mirror of
https://github.com/openzim/zimit.git
synced 2025-12-31 04:23:15 +00:00
fixup! Keep temporary folder when crawler or warc2zim fails, even if not asked for
This commit is contained in:
parent
ee82837aaa
commit
b4ec60f316
2 changed files with 5 additions and 2 deletions
|
|
@ -48,7 +48,7 @@ The image accepts the following parameters, **as well as any of the [warc2zim](h
|
|||
- `--exclude <regex>` - skip URLs that match the regex from crawling. Can be specified multiple times. An example is `--exclude="(\?q=|signup-landing\?|\?cid=)"`, where URLs that contain either `?q=` or `signup-landing?` or `?cid=` will be excluded.
|
||||
- `--workers N` - number of crawl workers to be run in parallel
|
||||
- `--wait-until` - Puppeteer setting for how long to wait for page load. See [page.goto waitUntil options](https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#pagegotourl-options). The default is `load`, but for static sites, `--wait-until domcontentloaded` may be used to speed up the crawl (to avoid waiting for ads to load for example).
|
||||
- `--keep` - if set, keep the WARC files in a temp directory inside the output directory
|
||||
- `--keep` - in case of failure, WARC files and other temporary files (which are stored as a subfolder of output directory) are always kept, otherwise they are automatically deleted. Use this flag to always keep WARC files, even in case of success.
|
||||
|
||||
Example command:
|
||||
|
||||
|
|
|
|||
|
|
@ -334,7 +334,10 @@ def run(raw_args):
|
|||
|
||||
parser.add_argument(
|
||||
"--keep",
|
||||
help="If set, keep WARC files after crawl, don't delete",
|
||||
help="In case of failure, WARC files and other temporary files (which are "
|
||||
"stored as a subfolder of output directory) are always kept, otherwise "
|
||||
"they are automatically deleted. Use this flag to always keep WARC files, "
|
||||
"even in case of success.",
|
||||
action="store_true",
|
||||
)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue