Mention command line options when restarting (#577)

It's probably worth reminding people that the command line options need
to be passed in again since the crawl state doesn't include them.

Refs #568
This commit is contained in:
Ed Summers 2024-05-21 10:57:50 -07:00 committed by GitHub
parent 1735c3d8e2
commit 2ef116d667
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -127,7 +127,7 @@ A crawl can be gracefully interrupted with Ctrl-C (SIGINT) or a SIGTERM (see bel
When a crawl is interrupted, the current crawl state is written to the `crawls` subdirectory inside the collection directory. The crawl state includes the current YAML config, if any, plus the current state of the crawl.
This crawl state YAML file can then be used as `--config` option to restart the crawl from where it was left of previously.
This crawl state YAML file can then be used as `--config` option to restart the crawl from where it was left of previously. When restarting a crawl you will need to include any command line options you used to start the original crawl (e.g. `--url`), since these are not persisted to the crawl state.
By default, the crawl interruption waits for current pages to finish. A subsequent SIGINT will cause the crawl to stop immediately. Any unfinished pages are recorded in the `pending` section of the crawl state (if gracefully finished, the section will be empty).