mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2026-04-18 15:10:21 +00:00
Add a --commitCrawlId and --cancelCrawlId to indexer which will either: - merge the the data for a single crawl into the main index. This can take a variable amount of time, depending on size of index. - improved logging for commit, and ensure counts are never incremented twice, even if long-running commit job is restarted. - delete data for canceled crawl from uncommitted crawls list (should be very fast). Both of these tasks also done as part of the crawl itself, and is still useful for single-crawler operation. However, with Browsertrix controlled in k8s, these tasks are better handled as separate jobs, especially since crawl may be committed without restarting the crawler container (eg. for paused crawls). --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net> |
||
|---|---|---|
| .. | ||
| docs | ||
| gen-cli.sh | ||
| mkdocs.yml | ||