browsertrix-crawler/docs
Ilya Kreymer 5d1b2ea263
Add Indexer options to commit/cancel single crawl (#978)
Add a --commitCrawlId and --cancelCrawlId to indexer which will either:
- merge the the data for a single crawl into the main index. This can
take a variable amount of time, depending on size of index.
- improved logging for commit, and ensure counts are never incremented
twice, even if long-running commit job is restarted.
- delete data for canceled crawl from uncommitted crawls list (should be
very fast).

Both of these tasks also done as part of the crawl itself, and is still
useful for single-crawler operation. However, with Browsertrix
controlled in k8s, these tasks are better handled as separate jobs,
especially since crawl may be committed without restarting the crawler
container (eg. for paused crawls).

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2026-02-19 11:53:12 -08:00
..
docs Add Indexer options to commit/cancel single crawl (#978) 2026-02-19 11:53:12 -08:00
gen-cli.sh Gracefully handle non-absolute path for create-login-profile --filename (#521) 2024-03-29 13:46:54 -07:00
mkdocs.yml Dedup Initial Implementation (#889) 2026-02-12 13:40:49 -08:00