browsertrix-crawler

mirror of https://github.com/webrecorder/browsertrix-crawler.git synced 2026-06-18 11:51:42 +00:00

History

Ilya Kreymer 5d1b2ea263 Add Indexer options to commit/cancel single crawl (#978 ) Add a --commitCrawlId and --cancelCrawlId to indexer which will either: - merge the the data for a single crawl into the main index. This can take a variable amount of time, depending on size of index. - improved logging for commit, and ensure counts are never incremented twice, even if long-running commit job is restarted. - delete data for canceled crawl from uncommitted crawls list (should be very fast). Both of these tasks also done as part of the crawl itself, and is still useful for single-crawler operation. However, with Browsertrix controlled in k8s, these tasks are better handled as separate jobs, especially since crawl may be committed without restarting the crawler container (eg. for paused crawls). --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>		2026-02-19 11:53:12 -08:00
..
docs	Add Indexer options to commit/cancel single crawl (#978 )	2026-02-19 11:53:12 -08:00
gen-cli.sh	Gracefully handle non-absolute path for create-login-profile --filename (#521 )	2024-03-29 13:46:54 -07:00
mkdocs.yml	Dedup Initial Implementation (#889 )	2026-02-12 13:40:49 -08:00