browsertrix-crawler

mirror of https://github.com/webrecorder/browsertrix-crawler.git synced 2025-12-08 06:09:48 +00:00

History

Ilya Kreymer 00835fc4f2 Retry same queue (#757 ) - follow up to #743 - page retries are simply added back to the same queue with `retry` param incremented and a higher scope, after extraHops, to ensure retries are added at the end. - score calculation is: `score = depth + (extraHops * MAX_DEPTH) + (retry * MAX_DEPTH * 2)`, this ensures that retries have lower priority than extraHops, and additional retries even lower priority (higher score). - warning is logged when a retry happens, error only when all retries are exhausted. - back to one failure list, urls added there only when all retries are exhausted. - rename --numRetries -> --maxRetries / --retries for clarity - state load: allow retrying previously failed URLs if --maxRetries is higher then on previous run. - ensure working with --failOnFailedStatus, if provided, invalid status codes (>= 400) are retried along with page load failures - fixes #132 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>		2025-02-06 18:48:40 -08:00
..
assets	Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494 )	2024-03-16 14:59:32 -07:00
develop	Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494 )	2024-03-16 14:59:32 -07:00
overrides	Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494 )	2024-03-16 14:59:32 -07:00
stylesheets	SOCKS5 over SSH Tunnel Support (#671 )	2024-08-28 18:47:24 -07:00
user-guide	Retry same queue (#757 )	2025-02-06 18:48:40 -08:00
CNAME	CNAME: keep CNAME in docs/docs for mkdocs	2024-03-16 15:24:54 -07:00
index.md	Add crawler QA docs (#551 )	2024-04-18 16:18:22 -04:00