browsertrix-crawler/docs/docs
Ilya Kreymer 00835fc4f2
Retry same queue (#757)
- follow up to #743
- page retries are simply added back to the same queue with `retry`
param incremented and a higher scope, after extraHops, to ensure retries
are added at the end.
- score calculation is: `score = depth + (extraHops * MAX_DEPTH) +
(retry * MAX_DEPTH * 2)`, this ensures that retries have lower priority
than extraHops, and additional retries even lower priority (higher
score).
- warning is logged when a retry happens, error only when all retries
are exhausted.
- back to one failure list, urls added there only when all retries are
exhausted.
- rename --numRetries -> --maxRetries / --retries for clarity
- state load: allow retrying previously failed URLs if --maxRetries is
higher then on previous run.
- ensure working with --failOnFailedStatus, if provided, invalid status
codes (>= 400) are retried along with page load failures
- fixes #132

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-06 18:48:40 -08:00
..
assets Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494) 2024-03-16 14:59:32 -07:00
develop Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494) 2024-03-16 14:59:32 -07:00
overrides Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494) 2024-03-16 14:59:32 -07:00
stylesheets SOCKS5 over SSH Tunnel Support (#671) 2024-08-28 18:47:24 -07:00
user-guide Retry same queue (#757) 2025-02-06 18:48:40 -08:00
CNAME CNAME: keep CNAME in docs/docs for mkdocs 2024-03-16 15:24:54 -07:00
index.md Add crawler QA docs (#551) 2024-04-18 16:18:22 -04:00