browsertrix-crawler/tests
Ilya Kreymer 00835fc4f2
Retry same queue (#757)
- follow up to #743
- page retries are simply added back to the same queue with `retry`
param incremented and a higher scope, after extraHops, to ensure retries
are added at the end.
- score calculation is: `score = depth + (extraHops * MAX_DEPTH) +
(retry * MAX_DEPTH * 2)`, this ensures that retries have lower priority
than extraHops, and additional retries even lower priority (higher
score).
- warning is logged when a retry happens, error only when all retries
are exhausted.
- back to one failure list, urls added there only when all retries are
exhausted.
- rename --numRetries -> --maxRetries / --retries for clarity
- state load: allow retrying previously failed URLs if --maxRetries is
higher then on previous run.
- ensure working with --failOnFailedStatus, if provided, invalid status
codes (>= 400) are retried along with page load failures
- fixes #132

---------

Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-02-06 18:48:40 -08:00
..
custom-behaviors Support loading custom behaviors from URLs and/or filepaths (#707) 2024-11-04 20:30:53 -08:00
fixtures Support custom css selectors for extracting links (#689) 2024-11-08 11:04:41 -05:00
invalid-behaviors detect invalid custom behaviors on load: (#450) 2023-12-13 15:14:53 -05:00
.DS_Store tests text extraction (#30) 2021-03-01 16:00:23 -08:00
adblockrules.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
add-exclusion.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
basic_crawl.test.js Streaming in-place WACZ creation + CDXJ indexing (#673) 2024-08-29 13:21:20 -07:00
blockrules.test.js tests: disable blockrules youtube tests in CI (#698) 2024-10-04 17:37:13 -07:00
brave-query-redir.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
collection_name.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
config_file.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
config_stdin.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
crawl_overwrite.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
custom-behavior.test.js Support loading custom behaviors from git repo (#717) 2024-11-13 22:50:33 -08:00
custom_driver.test.js Support custom css selectors for extracting links (#689) 2024-11-08 11:04:41 -05:00
custom_selector.test.js Support custom css selectors for extracting links (#689) 2024-11-08 11:04:41 -05:00
dryrun.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
exclude-redirected.test.js Apply exclusions to redirects (#745) 2025-01-28 11:28:23 -08:00
extra_hops_depth.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
file_stats.test.js Retry same queue (#757) 2025-02-06 18:48:40 -08:00
http-auth.test.js http auth support per seed (supersedes #566): (#616) 2024-06-20 16:35:30 -07:00
limit_reached.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
log_filtering.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
mult_url_crawl_with_favicon.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
multi-instance-crawl.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
non-html-crawl.test.js Always download PDF + non HTML page cleanup + enterprise policy cleanup (#629) 2024-06-26 09:16:24 -07:00
pageinfo-records.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
proxy.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
qa_compare.test.js QA fix: ensure replay iframe actually been updated after goto call! (#756) 2025-02-06 10:41:38 -08:00
retry-failed.test.js Retry same queue (#757) 2025-02-06 18:48:40 -08:00
rollover-writer.test.js Autoclick Support (#729) 2025-01-16 09:38:11 -08:00
saved-state.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
scopes.test.js crawler args typing (#680) 2024-09-05 18:10:27 -07:00
screenshot.test.js Fix for --rolloverSize for individual WARCs in 1.x (#542) 2024-04-15 13:43:08 -07:00
seeds.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
sitemap-parse.test.js don't disable extraHops when using sitemaps: (#639) 2024-07-11 19:48:43 -07:00
storage.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
text-extract.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
upload-wacz.test.js pages redis: include 'depth', 'seed' and 'favIconUrl' in page data added to redis (#749) 2025-01-30 11:18:59 -08:00
url_file_list.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
warcinfo.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00