browsertrix-crawler/tests
Ilya Kreymer 850a6a6665
Don't remove excluded-on-redirect URLs from seen list (#936)
Fixes #937 
- Don't remove URLs from seen list
- Add new excluded key, add URLs to be excluded (out-of-scope on
redirect) to excluded set. The size of this set can be used to get the
URLs that have been excluded in this way, to compute number of
discovered URLs.
- Don't write urn:pageinfo records for excluded pages, along with not
writing to pages/extraPages.jsonl
2025-12-08 22:41:52 -08:00
..
custom-behaviors tests: remove example.com from tests (#885) 2025-09-19 23:21:47 -07:00
fixtures Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
invalid-behaviors tests: remove example.com from tests (#885) 2025-09-19 23:21:47 -07:00
.DS_Store tests text extraction (#30) 2021-03-01 16:00:23 -08:00
adblockrules.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
add-exclusion.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
basic_crawl.test.js tests: remove example.com from tests (#885) 2025-09-19 23:21:47 -07:00
blockrules.test.js tests: disable blockrules youtube tests in CI (#698) 2024-10-04 17:37:13 -07:00
brave-query-redir.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
collection_name.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
config_file.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
config_stdin.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
crawl_overwrite.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
custom-behavior-flow.test.js Support for behaviors from 'recorder flow' JSON created in devtools (#818) 2025-04-09 12:24:29 +02:00
custom-behavior.test.js tests: remove example.com from tests (#885) 2025-09-19 23:21:47 -07:00
custom_driver.test.js Support custom css selectors for extracting links (#689) 2024-11-08 11:04:41 -05:00
custom_selector.test.js flow behaviors: add scrolling into view (#892) 2025-10-07 08:17:56 -07:00
dryrun.test.js Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
exclude-redirected.test.js Don't remove excluded-on-redirect URLs from seen list (#936) 2025-12-08 22:41:52 -08:00
extra_hops_depth.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
file_stats.test.js Retry same queue (#757) 2025-02-06 18:48:40 -08:00
http-auth.test.js http auth support per seed (supersedes #566): (#616) 2024-06-20 16:35:30 -07:00
lang-code.test.js lang code fixes: (#834) 2025-05-12 16:06:29 -07:00
limit_reached.test.js Add more exit codes to detect interruption reason (#764) 2025-02-10 14:00:55 -08:00
log_filtering.test.js Better default crawlId (#806) 2025-04-01 13:40:03 -07:00
mult_url_crawl_with_favicon.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
multi-instance-crawl.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
non-html-crawl.test.js Async Fetch Refactor (#880) 2025-09-10 12:05:21 -07:00
pageinfo-records.test.js seed urls list: check for quoted URLs and remove quotes (#883) 2025-09-12 13:34:41 -07:00
profiles.test.js Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
proxy.test.js Support host-specific proxies with proxy config YAML (#837) 2025-08-20 16:07:29 -07:00
qa_compare.test.js tests: update qa test to use awp site 2025-03-21 13:06:53 -07:00
retry-failed.test.js tests: remove example.com from tests (#885) 2025-09-19 23:21:47 -07:00
robots_txt.test.js Rename robots flag to --useRobots, keep --robots as alias (#932) 2025-12-02 15:55:25 -08:00
rollover-writer.test.js Autoclick Support (#729) 2025-01-16 09:38:11 -08:00
saved-state.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
scopes.test.js Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
screenshot.test.js Fix for --rolloverSize for individual WARCs in 1.x (#542) 2024-04-15 13:43:08 -07:00
seeds.test.js tests: use old.webrecorder.net for testing (#710) 2024-10-31 13:24:58 -04:00
sitemap-parse.test.js Deps update 1.6.1 (#826) 2025-05-02 00:43:37 -07:00
storage.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00
text-extract.test.js Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
upload-wacz.test.js base: bump to brave 1.80.113 (#857) 2025-06-30 19:55:38 -07:00
url-normalize.test.js sort query args before queuing URLs (#935) 2025-12-08 15:51:50 -08:00
url_file_list.test.js Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
warcinfo.test.js tests: reduce logging (#596) 2024-06-26 13:05:13 -07:00