browsertrix-crawler/tests/fixtures
Ilya Kreymer 30646ca7ba
Add downloads dir to cache external dependency within the crawl (#921)
Fixes #920 
- Downloads profile, custom behavior, and seed list to `/downloads`
directory in the crawl
- Seed File: Downloaded into downloads. Never refetched if already
exists on subsequent crawl restarts.
- Custom Behaviors: Git: Downloaded into dir, then moved to
/downloads/behaviors/<dir name>. if already exist, failure to downloaded
will reuse existing directory
- Custom Behaviors: File: Downloaded into temp file, then moved to
/downloads/behaviors/<name.js>. if already exists, failure to download
will reuse existing file.
- Profile: using `/profile` directory to contain the browser profile
- Profile: downloaded to temp file, then placed into
/downloads/profile.tar.gz. If failed to download, but already exists,
existing /profile directory is used
- Also fixes #897
2025-11-26 19:30:27 -08:00
..
proxies Support host-specific proxies with proxy config YAML (#837) 2025-08-20 16:07:29 -07:00
crawl-1.yaml Add Prettier to the repo, and format all the files! (#428) 2023-11-09 16:11:11 -08:00
crawl-2.yaml Add fields to warcinfo in combinedwarc (#60) 2021-07-07 15:56:52 -07:00
driver-1.mjs Support custom css selectors for extracting links (#689) 2024-11-08 11:04:41 -05:00
pages.jsonl tests text extraction (#30) 2021-03-01 16:00:23 -08:00
sample-profile.tar.gz Add downloads dir to cache external dependency within the crawl (#921) 2025-11-26 19:30:27 -08:00
urlSeedFile.txt seed urls list: check for quoted URLs and remove quotes (#883) 2025-09-12 13:34:41 -07:00