mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2026-04-18 07:00:22 +00:00
Fixes #965 Add `--reportSkipped` argument, which will enable the creation of a `reports/skippedPages.jsonl` file with the following elements for each URL encountered that was not queued: - `url` - `seedUrl` - `depth` - `reason` (one of `outOfScope`, `pageLimit`, `robotsTxt`, or `redirectToExcluded`) - `ts` The `reports/` directory is new and will likely be expanded with other crawl-time reporting moving forward. --------- Co-authored-by: Ilya Kreymer <ikreymer@gmail.com> |
||
|---|---|---|
| .. | ||
| docs | ||
| gen-cli.sh | ||
| mkdocs.yml | ||