browsertrix-crawler/docs
Tessa Walsh 1c6e814e15
Add option to write JSONL file with data on skipped pages (#966)
Fixes #965 

Add `--reportSkipped` argument, which will enable the creation of a
`reports/skippedPages.jsonl` file with the following elements for each
URL encountered that was not queued:

- `url`
- `seedUrl`
- `depth`
- `reason` (one of `outOfScope`, `pageLimit`, `robotsTxt`, or
`redirectToExcluded`)
- `ts`

The `reports/` directory is new and will likely be expanded with other
crawl-time reporting moving forward.

---------

Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2026-04-09 12:51:41 -07:00
..
docs Add option to write JSONL file with data on skipped pages (#966) 2026-04-09 12:51:41 -07:00
gen-cli.sh Dedupe docs (#989) 2026-03-10 12:49:30 -07:00
mkdocs.yml Dedupe docs (#989) 2026-03-10 12:49:30 -07:00