Retry support and additional fixes (#743)

- retries: for failed pages, set retry to 5 in cases multiple retries
may be needed.
- redirect: if page url is /path/ -> /path, don't add as extra seed
- proxy: don't use global dispatcher, pass dispatcher explicitly when
using proxy, as proxy may interfere with local network requests
- final exit flag: if crawl is done and also interrupted, ensure WACZ is
still written/uploaded by setting final exit to true
- hashtag only change force reload: if loading page with same URL but
different hashtag, eg. `https://example.com/#B` after
`https://example.com/#A`, do a full reload
This commit is contained in:
Ilya Kreymer 2025-01-25 22:55:49 -08:00 committed by GitHub
parent 5d9c62e264
commit f7cbf9645b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
12 changed files with 212 additions and 74 deletions

View file

@ -1,6 +1,6 @@
{
"name": "browsertrix-crawler",
"version": "1.5.0-beta.2",
"version": "1.5.0-beta.3",
"main": "browsertrix-crawler",
"type": "module",
"repository": "https://github.com/webrecorder/browsertrix-crawler",