- Update to Chrome/Chromium 101 - (0.7.0 Beta 0) by @ikreymer in #144
- Add --netIdleWait, bump dependencies (0.7.0-beta.2) by @ikreymer in #145
- Update README.md by @atomotic in #147
- Wait Default + Logging Improvements by @ikreymer in #153
- Page-reuse concurrency + Browser Repair + Screencaster Cleanup Improvements by @ikreymer in #157
- Logging and browser improvements: by @ikreymer in #158
- pending wait: set max pending request wait to 120 seconds by @ikreymer in #161
- Default Wait-Time Improvements by @ikreymer in #162
- Interrupt Handling Fixes by @ikreymer in #167
- Run in Docker as User by @edsu in #171
v0.6.0
- Add a --waitOnDone option, which has browsertrix crawler wait when finished (for use with Browsertrix Cloud)
- When running with redis shared state, set the :status field to running, failing/failed or done to let job controller know crawl is finished.
- Set redis state to failing in case of exception, set to failed in case of >3 or more failed exits within 60 seconds (but don't mark as failed if all pages are finished and >0 pages.
- When receiving a SIGUSR1, don't wait on down (assume final exit due to scale down).
- More efficient screencasting, don't end screencasting when page ends, only when target is destroyed!
- Keep same screencasting connection from one page to next, as the target are reused in 'window' concurrency mode
- Size limit (in bytes) via --sizeLimit
- Total time limit (in bytes) via --timeLimit
- Overwrite collection (delete existing) via --overwrite
- Fixes to interrupting a single instance in a shared state crawl
- force all cookies, including session cookies, to fixed duration in days, configurable via --cookieDays
- BlockRules Optimizations: don't intercept requests if no blockRules
- Profile Creation: Support extending existing profile by passing a --profile param to load on startup
- Profile Creation: Set default window size to 1600x900, add --windowSize param for setting custom size
- Behavior Timeouts: Add --behaviorTimeout to specify custom timeout for behaviors, in seconds (defaulting to 90 seconds)
- Load Wait Default: Switch to 'load,networkidle2' to speed-up waiting for initial load
- Multi-platform build: Support building for amd64 and Arm using oldwebtoday/chrome:91 images (check for google-chrome and chromium-browser automatically)
- CI: Build a multi-platform (amd64 and arm64) image on each release
- YAML based config, specifyable via --config property or via stdin (with '--config stdin')
- Support for different scope types ('page', 'prefix', 'host', 'any', 'none') + crawl depth at crawl level
- Per-Seed scoping, including different scope types, or depth and include/exclude rules configurable per seed in 'seeds' list via YAML config
- Support for 'blockRules' for blocking certain URLs from being stored in WARCs, conditional blocking for iframe based on contents, and iframe URLs (see README for more details)
- Interactive profile creation: creating profiles by interacting with embedded browser loaded in the browser (see README for more details).
- Screencasting: streaming the output of each window via websocket-based streaming, configurable with --screencastPort option
- New 'window' based parallelization: Open each worker in new window in same session
- Refactor arg parsing, other auxiliary functions into separate utils files
- Image customization: support for customizing browser image, eg. building with Chromium instead of Chrome, support for ARM architecture builds (see README for more details).
- Update to latest pywb (2.5.0b4), browsertrix-behaviors (0.2.3), py-wacz (0.3.1)