update CHANGES for 0.5.0 release

This commit is contained in:
Ilya Kreymer 2022-04-09 21:57:58 -07:00
parent 7ed5586bdb
commit bfd72835d1
2 changed files with 12 additions and 4 deletions

View file

@ -1,15 +1,23 @@
## CHANGES
v0.5.0
- State: Support for serialization and reloading of crawl state to config.yaml
- Scope: support for `scopeType: domain` to include all subdomains and ignoring 'www.' if specified in the seed.
- Profiles: support loading remote profile from URL as well as local file
- Non-HTML Pages: Load non-200 responses in browser, even if non-html, fix waiting issues with non-HTML pages (eg. PDFs)
- Config options: Fix setting user-agent
- Page behavior: latest browsertrix-behaviors, also add experimental Cloudflare interstitial wait.
- Error handling: better error handling for redis errors
- State: Support loading of crawl state from config.yaml
- State: Support serialization of crawl state to `crawls` subdirectory, both while running (keeping last N states) and on exit.
- State: Graceful saving of crawl state on ctrl+c interrupt
- State: Memory or Redis based crawl state
- Config: Aadditional crawl config via env var
- Config: Support additional options via `CRAWL_ARGS` environment variable
- WACZ Upload: Support for S3 upload of WACZ upon crawl completion
- WACZ Upload: HTTP/Redis webhook to notify of upload completion
- Crawl Scope: Support for `extraHops` to optionally crawl an extra hop beyond scope
- Signing: Support for optional signing of WACZ
- Dependencies: update to latest pywb and wacz packages
- Dependencies: update to latest pywb, wacz and browsertrix-behaviors packages
v0.4.4
- Page Block Rules Fix: 'request already handled' errors by avoiding adding duplicate handlers to same page.

View file

@ -1,6 +1,6 @@
{
"name": "browsertrix-crawler",
"version": "0.5.0-beta.8",
"version": "0.5.0",
"main": "browsertrix-crawler",
"repository": "https://github.com/webrecorder/browsertrix-crawler",
"author": "Ilya Kreymer <ikreymer@gmail.com>, Webrecorder Software",