Initial (beta) support for QA/replay crawling!
- Supports running a crawl over a given WACZ / list of WACZ (multi WACZ) input, hosted in ReplayWeb.page
- Runs local http server with full-page, ui-less ReplayWeb.page embed
- ReplayWeb.page release version configured in the Dockerfile, pinned ui.js and sw.js fetched directly from cdnjs
Can be deployed with `webrecorder/browsertrix-crawler qa` entrypoint.
- Requires `--qaSource`, pointing to WACZ or multi-WACZ json that will be replay/QAd
- Also supports `--qaRedisKey` where QA comparison data will be pushed, if specified.
- Supports `--qaDebugImageDiff` for outputting crawl / replay/ diff
images.
- If using --writePagesToRedis, a `comparison` key is added to existing page data where:
```
comparison: {
screenshotMatch?: number;
textMatch?: number;
resourceCounts: {
crawlGood?: number;
crawlBad?: number;
replayGood?: number;
replayBad?: number;
};
};
```
- bump version to 1.1.0-beta.2
This adds prettier to the repo, and sets up the pre-commit hook to
auto-format as well as lint.
Also updates ignores files to exclude crawls, test-crawls, scratch, dist as needed.
- set done key correctly, just an int now
- also check if array for old-style save states (for backwards
compatibility)
- fixes#411
- tests: includes tests using redis: tests save state + dynamically
adding exclusions (follow up to #408)
- adds `--debugAccessRedis` flag to allow accessing local redis outside
container