mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00
![]() Initial (beta) support for QA/replay crawling! - Supports running a crawl over a given WACZ / list of WACZ (multi WACZ) input, hosted in ReplayWeb.page - Runs local http server with full-page, ui-less ReplayWeb.page embed - ReplayWeb.page release version configured in the Dockerfile, pinned ui.js and sw.js fetched directly from cdnjs Can be deployed with `webrecorder/browsertrix-crawler qa` entrypoint. - Requires `--qaSource`, pointing to WACZ or multi-WACZ json that will be replay/QAd - Also supports `--qaRedisKey` where QA comparison data will be pushed, if specified. - Supports `--qaDebugImageDiff` for outputting crawl / replay/ diff images. - If using --writePagesToRedis, a `comparison` key is added to existing page data where: ``` comparison: { screenshotMatch?: number; textMatch?: number; resourceCounts: { crawlGood?: number; crawlBad?: number; replayGood?: number; replayBad?: number; }; }; ``` - bump version to 1.1.0-beta.2 |
||
---|---|---|
.. | ||
custom-behaviors | ||
fixtures | ||
invalid-behaviors | ||
.DS_Store | ||
adblockrules.test.js | ||
add-exclusion.test.js | ||
basic_crawl.test.js | ||
blockrules.test.js | ||
collection_name.test.js | ||
config_file.test.js | ||
config_stdin.test.js | ||
crawl_overwrite.js | ||
custom-behavior.test.js | ||
custom_driver.test.js | ||
extra_hops_depth.test.js | ||
file_stats.test.js | ||
limit_reached.test.js | ||
log_filtering.test.js | ||
mult_url_crawl_with_favicon.test.js | ||
pageinfo-records.test.js | ||
qa_compare.test.js | ||
redis_crawl_state.js | ||
saved-state.test.js | ||
scopes.test.js | ||
screenshot.test.js | ||
seeds.test.js | ||
sitemap-parse.test.js | ||
storage.test.js | ||
text-extract.test.js | ||
url_file_list.test.js | ||
warcinfo.test.js |