mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00

* new options: - to support browsertrix-cloud, add a --waitOnDone option, which has browsertrix crawler wait when finished - when running with redis shared state, set the `<crawl id>:status` field to `running`, `failing`, `failed` or `done` to let job controller know crawl is finished. - set redis state to `failing` in case of exception, set to `failed` in case of >3 or more failed exits within 60 seconds (todo: make customizable) - when receiving a SIGUSR1, assume final shutdown and finalize files (eg. save WACZ) before exiting. - also write WACZ if exiting due to size limit exceed, but not do to other interruptions - change sleep() to be in seconds * misc fixes: - crawlstate.finished() -> isFinished() - return if >0 pages and none left in queue - don't fail crawl if isFinished() is true - don't keep looping in pending wait for urls to finish if received abort request * screencast improvements (fix related to webrecorder/browsertrix-cloud#233) - more optimized screencasting, don't close and restart after every page. - don't assume targets change after every page, they don't in window mode! - only send 'close' message when target is actually closed * bump to 0.6.0
17 lines
276 B
YAML
17 lines
276 B
YAML
version: '3.5'
|
|
|
|
services:
|
|
crawler:
|
|
image: ${REGISTRY}webrecorder/browsertrix-crawler:latest
|
|
build:
|
|
context: ./
|
|
|
|
volumes:
|
|
- ./crawls:/crawls
|
|
|
|
cap_add:
|
|
- NET_ADMIN
|
|
- SYS_ADMIN
|
|
|
|
shm_size: 1gb
|
|
|