browsertrix-crawler/docker-compose.yml at 0.7.2 - Stowage/browsertrix-crawler - Remotebranch.eu

Stowage/browsertrix-crawler

mirror of https://github.com/webrecorder/browsertrix-crawler.git synced 2025-10-19 06:23:16 +00:00

Ilya Kreymer cf90304fa7

0.6.0 Wait State + Screencasting Fixes (#141 )

* new options:
- to support browsertrix-cloud, add a --waitOnDone option, which has browsertrix crawler wait when finished 
- when running with redis shared state, set the `<crawl id>:status` field to `running`, `failing`, `failed` or `done` to let job controller know crawl is finished.
- set redis state to `failing` in case of exception, set to `failed` in case of >3 or more failed exits within 60 seconds (todo: make customizable)
- when receiving a SIGUSR1, assume final shutdown and finalize files (eg. save WACZ) before exiting.
- also write WACZ if exiting due to size limit exceed, but not do to other interruptions
- change sleep() to be in seconds

* misc fixes:
- crawlstate.finished() -> isFinished() - return if >0 pages and none left in queue
- don't fail crawl if isFinished() is true
- don't keep looping in pending wait for urls to finish if received abort request

* screencast improvements (fix related to webrecorder/browsertrix-cloud#233)
- more optimized screencasting, don't close and restart after every page.
- don't assume targets change after every page, they don't in window mode!
- only send 'close' message when target is actually closed

* bump to 0.6.0

2022-06-17 11:58:44 -07:00

17 lines

276 B

YAML

Raw Permalink Blame History

 version: '3.5'
 services:
     crawler:
         image: ${REGISTRY}webrecorder/browsertrix-crawler:latest
         build:
           context: ./
         volumes:
           - ./crawls:/crawls
         cap_add:
           - NET_ADMIN
           - SYS_ADMIN
         shm_size: 1gb