2020-11-01 19:22:53 -08:00
|
|
|
version: '3.5'
|
2020-10-31 13:16:37 -07:00
|
|
|
|
|
|
|
services:
|
2020-11-01 21:35:00 -08:00
|
|
|
crawler:
|
0.4.1 Release! (#70)
* optimization: don't intercept requests if no blockRules set
* page load: set waitUntil to use networkidle2 instead of networkidle0 as reasonable default for most pages
* add --behaviorTimeout to set max running time for behaviors (defaults to 90 seconds)
* refactor profile loadProfile/saveProfile to util/browser.js
- support augmenting existing profile when creating a new profile
* screencasting: convert newContext to window instead of page by default, instead of just warning about it
* shared multiplatform image support:
- determine browser exe from list of options, getBrowserExe() returns current exe
- supports running with 'google-chrome' under amd64, and 'chromium-browser' under arm64
- update to multiplatform oldwebtoday/chrome:91 as browser image
- enable multiplatform build with latest build-push-action@v2
* seeds: add trim() to seed URLs
* logging: reduce initial debug logging, enable only if '--logging debug' is set. log if profile, text-extraction enabled, and post-processing stages automatically
* profile creation: add --windowSize flag, set default to 1600x900, default to loading Application tab, tweak UI styles
* extractLinks: support passing in custom property to get link, and also loading as an attribute via getAttribute. Fixes #25
* update CHANGES and README with new features
* bump version to 0.4.1
2021-07-22 14:24:51 -07:00
|
|
|
image: webrecorder/browsertrix-crawler:0.4.1
|
2020-11-01 19:22:53 -08:00
|
|
|
build:
|
|
|
|
context: ./
|
|
|
|
|
2020-11-01 21:35:00 -08:00
|
|
|
volumes:
|
2020-11-02 15:28:19 +00:00
|
|
|
- ./crawls:/crawls
|
2020-11-01 21:35:00 -08:00
|
|
|
|
2020-11-01 19:22:53 -08:00
|
|
|
cap_add:
|
|
|
|
- NET_ADMIN
|
|
|
|
- SYS_ADMIN
|
|
|
|
|
|
|
|
shm_size: 1gb
|