Commit graph

19 commits

Author SHA1 Message Date
Emma Dickson
0688674f6f
case insensitive params (#27)
* make --generateWacz, --generateCdx case insensitive with alias option
* fix eslint config and eslint issues

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2021-02-17 09:37:07 -08:00
Emma Dickson
9ef83e4ab4
update default collection name (#26)
Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
2021-02-15 20:06:18 -08:00
Ilya Kreymer
8c85ca2749 background behaviors refactor: (fixes #23)
- move auto-play, auto-fetch and auto-scroll behaviors to behaviors/global/*
- bgbehaviors manages these background behaviors
- command line --bgbehaviors option specifies which background behaviors to run (defaults to auto-fetch and auto-play)
2021-02-08 22:21:34 -08:00
Emma Dickson
7cfeefd19b
add ci and linting (#21)
* linting with eslint
* ci: validate linting and check basic single-page crawl with wacz creation

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
2021-02-08 09:45:46 -08:00
Ilya Kreymer
8af5e1487d
waitUntil improvements: (#22)
- puppeteer 'waitUntil supports an array of options, support via comma separated list
- default to 'waitUntil,load'
- should fix #3
2021-02-04 22:42:03 -08:00
Ilya Kreymer
0a4f716a9c version update:
- parametrize chrome version, set to 88 in Dockerfile and as BROWSER_VERSION env var
- bump to docker image to 0.2.0
2021-02-03 22:24:38 -08:00
Emma Dickson
9c139eba2b
Add wacz support to browsertrix (#6)
* Add WACZ creation support, fixes #2
* --generateWACZ flag adds WACZ file (currently named <collection>/<collection>.wacz)
* page list generated in <collection>/pages/pages.jsonl, entry for each page is appended to end of file, includes url and title of page

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2021-02-03 21:28:32 -08:00
rgaudin
789279021b
Added limit info to statsFilename (#5)
- added new `limit` dict to statsFilename
- `limit` dict composed of:
  - `max`: the limit requested (or `0`)
  - `hit`: boolean whether limit was reached or not
2021-01-29 10:26:55 -08:00
Ilya Kreymer
b7fe292021
work on latest update: (#7)
- fixes for iframes, as described in #4
- bump chrome to 88
- bump pywb to 2.5.0
- bump version to 1.0.5
2021-01-29 00:33:01 -08:00
Ilya Kreymer
62834735d1 stats: support json stats output to specified filename with --statsFilename flag (fixes openzim/zimit#39)
bump version to 0.1.3
2020-12-02 16:27:17 +00:00
Ilya Kreymer
082667099d add support for sitemaps with --useSitemap flag, defaults to /sitemap.xml if no string provided 2020-11-14 21:56:30 +00:00
Ilya Kreymer
92b251f0cb fix typo in setting userAgent 2020-11-14 20:51:07 +00:00
Ilya Kreymer
fe406b5f74 browser config settings:
- add support for --userAgent to override user agent
- add support for --mobileDevice to use puppeteer device emulation presets
- add support for --userAgentSuffix to append to default user agent (including device userAgent)
bump to 0.1.2
2020-11-14 19:32:31 +00:00
Ilya Kreymer
7a13535d78 dockerfile: add symlink to 'google-chrome'
crawler: get version for user-agent via 'google-chrome --product-version'
compose: build versionned image, version 0.1.0
2020-11-05 22:34:10 +00:00
raffaele messuti
5bf64be018
minor fixes (#1)
* Update README.md - fix incomplete docker run pywb

* Update crawler.js - fix generateCDX
2020-11-03 13:33:19 -08:00
Ilya Kreymer
8f740d4e24 support custom crawl directory with --cwd flag, default to /crawls
update README
2020-11-02 15:28:19 +00:00
Ilya Kreymer
a875aa90d3 Dockerfile: switch to cmd 'crawl', instead of entrypoint to support running 'pywb' also
update README with docker-compose and docker run examples, update commandline example
default output to './crawls' subdirectory
2020-11-01 21:35:00 -08:00
Ilya Kreymer
91b8994a08 refactor crawler and default driver:
- add extensible defaultDriver, wrap crawling functionality in Crawler class
- support headless/non-headless, custom driver
- support custom collection name for pywb, generate-cdx option
- autoplay: add slightly delay for splash loading
2020-11-01 19:53:47 -08:00
Ilya Kreymer
ded83b52b3 initial commit after split from zimit 2020-10-31 13:16:37 -07:00