Commit graph

63 commits

Author SHA1 Message Date
Emma Dickson
9c139eba2b
Add wacz support to browsertrix (#6)
* Add WACZ creation support, fixes #2
* --generateWACZ flag adds WACZ file (currently named <collection>/<collection>.wacz)
* page list generated in <collection>/pages/pages.jsonl, entry for each page is appended to end of file, includes url and title of page

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2021-02-03 21:28:32 -08:00
rgaudin
789279021b
Added limit info to statsFilename (#5)
- added new `limit` dict to statsFilename
- `limit` dict composed of:
  - `max`: the limit requested (or `0`)
  - `hit`: boolean whether limit was reached or not
2021-01-29 10:26:55 -08:00
Ilya Kreymer
b7fe292021
work on latest update: (#7)
- fixes for iframes, as described in #4
- bump chrome to 88
- bump pywb to 2.5.0
- bump version to 1.0.5
2021-01-29 00:33:01 -08:00
Ilya Kreymer
62834735d1 stats: support json stats output to specified filename with --statsFilename flag (fixes openzim/zimit#39)
bump version to 0.1.3
2020-12-02 16:27:17 +00:00
Ilya Kreymer
082667099d add support for sitemaps with --useSitemap flag, defaults to /sitemap.xml if no string provided 2020-11-14 21:56:30 +00:00
Ilya Kreymer
92b251f0cb fix typo in setting userAgent 2020-11-14 20:51:07 +00:00
Ilya Kreymer
fe406b5f74 browser config settings:
- add support for --userAgent to override user agent
- add support for --mobileDevice to use puppeteer device emulation presets
- add support for --userAgentSuffix to append to default user agent (including device userAgent)
bump to 0.1.2
2020-11-14 19:32:31 +00:00
Ilya Kreymer
7a13535d78 dockerfile: add symlink to 'google-chrome'
crawler: get version for user-agent via 'google-chrome --product-version'
compose: build versionned image, version 0.1.0
2020-11-05 22:34:10 +00:00
raffaele messuti
5bf64be018
minor fixes (#1)
* Update README.md - fix incomplete docker run pywb

* Update crawler.js - fix generateCDX
2020-11-03 13:33:19 -08:00
Ilya Kreymer
8f740d4e24 support custom crawl directory with --cwd flag, default to /crawls
update README
2020-11-02 15:28:19 +00:00
Ilya Kreymer
a875aa90d3 Dockerfile: switch to cmd 'crawl', instead of entrypoint to support running 'pywb' also
update README with docker-compose and docker run examples, update commandline example
default output to './crawls' subdirectory
2020-11-01 21:35:00 -08:00
Ilya Kreymer
91b8994a08 refactor crawler and default driver:
- add extensible defaultDriver, wrap crawling functionality in Crawler class
- support headless/non-headless, custom driver
- support custom collection name for pywb, generate-cdx option
- autoplay: add slightly delay for splash loading
2020-11-01 19:53:47 -08:00
Ilya Kreymer
ded83b52b3 initial commit after split from zimit 2020-10-31 13:16:37 -07:00