Commit graph

14 commits

Author SHA1 Message Date
Emma Segal-Grossman
2a49406df7
Add Prettier to the repo, and format all the files! (#428)
This adds prettier to the repo, and sets up the pre-commit hook to
auto-format as well as lint.
Also updates ignores files to exclude crawls, test-crawls, scratch, dist as needed.
2023-11-09 16:11:11 -08:00
Ilya Kreymer
3a4d318e90 CHANGES: update changes for 0.8.1 2023-02-24 18:33:29 -08:00
Sara Tavares
5b1f224dcb
fix typos (#232) 2023-02-24 11:09:40 -08:00
Ilya Kreymer
a4358f4622 CHANGES: update with latest PRs for release! 2023-02-04 16:49:17 -08:00
Ilya Kreymer
b513246b03
deps: bump pywb to 2.7.3, update CHANGES to current version (#222)
* deps: bump pywb to 2.7.3
bump to 0.8.0 for release

* update CHANGES
2023-02-03 17:56:30 -08:00
Ilya Kreymer
bfd72835d1 update CHANGES for 0.5.0 release 2022-04-09 21:59:44 -07:00
Ilya Kreymer
66ce6688eb
Add WACZ Signing Support (#99)
* initial support for wacz signing (using a custom version py-wacz)
- signing url and token set via env vars WACZ_SIGN_TOKEN and WACZ_SIGN_URL
-  add CHANGELIST for 0.5.0
- bump pywb to 2.6.4
2022-01-26 16:06:10 -08:00
Ilya Kreymer
8c8cf232de update CHANGES for 0.4.4! 2021-08-17 21:24:56 -07:00
Ilya Kreymer
be1ee53c3e
BlockRules Fixes (0.4.3) (#75)
- blockrules fix: when checking an iframe nav request, match inFrameUrl against the parent iframe, not current one
- blockrules: cleanup, always allow 'pywb.proxy' static files
- logging: when 'debug' logging enabled, log urls blocked and conditional iframe checks from blockrules
- tests: add more complex test for blockrules
- update CHANGES and support info in README
- bump to 0.4.3
2021-07-27 09:41:21 -07:00
Ilya Kreymer
0e0b85d7c3
Customizable extract selectors + typo fix (0.4.2) (#72)
* fix typo in setting crawler.capturePrefix which caused directFetchCapture() to fail, causing non-HTML urls to fail.
- wrap directFetchCapture() to retry browser loading in case of failure

* custom link extraction improvements (improvements for #25) 
- extractLinks() returns a list of link URLs to allow for more flexibility in custom driver
- rename queueUrls() to queueInScopeUrls() to indicate the filtering is performed
- loadPage accepts a list of select opts {selector, extract, isAttribute} and defaults to {"a[href]", "href", false}
- tests: add test for custom driver which uses custom selector

* tests
- tests: all tests uses 'test-crawls' instead of crawls
- consolidation: combine initial crawl + rollover, combine warc, text tests into basic_crawl.test.js
- add custom driver test and fixture to test custom link extraction

* add to CHANGES, bump to 0.4.2
2021-07-23 18:31:43 -07:00
Ilya Kreymer
f4c6b6a99f
0.4.1 Release! (#70)
* optimization: don't intercept requests if no blockRules set

* page load: set waitUntil to use networkidle2 instead of networkidle0 as reasonable default for most pages

* add --behaviorTimeout to set max running time for behaviors (defaults to 90 seconds)

* refactor profile loadProfile/saveProfile to util/browser.js
- support augmenting existing profile when creating a new profile

* screencasting: convert newContext to window instead of page by default, instead of just warning about it

* shared multiplatform image support:
- determine browser exe from list of options, getBrowserExe() returns current exe
- supports running with 'google-chrome' under amd64, and 'chromium-browser' under arm64
- update to multiplatform oldwebtoday/chrome:91 as browser image
- enable multiplatform build with latest build-push-action@v2

* seeds: add trim() to seed URLs

* logging: reduce initial debug logging, enable only if '--logging debug' is set. log if profile, text-extraction enabled, and post-processing stages automatically

* profile creation: add --windowSize flag, set default to 1600x900, default to loading Application tab, tweak UI styles

* extractLinks: support passing in custom property to get link, and also loading as an attribute via getAttribute. Fixes #25

* update CHANGES and README with new features

* bump version to 0.4.1
2021-07-22 14:24:51 -07:00
Ilya Kreymer
6a65ea7a58 update CHANGES.md for 0.4.0
bump version to 0.4.0
remove extraneous logging
2021-07-20 23:06:15 -07:00
Emma Dickson
63376ab6ac
Add --urlFile param to specify text file with a list of URLs to crawl (#38)
* Resolves #12

* Make --url param optional. Only one of --url of --urlFile should be specified.

* Add ignoreScope option queueUrls() to support adding specific URLs

* add tests for urlFile

* bump version to 0.3.2

Co-authored-by: Emma Dickson <emmadickson@Emmas-MacBook-Pro.local>
2021-05-12 22:57:06 -07:00
Ilya Kreymer
51bb54e869 add CHANGES.md for 0.3.1 release! 2021-05-04 13:13:33 -07:00