mirror of
https://github.com/openzim/warc2zim.git
synced 2025-10-19 14:33:17 +00:00
using scraperlib 1.6 (libzim 7.2)
This commit is contained in:
parent
16d4bfafc1
commit
c19c0eb1ef
2 changed files with 27 additions and 15 deletions
40
CHANGELOG.md
40
CHANGELOG.md
|
@ -1,59 +1,71 @@
|
|||
warc2zim
|
||||
===
|
||||
## Changelog
|
||||
|
||||
# 1.4.0
|
||||
All notable changes to this project are documented in this file.
|
||||
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) (as of version 1.4.0).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added
|
||||
|
||||
* Additional fuzzy matching rules for youtube and vimeo, and additional test cases
|
||||
* Support for youtube videos, which require POST request handling to work.
|
||||
* Support for canonicalizing POST request data into URL for fuzzy matching (using cdxj-indexer)
|
||||
* Support loading custom sw.js from a local file path
|
||||
|
||||
# 1.3.6
|
||||
### Changed
|
||||
|
||||
* Updated zimscraperlib to 1.6 using libzim7.2
|
||||
* Added support for {period} replacement in --zim-file
|
||||
* Using fixed MarkupSafe version (Jinja2 dependency)
|
||||
|
||||
# [1.3.6]
|
||||
|
||||
* updated zimscraperlib (for libzim fix)
|
||||
|
||||
# 1.3.5
|
||||
# [1.3.5]
|
||||
|
||||
* don't crash on records without WARC-Target-URI
|
||||
* fixed failure if url contains a fragment
|
||||
* updated wabac.js to 2.7.3
|
||||
|
||||
# 1.3.4
|
||||
# [1.3.4]
|
||||
|
||||
* Added `--custom-css` option
|
||||
|
||||
# 1.3.3
|
||||
# [1.3.3]
|
||||
|
||||
* Added `--progress-file` option
|
||||
|
||||
# 1.3.2
|
||||
# [1.3.2]
|
||||
|
||||
* Update to wabac.js 2.1.6
|
||||
|
||||
# 1.3.1
|
||||
# [1.3.1]
|
||||
|
||||
* Favicon loading fixes: In topFrame.html, load favicon URL directly from ZIM A/ record, bypassing service worker H/ lookup.
|
||||
|
||||
# 1.3.0
|
||||
# [1.3.0]
|
||||
|
||||
* Supports 'fuzzy matching' with additional redirects add from normalized URL to exact URL
|
||||
* Add fuzzy matching rules for youtube and '?timestamp' URLs
|
||||
* Fix canonicaliziation where URLs that contain http/https were being incorrectly stripped (https://github.com/openzim/zimit/issues/37)
|
||||
|
||||
# 1.2.0
|
||||
# [1.2.0]
|
||||
|
||||
* Accepts directory inputs as well as individual files. If directory given, which will process all .warc and .warc.gz files recursively in the directory.
|
||||
* If trailing slash is missing on main URL, `--url https://example.com?test=value`, slash added and URL treated as `--url https://example.com/?test=value`
|
||||
|
||||
# 1.1.0
|
||||
# [1.1.0]
|
||||
|
||||
* Now defaults to including all URLs unless --include-domains is specifief (removed `-a`)
|
||||
* Arguments are now checked before starting. Also returns `100` on valid arguments but no WARC provided.
|
||||
|
||||
# 1.0.1
|
||||
# [1.0.1]
|
||||
|
||||
* Now skipping WARC records that redirect to self (http -> https mostly)
|
||||
|
||||
# 1.0.0
|
||||
# [1.0.0]
|
||||
|
||||
* Initial release
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
warcio>=1.7.3,<1.8
|
||||
requests>=2.25.1,<3.0
|
||||
beautifulsoup4>=4.9.3,<4.10
|
||||
zimscraperlib>=1.4.1,<1.5
|
||||
zimscraperlib>=1.6.0,<1.7
|
||||
Babel>=2.9,<3.0
|
||||
jinja2>=2.11,<3.0
|
||||
# to support possible brotli content in warcs
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue