benoit74
6a804e9a8e
Prepare for 2.1.2
2024-09-05 08:25:44 +00:00
benoit74
501520d07f
Release 2.1.1
2024-09-05 07:45:42 +00:00
benoit74
6b3c725eeb
More precise usage on diskUtilization setting
2024-09-03 18:06:07 +00:00
benoit74
7f76415710
Upgrade to browsertrix crawler 1.3.0-beta.0
...
Among other changes, it includes the upgrade to Ubuntu Noble, so we no
longer need the additional deadsnakes ppa in Dockerfile.
2024-09-03 18:06:06 +00:00
benoit74
efdf7804c0
Stream files downloads to not exhaust memory
2024-08-12 19:56:05 +00:00
benoit74
af48be8f82
Add support for tar files in --warcs
2024-08-09 09:27:57 +00:00
benoit74
7e69d8ab75
Prepare for 2.1.1
2024-08-09 08:14:10 +00:00
benoit74
2e082c41a9
Release 2.1.0
2024-08-09 08:02:16 +00:00
benoit74
bc06e85ced
Upgrade dependencies
2024-08-09 07:53:11 +00:00
benoit74
eb32adfea7
Sort WARC directories passed to zimit by modification time
2024-08-07 12:16:08 +00:00
benoit74
8cd1db6eef
Add option to directly process WARC files
2024-08-07 12:06:44 +00:00
benoit74
459a30a226
Do not log number of WARC files found
2024-08-07 12:06:43 +00:00
benoit74
861751a7ed
Stop fetching and passing browsertrix crawler version as scraperSuffix to warc2zim
2024-08-07 12:06:43 +00:00
benoit74
6d078c4dcf
Automate daily tests of ZIM behavior - Youtube only for now
2024-08-07 10:34:19 +00:00
benoit74
f756c2c652
Fix CHANGELOG
2024-08-07 09:38:15 +00:00
benoit74
097613de29
Add test checking that expected entries are present
2024-08-07 09:38:08 +00:00
benoit74
6e3951dfa7
Fix README and Dockerfile for imprecisions ( #314 )
2024-08-07 09:32:37 +00:00
benoit74
80b6b26782
Add support for custom behaviors configuration
2024-08-07 09:28:07 +00:00
benoit74
a1efe8dccf
Make it clear that --profile argument can be an HTTP(S) URL (and not only a path)
2024-08-07 09:16:19 +00:00
benoit74
526019e095
Prepare for 2.0.7
2024-08-02 08:46:59 +00:00
benoit74
2452e60d9d
Release 2.0.6
2024-08-02 08:17:58 +00:00
benoit74
c92782bea0
Upgrade to Browsertrix Crawler 1.2.6
2024-08-02 08:07:46 +00:00
benoit74
7305f70300
Prepare for 2.0.6
2024-07-24 06:39:21 +00:00
benoit74
021654e6b3
Release 2.0.5
2024-07-24 06:37:27 +00:00
benoit74
8a64216ac0
Upgrade to warc2zim 2.0.3
2024-07-24 05:35:55 +00:00
benoit74
9d43636559
Upgrade to Browsertrix Crawler 1.2.5
2024-07-24 05:34:25 +00:00
benoit74
dcd6427b8a
Prepare for 2.0.5
2024-07-15 08:58:03 +00:00
benoit74
fbd01a77ce
Release 2.0.4
2024-07-15 08:52:48 +00:00
benoit74
91a53f70ec
Prepare for 2.0.4
2024-06-24 07:56:35 +00:00
benoit74
e8995a9f59
Release 2.0.3
2024-06-24 07:50:13 +00:00
benoit74
2be5650a8c
Upgrade to crawler 1.2.0
2024-06-24 06:48:38 +00:00
benoit74
de0720e301
Prepare for 2.0.3
2024-06-18 14:05:47 +00:00
benoit74
b73a3e04d0
Release 2.0.2
2024-06-18 13:44:13 +00:00
benoit74
baa0d9ecc7
Prepare for next release
2024-06-13 11:42:17 +00:00
benoit74
2835c7b078
Release 2.0.1
2024-06-13 11:32:13 +00:00
benoit74
77747ec1d3
Upgrade dependencies
2024-06-13 10:26:04 +00:00
benoit74
83690f410d
Prepare for 2.1.0
2024-06-04 15:14:43 +00:00
benoit74
d8e6d55f87
Release 2.0.0
2024-06-03 19:59:04 +00:00
benoit74
59057bdbb1
Fix documentation about --waitUntil allowed values and drop choices checks
...
- add networkidle0, networkidle2 and drop networkidle to reflect crawler
changes
- drop choices check since this is anyway checked right at scraper start
in crawler startup (this ensures to be more permissive should one want
to use a different crawler version that the one supported in Docker
image)
2024-06-03 15:11:48 +00:00
benoit74
9e6c998816
Bump zimit to 2.0.0-dev5 + use warc2zim2 branch + remove zimit2 image workflow
2024-05-24 14:10:19 +00:00
benoit74
728784d6bf
Upgrade Browsertrix Crawler to 1.0.3
2024-03-27 15:08:59 +00:00
benoit74
e24479945f
Remove trailing characters when retrieving Browsertrix Crawler version
2024-03-27 15:08:58 +00:00
benoit74
5c716747b4
Add CHANGELOG
2024-03-07 10:16:57 +00:00
benoit74
9244f2e69c
Set zimit and browsertrix crawler versions in final ZIM 'Scraper' metadata
2024-01-31 15:10:08 +01:00
benoit74
a505df9fe0
Add support for --logging parameter of browsertrix crawler
2024-01-23 17:28:56 +01:00
benoit74
c0ffb74d8c
Adopt Python bootstrap conventions
2024-01-18 13:31:00 +01:00
benoit74
909b6e3da8
Merge branch 'main' into zimit2
2024-01-18 09:27:00 +01:00
benoit74
f46f2568ff
Prepare for next release
2024-01-18 09:16:18 +01:00
benoit74
19b4898326
Release 1.6.3
2024-01-18 09:12:36 +01:00
benoit74
eebf26f7cb
Upgrade to browsertrix crawler 0.12.4 and warc2zim 1.5.5
2024-01-18 09:05:06 +01:00