Commit graph

162 commits

Author SHA1 Message Date
benoit74
a352c0c402
Add temporary Github Actions workflow to build zimit2 image 2024-01-15 08:06:50 +01:00
benoit74
e034b08852
Update CHANGELOG 2024-01-15 08:06:50 +01:00
benoit74
bbc8a48bc9
Update CHANGELOG 2024-01-15 07:55:53 +01:00
benoit74
d6c0c6ce63
Fixes following review + we need to create on subdir per run to not mix data / cleanup correctly afer run 2023-11-23 13:08:45 +01:00
benoit74
b98e8f7027
Fix handling of '--collection' parameter + add '--tmp' + enhance logging 2023-11-23 09:02:08 +01:00
benoit74
51ef841836
Prepare next release 2023-11-17 11:30:37 +01:00
benoit74
6e6c0e8b39
Release 1.6.2 2023-11-17 11:25:09 +01:00
benoit74
7ca08791e7
Upgrade to browsertrix crawler 0.12.3 2023-11-17 11:17:41 +01:00
benoit74
4ad41a7d54
Upgrade to browsertrix crawler 0.12.2 2023-11-15 15:26:49 +01:00
benoit74
d24775d70c
Fix logic passing args to crawler
- do not set arg only if value is None or False
- remove default value 0 from args (this was not passed but would be
  with new corrected code and would induce a different crawler behavior in fact)
2023-11-15 15:26:18 +01:00
benoit74
a73114d140
Release Browsertrix 0.12.1 2023-11-06 10:00:03 +01:00
benoit74
c98e4505a8
Prepare next release 2023-11-02 21:10:28 +01:00
benoit74
36ba61b0a5
Release v1.6.0 2023-11-02 20:54:07 +01:00
benoit74
56fb86e531
Update to browsertrix crawler 0.12.0-beta2 2023-10-30 11:25:58 +01:00
benoit74
2a317c91e4
User-Agent has a default and is used for check_url 2023-10-23 13:45:26 +02:00
benoit74
d8f6cef7f3
Fail on all HTTP error codes in check_url 2023-10-23 11:09:16 +02:00
renaud gaudin
00051453e1
releasing 1.5.3 with crawler 0.11.2 2023-10-02 10:51:06 +00:00
renaud gaudin
3769c77cd4
releasing with crawler 0.11.1 2023-09-19 09:04:23 +00:00
benoit74
df2403c6dd
Update CHANGELOG.md 2023-09-18 16:16:12 +02:00
renaud gaudin
2be5562553
releasing 1.5.1 with updated crawler and warc2zim 2023-09-18 08:28:09 +00:00
renaud gaudin
ea210bcd10
Using main warc2zim 2023-09-11 10:43:28 +00:00
benoit74
7e24388820
Do not create empty stats file 2023-08-28 13:10:07 +02:00
renaud gaudin
12dab25e61
v1.5.0 with --long-description 2023-08-23 16:33:46 +00:00
renaud gaudin
df0fa9bbaf
releasing 1.4.1 with crawler 0.10.4 2023-08-23 12:15:01 +00:00
renaud gaudin
1224476b41
crawler 0.10.3 and main warc2zim 2023-08-10 18:51:19 +00:00
renaud gaudin
906161ea51
fixed changelog (for 1.4.0) 2023-08-02 14:47:23 +00:00
renaud gaudin
cbaaa77a1f
releasing 1.4.0 2023-08-02 14:42:10 +00:00
renaud gaudin
61dc792653
Fixed #191: --lang to crawler, --zim-lang to warc2zim 2023-08-02 11:26:47 +00:00
renaud gaudin
941db5fdfc
using crawler 0.10.2 2023-08-02 11:26:42 +00:00
renaud gaudin
af8196095d
using 0.10.0-beta.4 2023-05-23 08:10:03 +00:00
renaud gaudin
70a80681a6
use bet3 and --failOnFailedSeed 2023-05-22 11:23:46 +00:00
renaud gaudin
c31e80608e Using browsertrix-crawler 0.10.0-beta.0 2023-04-27 11:35:14 +00:00
renaud gaudin
8b4ea950a8 Using browsertrix-crawler 0.9.1 2023-04-25 08:53:52 +00:00
renaud gaudin
8ecd0a3210 upgraded to browsertrix-crawler 0.9.0 2023-04-10 13:08:12 +00:00
renaud gaudin
4f676e37c7 Using browsertrix-crawler 0.9.0-beta.2 2023-04-04 08:49:25 +00:00
renaud gaudin
b7265b49b6 updated to crawler 0.9 (b1) 2023-03-24 07:26:33 +00:00
renaud gaudin
6324b7c7c5 Fixed #172: Disabled Chrome updates to prevent incidental inclusion of update data in WARC/ZIM 2023-03-10 12:10:06 +00:00
renaud gaudin
238d1a6016 using crawler 0.8.1 and warc2zim's main 2023-02-27 09:57:36 +00:00
renaud gaudin
64bc8bf09f releasing 1.3.1 2023-02-06 11:48:44 +01:00
renaud gaudin
459778d472 released v1.3.0 2023-02-02 16:31:45 +00:00
renaud gaudin
554fff5c87 Using browsertrix-crawler 0.8.0-beta.1 2023-01-31 10:34:32 +00:00
renaud gaudin
8fd9462e25 triggering a rebuild with updated (still main) warc2zim 2023-01-16 11:39:05 +00:00
renaud gaudin
3756c6612f Using browsertrix-crawler 0.8.0-beta.0 2023-01-13 09:59:07 +00:00
renaud gaudin
cf26f8c33a Using browsertrix-crawler 0.7.1 2022-11-16 11:20:39 +00:00
renaud gaudin
0624c50121 Using browsertrix-crawler 0.7.0 (release) 2022-10-12 14:57:01 +00:00
renaud gaudin
ce68493087 increased check_url timeouts 2022-07-25 08:41:08 +00:00
renaud gaudin
857e044c84 Fixed --allowHashUrls incorrectly requiring a value 2022-07-18 10:23:16 +00:00
renaud gaudin
8c6d2bfb45 using browsertrix-crawler 0.7 beta 2022-07-04 15:08:49 +00:00
renaud gaudin
b79ad1b138 use master warc2zim in-between releases 2022-06-30 09:42:50 +00:00
renaud gaudin
142970bc0a Fixed #137: normalizes homepage redirects to standart ports 2022-06-22 09:57:01 +00:00