warc2zim/tests/data
benoit74 93c866d6bd
Revisit retrieve_illustration logic to prefer best favicons unless user
provided a favicon to use.

Instead of prefering to use WARC items (or prefering to download as it
was before #202), we prefer to use the most suited favicon.

Potential favicons are sourced from main HTML page.

All favicons are retrieved either from the WARC or downloaded to inspect
their sizes.

We use the most suited one (i.e. 48x48 or bigger if possible or the
biggest one).

We still fallback to default ZIM illustration if no favicon is found, to
avoid loosing all time spent crawling the website.
2024-08-07 12:14:59 +00:00
..
bad-redirections.warc.gz Handle case where the redirect target is bad 2024-07-26 05:08:25 +00:00
content-resource-types.warc.gz Detect rewrite mode based on WARC-Resource-Type when available 2024-06-13 06:24:10 +00:00
empty-file fuzzy matching support: 2020-10-21 04:26:00 +00:00
example-response.warc initial warc2zim conversion and sample warcs 2020-07-22 13:24:33 -07:00
example-revisit.warc.gz Multi-Page Mode via SW (#28) 2020-08-03 09:47:32 -07:00
example-utf8.warc initial warc2zim conversion and sample warcs 2020-07-22 13:24:33 -07:00
example-with-timestamp.warc Handle HTTP return codes properly 2024-05-04 10:16:55 +00:00
http-return-codes.warc.gz Handle HTTP return codes properly 2024-05-04 10:16:55 +00:00
kiwix-with-redirects.warc.gz Add kiwix.org test case and fix favicon assertions 2024-03-07 07:30:43 +00:00
main-entry-403.warc.gz Exit with cleaner message when main entry is not processable 2024-06-27 06:31:53 +00:00
self-redirect.warc tests: add self-redirect.warc to test that self-redirect record is filtered out and not written 2020-09-25 03:06:13 +00:00
single-page-test.warc tests: use better test WARC (https://lesfondamentaux.reseau-canope.fr/ page) to test include-domains, favicon and language detection 2020-08-03 10:41:23 -07:00
solidaritenum.warc.gz Revisit retrieve_illustration logic to prefer best favicons unless user 2024-08-07 12:14:59 +00:00
video-vimeo.warc.gz update to cdxj-indexer 1.4.3 2021-10-29 16:52:36 +00:00
video-yt-2.warc.gz update to cdxj-indexer 1.4.3 2021-10-29 16:52:36 +00:00
video-yt.warc.gz update to cdxj-indexer 1.4.3 2021-10-29 16:52:36 +00:00