Retry same queue (#757)

- follow up to #743 - page retries are simply added back to the same queue with `retry` param incremented and a higher scope, after extraHops, to ensure retries are added at the end. - score calculation is: `score = depth + (extraHops * MAX_DEPTH) + (retry * MAX_DEPTH * 2)`, this ensures that retries have lower priority than extraHops, and additional retries even lower priority (higher score). - warning is logged when a retry happens, error only when all retries are exhausted. - back to one failure list, urls added there only when all retries are exhausted. - rename --numRetries -> --maxRetries / --retries for clarity - state load: allow retrying previously failed URLs if --maxRetries is higher then on previous run. - ensure working with --failOnFailedStatus, if provided, invalid status codes (>= 400) are retried along with page load failures - fixes #132 --------- Co-authored-by: Tessa Walsh <tessa@bitarchivist.net>
2025-10-19 06:23:16 +00:00 · 2025-02-06 18:48:40 -08:00 · 2025-02-06 18:48:40 -08:00 · 00835fc4f2
commit 00835fc4f2
parent 5c9d808651
7 changed files with 218 additions and 131 deletions
--- a/docs/docs/user-guide/cli-options.md
+++ b/docs/docs/user-guide/cli-options.md
@ -240,8 +240,9 @@ Options:
                                            s         [boolean] [default: false]
      --writePagesToRedis                   If set, write page objects to redis
                                                      [boolean] [default: false]
-      --numRetries                          If set, number of times to retry a p
-                                            age that failed to load
+      --maxPageRetries, --retries           If set, number of times to retry a p
+                                            age that failed to load before page
+                                            is considered to have failed
                                                           [number] [default: 1]
      --failOnFailedSeed                    If set, crawler will fail with exit
                                            code 1 if any seed fails. When combi