browsertrix-crawler

mirror of https://github.com/webrecorder/browsertrix-crawler.git synced 2025-10-19 06:23:16 +00:00

History

Ilya Kreymer 65933c6b12 Interrupt Handling Fixes (#167 ) * interrupts: simplify interrupt behavior: - SIGTERM/SIGINT behave same way, trigger an graceful shutdown after page load improvements of remote state / parallel crawlers (for browsertrix-cloud): - SIGUSR1 before SIGINT/SIGTERM ensures data is saved, mark crawler as done - for use with graceful stopping crawl - SIGUSR2 before SIGINT/SIGTERM ensures data is saved, does not mark crawler as done - for use with scaling down a single crawler * scope check: check scope of URL retrieved from queue (in case scoping rules changed), urls matching seed automatically in scope!		2022-09-20 17:09:52 -07:00
..
argParser.js	Default Wait-Time Improvements (#162 )	2022-09-08 23:39:26 -07:00
blockrules.js	Page Resource Block Rules Avoid Duplicate Handlers + Ignore top-level pages + README update (0.4.4) (#81 )	2021-08-17 20:54:18 -07:00
browser.js	Logging and browser improvements: (#158 )	2022-08-21 00:30:25 -07:00
constants.js	Customizable extract selectors + typo fix (0.4.2) (#72 )	2021-07-23 18:31:43 -07:00
redis.js	Support for uploading to S3 (#95 )	2021-11-23 12:53:30 -08:00
screencaster.js	Page-reuse concurrency + Browser Repair + Screencaster Cleanup Improvements (#157 )	2022-08-19 09:23:40 -07:00
seeds.js	Interrupt Handling Fixes (#167 )	2022-09-20 17:09:52 -07:00
state.js	0.6.0 Wait State + Screencasting Fixes (#141 )	2022-06-17 11:58:44 -07:00
storage.js	Health Check + Size Limits + Profile fixes (#138 )	2022-05-18 22:51:55 -07:00
textextract.js	Arg Parsing Refactor + Support for YAML Config Support (take 2!) (#59 )	2021-06-23 19:45:40 -07:00
windowconcur.js	Page-reuse concurrency + Browser Repair + Screencaster Cleanup Improvements (#157 )	2022-08-19 09:23:40 -07:00