Fixes#575
- Adds a missing await to fetching the number of failed pages from Redis
- Fixes a typo in the fatal logging message
- Adds a test to ensure that the crawl fails with exit code 17 if
--failOnInvalidStatus and --failOnFailedLimit 1 are set with a url that
will 404
This adds prettier to the repo, and sets up the pre-commit hook to
auto-format as well as lint.
Also updates ignores files to exclude crawls, test-crawls, scratch, dist as needed.
* optimize link extraction: (fixes#376)
- dedup urls in browser first
- don't return entire list of URLs, process one-at-a-time via callback
- add exposeFunction per page in setupPage, then register 'addLink' callback for each pages' handler
- optimize addqueue: atomically check if already at max urls and if url already seen in one redis call
- add QueueState enum to indicate possible states: url added, limit hit, or dupe url
- better logging: log rejected promises for link extraction
- tests: add test for exact page limit being reached