This website requires JavaScript.
Explore
Help
Sign in
Stowage
/
browsertrix-crawler
Watch
2
Star
0
Fork
You've already forked browsertrix-crawler
0
mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced
2025-10-19 22:43:17 +00:00
Code
Issues
Projects
Releases
Packages
Wiki
Activity
d723f95cb9
browsertrix-crawler
/
defaultDriver.js
5 lines
94 B
JavaScript
Raw
Normal View
History
Unescape
Escape
refactor crawler and default driver: - add extensible defaultDriver, wrap crawling functionality in Crawler class - support headless/non-headless, custom driver - support custom collection name for pywb, generate-cdx option - autoplay: add slightly delay for splash loading
2020-11-01 19:22:53 -08:00
module
.
exports
=
async
(
{
data
,
page
,
crawler
}
)
=>
{
Per-Seed Scoping Rules + Crawl Depth (#63) * scoped seeds: - support per-seed scoping (include + exclude), allowHash, depth, and sitemap options - support maxDepth per seed #16 - combine --url, --seed and --urlFile/--seedFile urls into a unified seed list arg parsing: - simplify seed file options into --seedFile/--urlFile, move option in help display - rename --maxDepth -> --depth, supported globally and per seed - ensure custom parsed params from argParser passed back correctly (behaviors, logging, device emulation) - update to latest js-yaml - rename --yamlConfig -> --config - config: support reading config from stdin if --config set to 'stdin' * scope: fix typo in 'prefix' scope * update browsertrix-behaviors to 0.2.2 * tests: add test for passing config via stdin, also adding --excludes via cmdline * update README: - latest cli, add docs on config via stdin - rename --yamlConfig -> --config, consolidate --seedFile/--urlFile, move arg position - info on scoped seeds - list current scope types
2021-06-26 13:11:29 -07:00
await
crawler
.
loadPage
(
page
,
data
)
;
refactor crawler and default driver: - add extensible defaultDriver, wrap crawling functionality in Crawler class - support headless/non-headless, custom driver - support custom collection name for pywb, generate-cdx option - autoplay: add slightly delay for splash loading
2020-11-01 19:22:53 -08:00
}
;
Reference in a new issue
Copy permalink