mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 14:33:17 +00:00

* scoped seeds: - support per-seed scoping (include + exclude), allowHash, depth, and sitemap options - support maxDepth per seed #16 - combine --url, --seed and --urlFile/--seedFile urls into a unified seed list arg parsing: - simplify seed file options into --seedFile/--urlFile, move option in help display - rename --maxDepth -> --depth, supported globally and per seed - ensure custom parsed params from argParser passed back correctly (behaviors, logging, device emulation) - update to latest js-yaml - rename --yamlConfig -> --config - config: support reading config from stdin if --config set to 'stdin' * scope: fix typo in 'prefix' scope * update browsertrix-behaviors to 0.2.2 * tests: add test for passing config via stdin, also adding --excludes via cmdline * update README: - latest cli, add docs on config via stdin - rename --yamlConfig -> --config, consolidate --seedFile/--urlFile, move arg position - info on scoped seeds - list current scope types
4 lines
94 B
JavaScript
4 lines
94 B
JavaScript
|
|
module.exports = async ({data, page, crawler}) => {
|
|
await crawler.loadPage(page, data);
|
|
};
|