mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00

* support hashtag for page-scoped crawls: - allow hashtags for current page, automatically set scope to current w/ different hashtags - also allow hashtags for URLs specified via urlFile - driver: simplify driver, move default driver function to loadPage() - bump version to 0.4.0-beta.0 * add --allowHash option to allow hashtags in URLs, enabled for --spaMode but can be set for crawling as well * graceful shutdown: ensure redis and pywb processes shutdown on exit (for use with singularity, outside of docker) * replace spaMode with more generic --scopeType, a shortcut to setting the scope via regex. scopeType options include: prefix - scope is prefix of current page (default) page - scope is current page + hashtags (spa mode) domain - scope is domain/origin of current page any - scope is any url (default for urlFile) - bump version to 0.4.0-beta.1
5 lines
115 B
JavaScript
5 lines
115 B
JavaScript
|
|
module.exports = async ({data, page, crawler}) => {
|
|
const {url} = data;
|
|
await crawler.loadPage(page, url);
|
|
};
|