mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00
Autoclick Support (#729)
Adds support for autoclick behavior: - Adds new `autoclick` behavior option to `--behaviors`, but not enabling by default - Adds support for new exposed function `__bx_addSet` which allows autoclick behavior to persist state about links that have already been clicked to avoid duplicates, only used if link has an href - Adds a new pageFinished flag on the worker state. - Adds a on('dialog') handler to reject onbeforeunload page navigations, when in behavior (page not finished), but accept when page is finished - to allow navigation away only when behaviors are done - Update to browsertrix-behaviors 0.7.0, which supports autoclick - Add --clickSelector option to customize elements that will be clicked, defaulting to `a`. - Add --linkSelector as alias for --selectLinks for consistency - Unknown options for --behaviors printed as warnings, instead of hard exit, for forward compatibility for new behavior types in the future Fixes #728, also #216, #665, #31
This commit is contained in:
parent
871490758a
commit
b7150f1343
14 changed files with 259 additions and 108 deletions
|
@ -50,11 +50,14 @@ Options:
|
|||
e-page-application crawling or when
|
||||
different hashtags load dynamic cont
|
||||
ent
|
||||
--selectLinks one or more selectors for extracting
|
||||
--selectLinks, --linkSelector One or more selectors for extracting
|
||||
links, in the format [css selector]
|
||||
->[property to use],[css selector]->
|
||||
@[attribute to use]
|
||||
[array] [default: ["a[href]->href"]]
|
||||
--clickSelector Selector for elements to click when
|
||||
using the autoclick behavior
|
||||
[string] [default: "a"]
|
||||
--blockRules Additional rules for blocking certai
|
||||
n URLs from being loaded, by URL reg
|
||||
ex and optionally via text match in
|
||||
|
@ -75,7 +78,8 @@ Options:
|
|||
[string] [default: "crawl-@ts"]
|
||||
--headless Run in headless mode, otherwise star
|
||||
t xvfb [boolean] [default: false]
|
||||
--driver JS driver for the crawler [string]
|
||||
--driver Custom driver for the crawler, if an
|
||||
y [string]
|
||||
--generateCDX, --generatecdx, --gene If set, generate index (CDXJ) for us
|
||||
rateCdx e with pywb after crawl is done
|
||||
[boolean] [default: false]
|
||||
|
@ -142,8 +146,7 @@ Options:
|
|||
o crawl working directory) [string]
|
||||
--behaviors Which background behaviors to enable
|
||||
on each page
|
||||
[array] [choices: "autoplay", "autofetch", "autoscroll", "siteSpecific"] [defa
|
||||
ult: ["autoplay","autofetch","autoscroll","siteSpecific"]]
|
||||
[array] [default: ["autoplay","autofetch","autoscroll","siteSpecific"]]
|
||||
--behaviorTimeout If >0, timeout (in seconds) for in-p
|
||||
age behavior will run on each page.
|
||||
If 0, a behavior can run until finis
|
||||
|
@ -163,8 +166,10 @@ Options:
|
|||
hich contains the browser profile di
|
||||
rectory [string]
|
||||
--screenshot Screenshot options for crawler, can
|
||||
include: view, thumbnail, fullPage
|
||||
[array] [choices: "view", "thumbnail", "fullPage"] [default: []]
|
||||
include: view, thumbnail, fullPage,
|
||||
fullPageFinal
|
||||
[array] [choices: "view", "thumbnail", "fullPage", "fullPageFinal"] [default:
|
||||
[]]
|
||||
--screencastPort If set to a non-zero value, starts a
|
||||
n HTTP server with screencast access
|
||||
ible on this port
|
||||
|
@ -251,9 +256,15 @@ Options:
|
|||
failing due to non-200 responses
|
||||
[boolean] [default: false]
|
||||
--customBehaviors Custom behavior files to inject. Val
|
||||
ues can be URLs, paths to individual
|
||||
behavior files, or paths to a direc
|
||||
tory of behavior files
|
||||
id values: URL to file, path to file
|
||||
, path to directory of behaviors, UR
|
||||
L to Git repo of behaviors (prefixed
|
||||
with git+, optionally specify branc
|
||||
h and relative path to a directory w
|
||||
ithin repo as branch and path query
|
||||
parameters, e.g. --customBehaviors "
|
||||
git+https://git.example.com/repo.git
|
||||
?branch=dev&path=some/dir"
|
||||
[array] [default: []]
|
||||
--debugAccessRedis if set, runs internal redis without
|
||||
protected mode to allow external acc
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue