diff --git a/docs/docs/user-guide/behaviors.md b/docs/docs/user-guide/behaviors.md index a4a234be..b0c34a61 100644 --- a/docs/docs/user-guide/behaviors.md +++ b/docs/docs/user-guide/behaviors.md @@ -268,3 +268,26 @@ Some of these functions which may be of use to behaviors authors are: - `getState`: increment a state counter and return all state counters + string message More detailed references will be added in the future. + +## Fail On Content Check + +In Browsertrix Crawler 1.7.0 and higher, the `--failOnContentCheck` option will result in a crawl failing if a behavior detects the presence or absence of certain content on a page in its `awaitPageLoad()` callback. By default, this is used to fail a crawl if site-specific behaviors determine that the user is not logged in on the following sites: + +- Facebook +- Instagram +- TikTok +- X + +It is also used to fail crawls with YouTube videos if one of the videos is found not to play. + +It is possible to add content checks to custom behaviors. To do so, include an `awaitPageLoad` method on the behavior and use the `ctx.Lib` function `assertContentValid` to check for content and fail the behavior with a specified reason if it is not found. + +For an example, see the following `awaitPageLoad` example from the site-specific behavior for X: + +```javascript +async awaitPageLoad(ctx: any) { + const { sleep, assertContentValid } = ctx.Lib; + await sleep(5); + assertContentValid(() => !document.documentElement.outerHTML.match(/Log In/i), "not_logged_in"); +} +``` diff --git a/docs/docs/user-guide/cli-options.md b/docs/docs/user-guide/cli-options.md index 298b366e..d37160b0 100644 --- a/docs/docs/user-guide/cli-options.md +++ b/docs/docs/user-guide/cli-options.md @@ -261,6 +261,10 @@ Options: ailOnFailedSeed may result in crawl failing due to non-200 responses [boolean] [default: false] + --failOnContentCheck If set, allows for behaviors to fail + a crawl with custom reason based on + content (e.g. logged out) + [boolean] [default: false] --customBehaviors Custom behavior files to inject. Val id values: URL to file, path to file , path to directory of behaviors, UR @@ -272,6 +276,10 @@ Options: git+https://git.example.com/repo.git ?branch=dev&path=some/dir" [array] [default: []] + --saveStorage if set, will store the localStorage/ + sessionStorage data for each page as + part of WARC-JSON-Metadata field + [boolean] --debugAccessRedis if set, runs internal redis without protected mode to allow external acc ess (for debugging) [boolean]