Add MKDocs documentation site for Browsertrix Crawler 1.0.0 (#494)

Fixes #493 This PR updates the documentation for Browsertrix Crawler 1.0.0 and moves it from the project README to an MKDocs site. Initial docs site set to https://crawler.docs.browsertrix.com/ Many thanks to @Shrinks99 for help setting this up! --------- Co-authored-by: Henry Wilkinson <henry@wilkinson.graphics> Co-authored-by: Ilya Kreymer <ikreymer@gmail.com>
2025-10-19 06:23:16 +00:00 · 2024-03-16 17:59:32 -04:00 · 2024-03-16 17:59:32 -04:00 · e1fe028c7c
commit e1fe028c7c
parent 6d04c9575f
47 changed files with 1238 additions and 795 deletions
--- a/docs/docs/assets/brand/browsertrix-crawler-icon-color-dynamic.svg
+++ b/docs/docs/assets/brand/browsertrix-crawler-icon-color-dynamic.svg
@ -0,0 +1,11 @@
+<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" fill-rule="evenodd"
+  stroke-linejoin="round" stroke-miterlimit="2" clip-rule="evenodd" viewBox="0 0 24 24">
+  <style>
+    .b{fill: #0891b2;}.g{fill: #4d7c0f;}
+    @media (prefers-color-scheme: dark) {
+      .b{ fill: #0AAED7; }.g{ fill: #65A414;}
+    }
+  </style>
+  <path class="b" d="m18.59 15.34-5.78-3.62a.24.24 0 0 1 0-.4l5.77-3.62a7.16 7.16 0 0 1 0 7.64Z"/>
+  <path class="g" d="M22.04 17.5c.06.03.1.08.11.15.01.06 0 .13-.03.18a11.52 11.52 0 1 1 0-12.62.24.24 0 0 1-.08.33L18.58 7.7a7.2 7.2 0 1 0 0 7.64s2.7 1.67 3.46 2.16Z"/>
+</svg>
--- a/docs/docs/assets/brand/browsertrix-crawler-white.svg
+++ b/docs/docs/assets/brand/browsertrix-crawler-white.svg
@ -0,0 +1,8 @@
+<svg xmlns="http://www.w3.org/2000/svg" xml:space="preserve" fill-rule="evenodd"
+  stroke-linejoin="round" stroke-miterlimit="2" clip-rule="evenodd" viewBox="0 0 24 24">
+  <path fill="none" d="M0 0h23.04v23.04H0z"/>
+  <g fill="#fff">
+    <path d="m18.59 15.34-5.78-3.62a.24.24 0 0 1 0-.4l5.77-3.62a7.16 7.16 0 0 1 0 7.64Z"/>
+    <path d="M22.04 17.5c.06.03.1.08.11.15.01.06 0 .13-.03.18a11.52 11.52 0 1 1 0-12.62.24.24 0 0 1-.08.33L18.58 7.7a7.2 7.2 0 1 0 0 7.64s2.7 1.67 3.46 2.16Z"/>
+  </g>
+</svg>
--- a/docs/docs/assets/fonts/Inter-Italic.var.woff2
+++ b/docs/docs/assets/fonts/Inter-Italic.var.woff2
--- a/docs/docs/assets/fonts/Inter.var.woff2
+++ b/docs/docs/assets/fonts/Inter.var.woff2
--- a/docs/docs/assets/fonts/Recursive_VF_1.084.woff2
+++ b/docs/docs/assets/fonts/Recursive_VF_1.084.woff2
--- a/docs/docs/develop/docs.md
+++ b/docs/docs/develop/docs.md
@ -0,0 +1,23 @@
+# Documentation
+
+This documentation is built with the [Mkdocs](https://www.mkdocs.org/) static site generator.
+
+## Docs Setup
+
+Python is required to build the docs, then run:
+
+    pip install mkdocs-material
+
+
+## Docs Server
+
+To start the docs server, simply run:
+
+    mkdocs serve
+
+The documentation will then be available on `http://localhost:8000/`
+
+The command-line options are rebuilt using the `docs/gen-cli.sh` script.
+
+Refer to the [Mkdocs](https://www.mkdocs.org/) and [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) pages
+for more info about the documentation.
--- a/docs/docs/develop/index.md
+++ b/docs/docs/develop/index.md
@ -0,0 +1,39 @@
+# Development
+
+## Usage with Docker Compose
+
+Many examples in User Guide demonstrate running Browsertrix Crawler with `docker run`.
+
+Docker Compose is recommended for building the image and for simple configurations. A simple Docker Compose configuration file is included in the Git repository.
+
+To build the latest image, run:
+
+```sh
+docker-compose build
+```
+
+Docker Compose also simplifies some config options, such as mounting the volume for the crawls.
+
+The following command starts a crawl with 2 workers and generates the CDX:
+
+```sh
+docker-compose run crawler crawl --url https://webrecorder.net/ --generateCDX --collection wr-net --workers 2
+```
+
+In this example, the crawl data is written to `./crawls/collections/wr-net` by default.
+
+While the crawl is running, the status of the crawl prints the progress to the JSON-L log output. This can be disabled by using the `--logging` option and not including `stats`.
+
+## Multi-Platform Build / Support for Apple Silicon
+
+Browsertrix Crawler uses a browser image which supports amd64 and arm64.
+
+This means Browsertrix Crawler can be built natively on Apple Silicon systems using the default settings. Running `docker-compose build` on an Apple Silicon should build a native version that should work for development.
+
+## Modifying Browser Image
+
+It is also possible to build Browsertrix Crawler with a different browser image. Currently, browser images using Brave Browser and Chrome/Chromium (depending on host system chip architecture) are supported via [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), however, only Brave Browser receives regular version updates from us.
+
+The browser base image used is specified and can be changed at the top of the Dockerfile in the Browsertrix Crawler repo.
+
+Custom browser images can be used by forking [browsertrix-browser-base](https://github.com/webrecorder/browsertrix-browser-base), locally building or publishing an image, and then modifying the Dockerfile in this repo to build from that image.
--- a/docs/docs/index.md
+++ b/docs/docs/index.md
@ -0,0 +1,40 @@
+---
+hide:
+  - navigation
+  - toc
+---
+
+# Home
+
+Welcome to the Browsertrix Crawler official documentation.
+
+Browsertrix Crawler is a simplified browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container. Browsertrix Crawler uses [Puppeteer](https://github.com/puppeteer/puppeteer) to control one or more [Brave Browser](https://brave.com/) browser windows in parallel. Data is captured through the [Chrome Devtools Protocol (CDP)](https://chromedevtools.github.io/devtools-protocol/) in the browser.
+
+
+!!! note
+
+    This documentation applies to Browsertrix Crawler versions 1.0.0 and above. Documentation for earlier versions of the crawler is available in the [Browsertrix Crawler Github repository](https://github.com/webrecorder/browsertrix-crawler)'s README file in older commits.
+
+
+## Features
+
+
+- Single-container, browser based crawling with a headless/headful browser running pages in multiple windows.
+- Support for custom browser behaviors, using [Browsertrix Behaviors](https://github.com/webrecorder/browsertrix-behaviors) including autoscroll, video autoplay, and site-specific behaviors.
+- YAML-based configuration, passed via file or via stdin.
+- Seed lists and per-seed scoping rules.
+- URL blocking rules to block capture of specific URLs (including by iframe URL and/or by iframe contents).
+- Screencasting: Ability to watch crawling in real-time.
+- Screenshotting: Ability to take thumbnails, full page screenshots, and/or screenshots of the initial page view.
+- Optimized (non-browser) capture of non-HTML resources.
+- Extensible Puppeteer driver script for customizing behavior per crawl or page.
+- Ability to create and reuse browser profiles interactively or via automated user/password login using an embedded browser.
+- Multi-platform support — prebuilt Docker images available for Intel/AMD and Apple Silicon (M1/M2) CPUs.
+
+## Documentation
+
+If something is missing, unclear, or seems incorrect, please open an [issue](https://github.com/webrecorder/browsertrix-crawler/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc) and we'll try to make sure that your questions get answered here in the future!
+
+## Code
+
+Browsertrix Crawler is free and open source software, with all code available in the [main repository on Github](https://github.com/webrecorder/browsertrix-crawler).
--- a/docs/docs/overrides/.icons/bootstrap/bug-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/bug-fill.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-bug-fill" viewBox="0 0 16 16">
+  <path d="M4.978.855a.5.5 0 1 0-.956.29l.41 1.352A4.985 4.985 0 0 0 3 6h10a4.985 4.985 0 0 0-1.432-3.503l.41-1.352a.5.5 0 1 0-.956-.29l-.291.956A4.978 4.978 0 0 0 8 1a4.979 4.979 0 0 0-2.731.811l-.29-.956z"/>
+  <path d="M13 6v1H8.5v8.975A5 5 0 0 0 13 11h.5a.5.5 0 0 1 .5.5v.5a.5.5 0 1 0 1 0v-.5a1.5 1.5 0 0 0-1.5-1.5H13V9h1.5a.5.5 0 0 0 0-1H13V7h.5A1.5 1.5 0 0 0 15 5.5V5a.5.5 0 0 0-1 0v.5a.5.5 0 0 1-.5.5H13zm-5.5 9.975V7H3V6h-.5a.5.5 0 0 1-.5-.5V5a.5.5 0 0 0-1 0v.5A1.5 1.5 0 0 0 2.5 7H3v1H1.5a.5.5 0 0 0 0 1H3v1h-.5A1.5 1.5 0 0 0 1 11.5v.5a.5.5 0 1 0 1 0v-.5a.5.5 0 0 1 .5-.5H3a5 5 0 0 0 4.5 4.975z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/chat-left-text-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/chat-left-text-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-chat-left-text-fill" viewBox="0 0 16 16">
+  <path d="M0 2a2 2 0 0 1 2-2h12a2 2 0 0 1 2 2v8a2 2 0 0 1-2 2H4.414a1 1 0 0 0-.707.293L.854 15.146A.5.5 0 0 1 0 14.793V2zm3.5 1a.5.5 0 0 0 0 1h9a.5.5 0 0 0 0-1h-9zm0 2.5a.5.5 0 0 0 0 1h9a.5.5 0 0 0 0-1h-9zm0 2.5a.5.5 0 0 0 0 1h5a.5.5 0 0 0 0-1h-5z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/check-circle-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/check-circle-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-check-circle-fill" viewBox="0 0 16 16">
+  <path d="M16 8A8 8 0 1 1 0 8a8 8 0 0 1 16 0zm-3.97-3.03a.75.75 0 0 0-1.08.022L7.477 9.417 5.384 7.323a.75.75 0 0 0-1.06 1.06L6.97 11.03a.75.75 0 0 0 1.079-.02l3.992-4.99a.75.75 0 0 0-.01-1.05z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/check-circle.svg
+++ b/docs/docs/overrides/.icons/bootstrap/check-circle.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-check-circle" viewBox="0 0 16 16">
+  <path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
+  <path d="M10.97 4.97a.235.235 0 0 0-.02.022L7.477 9.417 5.384 7.323a.75.75 0 0 0-1.06 1.06L6.97 11.03a.75.75 0 0 0 1.079-.02l3.992-4.99a.75.75 0 0 0-1.071-1.05"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/dash-circle.svg
+++ b/docs/docs/overrides/.icons/bootstrap/dash-circle.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-dash-circle" viewBox="0 0 16 16">
+  <path d="M8 15A7 7 0 1 1 8 1a7 7 0 0 1 0 14m0 1A8 8 0 1 0 8 0a8 8 0 0 0 0 16"/>
+  <path d="M4 8a.5.5 0 0 1 .5-.5h7a.5.5 0 0 1 0 1h-7A.5.5 0 0 1 4 8"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/exclamation-circle-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/exclamation-circle-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-exclamation-circle-fill" viewBox="0 0 16 16">
+  <path d="M16 8A8 8 0 1 1 0 8a8 8 0 0 1 16 0zM8 4a.905.905 0 0 0-.9.995l.35 3.507a.552.552 0 0 0 1.1 0l.35-3.507A.905.905 0 0 0 8 4zm.002 6a1 1 0 1 0 0 2 1 1 0 0 0 0-2z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/exclamation-diamond-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/exclamation-diamond-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-exclamation-diamond-fill" viewBox="0 0 16 16">
+  <path d="M9.05.435c-.58-.58-1.52-.58-2.1 0L.436 6.95c-.58.58-.58 1.519 0 2.098l6.516 6.516c.58.58 1.519.58 2.098 0l6.516-6.516c.58-.58.58-1.519 0-2.098L9.05.435zM8 4c.535 0 .954.462.9.995l-.35 3.507a.552.552 0 0 1-1.1 0L7.1 4.995A.905.905 0 0 1 8 4zm.002 6a1 1 0 1 1 0 2 1 1 0 0 1 0-2z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/exclamation-triangle-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/exclamation-triangle-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-exclamation-triangle-fill" viewBox="0 0 16 16">
+  <path d="M8.982 1.566a1.13 1.13 0 0 0-1.96 0L.165 13.233c-.457.778.091 1.767.98 1.767h13.713c.889 0 1.438-.99.98-1.767L8.982 1.566zM8 5c.535 0 .954.462.9.995l-.35 3.507a.552.552 0 0 1-1.1 0L7.1 5.995A.905.905 0 0 1 8 5zm.002 6a1 1 0 1 1 0 2 1 1 0 0 1 0-2z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/exclamation-triangle.svg
+++ b/docs/docs/overrides/.icons/bootstrap/exclamation-triangle.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-exclamation-triangle" viewBox="0 0 16 16">
+  <path d="M7.938 2.016A.13.13 0 0 1 8.002 2a.13.13 0 0 1 .063.016.146.146 0 0 1 .054.057l6.857 11.667c.036.06.035.124.002.183a.163.163 0 0 1-.054.06.116.116 0 0 1-.066.017H1.146a.115.115 0 0 1-.066-.017.163.163 0 0 1-.054-.06.176.176 0 0 1 .002-.183L7.884 2.073a.147.147 0 0 1 .054-.057zm1.044-.45a1.13 1.13 0 0 0-1.96 0L.165 13.233c-.457.778.091 1.767.98 1.767h13.713c.889 0 1.438-.99.98-1.767L8.982 1.566z"/>
+  <path d="M7.002 12a1 1 0 1 1 2 0 1 1 0 0 1-2 0zM7.1 5.995a.905.905 0 1 1 1.8 0l-.35 3.507a.552.552 0 0 1-1.1 0z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/eye.svg
+++ b/docs/docs/overrides/.icons/bootstrap/eye.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-eye" viewBox="0 0 16 16">
+  <path d="M16 8s-3-5.5-8-5.5S0 8 0 8s3 5.5 8 5.5S16 8 16 8zM1.173 8a13.133 13.133 0 0 1 1.66-2.043C4.12 4.668 5.88 3.5 8 3.5c2.12 0 3.879 1.168 5.168 2.457A13.133 13.133 0 0 1 14.828 8c-.058.087-.122.183-.195.288-.335.48-.83 1.12-1.465 1.755C11.879 11.332 10.119 12.5 8 12.5c-2.12 0-3.879-1.168-5.168-2.457A13.134 13.134 0 0 1 1.172 8z"/>
+  <path d="M8 5.5a2.5 2.5 0 1 0 0 5 2.5 2.5 0 0 0 0-5zM4.5 8a3.5 3.5 0 1 1 7 0 3.5 3.5 0 0 1-7 0z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/file-earmark-text-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/file-earmark-text-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-file-earmark-text-fill" viewBox="0 0 16 16">
+  <path d="M9.293 0H4a2 2 0 0 0-2 2v12a2 2 0 0 0 2 2h8a2 2 0 0 0 2-2V4.707A1 1 0 0 0 13.707 4L10 .293A1 1 0 0 0 9.293 0zM9.5 3.5v-2l3 3h-2a1 1 0 0 1-1-1zM4.5 9a.5.5 0 0 1 0-1h7a.5.5 0 0 1 0 1h-7zM4 10.5a.5.5 0 0 1 .5-.5h7a.5.5 0 0 1 0 1h-7a.5.5 0 0 1-.5-.5zm.5 2.5a.5.5 0 0 1 0-1h4a.5.5 0 0 1 0 1h-4z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/github.svg
+++ b/docs/docs/overrides/.icons/bootstrap/github.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-github" viewBox="0 0 16 16">
+  <path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.012 8.012 0 0 0 16 8c0-4.42-3.58-8-8-8z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/globe.svg
+++ b/docs/docs/overrides/.icons/bootstrap/globe.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-globe" viewBox="0 0 16 16">
+  <path d="M0 8a8 8 0 1 1 16 0A8 8 0 0 1 0 8zm7.5-6.923c-.67.204-1.335.82-1.887 1.855A7.97 7.97 0 0 0 5.145 4H7.5V1.077zM4.09 4a9.267 9.267 0 0 1 .64-1.539 6.7 6.7 0 0 1 .597-.933A7.025 7.025 0 0 0 2.255 4H4.09zm-.582 3.5c.03-.877.138-1.718.312-2.5H1.674a6.958 6.958 0 0 0-.656 2.5h2.49zM4.847 5a12.5 12.5 0 0 0-.338 2.5H7.5V5H4.847zM8.5 5v2.5h2.99a12.495 12.495 0 0 0-.337-2.5H8.5zM4.51 8.5a12.5 12.5 0 0 0 .337 2.5H7.5V8.5H4.51zm3.99 0V11h2.653c.187-.765.306-1.608.338-2.5H8.5zM5.145 12c.138.386.295.744.468 1.068.552 1.035 1.218 1.65 1.887 1.855V12H5.145zm.182 2.472a6.696 6.696 0 0 1-.597-.933A9.268 9.268 0 0 1 4.09 12H2.255a7.024 7.024 0 0 0 3.072 2.472zM3.82 11a13.652 13.652 0 0 1-.312-2.5h-2.49c.062.89.291 1.733.656 2.5H3.82zm6.853 3.472A7.024 7.024 0 0 0 13.745 12H11.91a9.27 9.27 0 0 1-.64 1.539 6.688 6.688 0 0 1-.597.933zM8.5 12v2.923c.67-.204 1.335-.82 1.887-1.855.173-.324.33-.682.468-1.068H8.5zm3.68-1h2.146c.365-.767.594-1.61.656-2.5h-2.49a13.65 13.65 0 0 1-.312 2.5zm2.802-3.5a6.959 6.959 0 0 0-.656-2.5H12.18c.174.782.282 1.623.312 2.5h2.49zM11.27 2.461c.247.464.462.98.64 1.539h1.835a7.024 7.024 0 0 0-3.072-2.472c.218.284.418.598.597.933zM10.855 4a7.966 7.966 0 0 0-.468-1.068C9.835 1.897 9.17 1.282 8.5 1.077V4h2.355z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/info-circle-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/info-circle-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-info-circle-fill" viewBox="0 0 16 16">
+  <path d="M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16zm.93-9.412-1 4.705c-.07.34.029.533.304.533.194 0 .487-.07.686-.246l-.088.416c-.287.346-.92.598-1.465.598-.703 0-1.002-.422-.808-1.319l.738-3.468c.064-.293.006-.399-.287-.47l-.451-.081.082-.381 2.29-.287zM8 5.5a1 1 0 1 1 0-2 1 1 0 0 1 0 2z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/mastodon.svg
+++ b/docs/docs/overrides/.icons/bootstrap/mastodon.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-mastodon" viewBox="0 0 16 16">
+  <path d="M11.19 12.195c2.016-.24 3.77-1.475 3.99-2.603.348-1.778.32-4.339.32-4.339 0-3.47-2.286-4.488-2.286-4.488C12.062.238 10.083.017 8.027 0h-.05C5.92.017 3.942.238 2.79.765c0 0-2.285 1.017-2.285 4.488l-.002.662c-.004.64-.007 1.35.011 2.091.083 3.394.626 6.74 3.78 7.57 1.454.383 2.703.463 3.709.408 1.823-.1 2.847-.647 2.847-.647l-.06-1.317s-1.303.41-2.767.36c-1.45-.05-2.98-.156-3.215-1.928a3.614 3.614 0 0 1-.033-.496s1.424.346 3.228.428c1.103.05 2.137-.064 3.188-.189zm1.613-2.47H11.13v-4.08c0-.859-.364-1.295-1.091-1.295-.804 0-1.207.517-1.207 1.541v2.233H7.168V5.89c0-1.024-.403-1.541-1.207-1.541-.727 0-1.091.436-1.091 1.296v4.079H3.197V5.522c0-.859.22-1.541.66-2.046.456-.505 1.052-.764 1.793-.764.856 0 1.504.328 1.933.983L8 4.39l.417-.695c.429-.655 1.077-.983 1.934-.983.74 0 1.336.259 1.791.764.442.505.661 1.187.661 2.046v4.203z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/mortarboard-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/mortarboard-fill.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-mortarboard-fill" viewBox="0 0 16 16">
+  <path d="M8.211 2.047a.5.5 0 0 0-.422 0l-7.5 3.5a.5.5 0 0 0 .025.917l7.5 3a.5.5 0 0 0 .372 0L14 7.14V13a1 1 0 0 0-1 1v2h3v-2a1 1 0 0 0-1-1V6.739l.686-.275a.5.5 0 0 0 .025-.917l-7.5-3.5Z"/>
+  <path d="M4.176 9.032a.5.5 0 0 0-.656.327l-.5 1.7a.5.5 0 0 0 .294.605l4.5 1.8a.5.5 0 0 0 .372 0l4.5-1.8a.5.5 0 0 0 .294-.605l-.5-1.7a.5.5 0 0 0-.656-.327L8 10.466 4.176 9.032Z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/pencil-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/pencil-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-pencil-fill" viewBox="0 0 16 16">
+  <path d="M12.854.146a.5.5 0 0 0-.707 0L10.5 1.793 14.207 5.5l1.647-1.646a.5.5 0 0 0 0-.708l-3-3zm.646 6.061L9.793 2.5 3.293 9H3.5a.5.5 0 0 1 .5.5v.5h.5a.5.5 0 0 1 .5.5v.5h.5a.5.5 0 0 1 .5.5v.5h.5a.5.5 0 0 1 .5.5v.207l6.5-6.5zm-7.468 7.468A.5.5 0 0 1 6 13.5V13h-.5a.5.5 0 0 1-.5-.5V12h-.5a.5.5 0 0 1-.5-.5V11h-.5a.5.5 0 0 1-.5-.5V10h-.5a.499.499 0 0 1-.175-.032l-.179.178a.5.5 0 0 0-.11.168l-2 5a.5.5 0 0 0 .65.65l5-2a.5.5 0 0 0 .168-.11l.178-.178z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/pencil.svg
+++ b/docs/docs/overrides/.icons/bootstrap/pencil.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-pencil" viewBox="0 0 16 16">
+  <path d="M12.146.146a.5.5 0 0 1 .708 0l3 3a.5.5 0 0 1 0 .708l-10 10a.5.5 0 0 1-.168.11l-5 2a.5.5 0 0 1-.65-.65l2-5a.5.5 0 0 1 .11-.168l10-10zM11.207 2.5 13.5 4.793 14.793 3.5 12.5 1.207 11.207 2.5zm1.586 3L10.5 3.207 4 9.707V10h.5a.5.5 0 0 1 .5.5v.5h.5a.5.5 0 0 1 .5.5v.5h.293l6.5-6.5zm-9.761 5.175-.106.106-1.528 3.821 3.821-1.528.106-.106A.5.5 0 0 1 5 12.5V12h-.5a.5.5 0 0 1-.5-.5V11h-.5a.5.5 0 0 1-.468-.325z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/question-circle-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/question-circle-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-question-circle-fill" viewBox="0 0 16 16">
+  <path d="M16 8A8 8 0 1 1 0 8a8 8 0 0 1 16 0zM5.496 6.033h.825c.138 0 .248-.113.266-.25.09-.656.54-1.134 1.342-1.134.686 0 1.314.343 1.314 1.168 0 .635-.374.927-.965 1.371-.673.489-1.206 1.06-1.168 1.987l.003.217a.25.25 0 0 0 .25.246h.811a.25.25 0 0 0 .25-.25v-.105c0-.718.273-.927 1.01-1.486.609-.463 1.244-.977 1.244-2.056 0-1.511-1.276-2.241-2.673-2.241-1.267 0-2.655.59-2.75 2.286a.237.237 0 0 0 .241.247zm2.325 6.443c.61 0 1.029-.394 1.029-.927 0-.552-.42-.94-1.029-.94-.584 0-1.009.388-1.009.94 0 .533.425.927 1.01.927z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/quote.svg
+++ b/docs/docs/overrides/.icons/bootstrap/quote.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-quote" viewBox="0 0 16 16">
+  <path d="M12 12a1 1 0 0 0 1-1V8.558a1 1 0 0 0-1-1h-1.388c0-.351.021-.703.062-1.054.062-.372.166-.703.31-.992.145-.29.331-.517.559-.683.227-.186.516-.279.868-.279V3c-.579 0-1.085.124-1.52.372a3.322 3.322 0 0 0-1.085.992 4.92 4.92 0 0 0-.62 1.458A7.712 7.712 0 0 0 9 7.558V11a1 1 0 0 0 1 1h2Zm-6 0a1 1 0 0 0 1-1V8.558a1 1 0 0 0-1-1H4.612c0-.351.021-.703.062-1.054.062-.372.166-.703.31-.992.145-.29.331-.517.559-.683.227-.186.516-.279.868-.279V3c-.579 0-1.085.124-1.52.372a3.322 3.322 0 0 0-1.085.992 4.92 4.92 0 0 0-.62 1.458A7.712 7.712 0 0 0 3 7.558V11a1 1 0 0 0 1 1h2Z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/x-octagon-fill.svg
+++ b/docs/docs/overrides/.icons/bootstrap/x-octagon-fill.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-x-octagon-fill" viewBox="0 0 16 16">
+  <path d="M11.46.146A.5.5 0 0 0 11.107 0H4.893a.5.5 0 0 0-.353.146L.146 4.54A.5.5 0 0 0 0 4.893v6.214a.5.5 0 0 0 .146.353l4.394 4.394a.5.5 0 0 0 .353.146h6.214a.5.5 0 0 0 .353-.146l4.394-4.394a.5.5 0 0 0 .146-.353V4.893a.5.5 0 0 0-.146-.353L11.46.146zm-6.106 4.5L8 7.293l2.646-2.647a.5.5 0 0 1 .708.708L8.707 8l2.647 2.646a.5.5 0 0 1-.708.708L8 8.707l-2.646 2.647a.5.5 0 0 1-.708-.708L7.293 8 4.646 5.354a.5.5 0 1 1 .708-.708z"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/x-octagon.svg
+++ b/docs/docs/overrides/.icons/bootstrap/x-octagon.svg
@ -0,0 +1,4 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-x-octagon" viewBox="0 0 16 16">
+  <path d="M4.54.146A.5.5 0 0 1 4.893 0h6.214a.5.5 0 0 1 .353.146l4.394 4.394a.5.5 0 0 1 .146.353v6.214a.5.5 0 0 1-.146.353l-4.394 4.394a.5.5 0 0 1-.353.146H4.893a.5.5 0 0 1-.353-.146L.146 11.46A.5.5 0 0 1 0 11.107V4.893a.5.5 0 0 1 .146-.353L4.54.146zM5.1 1 1 5.1v5.8L5.1 15h5.8l4.1-4.1V5.1L10.9 1z"/>
+  <path d="M4.646 4.646a.5.5 0 0 1 .708 0L8 7.293l2.646-2.647a.5.5 0 0 1 .708.708L8.707 8l2.647 2.646a.5.5 0 0 1-.708.708L8 8.707l-2.646 2.647a.5.5 0 0 1-.708-.708L7.293 8 4.646 5.354a.5.5 0 0 1 0-.708"/>
+</svg>
--- a/docs/docs/overrides/.icons/bootstrap/youtube.svg
+++ b/docs/docs/overrides/.icons/bootstrap/youtube.svg
@ -0,0 +1,3 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" fill="currentColor" class="bi bi-youtube" viewBox="0 0 16 16">
+  <path d="M8.051 1.999h.089c.822.003 4.987.033 6.11.335a2.01 2.01 0 0 1 1.415 1.42c.101.38.172.883.22 1.402l.01.104.022.26.008.104c.065.914.073 1.77.074 1.957v.075c-.001.194-.01 1.108-.082 2.06l-.008.105-.009.104c-.05.572-.124 1.14-.235 1.558a2.007 2.007 0 0 1-1.415 1.42c-1.16.312-5.569.334-6.18.335h-.142c-.309 0-1.587-.006-2.927-.052l-.17-.006-.087-.004-.171-.007-.171-.007c-1.11-.049-2.167-.128-2.654-.26a2.007 2.007 0 0 1-1.415-1.419c-.111-.417-.185-.986-.235-1.558L.09 9.82l-.008-.104A31.4 31.4 0 0 1 0 7.68v-.123c.002-.215.01-.958.064-1.778l.007-.103.003-.052.008-.104.022-.26.01-.104c.048-.519.119-1.023.22-1.402a2.007 2.007 0 0 1 1.415-1.42c.487-.13 1.544-.21 2.654-.26l.17-.007.172-.006.086-.003.171-.007A99.788 99.788 0 0 1 7.858 2h.193zM6.4 5.209v4.818l4.157-2.408L6.4 5.209z"/>
+</svg>
--- a/docs/docs/overrides/main.html
+++ b/docs/docs/overrides/main.html
@ -0,0 +1,2 @@
+{% extends "base.html" %} {% block icons %} {% set icon_path =
+"overrides/.icons/bootstrap/" %} {{ super() }} {% endblock %}
--- a/docs/docs/stylesheets/extra.css
+++ b/docs/docs/stylesheets/extra.css
@ -0,0 +1,173 @@
+/* Font style definitions */
+
+@font-face {
+  font-family: "Recursive";
+  font-style: oblique 0deg 15deg;
+  font-weight: 300 1000;
+  src: url("../assets/fonts/Recursive_VF_1.084.woff2") format("woff2");
+  font-feature-settings: "ss12";
+}
+
+@font-face {
+  font-family: "Inter";
+  font-weight: 100 900;
+  font-display: swap;
+  font-style: normal;
+  src: url("../assets/fonts/Inter.var.woff2") format("woff2");
+  font-feature-settings: "ss03";
+}
+
+@font-face {
+  font-family: "Inter";
+  font-weight: 100 900;
+  font-display: swap;
+  font-style: italic;
+  src: url("../assets/fonts/Inter-Italic.var.woff2") format("woff2");
+  font-feature-settings: "ss03";
+}
+
+:root {
+  --md-code-font: "Recursive", monospace;
+  --md-text-font: "Inter", "Helvetica", "Arial", sans-serif;
+  --wr-blue-primary: #0891B2;
+  --wr-orange-primary: #C96509;
+}
+
+[data-md-color-scheme="webrecorder"] {
+  --md-primary-fg-color: #4D7C0F;
+  --md-primary-fg-color--light: #0782A1;
+  --md-primary-fg-color--dark: #066B84;
+  --md-typeset-color: black;
+  --md-accent-fg-color: #0782A1;
+  --md-typeset-a-color: #066B84;
+  --md-code-bg-color: #F9FAFB;
+}
+
+/* Nav changes */
+
+.md-header__title {
+  font-family: var(--md-code-font);
+  font-variation-settings: "MONO" 0.51;
+}
+
+.md-header__title--active {
+  font-family: var(--md-text-font);
+  font-weight: 600;
+}
+
+/* Custom menu item hover */
+
+.md-tabs__link {
+  font-family: var(--md-code-font);
+  font-weight: 400;
+  opacity: 0.9;
+  transition:
+    0.4s cubic-bezier(0.1, 0.7, 0.1, 1),
+    opacity 0.25s;
+}
+
+.md-tabs__link:hover {
+  font-weight: 600;
+}
+
+/* Custom body typography rules */
+
+.md-typeset a {
+  text-decoration: underline;
+}
+
+.headerlink {
+  text-decoration: none !important;
+}
+
+code,
+pre,
+kbd {
+  font-variation-settings: "MONO" 1;
+  font-feature-settings: "ss01", "ss02", "ss08";
+}
+
+code {
+  border-width: 1px;
+  border-color: #d1d5db;
+  border-style: solid;
+}
+
+.md-typeset h1,
+h2,
+h3,
+h4,
+h5 {
+  color: black;
+}
+
+.md-typeset h1,
+h2,
+h3 {
+  font-weight: 650 !important;
+  font-variation-settings: "OPSZ" 35;
+}
+
+/* Custom badge classes, applies custom overrides to inline-code blocks */
+
+.badge-blue {
+  background-color: var(--wr-blue-primary) !important;
+  border-color: var(--wr-blue-primary) !important;
+  color: white !important;
+  font-family: var(--md-text-font);
+  font-weight: 600;
+}
+
+.badge-green {
+  background-color: hsl(142 76% 36%) !important;
+  border-color: hsl(142 76% 36%) !important;
+  color: white !important;
+  font-family: var(--md-text-font);
+  font-weight: 600;
+}
+
+.badge-orange {
+  background-color: var(--wr-orange-primary) !important;
+  border-color: var(--wr-orange-primary) !important;
+  color: white !important;
+  font-family: var(--md-text-font);
+  font-weight: 600;
+}
+
+/* Status Styling */
+
+.status-success {
+  font-family: var(--md-code-font);
+  font-weight: 500;
+  white-space: nowrap;
+  & svg {
+    color: hsl(142.1 76.2% 36.3%);
+  }
+}
+
+.status-warning {
+  font-family: var(--md-code-font);
+  font-weight: 500;
+  white-space: nowrap;
+  & svg {
+    color: hsl(32.1 94.6% 43.7%);
+  }
+}
+
+.status-danger {
+  font-family: var(--md-code-font);
+  font-weight: 500;
+  white-space: nowrap;
+  & svg {
+    color: hsl(0 72.2% 50.6%);
+  }
+}
+
+.status-waiting {
+  font-family: var(--md-code-font);
+  font-weight: 500;
+  white-space: nowrap;
+  & svg {
+    color: hsl(271.5 81.3% 55.9%);
+  }
+}
--- a/docs/docs/user-guide/behaviors.md
+++ b/docs/docs/user-guide/behaviors.md
@ -0,0 +1,23 @@
+# Browser Behaviors
+
+Browsertrix Crawler supports automatically running customized in-browser behaviors. The behaviors auto-play videos (when possible), auto-fetch content that is not loaded by default, and also run custom behaviors on certain sites.
+
+To run behaviors, specify them via a comma-separated list passed to the `--behaviors` option. All behaviors are enabled by default, the equivalent of `--behaviors autoscroll,autoplay,autofetch,siteSpecific`. To enable only a single behavior, such as autoscroll, use `--behaviors autoscroll`.
+
+The site-specific behavior (or autoscroll) will start running after the page is finished its initial load (as defined by the `--waitUntil` settings). The behavior will then run until finished or until the behavior timeout is exceeded. This timeout can be set (in seconds) via the `--behaviorTimeout` flag (90 seconds by default). Setting the timeout to 0 will allow the behavior to run until it is finished.
+
+See [Browsertrix Behaviors](https://github.com/webrecorder/browsertrix-behaviors) for more info on all of the currently available behaviors.
+
+Browsertrix Crawler includes a `--pageExtraDelay`/`--delay` option, which can be used to have the crawler sleep for a configurable number of seconds after behaviors before moving on to the next page.
+
+## Additional Custom Behaviors
+
+Custom behaviors can be mounted into the crawler and loaded from there. For example:
+
+```sh
+docker run -v $PWD/test-crawls:/crawls -v $PWD/tests/custom-behaviors/:/custom-behaviors/ webrecorder/browsertrix-crawler crawl --url https://example.com/ --customBehaviors /custom-behaviors/
+```
+
+This will load all the custom behaviors stored in the `tests/custom-behaviors` directory. The first behavior which returns true for `isMatch()` will be run on a given page.
+
+Each behavior should contain a single class that implements the behavior interface. See [the behaviors tutorial](https://github.com/webrecorder/browsertrix-behaviors/blob/main/docs/TUTORIAL.md) for more info on how to write behaviors.
--- a/docs/docs/user-guide/browser-profiles.md
+++ b/docs/docs/user-guide/browser-profiles.md
@ -0,0 +1,87 @@
+# Creating and Using Browser Profiles
+
+Browsertrix Crawler can use existing browser profiles when running a crawl. This allows the browser to be pre-configured by logging in to certain sites or changing other settings, before running a crawl. By creating a logged in profile, the actual login credentials are not included in the crawl, only (temporary) session cookies.
+
+## Interactive Profile Creation
+
+Interactive profile creation is used for creating profiles of more complex sites, or logging in to multiple sites at once.
+
+To use this mode, don't specify `--username` or `--password` flags and expose two ports on the Docker container to allow DevTools to connect to the browser and to serve a status page.
+
+In profile creation mode, Browsertrix Crawler launches a browser which uses a VNC server (via [noVNC](https://novnc.com/)) running on port 6080 to provide a 'remote desktop' for interacting with the browser.
+
+After interactively logging into desired sites or configuring other settings, _Create Profile_ should be clicked to initiate profile creation. Browsertrix Crawler will then stop the browser, and save the browser profile.
+
+To start in interactive profile creation mode, run:
+
+```sh
+docker run -p 6080:6080 -p 9223:9223 -v $PWD/crawls/profiles:/crawls/profiles/ -it webrecorder/browsertrix-crawler create-login-profile --url "https://example.com/"
+```
+
+Then, open a browser pointing to `http://localhost:9223/` and use the embedded browser to log in to any sites or configure any settings as needed.
+
+Click _Create Profile_ at the top when done. The profile will then be created in `./crawls/profiles/profile.tar.gz` containing the settings of this browsing session.
+
+It is also possible to use an existing profile via the `--profile` flag. This allows previous browsing sessions to be extended as needed.
+
+```sh
+docker run -p 6080:6080 -p 9223:9223 -v $PWD/crawls/profiles:/crawls/profiles -it webrecorder/browsertrix-crawler create-login-profile --url "https://example.com/" --filename "/crawls/profiles/newProfile.tar.gz" --profile "/crawls/profiles/oldProfile.tar.gz"
+```
+
+## Headless vs Headful Profiles
+
+Browsertrix Crawler supports both headful and headless crawling. We have historically recommended using headful crawling to be most accurate to user experience, however, headless crawling may be faster and in recent versions of Chromium-based browsers should be much closer in fidelity to headful crawling.
+
+To use profiles in headless mode, profiles should also be created with `--headless` flag.
+
+When creating browser profile in headless mode, Browsertrix will use the devtools protocol on port 9222 to stream the browser interface.
+
+To create a profile in headless mode, run:
+
+```sh
+docker run -p 9222:9222 -p 9223:9223 -v $PWD/crawls/profiles:/crawls/profiles/ -it webrecorder/browsertrix-crawler create-login-profile --headless --url "https://example.com/"
+```
+
+## Automated Profile Creation for User Login
+
+If the `--automated` flag is provided, Browsertrix Crawler will attempt to create a profile automatically after logging in to sites with a username and password. The username and password can be provided via `--username` and `--password` flags or, if omitted, from a command-line prompt.
+
+When using `--automated` or `--username` / `--password`, Browsertrix Crawler will not launch an interactive browser and instead will attempt to finish automatically.
+
+The automated profile creation system will log in to a single website with supplied credentials and then save the profile.
+
+The script profile creation system also take a screenshot so you can check if the login succeeded.
+
+!!! example "Example: Launch a browser and login to the digipres.club Mastodon instance"
+
+	To automatically created a logged-in browser profile, run:
+
+	```bash
+	docker run -v $PWD/crawls/profiles:/crawls/profiles -it webrecorder/browsertrix-crawler create-login-profile --url "https://digipres.club/"
+	```
+
+	The script will then prompt you for login credentials, attempt to login, and create a tar.gz file in `./crawls/profiles/profile.tar.gz`.
+
+- The `--url` parameter should specify the URL of a login page.
+
+- To specify a custom filename, pass along `--filename` parameter.
+
+- To specify the username and password on the command line (for automated profile creation), pass `--username` and `--password` flags.
+
+- To specify headless mode, add the `--headless` flag. Note that for crawls run with `--headless` flag, it is recommended to also create the profile with `--headless` to ensure the profile is compatible.
+
+- To specify the window size for the profile creation embedded browser, specify `--windowSize WIDTH,HEIGHT`. (The default is 1600x900)
+
+The profile creation script attempts to detect the username and password fields on a site as generically as possible, but may not work for all sites.
+
+## Using Browser Profile with a Crawl
+
+To use a previously created profile with a crawl, use the `--profile` flag or `profile` option. The `--profile` flag can then be used to specify any Brave Browser profile stored as a tarball. Using profiles created with same or older version of Browsertrix Crawler is recommended to ensure compatibility. This option allows running a crawl with the browser already pre-configured, logged in to certain sites, language settings configured, etc.
+
+After running the above command, you can now run a crawl with the profile, as follows:
+
+```bash
+docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --profile /crawls/profiles/profile.tar.gz --url https://digipres.club/ --generateWACZ --collection test-with-profile
+```
+
+Profiles can also be loaded from an http/https URL, eg. `--profile https://example.com/path/to/profile.tar.gz`.
--- a/docs/docs/user-guide/cli-options.md
+++ b/docs/docs/user-guide/cli-options.md
@ -0,0 +1,243 @@
+# All Command-Line Options
+
+The Browsertrix Crawler Docker image currently accepts the following parameters:
+
+```
+crawler [options]
+
+Options:
+      --help                                Show help                  [boolean]
+      --version                             Show version number        [boolean]
+      --seeds, --url                        The URL to start crawling from
+                                                           [array] [default: []]
+      --seedFile, --urlFile                 If set, read a list of seed urls, on
+                                            e per line, from the specified
+                                                                        [string]
+  -w, --workers                             The number of workers to run in para
+                                            llel           [number] [default: 1]
+      --crawlId, --id                       A user provided ID for this crawl or
+                                             crawl configuration (can also be se
+                                            t via CRAWL_ID env var, defaults to
+                                            hostname)                   [string]
+      --waitUntil                           Puppeteer page.goto() condition to w
+                                            ait for before continuing, can be mu
+                                            ltiple separated by ','
+   [array] [choices: "load", "domcontentloaded", "networkidle0", "networkidle2"]
+                                              [default: ["load","networkidle2"]]
+      --depth                               The depth of the crawl for all seeds
+                                                          [number] [default: -1]
+      --extraHops                           Number of extra 'hops' to follow, be
+                                            yond the current scope
+                                                           [number] [default: 0]
+      --pageLimit, --limit                  Limit crawl to this number of pages
+                                                           [number] [default: 0]
+      --maxPageLimit                        Maximum pages to crawl, overriding
+                                            pageLimit if both are set
+                                                           [number] [default: 0]
+      --pageLoadTimeout, --timeout          Timeout for each page to load (in se
+                                            conds)        [number] [default: 90]
+      --scopeType                           A predefined scope of the crawl. For
+                                             more customization, use 'custom' an
+                                            d set scopeIncludeRx regexes
+  [string] [choices: "page", "page-spa", "prefix", "host", "domain", "any", "cus
+                                                                           tom"]
+      --scopeIncludeRx, --include           Regex of page URLs that should be in
+                                            cluded in the crawl (defaults to the
+                                             immediate directory of URL)
+      --scopeExcludeRx, --exclude           Regex of page URLs that should be ex
+                                            cluded from the crawl.
+      --allowHashUrls                       Allow Hashtag URLs, useful for singl
+                                            e-page-application crawling or when
+                                            different hashtags load dynamic cont
+                                            ent
+      --blockRules                          Additional rules for blocking certai
+                                            n URLs from being loaded, by URL reg
+                                            ex and optionally via text match in
+                                            an iframe      [array] [default: []]
+      --blockMessage                        If specified, when a URL is blocked,
+                                             a record with this error message is
+                                             added instead              [string]
+      --blockAds, --blockads                If set, block advertisements from be
+                                            ing loaded (based on Stephen Black's
+                                             blocklist)
+                                                      [boolean] [default: false]
+      --adBlockMessage                      If specified, when an ad is blocked,
+                                             a record with this error message is
+                                             added instead              [string]
+  -c, --collection                          Collection name to crawl to (replay
+                                            will be accessible under this name i
+                                            n pywb preview)
+                                                 [string] [default: "crawl-@ts"]
+      --headless                            Run in headless mode, otherwise star
+                                            t xvfb    [boolean] [default: false]
+      --driver                              JS driver for the crawler
+                                        [string] [default: "./defaultDriver.js"]
+      --generateCDX, --generatecdx, --gene  If set, generate index (CDXJ) for us
+      rateCdx                               e with pywb after crawl is done
+                                                      [boolean] [default: false]
+      --combineWARC, --combinewarc, --comb  If set, combine the warcs
+      ineWarc                                         [boolean] [default: false]
+      --rolloverSize                        If set, declare the rollover size
+                                                  [number] [default: 1000000000]
+      --generateWACZ, --generatewacz, --ge  If set, generate wacz
+      nerateWacz                                      [boolean] [default: false]
+      --logging                             Logging options for crawler, can inc
+                                            lude: stats (enabled by default), js
+                                            errors, debug
+                                                    [array] [default: ["stats"]]
+      --logLevel                            Comma-separated list of log levels t
+                                            o include in logs
+                                                           [array] [default: []]
+      --context, --logContext               Comma-separated list of contexts to
+                                            include in logs
+  [array] [choices: "general", "worker", "recorder", "recorderNetwork", "writer"
+  , "state", "redis", "storage", "text", "exclusion", "screenshots", "screencast
+  ", "originOverride", "healthcheck", "browser", "blocking", "behavior", "behavi
+  orScript", "jsError", "fetch", "pageStatus", "memoryStatus", "crawlStatus", "l
+                                                 inks", "sitemap"] [default: []]
+      --logExcludeContext                   Comma-separated list of contexts to
+                                            NOT include in logs
+  [array] [choices: "general", "worker", "recorder", "recorderNetwork", "writer"
+  , "state", "redis", "storage", "text", "exclusion", "screenshots", "screencast
+  ", "originOverride", "healthcheck", "browser", "blocking", "behavior", "behavi
+  orScript", "jsError", "fetch", "pageStatus", "memoryStatus", "crawlStatus", "l
+         inks", "sitemap"] [default: ["recorderNetwork","jsError","screencast"]]
+      --text                                Extract initial (default) or final t
+                                            ext to pages.jsonl or WARC resource
+                                            record(s)
+                       [array] [choices: "to-pages", "to-warc", "final-to-warc"]
+      --cwd                                 Crawl working directory for captures
+                                             (pywb root). If not set, defaults t
+                                            o process.cwd()
+                                                   [string] [default: "/crawls"]
+      --mobileDevice                        Emulate mobile device by name from:
+                                            https://github.com/puppeteer/puppete
+                                            er/blob/main/src/common/DeviceDescri
+                                            ptors.ts                    [string]
+      --userAgent                           Override user-agent with specified s
+                                            tring                       [string]
+      --userAgentSuffix                     Append suffix to existing browser us
+                                            er-agent (ex: +MyCrawler, info@examp
+                                            le.com)                     [string]
+      --useSitemap, --sitemap               If enabled, check for sitemaps at /s
+                                            itemap.xml, or custom URL if URL is
+                                            specified
+      --sitemapFromDate, --sitemapFrom      If set, filter URLs from sitemaps to
+                                             those greater than or equal to prov
+                                            ided ISO Date string (YYYY-MM-DD or
+                                            YYYY-MM-DDTHH:MM:SS or partial date)
+      --statsFilename                       If set, output stats as JSON to this
+                                             file. (Relative filename resolves t
+                                            o crawl working directory)
+      --behaviors                           Which background behaviors to enable
+                                             on each page
+  [array] [choices: "autoplay", "autofetch", "autoscroll", "siteSpecific"] [defa
+                      ult: ["autoplay","autofetch","autoscroll","siteSpecific"]]
+      --behaviorTimeout                     If >0, timeout (in seconds) for in-p
+                                            age behavior will run on each page.
+                                            If 0, a behavior can run until finis
+                                            h.            [number] [default: 90]
+      --pageExtraDelay, --delay             If >0, amount of time to sleep (in s
+                                            econds) after behaviors before movin
+                                            g on to next page
+                                                           [number] [default: 0]
+      --dedupPolicy                         Deduplication policy
+                 [string] [choices: "skip", "revisit", "keep"] [default: "skip"]
+      --profile                             Path to tar.gz file which will be ex
+                                            tracted and used as the browser prof
+                                            ile                         [string]
+      --screenshot                          Screenshot options for crawler, can
+                                            include: view, thumbnail, fullPage
+                [array] [choices: "view", "thumbnail", "fullPage"] [default: []]
+      --screencastPort                      If set to a non-zero value, starts a
+                                            n HTTP server with screencast access
+                                            ible on this port
+                                                           [number] [default: 0]
+      --screencastRedis                     If set, will use the state store red
+                                            is pubsub for screencasting. Require
+                                            s --redisStoreUrl to be set
+                                                      [boolean] [default: false]
+      --warcInfo, --warcinfo                Optional fields added to the warcinf
+                                            o record in combined WARCs
+      --redisStoreUrl                       If set, url for remote redis server
+                                            to store state. Otherwise, using in-
+                                            memory store
+                                  [string] [default: "redis://localhost:6379/0"]
+      --saveState                           If the crawl state should be seriali
+                                            zed to the crawls/ directory. Defaul
+                                            ts to 'partial', only saved when cra
+                                            wl is interrupted
+           [string] [choices: "never", "partial", "always"] [default: "partial"]
+      --saveStateInterval                   If save state is set to 'always', al
+                                            so save state during the crawl at th
+                                            is interval (in seconds)
+                                                         [number] [default: 300]
+      --saveStateHistory                    Number of save states to keep during
+                                             the duration of a crawl
+                                                           [number] [default: 5]
+      --sizeLimit                           If set, save state and exit if size
+                                            limit exceeds this value
+                                                           [number] [default: 0]
+      --diskUtilization                     If set, save state and exit if disk
+                                            utilization exceeds this percentage
+                                            value         [number] [default: 90]
+      --timeLimit                           If set, save state and exit after ti
+                                            me limit, in seconds
+                                                           [number] [default: 0]
+      --healthCheckPort                     port to run healthcheck on
+                                                           [number] [default: 0]
+      --overwrite                           overwrite current crawl data: if set
+                                            , existing collection directory will
+                                             be deleted before crawl is started
+                                                      [boolean] [default: false]
+      --waitOnDone                          if set, wait for interrupt signal wh
+                                            en finished instead of exiting
+                                                      [boolean] [default: false]
+      --restartsOnError                     if set, assume will be restarted if
+                                            interrupted, don't run post-crawl pr
+                                            ocesses on interrupt
+                                                      [boolean] [default: false]
+      --netIdleWait                         if set, wait for network idle after
+                                            page load and after behaviors are do
+                                            ne (in seconds). if -1 (default), de
+                                            termine based on scope
+                                                          [number] [default: -1]
+      --lang                                if set, sets the language used by th
+                                            e browser, should be ISO 639 languag
+                                            e[-country] code            [string]
+      --title                               If set, write supplied title into WA
+                                            CZ datapackage.json metadata[string]
+      --description, --desc                 If set, write supplied description i
+                                            nto WACZ datapackage.json metadata
+                                                                        [string]
+      --originOverride                      if set, will redirect requests from
+                                            each origin in key to origin in the
+                                            value, eg. --originOverride https://
+                                            host:port=http://alt-host:alt-port
+                                                           [array] [default: []]
+      --logErrorsToRedis                    If set, write error messages to redi
+                                            s         [boolean] [default: false]
+      --writePagesToRedis                   If set, write page objects to redis
+                                                      [boolean] [default: false]
+      --failOnFailedSeed                    If set, crawler will fail with exit
+                                            code 1 if any seed fails
+                                                      [boolean] [default: false]
+      --failOnFailedLimit                   If set, save state and exit if numbe
+                                            r of failed pages exceeds this value
+                                                           [number] [default: 0]
+      --failOnInvalidStatus                 If set, will treat pages with non-20
+                                            0 response as failures. When combine
+                                            d with --failOnFailedLimit or --fail
+                                            OnFailedSeedmay result in crawl fail
+                                            ing due to non-200 responses
+                                                      [boolean] [default: false]
+      --customBehaviors                     injects a custom behavior file or se
+                                            t of behavior files in a directory
+                                                                        [string]
+      --debugAccessRedis                    if set, runs internal redis without
+                                            protected mode to allow external acc
+                                            ess (for debugging)        [boolean]
+      --warcPrefix                          prefix for WARC files generated, inc
+                                            luding WARCs added to WACZ  [string]
+      --config                              Path to YAML config file
+```
--- a/docs/docs/user-guide/common-options.md
+++ b/docs/docs/user-guide/common-options.md
@ -0,0 +1,122 @@
+# Commonly-Used Options
+
+## Waiting for Page Load
+
+One of the key nuances of browser-based crawling is determining when a page is finished loading. This can be configured with the `--waitUntil` flag.
+
+The default is `load,networkidle2`, which waits until page load and ≤2 requests remain, but for static sites, `--wait-until domcontentloaded` may be used to speed up the crawl (to avoid waiting for ads to load for example). `--waitUntil networkidle0` may make sense for sites where absolutely all requests must be waited until before proceeding.
+
+See [page.goto waitUntil options](https://pptr.dev/api/puppeteer.page.goto#remarks) for more info on the options that can be used with this flag from the Puppeteer docs.
+
+The `--pageLoadTimeout`/`--timeout` option sets the timeout in seconds for page load, defaulting to 90 seconds. Behaviors will run on the page once either the page load condition or the page load timeout is met, whichever happens first.
+
+## Ad blocking
+
+Brave Browser, the browser used by Browsertrix Crawler for crawling, has some ad and tracker blocking features enabled by default. These [Shields](https://brave.com/shields/) be disabled or customized using [Browser Profiles](browser-profiles.md).
+
+Browsertrix Crawler also supports blocking ads from being loaded during capture based on [Stephen Black's list of known ad hosts](https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts). To enable ad blocking based on this list, use the `--blockAds` option. If `--adBlockMessage` is set, a record with the specified error message will be added in the ad's place.
+
+## Custom Warcinfo Fields
+
+Custom fields can be added to the `warcinfo` WARC record, generated for each combined WARC. The fields can be specified in the YAML config under `warcinfo` section or specifying individually via the command-line.
+
+For example, the following are equivalent ways to add additional warcinfo fields:
+
+via yaml config:
+
+```yaml
+warcinfo:
+  operator: my-org
+  hostname: hostname.my-org
+```
+
+via command-line:
+
+```sh
+--warcinfo.operator my-org --warcinfo.hostname hostname.my-org
+
+```
+
+## Screenshots
+
+Browsertrix Crawler includes the ability to take screenshots of each page crawled via the `--screenshot` option.
+
+Three screenshot options are available:
+
+- `--screenshot view`: Takes a png screenshot of the initially visible viewport (1920x1080)
+- `--screenshot fullPage`: Takes a png screenshot of the full page
+- `--screenshot thumbnail`: Takes a jpeg thumbnail of the initially visible viewport (1920x1080)
+
+These can be combined using a comma-separated list passed via the `--screenshot` option, e.g.: `--screenshot thumbnail,view,fullPage` or passed in separately `--screenshot thumbnail --screenshot view --screenshot fullPage`.
+
+Screenshots are written into a `screenshots.warc.gz` WARC file in the `archives/` directory. If the `--generateWACZ` command line option is used, the screenshots WARC is written into the `archive` directory of the WACZ file and indexed alongside the other WARCs.
+
+## Screencasting
+
+Browsertrix Crawler includes a screencasting option which allows watching the crawl in real-time via screencast (connected via a websocket).
+
+To enable, add `--screencastPort` command-line option and also map the port on the docker container. An example command might be:
+
+```sh
+docker run -p 9037:9037 -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl  --url https://www.example.com --screencastPort 9037
+```
+
+Then, open `http://localhost:9037/` and watch the crawl!
+
+## Text Extraction
+
+Browsertrix Crawler supports text extraction via the `--text` flag, which accepts one or more of the following extraction options:
+
+- `--text to-pages` — Extract initial text and add it to the text field in pages.jsonl
+- `--text to-warc` — Extract initial page text and add it to a `urn:text:<url>` WARC resource record
+- `--text final-to-warc` — Extract the final page text after all behaviors have run and add it to a `urn:textFinal:<url>` WARC resource record
+
+The options can be separate or combined into a comma separate list, eg. `--text to-warc,final-to-warc` or `--text to-warc --text final-to-warc`
+are equivalent. For backwards compatibility, `--text` alone is equivalent to `--text to-pages`.
+
+## Uploading Crawl Outputs to S3-Compatible Storage
+
+Browsertrix Crawler includes support for uploading WACZ files to S3-compatible storage, and notifying a webhook when the upload succeeds.
+
+S3 upload is only supported when WACZ output is enabled and will not work for WARC output.
+
+This feature can currently be enabled by setting environment variables (for security reasons, these settings are not passed in as part of the command-line or YAML config at this time).
+
+Environment variables for S3-uploads include:
+
+- `STORE_ACCESS_KEY` / `STORE_SECRET_KEY` — S3 credentials
+- `STORE_ENDPOINT_URL` — S3 endpoint URL
+- `STORE_PATH` — optional path appended to endpoint, if provided
+- `STORE_FILENAME` — filename or template for filename to put on S3
+- `STORE_USER` — optional username to pass back as part of the webhook callback
+- `CRAWL_ID` — unique crawl id (defaults to container hostname)
+- `WEBHOOK_URL` — the URL of the webhook (can be http://, https://, or redis://)
+
+### Webhook Notification
+
+The webhook URL can be an HTTP URL which receives a JSON POST request OR a Redis URL, which specifies a redis list key to which the JSON data is pushed as a string.
+
+Webhook notification JSON includes:
+
+- `id` — crawl id (value of `CRAWL_ID`)
+- `userId` — user id (value of `STORE_USER`)
+- `filename` — bucket path + filename of the file
+- `size` — size of WACZ file
+- `hash` — SHA-256 of WACZ file
+- `completed` — boolean of whether crawl fully completed or partially (due to interrupt signal or other error).
+
+## Saving Crawl State: Interrupting and Restarting the Crawl
+
+A crawl can be gracefully interrupted with Ctrl-C (SIGINT) or a SIGTERM.
+
+When a crawl is interrupted, the current crawl state is written to the `crawls` subdirectory inside the collection directory. The crawl state includes the current YAML config, if any, plus the current state of the crawl.
+
+This crawl state YAML file can then be used as `--config` option to restart the crawl from where it was left of previously.
+
+By default, the crawl interruption waits for current pages to finish. A subsequent SIGINT will cause the crawl to stop immediately. Any unfinished pages are recorded in the `pending` section of the crawl state (if gracefully finished, the section will be empty).
+
+By default, the crawl state is only written when a crawl is interrupted before completing. The `--saveState` cli option can be set to `always` or `never` respectively, to control when the crawl state file should be written.
+
+### Periodic State Saving
+
+When the `--saveState` is set to always, Browsertrix Crawler will also save the state automatically during the crawl, as set by the `--saveStateInterval` setting. The crawler will keep the last `--saveStateHistory` save states and delete older ones. This provides extra backup, in the event that the crawl fails unexpectedly or is not terminated via Ctrl-C, several previous crawl states are still available.
--- a/docs/docs/user-guide/crawl-scope.md
+++ b/docs/docs/user-guide/crawl-scope.md
@ -0,0 +1,152 @@
+# Crawl Scope
+
+## Configuring Pages Included or Excluded from a Crawl
+
+The crawl scope can be configured globally for all seeds, or customized per seed, by specifying the `--scopeType` command-line option or setting the `type` property for each seed.
+
+The `depth` option also limits how many pages will be crawled for that seed, while the `limit` option sets the total number of pages crawled from any seed.
+
+The scope controls which linked pages are included and which pages are excluded from the crawl.
+
+To make this configuration as simple as possible, there are several predefined scope types. The available types are:
+
+- `page` — crawl only this page and no additional links.
+
+- `page-spa` — crawl only this page, but load any links that include different hashtags. Useful for single-page apps that may load different content based on hashtag.
+
+- `prefix` — crawl any pages in the same directory, eg. starting from `https://example.com/path/page.html`, crawl anything under `https://example.com/path/` (default)
+
+- `host` — crawl pages that share the same host.
+
+- `domain` — crawl pages that share the same domain and subdomains, eg. given `https://example.com/` will also crawl `https://anysubdomain.example.com/`
+
+- `any` — crawl any and all pages linked from this page..
+
+- `custom` — crawl based on the `--include` regular expression rules.
+
+The scope settings for multi-page crawls (page-spa, prefix, host, domain) also include http/https versions, eg. given a prefix of `http://example.com/path/`, `https://example.com/path/` is also included.
+
+## Custom Scope Inclusion Rules
+
+Instead of setting a scope type, it is possible to configure a custom scope regular expression (regex) by setting `--include` to one or more regular expressions. If using the YAML config, the `include` field can contain a list of regexes.
+
+Extracted links that match the regular expression will be considered 'in scope' and included.
+
+## Custom Scope Exclusion Rules
+
+In addition to the inclusion rules, Browsertrix Crawler supports a separate list of exclusion regexes, that if matched, override and exclude a URL from the crawl.
+
+The exclusion regexes are often used with a custom scope, but could be used with a predefined scopeType as well.
+
+## Extra 'Hops' Beyond Current Scope
+
+Occasionally, it may be useful to augment the scope by allowing extra links N 'hops' beyond the current scope.
+
+For example, this is most useful when crawling with a `host` or `prefix` scope, but also wanting to include 'one extra hop' — any link to external pages beyond the current host — but not following any of the links on those pages. This is possible with the `extraHops` setting, which defaults to 0, but can be set to a higher value N (usually 1) to go beyond the current scope.
+
+The `--extraHops` setting can be set globally or per seed to allow expanding the current inclusion scope N 'hops' beyond the configured scope. Note that this mechanism only expands the inclusion scope, and any exclusion rules are still applied. If a URL is to be excluded via the exclusion rules, that will take precedence over the `--extraHops`.
+
+## Scope Rule Examples
+
+!!! example "Regular expression exclude rules"
+
+    A crawl started with this config will start on `https://example.com/startpage.html` and crawl all pages on the `https://example.com/` domain except pages that match the exclusion rules — URLs that contain the strings `example.com/skip` or `example.com/search` followed by any number of characters, and URLs that contain the string `postfeed`.
+
+    `https://example.com/page.html` will be crawled but `https://example.com/skip/postfeed`, `https://example.com/skip/this-page.html`, and `https://example.com/search?q=searchstring` will not.
+
+    ```yaml
+    seeds:
+      - url: https://example.com/startpage.html
+        scopeType: "host"
+        exclude:
+          - example.com/skip.*
+          - example.com/search.*
+          - postfeed
+    ```
+
+!!! example "Regular expression include and exclude rules"
+
+    In this example config, the scope includes regular expressions that will crawl all page URLs that match `example.com/(crawl-this|crawl-that)`, and exclude any URLs that terminate with exactly `skip`.
+
+    `https://example.com/crawl-this/page.html` and `https://example.com/crawl-this/page/skipme/not` will be crawled but `https://example.com/crawl-this/page/skip` will not.
+
+    ```yaml
+    seeds:
+      - url: https://example.com/startpage.html
+        include: example.com/(crawl-this|crawl-that)
+        exclude:
+          - skip$
+    ```
+
+!!! example "More complicated regular expressions"
+
+    This example exclusion rule targets characters and numbers after `search` until the string `ID=`, followed by any amount of numbers.
+
+    `https://example.com/search/ID=5819`, `https://example.com/search/6vH8R4Tm`, and `https://example.com/search/2o3Jq89cID=5ag8h19` will be crawled but `https://example.com/search/6vH8R4TmID=5819` will not.
+
+    ```yaml
+    seeds:
+      - url: https://example.com/startpage.html
+        scopeType: "host"
+        exclude:
+          - example.com/search/[A-Za-z0-9]+ID=[0-9]+
+    ```
+
+The `include`, `exclude`, `scopeType`, and `depth` settings can be configured per seed or globally for the entire crawl.
+
+The per-seed settings override the per-crawl settings, if any.
+
+See the test suite [tests/scopes.test.js](https://github.com/webrecorder/browsertrix-crawler/blob/main/tests/scopes.test.js) for additional examples of configuring scope inclusion and exclusion rules.
+
+!!! note
+
+    Include and exclude rules are always regular expressions. For rules to match, you may have to escape special characters that commonly appear in urls like `?`, `+`, or `.` by placing a `\` before the character. For example: `youtube.com/watch\?rdwz7QiG0lk`.
+
+Browsertrix Crawler does not log excluded URLs.
+
+## Page Resource Block Rules
+
+While scope rules define which pages are to be crawled, it is also possible to block page resources, URLs loaded within a page or within an iframe on a page.
+
+For example, this is useful for blocking ads or other content that is loaded within multiple pages, but should be blocked.
+
+The page rules block rules can be specified as a list in the `blockRules` field. Each rule can contain one of the following fields:
+
+- `url`: regex for URL to match (required)
+
+- `type`: can be `block` or `allowOnly`. The block rule blocks the specified match, while allowOnly inverts the match and allows only the matched URLs, while blocking all others.
+
+- `inFrameUrl`: if specified, indicates that the rule only applies when `url` is loaded in a specific iframe or top-level frame.
+
+- `frameTextMatch`: if specified, the text of the specified URL is checked for the regex, and the rule applies only if there is an additional match. When specified, this field makes the block rule apply only to frame-level resource, eg. URLs loaded directly in an iframe or top-level frame.
+
+For example, a very simple block rule that blocks all URLs from 'googleanalytics.com' on any page can be added with:
+
+```yaml
+blockRules:
+   - url: googleanalytics.com
+```
+
+To instead block 'googleanalytics.com' only if loaded within pages or iframes that match the regex 'example.com/no-analytics', add:
+
+```yaml
+blockRules:
+   - url: googleanalytics.com
+     inFrameUrl: example.com/no-analytics
+```
+
+For additional examples of block rules, see the [tests/blockrules.test.js](https://github.com/webrecorder/browsertrix-crawler/blob/main/tests/blockrules.test.js) file in the test suite.
+
+If the `--blockMessage` is also specified, a blocked URL is replaced with the specified message (added as a WARC resource record).
+
+## Page Resource Block Rules vs Scope Rules
+
+If it seems confusing which rules should be used, here is a quick way to determine:
+
+- If you'd like to restrict _the pages that are being crawled_, use the crawl scope rules (defined above).
+
+- If you'd like to restrict _parts of a page_ that are being loaded, use the page resource block rules described in this section.
+
+The blockRules add a filter to each URL loaded on a page and incur an extra overhead. They should only be used in advanced use cases where part of a page needs to be blocked.
+
+These rules can not be used to prevent entire pages for loading — use the scope exclusion rules for that (a warning will be printed if a page resource block rule matches a top-level page).
--- a/docs/docs/user-guide/index.md
+++ b/docs/docs/user-guide/index.md
@ -0,0 +1,44 @@
+# Browsertrix Crawler User Guide
+
+Welcome to the Browsertrix User Guide. This page covers the basics of using Browsertrix Crawler, Webrecorder's browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container.
+
+## Getting Started
+
+Browsertrix Crawler requires [Docker](https://docs.docker.com/get-docker/) to be installed on the machine running the crawl.
+
+Assuming Docker is installed, you can run a crawl and test your archive with the following steps.
+
+You don't even need to clone the Browsertrix Crawler repo, just choose a directory where you'd like the crawl data to be placed, and then run
+the following commands. Replace `[URL]` with the website you'd like to crawl.
+
+1. Run `docker pull webrecorder/browsertrix-crawler`
+2. `docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --url [URL] --generateWACZ --text --collection test`
+3. The crawl will now run and logs in [JSON Lines](https://jsonlines.org/) format will be output to the console. Depending on the size of the site, this may take a bit!
+4. Once the crawl is finished, a WACZ file will be created in `crawls/collection/test/test.wacz` from the directory you ran the crawl!
+5. You can go to [ReplayWeb.page](https://replayweb.page) and open the generated WACZ file and browse your newly crawled archive!
+
+## Getting Started with Command-Line Options
+
+Here's how you can use some of the more common command-line options to configure the crawl:
+
+- To include automated text extraction for full text search to pages.jsonl, add the `--text` flag. To write extracted text to WARCs instead of or in addition to pages.jsonl, see [Text Extraction](common-options.md#text-extraction).
+
+- To limit the crawl to a maximum number of pages, add `--limit P` where P is the number of pages that will be crawled.
+
+- To limit the crawl to a maximum size, set `--sizeLimit` (size in bytes).
+
+- To limit the crawl time, set `--timeLimit` (in seconds).
+
+- To run more than one browser worker and crawl in parallel, and `--workers N` where N is number of browsers to run in parallel. More browsers will require more CPU and network bandwidth, and does not guarantee faster crawling.
+
+- To crawl into a new directory, specify a different name for the `--collection` param. If omitted, a new collection directory based on current time will be created. Adding the `--overwrite` flag will delete the collection directory at the start of the crawl, if it exists.
+
+Browsertrix Crawler includes a number of additional command-line options, explained in detail throughout this User Guide.
+
+## Published Releases / Production Use
+
+When using Browsertrix Crawler in production, it is recommended to use a specific, published version of the image, eg. `webrecorder/browsertrix-crawler:[VERSION]` instead of `webrecorder/browsertrix-crawler` where `[VERSION]` corresponds to one of the published release tag.
+
+All released Docker Images are available from [Docker Hub, listed by release tag here](https://hub.docker.com/r/webrecorder/browsertrix-crawler/tags?page=1&ordering=last_updated).
+
+Details for each corresponding release tag are also available on GitHub under [Releases](https://github.com/webrecorder/browsertrix-crawler/releases).
--- a/docs/docs/user-guide/yaml-config.md
+++ b/docs/docs/user-guide/yaml-config.md
@ -0,0 +1,48 @@
+# YAML Crawl Config
+
+Browsertix Crawler supports the use of a YAML file to set parameters for a crawl. This can be used by passing a valid yaml file to the `--config` option.
+
+The YAML file can contain the same parameters as the command-line arguments. If a parameter is set on the command-line and in the YAML file, the value from the command-line will be used. For example, the following should start a crawl with config in `crawl-config.yaml`.
+
+```sh
+docker run -v $PWD/crawl-config.yaml:/app/crawl-config.yaml -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl --config /app/crawl-config.yaml
+```
+
+The config can also be passed via stdin, which can simplify the command. Note that this require running `docker run` with the `-i` flag. To read config from stdin, pass `--config stdin`
+
+```sh
+cat ./crawl-config.yaml | docker run -i -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl --config stdin
+```
+
+An example config file (eg. crawl-config.yaml) might contain:
+
+```yaml
+seeds:
+  - https://example.com/
+  - https://www.iana.org/
+
+combineWARC: true
+```
+
+The list of seeds can be loaded via an external file by specifying the filename via the `seedFile` config or command-line option.
+
+## Seed File
+
+The URL seed file should be a text file formatted so that each line of the file is a url string. An example file is available in the Github repository's fixture folder as [urlSeedFile.txt](https://github.com/webrecorder/browsertrix-crawler/blob/main/tests/fixtures/urlSeedFile.txt).
+
+The seed file must be passed as a volume to the docker container. Your Docker command should be formatted similar to the following:
+
+```sh
+docker run -v $PWD/seedFile.txt:/app/seedFile.txt -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl --seedFile /app/seedFile.txt
+```
+
+## Per-Seed Settings
+
+Certain settings such as scope type, scope includes and excludes, and depth can also be configured per-seed directly in the YAML file, for example:
+
+```yaml
+seeds:
+  - url: https://webrecorder.net/
+    depth: 1
+    scopeType: "prefix"
+```
--- a/docs/gen-cli.sh
+++ b/docs/gen-cli.sh
@ -0,0 +1,14 @@
+#!/bin/bash
+CURR=$(dirname "${BASH_SOURCE[0]}")
+
+out=$CURR/docs/user-guide/cli-options.md
+echo "# All Command-Line Options" > $out
+echo "" >> $out
+echo "The Browsertrix Crawler Docker image currently accepts the following parameters:" >> $out
+echo "" >> $out
+echo '```' >> $out
+#node $CURR/../dist/main.js --help >> $out
+docker run webrecorder/browsertrix-crawler crawl --help >> $out
+echo '```' >> $out
+
+
--- a/docs/mkdocs.yml
+++ b/docs/mkdocs.yml
@ -0,0 +1,94 @@
+site_name: Browsertrix Crawler Docs
+repo_url: https://github.com/webrecorder/browsertrix-crawler/
+repo_name: Browsertrix Crawler
+edit_uri: edit/main/docs/
+extra_css:
+  - stylesheets/extra.css
+theme:
+  name: material
+  custom_dir: docs/overrides
+  features:
+    - navigation.sections
+    - navigation.tabs
+    - navigation.tabs.sticky
+    - navigation.instant
+    - navigation.tracking
+    - navigation.indexes
+    - navigation.footer
+    - content.code.copy
+    - content.action.edit
+    - content.tooltips
+    - search.suggest
+  palette:
+    scheme: webrecorder
+  logo: assets/brand/browsertrix-crawler-white.svg
+  favicon: assets/brand/browsertrix-crawler-icon-color-dynamic.svg
+
+  icon:
+    admonition:
+      note: bootstrap/pencil-fill
+      abstract: bootstrap/file-earmark-text-fill
+      info: bootstrap/info-circle-fill
+      tip: bootstrap/exclamation-circle-fill
+      success: bootstrap/check-circle-fill
+      question: bootstrap/question-circle-fill
+      warning: bootstrap/exclamation-triangle-fill
+      failure: bootstrap/x-octagon-fill
+      danger: bootstrap/exclamation-diamond-fill
+      bug: bootstrap/bug-fill
+      example: bootstrap/mortarboard-fill
+      quote: bootstrap/quote
+
+    repo: bootstrap/github
+    edit: bootstrap/pencil
+    view: bootstrap/eye
+
+nav:
+  - index.md
+  - Develop:
+    - develop/index.md
+    - develop/docs.md
+  - User Guide:
+    - user-guide/index.md
+    - user-guide/common-options.md
+    - user-guide/crawl-scope.md
+    - user-guide/yaml-config.md
+    - user-guide/browser-profiles.md
+    - user-guide/behaviors.md
+    - user-guide/cli-options.md
+
+markdown_extensions:
+  - toc:
+      toc_depth: 3
+      permalink: true
+  - pymdownx.highlight:
+      anchor_linenums: true
+  - pymdownx.emoji:
+      emoji_index: !!python/name:material.extensions.emoji.twemoji
+      emoji_generator: !!python/name:material.extensions.emoji.to_svg
+      options:
+        custom_icons:
+          - docs/overrides/.icons
+  - admonition
+  - pymdownx.inlinehilite
+  - pymdownx.details
+  - pymdownx.superfences
+  - pymdownx.keys
+  - def_list
+  - attr_list
+
+extra:
+  generator: false
+  social:
+    - icon: bootstrap/globe
+      link: https://webrecorder.net
+    - icon: bootstrap/chat-left-text-fill
+      link: https://forum.webrecorder.net/
+    - icon: bootstrap/mastodon
+      link: https://digipres.club/@webrecorder
+    - icon: bootstrap/youtube
+      link: https://www.youtube.com/@webrecorder
+copyright: "Creative Commons Attribution 4.0 International (CC BY 4.0)"
+
+plugins:
+  - search