http auth support per seed (supersedes #566): (#616)

- parse URL username/password, store in 'auth' field in seed, or pass in 'auth' field directly (from yaml config) - add 'Authorization' header with base64 encoded basic auth via setExtraHTTPHeaders() - tests: add test for crawling with auth using http-server using local docs build (now build docs as part of CI) - docs: add HTTP Auth to YAML config section --------- Co-authored-by: Ed Summers <ehs@pobox.com>
2025-10-19 06:23:16 +00:00 · 2024-06-20 16:35:30 -07:00 · 2024-06-20 16:35:30 -07:00 · 3339374092
commit 3339374092
parent 6329b19a20
8 changed files with 437 additions and 9 deletions
--- a/docs/docs/user-guide/yaml-config.md
+++ b/docs/docs/user-guide/yaml-config.md
@ -46,3 +46,16 @@ seeds:
    depth: 1
    scopeType: "prefix"
 ```
+
+## HTTP Auth
+
+Browsertrix Crawler supports HTTP Basic Auth, which can be provide on a per-seed basis as part of the URL, for example:
+`--url https://username:password@example.com/`.
+
+Alternatively, credentials can be added to the `auth` field for each seed:
+
+```yaml
+seeds:
+  - url: https://example.com/
+    auth: username:password
+```