http auth support per seed (supersedes #566): (#616)

- parse URL username/password, store in 'auth' field in seed, or pass in 'auth' field directly (from yaml config)
- add 'Authorization' header with base64 encoded basic auth via setExtraHTTPHeaders()
- tests: add test for crawling with auth using http-server using local docs build (now build docs as part of CI)
- docs: add HTTP Auth to YAML config section

---------
Co-authored-by: Ed Summers <ehs@pobox.com>
This commit is contained in:
Ilya Kreymer 2024-06-20 16:35:30 -07:00 committed by GitHub
parent 6329b19a20
commit 3339374092
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 437 additions and 9 deletions

View file

@ -46,3 +46,16 @@ seeds:
depth: 1
scopeType: "prefix"
```
## HTTP Auth
Browsertrix Crawler supports HTTP Basic Auth, which can be provide on a per-seed basis as part of the URL, for example:
`--url https://username:password@example.com/`.
Alternatively, credentials can be added to the `auth` field for each seed:
```yaml
seeds:
- url: https://example.com/
auth: username:password
```