mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00
Support loading custom behaviors from git repo (#717)
Fixes #712 - Also expands the existing documentation about behaviors and adds a test. - Uses query arg for 'branch' and 'path' to specify git branch and subpath in repo, respectively. --------- Co-authored-by: Ilya Kreymer <ikreymer@users.noreply.github.com>
This commit is contained in:
parent
ea05307528
commit
60c84b342e
4 changed files with 121 additions and 10 deletions
|
@ -14,12 +14,41 @@ To disable behaviors for a crawl, use `--behaviors ""`.
|
|||
|
||||
## Additional Custom Behaviors
|
||||
|
||||
Custom behaviors can be mounted into the crawler and loaded from there. For example:
|
||||
|
||||
```sh
|
||||
docker run -v $PWD/test-crawls:/crawls -v $PWD/tests/custom-behaviors/:/custom-behaviors/ webrecorder/browsertrix-crawler crawl --url https://example.com/ --customBehaviors /custom-behaviors/
|
||||
```
|
||||
|
||||
This will load all the custom behaviors stored in the `tests/custom-behaviors` directory. The first behavior which returns true for `isMatch()` will be run on a given page.
|
||||
Custom behaviors can be mounted into the crawler and ran from there, or downloaded from a URL.
|
||||
|
||||
Each behavior should contain a single class that implements the behavior interface. See [the behaviors tutorial](https://github.com/webrecorder/browsertrix-behaviors/blob/main/docs/TUTORIAL.md) for more info on how to write behaviors.
|
||||
|
||||
The first behavior which returns true for `isMatch()` will be run on a given page.
|
||||
|
||||
The repeatable `--customBehaviors` flag can accept:
|
||||
|
||||
- A path to a directory of behavior files
|
||||
- A path to a single behavior file
|
||||
- A URL for a single behavior file to download
|
||||
- A URL for a git repository of the form `git+https://git.example.com/repo.git`, with optional query parameters `branch` (to specify a particular branch to use) and `path` (to specify a relative path to a directory within the git repository where the custom behaviors are located)
|
||||
|
||||
### Examples
|
||||
|
||||
#### Local filepath (directory)
|
||||
|
||||
```sh
|
||||
docker run -v $PWD/test-crawls:/crawls -v $PWD/tests/custom-behaviors/:/custom-behaviors/ webrecorder/browsertrix-crawler crawl --url https://specs.webrecorder.net --customBehaviors /custom-behaviors/
|
||||
```
|
||||
|
||||
#### Local filepath (file)
|
||||
|
||||
```sh
|
||||
docker run -v $PWD/test-crawls:/crawls -v $PWD/tests/custom-behaviors/:/custom-behaviors/ webrecorder/browsertrix-crawler crawl --url https://specs.webrecorder.net --customBehaviors /custom-behaviors/custom.js
|
||||
```
|
||||
|
||||
#### URL
|
||||
|
||||
```sh
|
||||
docker run -v $PWD/test-crawls:/crawls webrecorder/browsertrix-crawler crawl --url https://specs.webrecorder.net --customBehaviors https://example.com/custom-behavior-1 --customBehaviors https://example.org/custom-behavior-2
|
||||
```
|
||||
|
||||
#### Git repository
|
||||
|
||||
```sh
|
||||
docker run -v $PWD/test-crawls:/crawls webrecorder/browsertrix-crawler crawl --url https://example.com/ --customBehaviors "git+https://git.example.com/custom-behaviors?branch=dev&path=path/to/behaviors"
|
||||
```
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue