Build simplification: Use :latest Version By default + README update (#71)

* docker-compose: just use ':latest' tag for local builds, allow users working with local docker-compose.yml to just build latest image - ci: add 'latest' tag to release ci build to automatically update latest as well - README: remove '[VERSION]', just refer to latest version of image in all examples - README: mention using specific released tag version for production
2025-10-19 06:23:16 +00:00 · 2021-07-22 17:46:10 -07:00 · 2021-07-22 17:46:10 -07:00 · bd44190ab2
commit bd44190ab2
parent f4c6b6a99f
6 changed files with 25 additions and 18 deletions
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@ -25,7 +25,7 @@ jobs:
          elif [[ $GITHUB_REF == refs/pull/* ]]; then
            VERSION=pr-${{ github.event.number }}
          fi
-          TAGS="${DOCKER_IMAGE}:${VERSION}"
+          TAGS="${DOCKER_IMAGE}:${VERSION},latest"
          echo ::set-output name=tags::${TAGS}
      -
        name: Set up QEMU
--- a/README.md
+++ b/README.md
@ -169,21 +169,21 @@ See [page.goto waitUntil options](https://github.com/puppeteer/puppeteer/blob/ma

 Browsertix Crawler supports the use of a yaml file to set parameters for a crawl. This can be used by passing a valid yaml file to the `--config` option.

-The YAML file can contain the same parameters as the command-line arguments. If a parameter is set on the command-line and in the yaml file, the value from the command-line will be used. For example, the following should start a crawl with config in `crawl-config.yaml` (where [VERSION] represents the version of browsertrix-crawler image you're working with). The current [VERSION] can be found by checking the package.json file.
+The YAML file can contain the same parameters as the command-line arguments. If a parameter is set on the command-line and in the yaml file, the value from the command-line will be used. For example, the following should start a crawl with config in `crawl-config.yaml`.


 ```
-docker run -v $PWD/crawl-config.yaml:/app/crawl-config.yaml -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler:[VERSION] crawl —-config /app/crawl-config.yaml
+docker run -v $PWD/crawl-config.yaml:/app/crawl-config.yaml -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl —-config /app/crawl-config.yaml
 ```

 The config can also be passed via stdin, which can simplify the command. Note that this require running `docker run` with the `-i` flag. To read config from stdin, pass `--config stdin`

 ```
-cat ./crawl-config.yaml | docker run -i -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler:[VERSION] crawl —-config stdin
+cat ./crawl-config.yaml | docker run -i -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl —-config stdin
 ```


-An example config file might contain:
+An example config file (eg. crawl-config.yaml) might contain:

 ```
 seeds:
@ -202,7 +202,7 @@ The URL seed file should be a text file formatted so that each line of the file
 The seed file must be passed as a volume to the docker container. To do that, you can format your docker command similar to the following:

 ```
-docker run -v $PWD/seedFile.txt:/app/seedFile.txt -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler:[VERSION] crawl —-seedFile /app/seedFile.txt
+docker run -v $PWD/seedFile.txt:/app/seedFile.txt -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl —-seedFile /app/seedFile.txt
 ```

 #### Per-Seed Settings
@ -308,7 +308,7 @@ With version 0.4.0, Browsertrix Crawler includes an experimental 'screencasting'
 To enable, add `--screencastPort` command-line option and also map the port on the docker container. An example command might be:

 ```
-docker run -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler:[VERSION] crawl -p 9037:9037 --url https://www.example.com --screencastPort 9037
+docker run -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl -p 9037:9037 --url https://www.example.com --screencastPort 9037
 ```

 Then, you can open `http://localhost:9037/` and watch the crawl.
@ -318,7 +318,7 @@ Note: If specifying multiple workers, the crawler should additional be instructe
 For example,

 ```
-docker run -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler:[VERSION] crawl -p 9037:9037 --url https://www.example.com --screencastPort 9037 --newContext window --workers 3
+docker run -v $PWD/crawls:/crawls/ webrecorder/browsertrix-crawler crawl -p 9037:9037 --url https://www.example.com --screencastPort 9037 --newContext window --workers 3
 ```

 will start a crawl with 3 workers, and show the screen of each of the workers from `http://localhost:9037/`.
@ -335,7 +335,7 @@ The script profile creation system also take a screenshot so you can check if th
 For example, to create a profile logged in to Twitter, you can run:

 ```bash
-docker run -v $PWD/crawls/profiles:/output/ -it webrecorder/browsertrix-crawler:[VERSION] create-login-profile --url "https://twitter.com/login"
+docker run -v $PWD/crawls/profiles:/output/ -it webrecorder/browsertrix-crawler create-login-profile --url "https://twitter.com/login"
 ```

 The script will then prompt you for login credentials, attempt to login and create a tar.gz file in `./crawls/profiles/profile.tar.gz`.
@ -367,7 +367,7 @@ Browsertrix Crawler will then create a profile as before using the current state
 For example, to start in interactive profile creation mode, run:

 ```
-docker run -p 9222:9222 -p 9223:9223 -v $PWD/profiles:/output/ -it webrecorder/browsertrix-crawler:[VERSION] create-login-profile --interactive --url "https://example.com/"
+docker run -p 9222:9222 -p 9223:9223 -v $PWD/profiles:/output/ -it webrecorder/browsertrix-crawler create-login-profile --interactive --url "https://example.com/"
 ```

 Then, open a browser pointing to `http://localhost:9223/` and use the embedded browser to log in to any sites or configure any settings as needed.
@ -376,7 +376,7 @@ Click 'Create Profile at the top when done. The profile will then be created in
 It is also possible to extend an existing profiles by also passing in an existing profile via the `--profile` flag. In this way, it is possible to build new profiles by extending previous browsing sessions as needed.

 ```
-docker run -p 9222:9222 -p 9223:9223 -v $PWD/profiles:/profiles --filename /profiles/newProfile.tar.gz -it webrecorder/browsertrix-crawler:[VERSION] create-login-profile --interactive --url "https://example.com/ --profile /profiles/oldProfile.tar.gz"
+docker run -p 9222:9222 -p 9223:9223 -v $PWD/profiles:/profiles --filename /profiles/newProfile.tar.gz -it webrecorder/browsertrix-crawler create-login-profile --interactive --url "https://example.com/ --profile /profiles/oldProfile.tar.gz"
 ```

 ### Using Browser Profile with a Crawl
@ -390,6 +390,15 @@ After running the above command, you can now run a crawl with the profile, as fo
 docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertrix-crawler crawl --profile /crawls/profiles/profile.tar.gz --url https://twitter.com/--generateWACZ --collection test-with-profile
 ```

+## Published Releases / Production Use
+
+When using Browsertrix Crawler in production, it is recommended to use a specific, published version of the image, eg. `webrecorder/browsertrix-crawler:[VERSION]` instead of `webrecorder/browsertrix-crawler` where `[VERSION]` corresponds to one of the published release tag.
+
+All released Docker Images are available from Docker Hub, listed by release tag here: https://hub.docker.com/r/webrecorder/browsertrix-crawler/tags?page=1&ordering=last_updated
+
+Details for each corresponding release tag are also available on GitHub at: https://github.com/webrecorder/browsertrix-crawler/releases
+
+
 ## Architecture

 The Docker container provided here packages up several components used in Browsertrix.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -2,7 +2,7 @@ version: '3.5'
  
 services:
    crawler:
-        image: webrecorder/browsertrix-crawler:0.4.1
+        image: webrecorder/browsertrix-crawler:latest
        build:
          context: ./

@ -14,3 +14,4 @@ services:
          - SYS_ADMIN

        shm_size: 1gb
+
--- a/tests/blockrules.test.js
+++ b/tests/blockrules.test.js
@ -10,8 +10,7 @@ function runCrawl(name, config, commandExtra = "") {
  const configYaml = yaml.dump(config);

  try {
-    const version = require("../package.json").version;
-    const proc = child_process.execSync(`docker run -i -v $PWD/test-crawls:/crawls webrecorder/browsertrix-crawler:${version} crawl --config stdin ${commandExtra}`, {input: configYaml, stdin: "inherit", encoding: "utf8"});
+    const proc = child_process.execSync(`docker run -i -v $PWD/test-crawls:/crawls webrecorder/browsertrix-crawler crawl --config stdin ${commandExtra}`, {input: configYaml, stdin: "inherit", encoding: "utf8"});

    console.log(proc);
  }
--- a/tests/config_stdin.test.js
+++ b/tests/config_stdin.test.js
@ -9,8 +9,7 @@ test("pass config file via stdin", async () => {
  const config = yaml.load(configYaml);

  try {
-    const version = require("../package.json").version;
-    const proc = child_process.execSync(`docker run -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler:${version} crawl --config stdin --scopeExcludeRx webrecorder.net/202`, {input: configYaml, stdin: "inherit", encoding: "utf8"});
+    const proc = child_process.execSync("docker run -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler crawl --config stdin --scopeExcludeRx webrecorder.net/202", {input: configYaml, stdin: "inherit", encoding: "utf8"});

    console.log(proc);
  }
--- a/tests/warcinfo.test.js
+++ b/tests/warcinfo.test.js
@ -7,8 +7,7 @@ test("check that the warcinfo file works as expected on the command line", async

  try{
    const configYaml = fs.readFileSync("tests/fixtures/crawl-2.yaml", "utf8");
-    const version = require("../package.json").version;
-    const proc = child_process.execSync(`docker run -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler:${version} crawl --config stdin --limit 1 --collection warcinfo --combineWARC`, {input: configYaml, stdin: "inherit", encoding: "utf8"});
+    const proc = child_process.execSync("docker run -i -v $PWD/crawls:/crawls webrecorder/browsertrix-crawler crawl --config stdin --limit 1 --collection warcinfo --combineWARC", {input: configYaml, stdin: "inherit", encoding: "utf8"});

    console.log(proc);
  }