mirror of
https://github.com/webrecorder/browsertrix-crawler.git
synced 2025-10-19 06:23:16 +00:00

This PR provides improved support for running crawler as non-root, matching the user to the uid/gid of the crawl volume. This fixes #502 initial regression from 0.12.4, where `chmod u+x` was used instead of `chmod a+x` on the node binary files. However, that was not enough to fully support equivalent signal handling / graceful shutdown as when running with the same user. To make the running as different user path work the same way: - need to switch to `gosu` instead of `su` (added in Brave 1.64.109 image) - run all child processes as detached (redis-server, socat, wacz, etc..) to avoid them automatically being killed via SIGINT/SIGTERM - running detached is controlled via `DETACHED_CHILD_PROC=1` env variable, set to 1 by default in the Dockerfile (to allow for overrides just in case) A test has been added which runs one of the tests with a non-root `test-crawls` directory to test the different user path. The test (saved-state.test.js) includes sending interrupt signals and graceful shutdown and allows testing of those features for a non-root gosu execution. Also bumping crawler version to 1.0.1
27 lines
623 B
Bash
Executable file
27 lines
623 B
Bash
Executable file
#!/bin/sh
|
|
|
|
# Get UID/GID from volume dir
|
|
|
|
VOLUME_UID=$(stat -c '%u' /crawls)
|
|
VOLUME_GID=$(stat -c '%g' /crawls)
|
|
|
|
# Get the UID/GID we are running as
|
|
|
|
MY_UID=$(id -u)
|
|
MY_GID=$(id -g)
|
|
|
|
# If we aren't running as the owner of the /crawls/ dir then add a new user
|
|
# btrix with the same UID/GID of the /crawls dir and run as that user instead.
|
|
|
|
if [ "$MY_GID" != "$VOLUME_GID" ] || [ "$MY_UID" != "$VOLUME_UID" ]; then
|
|
groupadd btrix
|
|
groupmod -o --gid $VOLUME_GID btrix
|
|
|
|
useradd -ms /bin/bash -g $VOLUME_GID btrix
|
|
usermod -o -u $VOLUME_UID btrix > /dev/null
|
|
|
|
exec gosu btrix:btrix "$@"
|
|
else
|
|
exec "$@"
|
|
fi
|
|
|