In some newer Linux distros, systemd forces
all mount namespaces to be shared, starting
at /. This disables the CLONE_NEWNS
flag in unshare(2) and clone(2).
While this problem is most commonly seen
on systems with systemd, it can happen anywhere,
due to how Linux namespaces now work.
Hence, to create a private mount namespace,
it is not sufficient to just set
CLONE_NEWS; you have to call mount(2) to change
the behavior of namespaces, i.e.
mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL)
This is tested and working and we can now correctly
start child process with private namespaces on Linux
distros that use systemd.
The new test works correctly on Ubuntu 16.04.2 LTS.
It fails if I comment out the new Mount, and
succeeds otherwise. In each case it correctly
cleans up after itself.
Fixes#19661
Change-Id: I52240b59628e3772b529d9bbef7166606b0c157d
Reviewed-on: https://go-review.googlesource.com/38471
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This greatly improves the latency of starting a child process when
the Go process is using a lot of memory. Even though the kernel uses
copy-on-write, preparation for that can take up to several 100ms under
certain conditions. All other goroutines are suspended while starting
a subprocess so this latency directly affects total throughput.
With CLONE_VM the child process shares the same memory with the parent
process. On its own this would lead to conflicting use of the same
memory, so CLONE_VFORK is used to suspend the parent process until the
child releases the memory when switching to to the new program binary
via the exec syscall. When the parent process continues to run, one
has to consider the changes to memory that the child process did,
namely the return address of the syscall function needs to be restored
from a register.
A simple benchmark has shown a difference in latency of 16ms vs. 0.5ms
at 10GB memory usage. However, much higher latencies of several 100ms
have been observed in real world scenarios. For more information see
comments on #5838.
Fixes#5838
Change-Id: I6377d7bd8dcd00c85ca0c52b6683e70ce2174ba6
Reviewed-on: https://go-review.googlesource.com/37439
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If the caller set ups a Credential in os/exec.Command,
os/exec.Command.Start will end up calling setgroups(2), even if no
supplementary groups were given.
Only root can call setgroups(2) on BSD kernels, which causes Start to
fail for non-root users when they try to set uid and gid for the new
process.
We fix by introducing a new field to syscall.Credential named
NoSetGroups, and setgroups(2) is only called if it is false.
We make this field with inverted logic to preserve backward
compatibility.
RELNOTES=yes
Change-Id: I3cff1f21c117a1430834f640ef21fd4e87e06804
Reviewed-on: https://go-review.googlesource.com/36697
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Skip setgroups only for one particular case: GidMappings != nil and
GidMappingsEnableSetgroup == false and list of supplementary groups is
empty.
This patch returns pre-1.5 behavior for simple exec and still allows to
use GidMappings with non-empty Credential.
Change-Id: Ia91c77e76ec5efab7a7f78134ffb529910108fc1
Reviewed-on: https://go-review.googlesource.com/23524
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
For symmetry with Cloneflags and it looks slightly weird because there
is syscall.Unshare method.
Change-Id: I3d710177ca8f27c05b344407f212cbbe3435094b
Reviewed-on: https://go-review.googlesource.com/23612
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
This patch adds Unshare flags to SysProcAttr for Linux systems.
Fixes#1954
Change-Id: Id819c3f92b1474e5a06dd8d55f89d74a43eb770c
Reviewed-on: https://go-review.googlesource.com/23233
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
In syscall.forkAndExecInChild, blocks of code labelled Pass 1
and Pass 2 permute the file descriptors (if necessary) which are
passed to the child process. If Pass 1 begins with fds = {0,2,1},
nextfd = 4 and pipe = 4, then the statement labelled "don't stomp
on pipe" is too late -- the pipe (which will be needed to pass
exec status back to the parent) will have been closed by the
preceding DUP call.
Moving the "don't stomp" test earlier ensures that the pipe is
protected.
Fixes#14979
Change-Id: I890c311527f6aa255be48b3277c1e84e2049ee22
Reviewed-on: https://go-review.googlesource.com/21184
Run-TryBot: David du Colombier <0intro@gmail.com>
Reviewed-by: David du Colombier <0intro@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Linux ABI takes arguments in a different order on s390x.
Change-Id: Ic9cfcc22a5ea3d8ef77d4dd0b915fc266ff3e5f7
Reviewed-on: https://go-review.googlesource.com/20960
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use a go:norace comment rather than having the compiler know the special
name syscall.forkAndExecInChild.
Change-Id: I69bc6aa6fc40feb2148d23f269ff32453696fb28
Reviewed-on: https://go-review.googlesource.com/16097
Reviewed-by: Minux Ma <minux@golang.org>
Setgroups with zero-length groups is no-op for changing groups and
supposed to be used only for determining curent groups length. Also
because we deny setgroups by default if use GidMappings we have
unnecessary error from that no-op syscall.
Change-Id: I8f74fbca9190a3dcbbef1d886c518e01fa05eb62
Reviewed-on: https://go-review.googlesource.com/13938
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Linux 3.19 made a change in the handling of setgroups and the 'gid_map' file to
address a security issue.
The upshot of the 3.19 changes is that in order to update the 'gid_maps' file,
use of the setgroups() system call in this user namespace must first be disabled
by writing "deny" to one of the /proc/PID/setgroups files for this namespace.
Also added tests for remapping uid_map and gid_map inside new user
namespace.
Fixes#10626
Change-Id: I4d2539acbab741a37092d277e10f31fc39a8feb7
Reviewed-on: https://go-review.googlesource.com/10670
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Fix bug on Linux SysProcAttr handling: setting both Pdeathsig and
Credential caused Pdeathsig to be ignored. This is because the kernel
clears the deathsignal field when performing a setuid/setgid
system call.
Avoid this by moving Pdeathsig handling after Credential handling.
Fixes#9686
Change-Id: Id01896ad4e979b8c448e0061f00aa8762ca0ac94
Reviewed-on: https://go-review.googlesource.com/3290
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On Unix, when placing a child in a new process group, allow that group
to become the foreground process group. Also, allow a child process to
join a specific process group.
When setting the foreground process group, Ctty is used as the file
descriptor of the controlling terminal. Ctty has been added to the BSD
and Solaris SysProcAttr structures and the handling of Setctty changed
to match Linux.
Change-Id: I18d169a6c5ab8a6a90708c4ff52eb4aded50bc8c
Reviewed-on: https://go-review.googlesource.com/5130
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Updates #9974
This change is in preparation for merging the arm64 platform.
Arm64 does not support SYS_DUP2 at all, so define a new constant to be
the minimum dup(2) version supported. This constant defaults to SYS_DUP2
on all existing platforms.
Change-Id: If405878105082c7c880f8541c1491970124c9ce4
Reviewed-on: https://go-review.googlesource.com/7123
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Handles the case where the parent is pid 1 (common in docker
containers).
Attempted and failed to write a test for this.
Fixes#9263.
Change-Id: I5c6036446c99e66259a4fab1660b6a594f875020
Reviewed-on: https://go-review.googlesource.com/1372
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Breaks reading from stdin in parent after exec with SysProcAttr{Setpgid: true}.
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
func main() {
cmd := exec.Command("true")
cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Run()
fmt.Printf("Hit enter:")
os.Stdin.Read(make([]byte, 100))
fmt.Printf("Bye\n")
}
In go1.3, I type enter at the prompt and the program exits.
With the CL being rolled back, the program wedges at the
prompt.
««« original CL description
syscall: SysProcAttr job control changes
Making the child's process group the foreground process group and
placing the child in a specific process group involves co-ordination
between the parent and child that must be done post-fork but pre-exec.
LGTM=iant
R=golang-codereviews, gobot, iant, mikioh.mikioh
CC=golang-codereviews
https://golang.org/cl/131750044
»»»
LGTM=minux, dneil
R=dneil, minux
CC=golang-codereviews, iant, michael.p.macinnis
https://golang.org/cl/174450043
Making the child's process group the foreground process group and
placing the child in a specific process group involves co-ordination
between the parent and child that must be done post-fork but pre-exec.
LGTM=iant
R=golang-codereviews, gobot, iant, mikioh.mikioh
CC=golang-codereviews
https://golang.org/cl/131750044