In Android version 11 and earlier, pidfd-related system calls
are not allowed by the seccomp policy, which causes crashes due
to SIGSYS signals.
Fixes#69065
Change-Id: Ib29631639a5cf221ac11b4d82390cb79436b8657
GitHub-Last-Rev: aad6b3b32c
GitHub-Pull-Request: golang/go#69543
Reviewed-on: https://go-review.googlesource.com/c/go/+/614277
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766.
Change-Id: I45a530422207dd94b5ad4eee51216c9410a84040
Reviewed-on: https://go-review.googlesource.com/c/go/+/613261
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Cleanup and friction reduction
For #65355.
Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Running with few threads usually does not need 500ms to crash, so let it
crash as soon as possible. While the test may caused more time on slow
machine, try to expand the sleep time in test.
Updates #64752
Change-Id: I635fab846bd5e1735808d4d47bb9032d5a04cc2b
GitHub-Last-Rev: 84f3844ac0
GitHub-Pull-Request: golang/go#65018
Reviewed-on: https://go-review.googlesource.com/c/go/+/554615
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
If a signal lands on a non-Go thread, and Go code doesn't want to
handle it, currently we re-raise the signal in the signal handler
after uninstalling our handler, so the C code can handle it.
But if there is no C signal handler and the signal is ignored,
there is no need to re-raise the signal. Just ignore it. This
avoids uninstalling and reinstalling our handler, which, for some
reason, changes errno when TSAN is used. And TSAN does not like
errno being changed in the signal handler.
Not really sure if this is the bset of complete fix, but it does
fix the immediate problem, and it seems a reasonable thing to do
by itself.
Test case is CL 581722.
Fixes#66427.
Change-Id: I7a043d53059f1ff4080f4fc8ef4065d76ee7d78a
Reviewed-on: https://go-review.googlesource.com/c/go/+/582077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This reverts CL 546135.
Reason for revert: Causes occasional throw during panic
For #65416.
Change-Id: I78c15637da18f85ede785363b777aa7d1dead3c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/560455
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
When we are crashing from an unrecovered panic, we freeze the
world, and print stack traces for all goroutines if GOTRACEBACK is
set to a high enough level. Freezing the world is best effort, so
there could still be goroutines that are not preempted, and so its
stack trace is unavailable and printed as "goroutine running on
other thread".
As we're crashing and not resuming execution on preempted
goroutines, we can make preemption more aggressive, preempting
cases that are not safe for resumption or stack scanning. This may
make goroutines more likely to be preempted in freezing the world
and have their stacks available.
Change-Id: Ie16269e2a05e007efa61368b695addc28d7a97ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/546135
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Let the fault thread to crash the program to make sure while gdb coredump file could see the correct backtrace in the number one thread in gdb.
Fixes#63277.
Change-Id: Ie4473f76f0feba596091433918bcd35a4ff7e11b
GitHub-Last-Rev: f4615c23f6
GitHub-Pull-Request: golang/go#63666
Reviewed-on: https://go-review.googlesource.com/c/go/+/536895
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
After CL 528817, *sigctxt.fault of all arches return uintptr, so
there is no need to convert 'c.fault()' to uintptr anymore.
Change-Id: I062283b578adaaee69d8f439b109a573eeb15110
GitHub-Last-Rev: 3ce3a75a66
GitHub-Pull-Request: golang/go#63133
Reviewed-on: https://go-review.googlesource.com/c/go/+/529995
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently, package runtime runs `osinit` before dynamic initialization
of package-scope variables; but on GOOS=linux, `osinit` involves
mutating `sigsetAllExiting`.
This currently works because cmd/compile and gccgo have
non-spec-conforming optimizations that statically initialize
`sigsetAllExiting`, but disabling that optimization causes
`sigsetAllExiting` to be dynamically initialized instead. This in turn
causes the mutations in `osinit` to get lost.
This CL moves the initialization of `sigsetAllExiting` from `osinit`
into its initialization expression, and then removes the special case
for continuing to perform the static-initialization optimization for
package runtime.
Updates #51913.
Change-Id: I3be31454277c103372c9701d227dc774b2311dad
Reviewed-on: https://go-review.googlesource.com/c/go/+/405549
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
On Unix platforms, the runtime previously did nothing special when a
program was run with either the SUID or SGID bits set. This can be
dangerous in certain cases, such as when dumping memory state, or
assuming the status of standard i/o file descriptors.
Taking cues from glibc, this change implements a set of protections when
a binary is run with SUID or SGID bits set (or is SUID/SGID-like). On
Linux, whether to enable these protections is determined by whether the
AT_SECURE flag is passed in the auxiliary vector. On platforms which
have the issetugid syscall (the BSDs, darwin, and Solaris/Illumos), that
is used. On the remaining platforms (currently only AIX) we check
!(getuid() == geteuid() && getgid == getegid()).
Currently when we determine a binary is "tainted" (using the glibc
terminology), we implement two specific protections:
1. we check if the file descriptors 0, 1, and 2 are open, and if they
are not, we open them, pointing at /dev/null (or fail).
2. we force GOTRACKBACK=none, and generally prevent dumping of
trackbacks and registers when a program panics/aborts.
In the future we may add additional protections.
This change requires implementing issetugid on the platforms which
support it, and implementing getuid, geteuid, getgid, and getegid on
AIX.
Thanks to Vincent Dehors from Synacktiv for reporting this issue.
Fixes#60272
Fixes CVE-2023-29403
Change-Id: I73fc93f2b7a8933c192ce3eabbf1db359db7d5fa
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1878434
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Russ Cox <rsc@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/501223
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
On Go <= 1.20 signals that caused the program to exit would eventually
call runtime.fatal. After the changes made in go.dev/cl/462437 but it
would still be nice if debuggers (eg. Delve) had a function they could
hook to intercept fatal signals.
Change-Id: Icf2b65187f95d52e60825c84f386806a75b38f6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/495736
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.
CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).
[Original CL 485500 description]
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 492987 description]
On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.
This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.
This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).
Also add some debug print for the "handler not on signal stack"
throw.
Fixes#51676.
Fixes#59294.
Fixes#59678.
Fixes#60007.
Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Like arm64, ppc64 and risv64, on loong64, the G register may be temporarily
broken during a VDSO call. If a signal is received during a VDSO call, an
invalid G may be obtained.
See #34391.
Change-Id: Iaffa8cce4f0ef8ef74225c355ec3c20ed238025f
Reviewed-on: https://go-review.googlesource.com/c/go/+/426355
Reviewed-by: WANG Xuerui <git@xen0n.name>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: WANG Xuerui <git@xen0n.name>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
There are quite a few locations that get/put Ms from the extra M list,
but the API is pretty clumsy to use. Add an easier to use getExtraM /
putExtraM API.
There are only two minor semantic changes:
1. dropm no longer calls setg(nil) inside the lockextra critical
section. It is important that this thread no longer references the G
(and in turn M) once it is published to the extra M list and another
thread could acquire it. But there is no reason that needs to happen
only after lockextra.
2. extraMLength (renamed from extraMCount) is no longer protected by
lockextra and is instead simply an atomic (though writes are still in
the critical section). The previous readers all dropped lockextra
before using the value they read anyway.
For #60004.
Change-Id: Ifca4d6c84d605423855d89f49af400ca07de56f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/492742
Run-TryBot: Michael Pratt <mpratt@google.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
This reverts CL 485500.
Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.
Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 485500 description]
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
Fixes#51676.
Fixes#59294.
Fixes#59678.
Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
This reverts commit CL 486381.
Submitted out of order and breaks bootstrap.
Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875
Reviewed-on: https://go-review.googlesource.com/c/go/+/486915
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Previously we did not permit them as Go programs generated enormous
core dumps on macOS. However, according to an investigation in #59446,
they are OK now.
For #59446
Change-Id: I1d7a3f500a6bc525aa6de8dfa8a1d8dbb15feadc
Reviewed-on: https://go-review.googlesource.com/c/go/+/483015
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes#51676.
Fixes#59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 479915.
Reason for revert: breaks a lot google internal tests.
Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51
Reviewed-on: https://go-review.googlesource.com/c/go/+/481055
Reviewed-by: Chressie Himpel <chressie@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
For #59294.
Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527
Reviewed-on: https://go-review.googlesource.com/c/go/+/479915
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
If a panicking signal (e.g. SIGSEGV) happens on a g0 stack, we're
either in the runtime or running C code. Either way we cannot
recover and sigpanic will immediately throw. Further, injecting a
sigpanic could make the C stack unwinder and the debugger fail to
unwind the stack. So don't inject a sigpanic.
If we have cgo traceback and symbolizer attached, if it panics in
a C function ("CF" for the example below), previously it shows
something like
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x45f1ef]
runtime stack:
runtime.throw({0x485460?, 0x0?})
.../runtime/panic.go:1076 +0x5c fp=0x7ffd77f60f58 sp=0x7ffd77f60f28 pc=0x42e39c
runtime.sigpanic()
.../runtime/signal_unix.go:821 +0x3e9 fp=0x7ffd77f60fb8 sp=0x7ffd77f60f58 pc=0x442229
goroutine 1 [syscall]:
CF
/tmp/pp/c.c:6 pc=0x45f1ef
runtime.asmcgocall
.../runtime/asm_amd64.s:869 pc=0x458007
runtime.cgocall(0x45f1d0, 0xc000053f70)
.../runtime/cgocall.go:158 +0x51 fp=0xc000053f48 sp=0xc000053f10 pc=0x404551
main._Cfunc_CF()
_cgo_gotypes.go:39 +0x3f fp=0xc000053f70 sp=0xc000053f48 pc=0x45f0bf
Now it shows
SIGSEGV: segmentation violation
PC=0x45f1ef m=0 sigcode=1
signal arrived during cgo execution
goroutine 1 [syscall]:
CF
/tmp/pp/c.c:6 pc=0x45f1ef
runtime.asmcgocall
.../runtime/asm_amd64.s:869 pc=0x458007
runtime.cgocall(0x45f1d0, 0xc00004ef70)
.../runtime/cgocall.go:158 +0x51 fp=0xc00004ef48 sp=0xc00004ef10 pc=0x404551
main._Cfunc_CF()
_cgo_gotypes.go:39 +0x3f fp=0xc00004ef70 sp=0xc00004ef48 pc=0x45f0bf
I think the new one is reasonable.
For #57698.
Change-Id: I4f7af91761374e9b569dce4c7587499d4799137e
Reviewed-on: https://go-review.googlesource.com/c/go/+/462437
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes#51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Use a single writeErrStr function. Avoid using global variables.
Use a single version of some error messages rather than duplicating
the messages in OS-specific files.
Change-Id: If259fbe78faf797f0a21337d14472160ca03efa0
Reviewed-on: https://go-review.googlesource.com/c/go/+/447055
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.
The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.
This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
choice is made because in practice 64 MiB appears to be way too large
of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
list and when there are no pointers left pointing into them, they're
allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
only types and slice backing stores whose sizes are 1/4th the size of
a chunk or less may be used. Previously larger sizes, up to the size
of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
termination (a non-public patch once did this; I don't see a reason
not to continue that).
For #51317.
Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
On Linux a signal sent using tgkill will have si_code == SI_TKILL,
not SI_USER. Treat the two cases the same. Add a Linux-specific test.
Change the test to use the C pause function rather than sleeping
for a second, as that achieves the same effect.
This is a roll forward of CL 431255 which was rolled back in CL 431715.
This new version skips flaky tests on more systems, and marks a new method
nosplit.
Change-Id: Ibf2d3e6fc43d63d0a71afa8fcca6a11fda03f291
Reviewed-on: https://go-review.googlesource.com/c/go/+/432136
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
On Linux a signal sent using tgkill will have si_code == SI_TKILL,
not SI_USER. Treat the two cases the same. Add a Linux-specific test.
Change the test to use the C pause function rather than sleeping
for a second, as that achieves the same effect.
Change-Id: I2a36646aecabcab9ec42ed9a048b07c2ff0a3987
Reviewed-on: https://go-review.googlesource.com/c/go/+/431255
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
This converts several unsynchronized reads (reads without holding
prof.signalLock) into atomic reads.
For #53821.
For #52912.
Change-Id: I421b96a22fbe26d699bcc21010c8a9e0f4efc276
Reviewed-on: https://go-review.googlesource.com/c/go/+/420196
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
sighandler has gp, the goroutine running when the signal arrived, and
gsignal, the goroutine executing the signal handler. The latter is
usually mp.gsignal, except in the case noted by the delayedSignal check.
Like previous CLs, cases where the getg() G is used only to access the M
are replaced with direct uses of mp.
Change-Id: I2dc7894da7004af17682712e07a0be5f9a235d81
Reviewed-on: https://go-review.googlesource.com/c/go/+/418580
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
* The gp argument to canpanic is always equivalent to getg(), so no need
to pass it at all.
* gp must not be nil or _g_.m would have crashed, so no need to check
for nil.
* Use acquirem to better reason about preemption.
Change-Id: Ic7dc8dc1e56ab4c1644965f6aeba16807cdb2df4
Reviewed-on: https://go-review.googlesource.com/c/go/+/418575
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Cgo TSAN (not the Go race detector) intercepts signals and calls
the signal handler at a later time. When the signal handler is
called, the memory may have changed, but the signal context
remains old. As the signal context and the memory don't match, it
is unsafe to unwind the stack from the signal PC and SP. We have
to ignore the signal.
It is probably also not safe to do async preemption, which relies
on the signal PC, and inspects and even writes to the stack (for
call injection).
We also inspect the stack for fatal signals (e.g. SIGSEGV), but I
think they are not delayed. For other signals we don't inspect
the stack, so they are probably fine.
Fixes#27540.
Change-Id: I5c80a7512265b8ea4a91422954dbff32c6c3a0d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/408218
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds support for vDSO for s390x architecture. This avoids the use of system calls in nanotime and walltime and accelerates them by factor 4-5.
Benchmarks:
100,000,000 x time.Now():
syscall fallback 13923ms 139.23 ns/op
vDSO enabled 2640ms 26.40 ns/op
Change-Id: Ic679fe31048379e59ccf83b400140f13c9d49696
GitHub-Last-Rev: 8f6e918a45
GitHub-Pull-Request: golang/go#49717
Reviewed-on: https://go-review.googlesource.com/c/go/+/365995
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Jonathan Albrecht <jonathan.albrecht@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
This gives explicit names to the possible states of throwing (-1, 0, 1).
m.throwing is now one of:
throwTypeOff: not throwing, previously == 0
throwTypeUser: user throw, previously == -1
throwTypeRuntime: runtime throw, previously == 1
For runtime throws, we now always include frame metadata and system
goroutines regardless of GOTRACEBACK to aid in debugging the runtime.
For user throws, we no longer include frame metadata or runtime frames,
unless GOTRACEBACK=system or higher.
For #51485.
Change-Id: If252e2377a0b6385ce7756b937929be4273a56c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/390421
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
No test because we already have a test in the syscall package.
The issue reports 1 failure per 100,000 iterations, which is rare enough
that our builders won't catch the problem.
Fixes#52226
Change-Id: I17633ff6cf676b6d575356186dce42cdacad0746
Reviewed-on: https://go-review.googlesource.com/c/go/+/400315
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
A future change to gofmt will rewrite
// Doc comment.
//go:foo
to
// Doc comment.
//
//go:foo
Apply that change preemptively to all comments (not necessarily just doc comments).
For #51082.
Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9
Reviewed-on: https://go-review.googlesource.com/c/go/+/384260
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
CL 383434 forgot to enable these paths for android, which is still linux
just not via GOOS.
Fixes#51213.
Change-Id: I102e53e8671403ded6edb4ba04789154d7a0730b
Reviewed-on: https://go-review.googlesource.com/c/go/+/385954
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>