Commit graph

25 commits

Author SHA1 Message Date
Daniel Maslowski
3cf1aaf8b9 runtime: use futexes with 64-bit time on Linux
Linux introduced new syscalls to fix the year 2038 issue.
To still be able to use the old ones, the Kconfig option
COMPAT_32BIT_TIME would be necessary.

Use the new syscall with 64-bit values for futex by default.
Define _ENOSYS for detecting if it's not available.
Add a fallback to use the older syscall in case the new one is
not available, since Go runs on Linux from 2.6.32 on, per
https://go.dev/wiki/MinimumRequirements.

Updates #75133

Change-Id: I65daff0a3d06b55440ff05d8f5a9aa1c07eb201d
GitHub-Last-Rev: 96dd1bd84b
GitHub-Pull-Request: golang/go#75306
Reviewed-on: https://go-review.googlesource.com/c/go/+/701615
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-18 21:04:12 -07:00
Michael Anthony Knyszek
9f9bb26880 runtime: avoid MADV_HUGEPAGE for heap memory
Currently the runtime marks all new memory as MADV_HUGEPAGE on Linux and
manages its hugepage eligibility status. Unfortunately, the default
THP behavior on most Linux distros is that MADV_HUGEPAGE blocks while
the kernel eagerly reclaims and compacts memory to allocate a hugepage.

This direct reclaim and compaction is unbounded, and may result in
significant application thread stalls. In really bad cases, this can
exceed 100s of ms or even seconds.

Really all we want is to undo MADV_NOHUGEPAGE marks and let the default
Linux paging behavior take over, but the only way to unmark a region as
MADV_NOHUGEPAGE is to also mark it MADV_HUGEPAGE.

The overall strategy of trying to keep hugepages for the heap unbroken
however is sound. So instead let's use the new shiny MADV_COLLAPSE if it
exists.

MADV_COLLAPSE makes a best-effort synchronous attempt at collapsing the
physical memory backing a memory region into a hugepage. We'll use
MADV_COLLAPSE where we would've used MADV_HUGEPAGE, and stop using
MADV_NOHUGEPAGE altogether.

Because MADV_COLLAPSE is synchronous, it's also important to not
re-collapse huge pages if the huge pages are likely part of some large
allocation. Although in many cases it's advantageous to back these
allocations with hugepages because they're contiguous, eagerly
collapsing every hugepage means having to page in at least part of the
large allocation.

However, because we won't use MADV_NOHUGEPAGE anymore, we'll no longer
handle the fact that khugepaged might come in and back some memory we
returned to the OS with a hugepage. I've come to the conclusion that
this is basically unavoidable without a new madvise flag and that it's
just not a good default. If this change lands, advice about Linux huge
page settings will be added to the GC guide.

Verified that this change doesn't regress Sweet, at least not on my
machine with:

/sys/kernel/mm/transparent_hugepage/enabled [always or madvise]
/sys/kernel/mm/transparent_hugepage/defrag [madvise]
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none [0 or 511]

Unfortunately, this workaround means that we only get forced hugepages
on Linux 6.1+.

Fixes #61718.

Change-Id: I7f4a7ba397847de29f800a99f9cb66cb2720a533
Reviewed-on: https://go-review.googlesource.com/c/go/+/516795
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-08-22 19:05:10 +00:00
Michael Knyszek
e4435cb844 runtime: add page tracer
This change adds a new GODEBUG flag called pagetrace that writes a
low-overhead trace of how pages of memory are managed by the Go runtime.

The page tracer is kept behind a GOEXPERIMENT flag due to a potential
security risk for setuid binaries.

Change-Id: I6f4a2447d02693c25214400846a5d2832ad6e5c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/444157
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-18 03:45:30 +00:00
Andrew Pogrebnoy
c7cc2b94c6 runtime: move epoll syscalls to runtime/internal/syscall
This change moves Linux epoll's syscalls implementation to the
"runtime/internal/syscall" package. The intention in this CL was to
minimise behavioural changes but make the code more generalised. This
also will allow adding new syscalls (like epoll_pwait2) without the
need to implement assembly stubs for each arch.

It also drops epoll_create as not all architectures provide this call.
epoll_create1 was added to the kernel in version 2.6.27 and Go requires
Linux kernel version 2.6.32 or later since Go 1.18. So it is safe to
always use epoll_create1.

This is a resubmit as the previous CL 421994 was reverted due to test
failures after the merge with the master. The issue was fixed in
CL 438615

For #53824
For #51087

Change-Id: I1bd0f23a85b4f9b80178c5dd36fd3e95ff4f9648
Reviewed-on: https://go-review.googlesource.com/c/go/+/440115
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-10-07 18:28:11 +00:00
Michael Pratt
4a49af5755 Revert "runtime: move epoll syscalls to runtime/internal/syscall"
This reverts CL 421994.

Reason for revert: breaks runtime.TestCheckPtr2

For #53824
For #51087

Change-Id: I044ea4d6efdffe0a4b7fb0d2bb3717d9f391fc59
Reviewed-on: https://go-review.googlesource.com/c/go/+/437295
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2022-09-30 19:07:13 +00:00
Andrew Pogrebnoy
4e6f963469 runtime: move epoll syscalls to runtime/internal/syscall
This change moves Linux epoll's syscalls implementation to the
"runtime/internal/syscall" package. The intention in this CL was to
minimise behavioural changes but make the code more generalised. This
also will allow adding new syscalls (like epoll_pwait2) without the
need to implement assembly stubs for each arch.

It also drops epoll_create as not all architectures provide this call.
epoll_create1 was added to the kernel in version 2.6.27 and Go requires
Linux kernel version 2.6.32 or later since Go 1.18. So it is safe to
always use epoll_create1.

For #53824
For #51087

Change-Id: I9a6a26b7f2075a38e041de1bab4691da0ecb94fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/421994
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2022-09-30 17:35:24 +00:00
Tobias Klauser
9d34fc5108 runtime: remove fallback to pipe on platforms with pipe2
On Linux, the minimum required kernel version for Go 1.18 was be changed
to 2.6.32, see #45964. The pipe2 syscall was added in 2.6.27.

All other platforms already provide the pipe2 syscall in the minimum
supported version:
- DragonFly BSD added it in version 4.2, see
  https://www.dragonflybsd.org/release42/
- FreeBSD added it in version 10.0, see
  https://www.freebsd.org/cgi/man.cgi?pipe(2)#end
- NetBSD added it in version 6.0, see
  https://man.netbsd.org/pipe2.2#HISTORY
- OpenBSD added it in version 5.7, see
  https://man.openbsd.org/pipe.2#HISTORY
- Illumos supports it since 2013, see
  https://www.illumos.org/issues/3714
- Solaris supports it since 11.4

This also allows to remove setNonblock which was only used in the pipe
fallback path on these platforms.

Change-Id: I1f40d32fd3065d74e22af77b9ff2292b9cf66706
Reviewed-on: https://go-review.googlesource.com/c/go/+/389354
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-03-03 20:47:17 +00:00
Michael Pratt
0a5fae2a0e runtime, syscall: reimplement AllThreadsSyscall using only signals.
In issue 50113, we see that a thread blocked in a system call can result
in a hang of AllThreadsSyscall. To resolve this, we must send a signal
to these threads to knock them out of the system call long enough to run
the per-thread syscall.

Stepping back, if we need to send signals anyway, it should be possible
to implement this entire mechanism on top of signals. This CL does so,
vastly simplifying the mechanism, both as a direct result of
newly-unnecessary code as well as some ancillary simplifications to make
things simpler to follow.

Major changes:

* The rest of the mechanism is moved to os_linux.go, with fields in mOS
  instead of m itself.
* 'Fixup' fields and functions are renamed to 'perThreadSyscall' so they
  are more precise about their purpose.
* Rather than getting passed a closure, doAllThreadsSyscall takes the
  syscall number and arguments. This avoids a lot of hairy behavior:
    * The closure may potentially only be live in fields in the M,
      hidden from the GC. Not necessary with no closure.
    * The need to loan out the race context. A direct RawSyscall6 call
      does not require any race context.
    * The closure previously conditionally panicked in strange
      locations, like a signal handler. Now we simply throw.
* All manual fixup synchronization with mPark, sysmon, templateThread,
  sigqueue, etc is gone. The core approach is much simpler:
  doAllThreadsSyscall sends a signal to every thread in allm, which
  executes the system call from the signal handler. We use (SIGRTMIN +
  1), aka SIGSETXID, the same signal used by glibc for this purpose. As
  such, we are careful to only handle this signal on non-cgo binaries.

Synchronization with thread creation is a key part of this CL. The
comment near the top of doAllThreadsSyscall describes the required
synchronization semantics and how they are achieved.

Note that current use of allocmLock protects the state mutations of allm
that are also protected by sched.lock. allocmLock is used instead of
sched.lock simply to avoid holding sched.lock for so long.

Fixes #50113

Change-Id: Ic7ea856dc66cf711731540a54996e08fc986ce84
Reviewed-on: https://go-review.googlesource.com/c/go/+/383434
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-15 15:40:35 +00:00
Rhys Hiltner
a97c527ac4 runtime: add padding to Linux kernel structures
Go exchanges siginfo and sigevent structures with the kernel. They
contain unions, but Go's use is limited to the first few fields. Pad out
the rest so the size Go sees is the same as what the Linux kernel sees.

This is a follow-up to CL 342052 which added the sigevent struct without
padding, and to CL 353136 which added the padding but with an assertion
that confused several type-checkers. It updates the siginfo struct as
well so there are no bad examples in the defs_linux_*.go files.

Reviewed-on: https://go-review.googlesource.com/c/go/+/353136

Change-Id: I9610632ff0ec43eba91f560536f5441fa907b36f
Reviewed-on: https://go-review.googlesource.com/c/go/+/360094
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-02 05:43:05 +00:00
Michael Pratt
7ee4c16654 Revert "runtime: add padding to Linux kernel structures"
This reverts commit f0db7eae74.

Reason for revert: Breaks linux-386 tests

Change-Id: Ia51fbf97460ab52920b67d6db6177ac2d6b0058e
Reviewed-on: https://go-review.googlesource.com/c/go/+/353432
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2021-10-04 22:08:53 +00:00
Rhys Hiltner
f0db7eae74 runtime: add padding to Linux kernel structures
Go exchanges siginfo and sigevent structures with the kernel. They
contain unions, but Go's use is limited to the first few fields. Pad out
the rest so the size Go sees is the same as what the Linux kernel sees.

This is a follow-up to CL 342052 which added the sigevent struct without
padding. It updates the siginfo struct as well so there are no bad
examples in the defs_linux_*.go files.

Change-Id: Id991d4a57826677dd7e6cc30ad113fa3b321cddf
Reviewed-on: https://go-review.googlesource.com/c/go/+/353136
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
2021-10-04 21:10:16 +00:00
Rhys Hiltner
3d795ea798 runtime: add timer_create syscalls for Linux
Updates #35057

Change-Id: Id702b502fa4e4005ba1e450a945bc4420a8a8b8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/342052
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Than McIntosh <thanm@google.com>
2021-09-27 18:57:20 +00:00
Austin Clements
92bda33d27 runtime: revert signal stack mlocking
Go 1.14 included a (rather awful) workaround for a Linux kernel bug
that corrupted vector registers on x86 CPUs during signal delivery
(https://bugzilla.kernel.org/show_bug.cgi?id=205663). This bug was
introduced in Linux 5.2 and fixed in 5.3.15, 5.4.2 and all 5.5 and
later kernels. The fix was also back-ported by major distros. This
workaround was necessary, but had unfortunate downsides, including
causing Go programs to exceed the mlock ulimit in many configurations
(#37436).

We're reasonably confident that by the Go 1.16 release, the number of
systems running affected kernels will be vanishingly small. Hence,
this CL removes this workaround.

This effectively reverts CLs 209597 (version parser), 209899 (mlock
top of signal stack), 210299 (better failure message), 223121 (soft
mlock failure handling), and 244059 (special-case patched Ubuntu
kernels). The one thing we keep is the osArchInit function. It's empty
everywhere now, but is a reasonable hook to have.

Updates #35326, #35777 (the original register corruption bugs).
Updates #40184 (request to revert in 1.15).
Fixes #35979.

Change-Id: Ie213270837095576f1f3ef46bf3de187dc486c50
Reviewed-on: https://go-review.googlesource.com/c/go/+/246200
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-08-13 02:17:17 +00:00
Tobias Klauser
90c71cec5b runtime: remove unused _F_SETFL const on linux
This constant is only used on libc-based platforms (aix, darwin,
solaris).

Change-Id: Ic57d1fe3b1501c5b552eddb9aba11f1e02510082
Reviewed-on: https://go-review.googlesource.com/c/go/+/220421
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-02-24 15:55:01 +00:00
Austin Clements
1c8d1f45ba runtime: mlock top of signal stack on both amd64 and 386
CL 209899 worked around an issue that corrupts vector registers in
recent versions of the Linux kernel by mlocking the top page of every
signal stack on amd64. However, the underlying issue also affects the
XMM registers on 386. This CL applies the mlock fix to both amd64 and
386.

Fixes #35777 (again).

Change-Id: I9886f2dc4c23625421296bd5518d5fd3288bfe48
Reviewed-on: https://go-review.googlesource.com/c/go/+/210345
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-12-09 14:41:00 +00:00
Ian Lance Taylor
3b0aa546d2 runtime: define nonblockingPipe
This requires defining pipe, pipe2, and setNonblock for various platforms.

The new function is currently only used on AIX. It will be used by
later CLs in this series.

Updates #27707

Change-Id: Id2f987b66b4c66a3ef40c22484ff1d14f58e9b31
Reviewed-on: https://go-review.googlesource.com/c/go/+/171822
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-20 21:15:55 +00:00
Ian Lance Taylor
0a7bc8f430 runtime: introduce and consistently use setNsec for timespec
The general code for setting a timespec value sometimes used set_nsec
and sometimes used a combination of set_sec and set_nsec. Standardize
on a setNsec function that takes a number of nanoseconds and splits
them up to set the tv_sec and tv_nsec fields. Consistently mark
setNsec as go:nosplit, since it has to be that way on some systems
including Darwin and GNU/Linux. Consistently use timediv on 32-bit
systems to help stay within split-stack limits on processors that
don't have a 64-bit division instruction.

Change-Id: I6396bb7ddbef171a96876bdeaf7a1c585a6d725b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167389
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-15 03:37:49 +00:00
Tobias Klauser
77f9b2728e runtime: use MADV_FREE on Linux if available
On Linux, sysUnused currently uses madvise(MADV_DONTNEED) to signal the
kernel that a range of allocated memory contains unneeded data. After a
successful call, the range (but not the data it contained before the
call to madvise) is still available but the first access to that range
will unconditionally incur a page fault (needed to 0-fill the range).

A faster alternative is MADV_FREE, available since Linux 4.5. The
mechanism is very similar, but the page fault will only be incurred if
the kernel, between the call to madvise and the first access, decides to
reuse that memory for something else.

In sysUnused, test whether MADV_FREE is supported and fall back to
MADV_DONTNEED in case it isn't. This requires making the return value of
the madvise syscall available to the caller, so change runtime.madvise
to return it.

Fixes #23687

Change-Id: I962c3429000dd9f4a00846461ad128b71201bb04
Reviewed-on: https://go-review.googlesource.com/135395
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-09-18 15:41:44 +00:00
Ian Lance Taylor
d15295c679 runtime: unify handling of alternate signal stack
Change all Unix systems to use stackt for the alternate signal
stack (some were using sigaltstackt). Add OS-specific setSignalstackSP
function to handle different types for ss_sp field, and unify all
OS-specific signalstack functions into one. Unify handling of alternate
signal stack in OS-specific minit and sigtrampgo functions via new
functions minitSignalstack and setGsignalStack.

Change-Id: Idc316dc69b1dd725717acdf61a1cd8b9f33ed174
Reviewed-on: https://go-review.googlesource.com/29757
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-26 04:07:31 +00:00
Tim Cooijmans
34db31d5f5 src/runtime: Add missing defs for android/386.
Change-Id: I63bf6d2fdf41b49ff8783052d5d6c53b20e2f050
Reviewed-on: https://go-review.googlesource.com/13760
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2015-08-27 15:14:41 +00:00
Keith Randall
7e1b61c718 runtime: mark pages we return to kernel as NOHUGEPAGE
We return memory to the kernel with madvise(..., DONTNEED).
Also mark returned memory with NOHUGEPAGE to keep the kernel from
merging this memory into a huge page, effectively reallocating it.

Only known to be a problem on linux/{386,amd64,amd64p32} at the moment.
It may come up on other os/arch combinations in the future.

Fixes #8832

Change-Id: Ifffc6627a0296926e3f189a8a9b6e4bdb54c79eb
Reviewed-on: https://go-review.googlesource.com/5660
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-02-25 21:16:18 +00:00
Austin Clements
f4a525452e [dev.cc] runtime: add explicit siginfo.si_addr field
struct siginfo_t's si_addr field is part of a union.
Previously, we represented this union in Go using an opaque
byte array and accessed the si_addr field using unsafe (and
wrong on 386 and arm!) pointer arithmetic.  Since si_addr is
the only field we use from this union, this replaces the
opaque byte array with an explicit declaration of the si_addr
field and accesses it directly.

LGTM=minux, rsc
R=rsc, minux
CC=golang-codereviews
https://golang.org/cl/179970044
2014-11-19 14:56:49 -05:00
Russ Cox
580cba42f4 [dev.cc] runtime: change set_sec to take int64
Fixes build.
Tested that all these systems can make.bash.

TBR=austin
CC=golang-codereviews
https://golang.org/cl/177770043
2014-11-14 14:50:00 -05:00
Russ Cox
a87e4a2d01 [dev.cc] runtime: fix linux build
TBR=austin
CC=golang-codereviews
https://golang.org/cl/176760044
2014-11-14 12:55:10 -05:00
Russ Cox
580ef3e4af [dev.cc] runtime: convert defs_$GOOS_$GOARCH.h to Go
The conversion was done with an automated tool and then
modified only as necessary to make it compile and run.

In a few cases, defs_$GOOS_$GOARCH.go already existed,
so the target here is defs1_$GOOS_$GOARCH.go.

[This CL is part of the removal of C code from package runtime.
See golang.org/s/dev.cc for an overview.]

LGTM=r
R=r
CC=austin, dvyukov, golang-codereviews, iant, khr
https://golang.org/cl/171490043
2014-11-11 17:07:37 -05:00