Mark nextFreeFast as not inline, as it is too expensive to inline on riscv64.
Also remove riscv64 from non-atomic inline architectures, as we now have
atomic intrisics.
Updates #22239
Change-Id: I6e0e72c1192070e39f065bee486f48df4cc74b35
Reviewed-on: https://go-review.googlesource.com/c/go/+/227808
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Based on riscv-go port.
Updates #27532
Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.
This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.
With these two functions, we also go and replace manual rounding code
with it where we can.
Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
You were a useful port and you've served your purpose.
Thanks for all the play.
A subsequent CL will remove amd64p32 (including assembly files and
toolchain bits) and remaining bits. The amd64p32 removal will be
separated into its own CL in case we want to support the Linux x32 ABI
in the future and want our old amd64p32 support as a starting point.
Updates #30439
Change-Id: Ia3a0c7d49804adc87bf52a4dea7e3d3007f2b1cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/199499
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Avoid interface calls, enable inlining, and store the rngSource close to the
Mutex to exploit better memory locality.
Also add a benchmark to properly measure the threadsafe nature of globalRand.
On a linux/amd64 VM:
name old time/op new time/op delta
Int63Threadsafe-4 36.4ns ±12% 20.6ns ±11% -43.52% (p=0.000 n=30+30)
Int63ThreadsafeParallel-4 79.3ns ± 5% 56.5ns ± 5% -28.69% (p=0.000 n=29+30)
Change-Id: I6ab912c1a1e9afc7bacd8e72c82d4d50d546a510
Reviewed-on: https://go-review.googlesource.com/c/go/+/191538
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
After golang.org/cl/33895, function adjustctx can not be inlined,
cost 82 exceeds budget 80
Change-Id: Ie559ed80ea2c251add940a99f11b2983f6cbddbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/187977
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Most of the decoding time is spent in the first Decode loop, since the
rest of the function only deals with the few remaining bytes. Any
unnecessary work done in that loop body matters tremendously.
One such unnecessary bottleneck was the use of the enc.decodeMap table.
Since enc is a pointer receiver, and the field is used within the
non-inlineable function decode64, the decoder must perform a nil check
at every iteration.
To fix that, move the enc.decodeMap uses to the parent function, where
we can lift the nil check outside the loop. That gives roughly a 15%
speed-up. The function no longer performs decoding per se, so rename it.
While at it, remove the now unnecessary receivers.
An unfortunate side effect of this change is that the loop now contains
eight bounds checks on src instead of just one. However, not having to
slice src plus the nil check removal well outweigh the added cost.
The other piece that made decode64 slow was that it wasn't inlined, and
had multiple branches. Use a simple bitwise-or trick suggested by Roger
Peppe, and collapse the rest of the bitwise logic into a single
expression. Inlinability and the reduced branching give a further 10%
speed-up.
Finally, add these two functions to TestIntendedInlining, since we want
them to stay inlinable.
Apply the same refactor to decode32 for consistency, and to let 32-bit
architectures see a similar performance gain for large inputs.
name old time/op new time/op delta
DecodeString/2-8 47.3ns ± 1% 45.8ns ± 0% -3.28% (p=0.002 n=6+6)
DecodeString/4-8 55.8ns ± 2% 51.5ns ± 0% -7.71% (p=0.004 n=5+6)
DecodeString/8-8 64.9ns ± 0% 61.7ns ± 0% -4.99% (p=0.004 n=5+6)
DecodeString/64-8 238ns ± 0% 198ns ± 0% -16.54% (p=0.002 n=6+6)
DecodeString/8192-8 19.5µs ± 0% 14.6µs ± 0% -24.96% (p=0.004 n=6+5)
name old speed new speed delta
DecodeString/2-8 84.6MB/s ± 1% 87.4MB/s ± 0% +3.38% (p=0.002 n=6+6)
DecodeString/4-8 143MB/s ± 2% 155MB/s ± 0% +8.41% (p=0.004 n=5+6)
DecodeString/8-8 185MB/s ± 0% 195MB/s ± 0% +5.29% (p=0.004 n=5+6)
DecodeString/64-8 369MB/s ± 0% 442MB/s ± 0% +19.78% (p=0.002 n=6+6)
DecodeString/8192-8 560MB/s ± 0% 746MB/s ± 0% +33.27% (p=0.004 n=6+5)
Updates #19636.
Change-Id: Ib839577b0e3f5a2bb201f5cae580c61365d92894
Reviewed-on: https://go-review.googlesource.com/c/go/+/151177
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: roger peppe <rogpeppe@gmail.com>
mustBe was barely over budget, so manually inlining the first flag.kind
call is enough. Add a TODO to reverse that in the future, once the
compiler gets better.
mustBeExported and mustBeAssignable were over budget by a larger amount,
so add slow path functions instead. This is the same strategy used in
the sync package for common methods like Once.Do, for example.
Lots of exported reflect.Value methods call these assert-like unexported
methods, so avoiding the function call overhead in the common case does
shave off a percent from most exported APIs.
Finally, add the methods to TestIntendedInlining.
While at it, replace a couple of uses of the 0 Kind with its descriptive
name, Invalid.
name old time/op new time/op delta
Call-8 68.0ns ± 1% 66.8ns ± 1% -1.81% (p=0.000 n=10+9)
PtrTo-8 8.00ns ± 2% 7.83ns ± 0% -2.19% (p=0.000 n=10+9)
Updates #7818.
Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761
Reviewed-on: https://go-review.googlesource.com/c/go/+/166462
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Value.CanInterface and Value.pointer are now inlinable, since we have a
limited form of mid-stack inlining. Their calls to panic were preventing
that in previous Go releases. The other three methods still go over
budget, so update that comment.
In recent commits, sync.Once.Do and multiple lock/unlock methods have
also been made inlinable, so add those as well. They have standalone
tests like test/inline_sync.go already, but it's best if the funcs are
in this global test table too. They aren't inlinable on every platform
yet, though.
Finally, use math/bits.UintSize to check if GOARCH is 64-bit, now that
we can.
Change-Id: I65cc681b77015f7746dba3126637e236dcd494e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/166461
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
golang.org/cl/151977 slightly decreased the cost of inlining an extra
call from 60 to 57, since it was a safe change that could help in some
scenarios.
One notable change spotted in that CL is that bytes.Buffer.Grow is now
inlinable, meaning that a fixedbugs test needed updating.
For consistency, add the test case to TestIntendedInlining too,
alongside other commonly used bytes.Buffer methods.
Change-Id: I4fb402fc684ef4c543fc65aea343ca1a4d73a189
Reviewed-on: https://go-review.googlesource.com/c/151979
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This adds a debug check to throw immediately if any pointers are added
to the gcWork buffer after the mark completion barrier. The intent is
to catch the source of the cached GC work that occasionally produces
"P has cached GC work at end of mark termination" failures.
The result should be that we get "throwOnGCWork" throws instead of "P
has cached GC work at end of mark termination" throws, but with useful
stack traces.
This should be reverted before the release. I've been unable to
reproduce this issue locally, but this issue appears fairly regularly
on the builders, so the intent is to catch it on the builders.
This probably slows down the GC slightly.
For #27993.
Change-Id: I5035e14058ad313bfbd3d68c41ec05179147a85c
Reviewed-on: https://go-review.googlesource.com/c/149969
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change introduces two optimizations together,
one for recursive and one for non-recursive stacks.
For recursive stacks, we introduce the new entry
at the beginning of the cache, so it can be found first.
This adds an extra read and write.
While we're here, switch from fastrandn, which does a multiply,
to fastrand % n, which does a shift.
For non-recursive stacks, split the cache from [16]pcvalueCacheEnt
into [2][8]pcvalueCacheEnt, and add a very cheap associative lookup.
name old time/op new time/op delta
StackCopyPtr-8 118ms ± 1% 106ms ± 2% -9.56% (p=0.000 n=17+18)
StackCopy-8 95.8ms ± 1% 87.0ms ± 3% -9.11% (p=0.000 n=19+20)
StackCopyNoCache-8 135ms ± 2% 139ms ± 1% +3.06% (p=0.000 n=19+18)
During make.bash, the association function used has this return distribution:
percent count return value
53.23% 678797 1
46.74% 596094 0
It is definitely not perfect, but it is pretty good,
and that's all we need.
Change-Id: I2cabb1d26b99c5111bc28f427016a2a5e6c620fd
Reviewed-on: https://go-review.googlesource.com/c/110564
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
All uses of these have been converted to use runtime/internal/math
functions for overflow checking.
Fixes#21588
Change-Id: I0ba57028e471803dc7d445e66d77a8f87edfdafb
Reviewed-on: https://go-review.googlesource.com/c/144037
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This CL adds a new internal math package for use by the runtime.
The new package exports a MulUintptr function with uintptr arguments
a and b and returns uintptr(a*b) and whether the full-width product
x*y does overflow the uintptr value range (uintptr(x*y) != x*y).
Uses of MulUinptr in the runtime and intrinsics for performance
will be added in followup CLs.
Updates #21588
Change-Id: Ia5a02eeabc955249118e4edf68c67d9fc0858058
Reviewed-on: https://go-review.googlesource.com/c/91755
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Now the registration phase looks like:
var cases [4]runtime.scases
var order [8]uint16
selectsend(&cases[0], c1, &v1)
selectrecv(&cases[1], c2, &v2, nil)
selectrecv(&cases[2], c3, &v3, &ok)
selectdefault(&cases[3])
chosen := selectgo(&cases[0], &order[0], 4)
Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.
As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.
Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
getArgInfo is called a lot during stack copying.
In the common case it doesn't do much work,
but it cannot be inlined.
This change works around that.
name old time/op new time/op delta
StackCopyPtr-8 108ms ± 5% 96ms ± 4% -10.40% (p=0.000 n=20+20)
StackCopy-8 82.6ms ± 3% 78.4ms ± 6% -5.15% (p=0.000 n=19+20)
StackCopyNoCache-8 130ms ± 3% 122ms ± 3% -6.44% (p=0.000 n=20+20)
Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
Reviewed-on: https://go-review.googlesource.com/108945
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently, bulkBarrierPreWrite uses inheap to decide whether the
destination is in the heap or whether to check for stack or global
data. However, this isn't the best question to ask.
Instead, get the span directly and query its state. This lets us
directly determine whether this might be a global, or is stack memory,
or is heap memory.
At this point, inheap is no longer used in the hot path, so drop it
from the must-be-inlined list and substitute spanOf.
This will help in a circuitous way with #23862, since fixing that is
going to push inheap very slightly over the inline-able threshold on a
few platforms.
Change-Id: I5360fc1181183598502409f12979899e1e4d45f7
Reviewed-on: https://go-review.googlesource.com/95495
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
It has always been problematic that there was no way to specify
tool flags that applied only to the build of certain packages;
it was only to specify flags for all packages being built.
The usual workaround was to install all dependencies of something,
then build just that one thing with different flags. Since the
dependencies appeared to be up-to-date, they were not rebuilt
with the different flags. The new content-based staleness
(up-to-date) checks see through this trick, because they detect
changes in flags. This forces us to address the underlying problem
of providing a way to specify per-package flags.
The solution is to allow -gcflags=pattern=flags, which means
that flags apply to packages matching pattern, in addition to the
usual -gcflags=flags, which is now redefined to apply only to
the packages named on the command line.
See #22527 for discussion and rationale.
Fixes#22527.
Change-Id: I6716bed69edc324767f707b5bbf3aaa90e8e7302
Reviewed-on: https://go-review.googlesource.com/76551
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Since inlining budget calculation is fixed in CL 70151
runtime.nextFreeFast is no longer inlineable on MIPS64x because
it does not support Ctz64 as intrinsic. Skip the test.
Updates #22239.
Change-Id: Id00d55628ddb4b48d27aebfa10377a896765d569
Reviewed-on: https://go-review.googlesource.com/72271
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Add the package to the table and start it off with a few small, basic
functions. Inspired by CL 66331, which added flag.ro.
Updates #21851.
Change-Id: I3995cde1ff7bb09a718110473bed8b193c2232a5
Reviewed-on: https://go-review.googlesource.com/66990
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is based on a list that Austin Clements provided in mid-2016. It is
mostly untouched, except for the fact that the wbufptr funcs were
removed from the runtime thus removed from the lits here too.
Add a section for these GC funcs, since there are quite a lot of them
and the runtime has tons of funcs that we want to inline. As before,
sort this section too.
Also place some of these funcs out of the GC section, as they are not
directly related to the GC.
Updates #21851.
Change-Id: I35eb777a4c50b5f655618920dc2bc568c7c30ff5
Reviewed-on: https://go-review.googlesource.com/65654
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
In golang.org/cl/42813, a test was added in the bytes package to check
if a Buffer method was being inlined, using 'go tool nm'.
Now that we have a compiler test that verifies that certain funcs are
inlineable, merge it there. Knowing whether the funcs are inlineable is
also more reliable than whether or not their symbol appears in the
binary, too. For example, under some circumstances, inlineable funcs
can't be inlined, such as if closures are used.
While at it, add a few more bytes.Buffer methods that are currently
inlined and should clearly stay that way.
Updates #21851.
Change-Id: I62066e32ef5542d37908bd64f90bda51276da4de
Reviewed-on: https://go-review.googlesource.com/65658
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The reason why adjustctxt wasn't being inlined was reported as:
function too complex: cost 92 exceeds budget 80
However, after tweaking the code to be under the budget limit, we see
the real blocker:
non-leaf function
There is little we can do about this one in particular at the moment.
Create a section with funcs that will need mid-stack inlining to be
inlineable, since this will likely come up again in other cases.
Change-Id: I3a8eb1546b289a060ac896506a007b0496946e84
Reviewed-on: https://go-review.googlesource.com/65650
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This is based from a list that Keith Randall provided in mid-2016. These
are all funcs that, at the time, were important and small enough that
they should be clearly inlined.
The runtime has changed a bit since then. Ctz16 and Ctz8 were removed,
so don't add them. stringtoslicebytetmp was moved to the backend, so
it's no longer a Go function. And itabhash was moved to itabHashFunc.
The only other outlier is adjustctxt, which is not inlineable at the
moment. I've added a TODO and will address it myself in a separate
commit.
While at it, error if any funcs in the input table are duplicated.
They're never useful and typos could lead to unintentionally thinking a
function is inlineable when it actually isn't.
And, since the lists are getting long, start sorting alphabetically.
Finally, rotl_31 is only defined on 64-bit architectures, and the added
runtime/internal/sys funcs are assembly on 386 and thus non-inlineable
in that case.
Updates #21851.
Change-Id: Ib99ab53d777860270e8fd4aefc41adb448f13662
Reviewed-on: https://go-review.googlesource.com/65351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
If we use -gcflags='-m -m', the compiler should give us a reason why a
func couldn't be inlined. Add the extra -m necessary for that extra info
and use it to give better test failures. For example, for the func in
the TODO:
--- FAIL: TestIntendedInlining (1.53s)
inl_test.go:104: runtime.nextFreeFast was not inlined: function too complex
We might increase the number of -m flags to get more information at some
later point, such as getting details on how close the func was to the
inlining budget.
Also started using regexes, as the output parsing is getting a bit too
complex for manual string handling.
While at it, also refactored the test to not buffer the entire output
into memory. This is fine in practice, but it won't scale well as we add
more packages or we depend more on the compiler's debugging output.
For example, "go build -a -gcflags='-m -m' std" prints nearly 40MB of
plaintext - and we only need to see the output line by line anyway.
Updates #21851.
Change-Id: I00986ff360eb56e4e9737b65a6be749ef8540643
Reviewed-on: https://go-review.googlesource.com/63810
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
https://golang.org/cl/22598 made nextFreeFast inlinable.
But during https://golang.org/cl/63611 it was discovered, that it is no longer inlinable.
Reduce number of statements below inlining threshold to make it inlinable again.
Also update tests, to prevent regressions.
Doesn't reduce readability.
Change-Id: Ia672784dd48ed3b1ab46e390132f1094fe453de5
Reviewed-on: https://go-review.googlesource.com/65030
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Rework the test to work with any number of std packages. This was done
to include a few funcs from unicode/utf8. Adding more will be much
simpler too.
While at it, add more runtime funcs by searching for "inlined" or
"inlining" in the git log of its directory. These are: addb, subtractb,
fastrand and noescape.
Updates #21851.
Change-Id: I4fb2bd8aa6a5054218f9b36cb19d897ac533710e
Reviewed-on: https://go-review.googlesource.com/63611
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Move it from the runtime package, as we will soon add more packages and
functions for it to check.
The test used the testEnv func, which cleaned certain environment
variables from a command, so it was moved to internal/testenv under a
more descriptive (and less ambiguous) name. Add a simple godoc to it
too.
For #21851.
Change-Id: I6f39c1f23b45377718355fafe66ffd87047d8ab6
Reviewed-on: https://go-review.googlesource.com/63550
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>