Commit graph

34 commits

Author SHA1 Message Date
Joel Sing
d3d21d0a42 cmd/compile: update TestIntendedInlining for riscv64
Mark nextFreeFast as not inline, as it is too expensive to inline on riscv64.
Also remove riscv64 from non-atomic inline architectures, as we now have
atomic intrisics.

Updates #22239

Change-Id: I6e0e72c1192070e39f065bee486f48df4cc74b35
Reviewed-on: https://go-review.googlesource.com/c/go/+/227808
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-13 17:29:23 +00:00
Joel Sing
98d2717499 cmd/compile: implement compiler for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-18 14:41:40 +00:00
Michael Anthony Knyszek
383b447e0d runtime: clean up power-of-two rounding code with align functions
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.

This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.

With these two functions, we also go and replace manual rounding code
with it where we can.

Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-04 23:41:34 +00:00
Brad Fitzpatrick
a38a917aee all: remove the nacl port (part 1)
You were a useful port and you've served your purpose.
Thanks for all the play.

A subsequent CL will remove amd64p32 (including assembly files and
toolchain bits) and remaining bits. The amd64p32 removal will be
separated into its own CL in case we want to support the Linux x32 ABI
in the future and want our old amd64p32 support as a starting point.

Updates #30439

Change-Id: Ia3a0c7d49804adc87bf52a4dea7e3d3007f2b1cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/199499
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-09 06:14:44 +00:00
Carlo Alberto Ferraris
931365763a math/rand: devirtualize interface in lockedSource
Avoid interface calls, enable inlining, and store the rngSource close to the
Mutex to exploit better memory locality.

Also add a benchmark to properly measure the threadsafe nature of globalRand.

On a linux/amd64 VM:

name                       old time/op  new time/op  delta
Int63Threadsafe-4          36.4ns ±12%  20.6ns ±11%  -43.52%  (p=0.000 n=30+30)
Int63ThreadsafeParallel-4  79.3ns ± 5%  56.5ns ± 5%  -28.69%  (p=0.000 n=29+30)

Change-Id: I6ab912c1a1e9afc7bacd8e72c82d4d50d546a510
Reviewed-on: https://go-review.googlesource.com/c/go/+/191538
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-29 14:27:05 +00:00
LE Manh Cuong
324cf21f78 cmd/compile: remove adjustctx from inline test
After golang.org/cl/33895, function adjustctx can not be inlined,
cost 82 exceeds budget 80

Change-Id: Ie559ed80ea2c251add940a99f11b2983f6cbddbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/187977
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 16:47:06 +00:00
Daniel Martí
5f40351708 encoding/base64: speed up the decoder
Most of the decoding time is spent in the first Decode loop, since the
rest of the function only deals with the few remaining bytes. Any
unnecessary work done in that loop body matters tremendously.

One such unnecessary bottleneck was the use of the enc.decodeMap table.
Since enc is a pointer receiver, and the field is used within the
non-inlineable function decode64, the decoder must perform a nil check
at every iteration.

To fix that, move the enc.decodeMap uses to the parent function, where
we can lift the nil check outside the loop. That gives roughly a 15%
speed-up. The function no longer performs decoding per se, so rename it.
While at it, remove the now unnecessary receivers.

An unfortunate side effect of this change is that the loop now contains
eight bounds checks on src instead of just one. However, not having to
slice src plus the nil check removal well outweigh the added cost.

The other piece that made decode64 slow was that it wasn't inlined, and
had multiple branches. Use a simple bitwise-or trick suggested by Roger
Peppe, and collapse the rest of the bitwise logic into a single
expression. Inlinability and the reduced branching give a further 10%
speed-up.

Finally, add these two functions to TestIntendedInlining, since we want
them to stay inlinable.

Apply the same refactor to decode32 for consistency, and to let 32-bit
architectures see a similar performance gain for large inputs.

name                 old time/op    new time/op    delta
DecodeString/2-8       47.3ns ± 1%    45.8ns ± 0%   -3.28%  (p=0.002 n=6+6)
DecodeString/4-8       55.8ns ± 2%    51.5ns ± 0%   -7.71%  (p=0.004 n=5+6)
DecodeString/8-8       64.9ns ± 0%    61.7ns ± 0%   -4.99%  (p=0.004 n=5+6)
DecodeString/64-8       238ns ± 0%     198ns ± 0%  -16.54%  (p=0.002 n=6+6)
DecodeString/8192-8    19.5µs ± 0%    14.6µs ± 0%  -24.96%  (p=0.004 n=6+5)

name                 old speed      new speed      delta
DecodeString/2-8     84.6MB/s ± 1%  87.4MB/s ± 0%   +3.38%  (p=0.002 n=6+6)
DecodeString/4-8      143MB/s ± 2%   155MB/s ± 0%   +8.41%  (p=0.004 n=5+6)
DecodeString/8-8      185MB/s ± 0%   195MB/s ± 0%   +5.29%  (p=0.004 n=5+6)
DecodeString/64-8     369MB/s ± 0%   442MB/s ± 0%  +19.78%  (p=0.002 n=6+6)
DecodeString/8192-8   560MB/s ± 0%   746MB/s ± 0%  +33.27%  (p=0.004 n=6+5)

Updates #19636.

Change-Id: Ib839577b0e3f5a2bb201f5cae580c61365d92894
Reviewed-on: https://go-review.googlesource.com/c/go/+/151177
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: roger peppe <rogpeppe@gmail.com>
2019-03-13 10:39:58 +00:00
Josh Bleecher Snyder
243c8eb8c2 cmd/compile: add pure Go math/big functions to TestIntendedInlining
Change-Id: Id29a9e48a09965e457f923a0ff023722b38b27ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/165157
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-09 21:26:04 +00:00
Daniel Martí
788e038e5d reflect: make all flag.mustBe* methods inlinable
mustBe was barely over budget, so manually inlining the first flag.kind
call is enough. Add a TODO to reverse that in the future, once the
compiler gets better.

mustBeExported and mustBeAssignable were over budget by a larger amount,
so add slow path functions instead. This is the same strategy used in
the sync package for common methods like Once.Do, for example.

Lots of exported reflect.Value methods call these assert-like unexported
methods, so avoiding the function call overhead in the common case does
shave off a percent from most exported APIs.

Finally, add the methods to TestIntendedInlining.

While at it, replace a couple of uses of the 0 Kind with its descriptive
name, Invalid.

name     old time/op    new time/op    delta
Call-8     68.0ns ± 1%    66.8ns ± 1%  -1.81%  (p=0.000 n=10+9)
PtrTo-8    8.00ns ± 2%    7.83ns ± 0%  -2.19%  (p=0.000 n=10+9)

Updates #7818.

Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761
Reviewed-on: https://go-review.googlesource.com/c/go/+/166462
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-09 19:50:24 +00:00
Daniel Martí
cc5dc00150 cmd/compile: update TestIntendedInlining
Value.CanInterface and Value.pointer are now inlinable, since we have a
limited form of mid-stack inlining. Their calls to panic were preventing
that in previous Go releases. The other three methods still go over
budget, so update that comment.

In recent commits, sync.Once.Do and multiple lock/unlock methods have
also been made inlinable, so add those as well. They have standalone
tests like test/inline_sync.go already, but it's best if the funcs are
in this global test table too. They aren't inlinable on every platform
yet, though.

Finally, use math/bits.UintSize to check if GOARCH is 64-bit, now that
we can.

Change-Id: I65cc681b77015f7746dba3126637e236dcd494e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/166461
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-09 18:39:22 +00:00
Leon Klingele
d090429ea9 all: fix typos as reported by 'misspell'
Change-Id: I904b8655f21743189814bccf24073b6fbb9fc56d
GitHub-Last-Rev: b032c14394
GitHub-Pull-Request: golang/go#29997
Reviewed-on: https://go-review.googlesource.com/c/160421
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-02-26 23:02:05 +00:00
Daniel Martí
b397248168 cmd/compile: add Buffer.Grow to TestIntendedInlining
golang.org/cl/151977 slightly decreased the cost of inlining an extra
call from 60 to 57, since it was a safe change that could help in some
scenarios.

One notable change spotted in that CL is that bytes.Buffer.Grow is now
inlinable, meaning that a fixedbugs test needed updating.

For consistency, add the test case to TestIntendedInlining too,
alongside other commonly used bytes.Buffer methods.

Change-Id: I4fb402fc684ef4c543fc65aea343ca1a4d73a189
Reviewed-on: https://go-review.googlesource.com/c/151979
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-01 17:31:02 +00:00
Austin Clements
9098d1d854 runtime: debug code to catch bad gcWork.puts
This adds a debug check to throw immediately if any pointers are added
to the gcWork buffer after the mark completion barrier. The intent is
to catch the source of the cached GC work that occasionally produces
"P has cached GC work at end of mark termination" failures.

The result should be that we get "throwOnGCWork" throws instead of "P
has cached GC work at end of mark termination" throws, but with useful
stack traces.

This should be reverted before the release. I've been unable to
reproduce this issue locally, but this issue appears fairly regularly
on the builders, so the intent is to catch it on the builders.

This probably slows down the GC slightly.

For #27993.

Change-Id: I5035e14058ad313bfbd3d68c41ec05179147a85c
Reviewed-on: https://go-review.googlesource.com/c/149969
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-21 16:28:17 +00:00
Josh Bleecher Snyder
7d6b5e340c runtime: reduce linear search through pcvalue cache
This change introduces two optimizations together,
one for recursive and one for non-recursive stacks.

For recursive stacks, we introduce the new entry
at the beginning of the cache, so it can be found first.
This adds an extra read and write.
While we're here, switch from fastrandn, which does a multiply,
to fastrand % n, which does a shift.

For non-recursive stacks, split the cache from [16]pcvalueCacheEnt
into [2][8]pcvalueCacheEnt, and add a very cheap associative lookup.

name                old time/op  new time/op  delta
StackCopyPtr-8       118ms ± 1%   106ms ± 2%  -9.56%  (p=0.000 n=17+18)
StackCopy-8         95.8ms ± 1%  87.0ms ± 3%  -9.11%  (p=0.000 n=19+20)
StackCopyNoCache-8   135ms ± 2%   139ms ± 1%  +3.06%  (p=0.000 n=19+18)

During make.bash, the association function used has this return distribution:

percent count  return value
 53.23% 678797 1
 46.74% 596094 0

It is definitely not perfect, but it is pretty good,
and that's all we need.

Change-Id: I2cabb1d26b99c5111bc28f427016a2a5e6c620fd
Reviewed-on: https://go-review.googlesource.com/c/110564
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-11-09 16:06:56 +00:00
Martin Möhrmann
642792350c runtime: remove unused maxSliceCap function and maxElems array
All uses of these have been converted to use runtime/internal/math
functions for overflow checking.

Fixes #21588

Change-Id: I0ba57028e471803dc7d445e66d77a8f87edfdafb
Reviewed-on: https://go-review.googlesource.com/c/144037
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-23 16:58:43 +00:00
Ilya Tocar
fa913a36a2 cmd/compile/internal/gc: inline autogenerated (*T).M wrappers
Currently all inlining of autogenerated wrappers is disabled,
because it causes build failures, when indexed export format is enabled.
Turns out we can reenable it for common case of (*T).M wrappers.
This fixes most performance degradation of 1.11 vs 1.10.

encoding/binary:
name                    old time/op   new time/op   delta
ReadSlice1000Int32s-6    14.8µs ± 2%   11.5µs ± 2%  -22.01%  (p=0.000 n=10+10)
WriteSlice1000Int32s-6   14.8µs ± 2%   11.7µs ± 2%  -20.95%  (p=0.000 n=10+10)

bufio:
name           old time/op    new time/op    delta
WriterFlush-6    32.4ns ± 1%    28.8ns ± 0%  -11.17%  (p=0.000 n=9+10)

sort:
SearchWrappers-6       231ns ± 1%   231ns ± 0%     ~     (p=0.129 n=9+10)
SortString1K-6         365µs ± 1%   298µs ± 1%  -18.43%  (p=0.000 n=9+10)
SortString1K_Slice-6   274µs ± 2%   276µs ± 1%     ~     (p=0.105 n=10+10)
StableString1K-6       490µs ± 1%   373µs ± 1%  -23.73%  (p=0.000 n=10+10)
SortInt1K-6            210µs ± 1%   142µs ± 1%  -32.69%  (p=0.000 n=10+10)
StableInt1K-6          243µs ± 0%   151µs ± 1%  -37.75%  (p=0.000 n=10+10)
StableInt1K_Slice-6    130µs ± 1%   130µs ± 0%     ~     (p=0.237 n=10+8)
SortInt64K-6          19.9ms ± 1%  13.5ms ± 1%  -32.32%  (p=0.000 n=10+10)
SortInt64K_Slice-6    11.5ms ± 1%  11.5ms ± 1%     ~     (p=0.912 n=10+10)
StableInt64K-6        21.5ms ± 0%  13.5ms ± 1%  -37.30%  (p=0.000 n=9+10)
Sort1e2-6              108µs ± 2%    83µs ± 3%  -23.26%  (p=0.000 n=10+10)
Stable1e2-6            218µs ± 0%   161µs ± 1%  -25.99%  (p=0.000 n=8+9)
Sort1e4-6             22.6ms ± 1%  16.8ms ± 0%  -25.45%  (p=0.000 n=10+7)
Stable1e4-6           67.6ms ± 1%  49.7ms ± 0%  -26.48%  (p=0.000 n=10+10)
Sort1e6-6              3.44s ± 0%   2.55s ± 1%  -26.05%  (p=0.000 n=8+9)
Stable1e6-6            13.7s ± 0%    9.9s ± 1%  -27.68%  (p=0.000 n=8+10)

Fixes #27621
Updates #25338

Change-Id: I6fe633202f63fa829a6ab849c44d7e45f8835dff
Reviewed-on: https://go-review.googlesource.com/c/135697
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-10-16 22:02:06 +00:00
Martin Möhrmann
c9130cae9a runtime/internal/math: add multiplication with overflow check
This CL adds a new internal math package for use by the runtime.
The new package exports a MulUintptr function with uintptr arguments
a and b and returns uintptr(a*b) and whether the full-width product
x*y does overflow the uintptr value range (uintptr(x*y) != x*y).

Uses of MulUinptr in the runtime and intrinsics for performance
will be added in followup CLs.

Updates #21588

Change-Id: Ia5a02eeabc955249118e4edf68c67d9fc0858058
Reviewed-on: https://go-review.googlesource.com/c/91755
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 17:58:06 +00:00
Josh Bleecher Snyder
31cfa7f2f2 runtime: allow inlining of stackmapdata
Also do very minor code cleanup.

name                old time/op  new time/op  delta
StackCopyPtr-8      84.8ms ± 6%  82.9ms ± 5%  -2.19%  (p=0.000 n=95+94)
StackCopy-8         68.4ms ± 5%  65.3ms ± 4%  -4.54%  (p=0.000 n=99+99)
StackCopyNoCache-8   107ms ± 2%   105ms ± 2%  -2.13%  (p=0.000 n=91+95)

Change-Id: I2d85ede48bffada9584d437a08a82212c0da6d00
Reviewed-on: https://go-review.googlesource.com/109001
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 19:14:17 +00:00
Matthew Dempsky
3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
Josh Bleecher Snyder
13cd006139 runtime: add fast version of getArgInfo
getArgInfo is called a lot during stack copying.
In the common case it doesn't do much work,
but it cannot be inlined.

This change works around that.

name                old time/op  new time/op  delta
StackCopyPtr-8       108ms ± 5%    96ms ± 4%  -10.40%  (p=0.000 n=20+20)
StackCopy-8         82.6ms ± 3%  78.4ms ± 6%   -5.15%  (p=0.000 n=19+20)
StackCopyNoCache-8   130ms ± 3%   122ms ± 3%   -6.44%  (p=0.000 n=20+20)

Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
Reviewed-on: https://go-review.googlesource.com/108945
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-04-29 03:33:09 +00:00
isharipo
d2a5263a9c math/big: speedup nat.setBytes for bigger slices
Set up to _S (number of bytes in Uint) bytes at time
by using BigEndian.Uint32 and BigEndian.Uint64.

The performance improves for slices bigger than _S bytes.
This is the case for 128/256bit arith that initializes
it's objects from bytes.

name               old time/op  new time/op  delta
NatSetBytes/8-4    29.8ns ± 1%  11.4ns ± 0%  -61.63%  (p=0.000 n=9+8)
NatSetBytes/24-4    109ns ± 1%    56ns ± 0%  -48.75%  (p=0.000 n=9+8)
NatSetBytes/128-4   420ns ± 2%   110ns ± 1%  -73.83%  (p=0.000 n=10+10)
NatSetBytes/7-4    26.2ns ± 1%  21.3ns ± 2%  -18.63%  (p=0.000 n=8+9)
NatSetBytes/23-4    106ns ± 1%    67ns ± 1%  -36.93%  (p=0.000 n=9+10)
NatSetBytes/127-4   410ns ± 2%   121ns ± 0%  -70.46%  (p=0.000 n=9+8)

Found this optimization opportunity by looking at ethereum_corevm
community benchmark cpuprofile.

name        old time/op  new time/op  delta
OpDiv256-4   715ns ± 1%   596ns ± 1%  -16.57%  (p=0.008 n=5+5)
OpDiv128-4   373ns ± 1%   314ns ± 1%  -15.83%  (p=0.008 n=5+5)
OpDiv64-4    301ns ± 0%   285ns ± 1%   -5.12%  (p=0.008 n=5+5)

Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57
Reviewed-on: https://go-review.googlesource.com/98775
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-08 18:50:10 +00:00
Austin Clements
3e214e5693 runtime: simplify bulkBarrierPreWrite
Currently, bulkBarrierPreWrite uses inheap to decide whether the
destination is in the heap or whether to check for stack or global
data. However, this isn't the best question to ask.

Instead, get the span directly and query its state. This lets us
directly determine whether this might be a global, or is stack memory,
or is heap memory.

At this point, inheap is no longer used in the hot path, so drop it
from the must-be-inlined list and substitute spanOf.

This will help in a circuitous way with #23862, since fixing that is
going to push inheap very slightly over the inline-able threshold on a
few platforms.

Change-Id: I5360fc1181183598502409f12979899e1e4d45f7
Reviewed-on: https://go-review.googlesource.com/95495
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:33 +00:00
Russ Cox
5993251c01 cmd/go: implement per-package asmflags, gcflags, ldflags, gccgoflags
It has always been problematic that there was no way to specify
tool flags that applied only to the build of certain packages;
it was only to specify flags for all packages being built.
The usual workaround was to install all dependencies of something,
then build just that one thing with different flags. Since the
dependencies appeared to be up-to-date, they were not rebuilt
with the different flags. The new content-based staleness
(up-to-date) checks see through this trick, because they detect
changes in flags. This forces us to address the underlying problem
of providing a way to specify per-package flags.

The solution is to allow -gcflags=pattern=flags, which means
that flags apply to packages matching pattern, in addition to the
usual -gcflags=flags, which is now redefined to apply only to
the packages named on the command line.

See #22527 for discussion and rationale.

Fixes #22527.

Change-Id: I6716bed69edc324767f707b5bbf3aaa90e8e7302
Reviewed-on: https://go-review.googlesource.com/76551
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-11-09 15:04:04 +00:00
Cherry Zhang
5c18a3ca70 cmd/compile: skip runtime.nextFreeFast inlining test on MIPS64x
Since inlining budget calculation is fixed in CL 70151
runtime.nextFreeFast is no longer inlineable on MIPS64x because
it does not support Ctz64 as intrinsic. Skip the test.

Updates #22239.

Change-Id: Id00d55628ddb4b48d27aebfa10377a896765d569
Reviewed-on: https://go-review.googlesource.com/72271
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-20 20:20:15 +00:00
Daniel Martí
a2993456bb cmd/compile: add reflect to TestIntendedInlining
Add the package to the table and start it off with a few small, basic
functions. Inspired by CL 66331, which added flag.ro.

Updates #21851.

Change-Id: I3995cde1ff7bb09a718110473bed8b193c2232a5
Reviewed-on: https://go-review.googlesource.com/66990
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-28 21:09:46 +00:00
Daniel Martí
4d3d333468 cmd/compile: add runtime GC funcs to inlining test
This is based on a list that Austin Clements provided in mid-2016. It is
mostly untouched, except for the fact that the wbufptr funcs were
removed from the runtime thus removed from the lits here too.

Add a section for these GC funcs, since there are quite a lot of them
and the runtime has tons of funcs that we want to inline. As before,
sort this section too.

Also place some of these funcs out of the GC section, as they are not
directly related to the GC.

Updates #21851.

Change-Id: I35eb777a4c50b5f655618920dc2bc568c7c30ff5
Reviewed-on: https://go-review.googlesource.com/65654
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-25 21:55:06 +00:00
Ilya Tocar
1ae67965e4 regexp: make (*bitState).push inlinable
By refactoring job.arg from int with 0/1 as the only valid values into bool
and simplifying (*bitState).push, we reduce the number of nodes below the inlining threshold.
This improves backtracking regexp performance by 5-10% and go1 geomean  by 1.7%
Full performance data below:

name                                      old time/op    new time/op     delta
Find-6                                       510ns ± 0%      480ns ± 1%   -5.90%  (p=0.000 n=10+10)
FindString-6                                 504ns ± 1%      479ns ± 1%   -5.10%  (p=0.000 n=10+10)
FindSubmatch-6                               689ns ± 1%      659ns ± 1%   -4.27%  (p=0.000 n=9+10)
FindStringSubmatch-6                         659ns ± 0%      628ns ± 1%   -4.69%  (p=0.000 n=8+10)
Literal-6                                    174ns ± 1%      171ns ± 1%   -1.50%  (p=0.000 n=10+10)
NotLiteral-6                                2.89µs ± 1%     2.72µs ± 0%   -5.84%  (p=0.000 n=10+9)
MatchClass-6                                4.65µs ± 1%     4.28µs ± 1%   -7.96%  (p=0.000 n=10+10)
MatchClass_InRange-6                        4.15µs ± 1%     3.80µs ± 0%   -8.61%  (p=0.000 n=10+8)
ReplaceAll-6                                2.72µs ± 1%     2.60µs ± 1%   -4.68%  (p=0.000 n=10+10)
AnchoredLiteralShortNonMatch-6               158ns ± 1%      153ns ± 1%   -3.03%  (p=0.000 n=10+10)
AnchoredLiteralLongNonMatch-6                176ns ± 1%      176ns ± 0%     ~     (p=1.000 n=10+9)
AnchoredShortMatch-6                         260ns ± 0%      255ns ± 1%   -1.84%  (p=0.000 n=9+10)
AnchoredLongMatch-6                          456ns ± 0%      455ns ± 0%   -0.19%  (p=0.008 n=8+10)
OnePassShortA-6                             1.13µs ± 1%     1.12µs ± 0%   -0.57%  (p=0.046 n=10+8)
NotOnePassShortA-6                          1.14µs ± 1%     1.14µs ± 1%     ~     (p=0.162 n=10+10)
OnePassShortB-6                              908ns ± 0%      893ns ± 0%   -1.60%  (p=0.000 n=8+9)
NotOnePassShortB-6                           857ns ± 0%      803ns ± 1%   -6.34%  (p=0.000 n=8+10)
OnePassLongPrefix-6                          190ns ± 0%      190ns ± 1%     ~     (p=0.059 n=8+10)
OnePassLongNotPrefix-6                       722ns ± 1%      722ns ± 1%     ~     (p=0.451 n=10+10)
MatchParallelShared-6                        810ns ± 2%      807ns ± 2%     ~     (p=0.643 n=10+10)
MatchParallelCopied-6                       72.1ns ± 1%     69.4ns ± 1%   -3.81%  (p=0.000 n=10+10)
QuoteMetaAll-6                               213ns ± 2%      216ns ± 3%     ~     (p=0.284 n=10+10)
QuoteMetaNone-6                             89.7ns ± 1%     89.8ns ± 1%     ~     (p=0.616 n=10+10)
Match/Easy0/32-6                             127ns ± 1%      127ns ± 1%     ~     (p=0.977 n=10+10)
Match/Easy0/1K-6                             566ns ± 0%      566ns ± 0%     ~     (p=1.000 n=8+8)
Match/Easy0/32K-6                           9.30µs ± 1%     9.28µs ± 1%     ~     (p=0.529 n=10+10)
Match/Easy0/1M-6                             460µs ± 1%      460µs ± 1%     ~     (p=0.853 n=10+10)
Match/Easy0/32M-6                           15.0ms ± 0%     15.1ms ± 0%   +0.77%  (p=0.000 n=9+8)
Match/Easy0i/32-6                           2.10µs ± 1%     1.98µs ± 0%   -6.02%  (p=0.000 n=10+8)
Match/Easy0i/1K-6                           61.5µs ± 0%     57.2µs ± 0%   -6.97%  (p=0.000 n=10+9)
Match/Easy0i/32K-6                          2.75ms ± 0%     2.72ms ± 0%   -1.10%  (p=0.000 n=9+9)
Match/Easy0i/1M-6                           88.0ms ± 0%     86.9ms ± 1%   -1.29%  (p=0.000 n=8+10)
Match/Easy0i/32M-6                           2.82s ± 0%      2.77s ± 1%   -1.81%  (p=0.000 n=8+10)
Match/Easy1/32-6                             123ns ± 1%      124ns ± 1%   +0.90%  (p=0.001 n=10+10)
Match/Easy1/1K-6                            1.70µs ± 1%     1.65µs ± 0%   -3.18%  (p=0.000 n=9+10)
Match/Easy1/32K-6                           69.1µs ± 0%     68.4µs ± 1%   -0.95%  (p=0.000 n=8+10)
Match/Easy1/1M-6                            2.46ms ± 1%     2.42ms ± 1%   -1.66%  (p=0.000 n=10+10)
Match/Easy1/32M-6                           78.4ms ± 1%     77.5ms ± 0%   -1.08%  (p=0.000 n=10+9)
Match/Medium/32-6                           2.07µs ± 1%     1.91µs ± 1%   -7.69%  (p=0.000 n=10+10)
Match/Medium/1K-6                           62.8µs ± 0%     58.0µs ± 1%   -7.70%  (p=0.000 n=8+10)
Match/Medium/32K-6                          2.63ms ± 1%     2.58ms ± 1%   -2.14%  (p=0.000 n=10+10)
Match/Medium/1M-6                           84.6ms ± 0%     82.5ms ± 0%   -2.37%  (p=0.000 n=8+9)
Match/Medium/32M-6                           2.71s ± 0%      2.64s ± 0%   -2.46%  (p=0.000 n=10+9)
Match/Hard/32-6                             3.26µs ± 1%     2.98µs ± 1%   -8.49%  (p=0.000 n=10+10)
Match/Hard/1K-6                              100µs ± 0%       90µs ± 1%   -9.55%  (p=0.000 n=9+10)
Match/Hard/32K-6                            3.82ms ± 0%     3.82ms ± 1%     ~     (p=0.515 n=8+10)
Match/Hard/1M-6                              122ms ± 1%      123ms ± 0%   +0.66%  (p=0.000 n=10+8)
Match/Hard/32M-6                             3.89s ± 1%      3.91s ± 1%     ~     (p=0.105 n=10+10)
Match/Hard1/32-6                            18.1µs ± 1%     16.1µs ± 1%  -11.31%  (p=0.000 n=10+10)
Match/Hard1/1K-6                             565µs ± 0%      493µs ± 1%  -12.65%  (p=0.000 n=8+10)
Match/Hard1/32K-6                           18.8ms ± 0%     18.8ms ± 1%     ~     (p=0.905 n=9+10)
Match/Hard1/1M-6                             602ms ± 1%      602ms ± 1%     ~     (p=0.278 n=9+10)
Match/Hard1/32M-6                            19.1s ± 1%      19.2s ± 1%   +0.31%  (p=0.035 n=9+10)
Match_onepass_regex/32-6                    6.32µs ± 1%     6.34µs ± 1%     ~     (p=0.060 n=10+10)
Match_onepass_regex/1K-6                     204µs ± 1%      204µs ± 1%     ~     (p=0.842 n=9+10)
Match_onepass_regex/32K-6                   6.53ms ± 0%     6.55ms ± 1%   +0.36%  (p=0.005 n=10+10)
Match_onepass_regex/1M-6                     209ms ± 0%      208ms ± 1%   -0.65%  (p=0.034 n=8+10)
Match_onepass_regex/32M-6                    6.72s ± 0%      6.68s ± 1%   -0.74%  (p=0.000 n=9+10)
CompileOnepass/^(?:(?:(?:.(?:$))?))...-6    7.02µs ± 1%     7.02µs ± 1%     ~     (p=0.671 n=10+10)
CompileOnepass/^abcd$-6                     5.65µs ± 1%     5.65µs ± 1%     ~     (p=0.411 n=10+9)
CompileOnepass/^(?:(?:a{0,})*?)$-6          7.06µs ± 1%     7.06µs ± 1%     ~     (p=0.912 n=10+10)
CompileOnepass/^(?:(?:a+)*)$-6              6.40µs ± 1%     6.41µs ± 1%     ~     (p=0.699 n=10+10)
CompileOnepass/^(?:(?:a|(?:aa)))$-6         8.18µs ± 2%     8.16µs ± 1%     ~     (p=0.529 n=10+10)
CompileOnepass/^(?:[^\s\S])$-6              5.08µs ± 1%     5.17µs ± 1%   +1.77%  (p=0.000 n=9+10)
CompileOnepass/^(?:(?:(?:a*)+))$-6          6.86µs ± 1%     6.85µs ± 0%     ~     (p=0.190 n=10+9)
CompileOnepass/^[a-c]+$-6                   5.14µs ± 1%     5.11µs ± 0%   -0.53%  (p=0.041 n=10+10)
CompileOnepass/^[a-c]*$-6                   5.62µs ± 1%     5.63µs ± 1%     ~     (p=0.382 n=10+10)
CompileOnepass/^(?:a*)$-6                   5.76µs ± 1%     5.73µs ± 1%   -0.41%  (p=0.008 n=9+10)
CompileOnepass/^(?:(?:aa)|a)$-6             7.89µs ± 1%     7.84µs ± 1%   -0.66%  (p=0.020 n=10+10)
CompileOnepass/^...$-6                      5.38µs ± 1%     5.38µs ± 1%     ~     (p=0.857 n=9+10)
CompileOnepass/^(?:a|(?:aa))$-6             7.80µs ± 2%     7.82µs ± 1%     ~     (p=0.342 n=10+10)
CompileOnepass/^a((b))c$-6                  7.75µs ± 1%     7.78µs ± 1%     ~     (p=0.172 n=10+10)
CompileOnepass/^a.[l-nA-Cg-j]?e$-6          8.39µs ± 1%     8.42µs ± 1%     ~     (p=0.138 n=10+10)
CompileOnepass/^a((b))$-6                   6.92µs ± 1%     6.95µs ± 1%     ~     (p=0.159 n=10+10)
CompileOnepass/^a(?:(b)|(c))c$-6            10.0µs ± 1%     10.0µs ± 1%     ~     (p=0.896 n=10+10)
CompileOnepass/^a(?:b|c)$-6                 5.62µs ± 1%     5.66µs ± 1%   +0.71%  (p=0.023 n=10+10)
CompileOnepass/^a(?:b?|c)$-6                8.49µs ± 1%     8.43µs ± 1%   -0.69%  (p=0.010 n=10+10)
CompileOnepass/^a(?:b?|c+)$-6               9.26µs ± 1%     9.28µs ± 1%     ~     (p=0.448 n=10+10)
CompileOnepass/^a(?:bc)+$-6                 6.52µs ± 1%     6.46µs ± 2%   -1.02%  (p=0.003 n=10+10)
CompileOnepass/^a(?:[bcd])+$-6              6.29µs ± 1%     6.32µs ± 1%     ~     (p=0.256 n=10+10)
CompileOnepass/^a((?:[bcd])+)$-6            7.77µs ± 1%     7.79µs ± 1%     ~     (p=0.105 n=10+10)
CompileOnepass/^a(:?b|c)*d$-6               14.0µs ± 1%     13.9µs ± 1%   -0.69%  (p=0.003 n=10+10)
CompileOnepass/^.bc(d|e)*$-6                8.96µs ± 1%     9.06µs ± 1%   +1.20%  (p=0.000 n=10+9)
CompileOnepass/^loooooooooooooooooo...-6     219µs ± 1%      220µs ± 1%   +0.63%  (p=0.006 n=9+10)
[Geo mean]                                  31.6µs          31.1µs        -1.82%

name                                      old speed      new speed       delta
QuoteMetaAll-6                            65.5MB/s ± 2%   64.8MB/s ± 3%     ~     (p=0.315 n=10+10)
QuoteMetaNone-6                            290MB/s ± 1%    290MB/s ± 1%     ~     (p=0.755 n=10+10)
Match/Easy0/32-6                           250MB/s ± 0%    251MB/s ± 1%     ~     (p=0.277 n=8+9)
Match/Easy0/1K-6                          1.81GB/s ± 0%   1.81GB/s ± 0%     ~     (p=0.408 n=8+10)
Match/Easy0/32K-6                         3.52GB/s ± 1%   3.53GB/s ± 1%     ~     (p=0.529 n=10+10)
Match/Easy0/1M-6                          2.28GB/s ± 1%   2.28GB/s ± 1%     ~     (p=0.853 n=10+10)
Match/Easy0/32M-6                         2.24GB/s ± 0%   2.23GB/s ± 0%   -0.76%  (p=0.000 n=9+8)
Match/Easy0i/32-6                         15.2MB/s ± 1%   16.2MB/s ± 0%   +6.43%  (p=0.000 n=10+9)
Match/Easy0i/1K-6                         16.6MB/s ± 0%   17.9MB/s ± 0%   +7.48%  (p=0.000 n=10+9)
Match/Easy0i/32K-6                        11.9MB/s ± 0%   12.0MB/s ± 0%   +1.11%  (p=0.000 n=9+9)
Match/Easy0i/1M-6                         11.9MB/s ± 0%   12.1MB/s ± 1%   +1.31%  (p=0.000 n=8+10)
Match/Easy0i/32M-6                        11.9MB/s ± 0%   12.1MB/s ± 1%   +1.84%  (p=0.000 n=8+10)
Match/Easy1/32-6                           260MB/s ± 1%    258MB/s ± 1%   -0.91%  (p=0.001 n=10+10)
Match/Easy1/1K-6                           601MB/s ± 1%    621MB/s ± 0%   +3.28%  (p=0.000 n=9+10)
Match/Easy1/32K-6                          474MB/s ± 0%    479MB/s ± 1%   +0.96%  (p=0.000 n=8+10)
Match/Easy1/1M-6                           426MB/s ± 1%    433MB/s ± 1%   +1.68%  (p=0.000 n=10+10)
Match/Easy1/32M-6                          428MB/s ± 1%    433MB/s ± 0%   +1.09%  (p=0.000 n=10+9)
Match/Medium/32-6                         15.4MB/s ± 1%   16.7MB/s ± 1%   +8.23%  (p=0.000 n=10+9)
Match/Medium/1K-6                         16.3MB/s ± 1%   17.7MB/s ± 1%   +8.43%  (p=0.000 n=9+10)
Match/Medium/32K-6                        12.5MB/s ± 1%   12.7MB/s ± 1%   +2.15%  (p=0.000 n=10+10)
Match/Medium/1M-6                         12.4MB/s ± 0%   12.7MB/s ± 0%   +2.44%  (p=0.000 n=8+9)
Match/Medium/32M-6                        12.4MB/s ± 0%   12.7MB/s ± 0%   +2.52%  (p=0.000 n=10+9)
Match/Hard/32-6                           9.82MB/s ± 1%  10.73MB/s ± 1%   +9.29%  (p=0.000 n=10+10)
Match/Hard/1K-6                           10.2MB/s ± 0%   11.3MB/s ± 1%  +10.56%  (p=0.000 n=9+10)
Match/Hard/32K-6                          8.58MB/s ± 0%   8.58MB/s ± 1%     ~     (p=0.554 n=8+10)
Match/Hard/1M-6                           8.59MB/s ± 1%   8.53MB/s ± 0%   -0.70%  (p=0.000 n=10+8)
Match/Hard/32M-6                          8.62MB/s ± 1%   8.59MB/s ± 1%     ~     (p=0.098 n=10+10)
Match/Hard1/32-6                          1.77MB/s ± 1%   1.99MB/s ± 1%  +12.40%  (p=0.000 n=10+8)
Match/Hard1/1K-6                          1.81MB/s ± 1%   2.08MB/s ± 1%  +14.55%  (p=0.000 n=10+10)
Match/Hard1/32K-6                         1.74MB/s ± 0%   1.74MB/s ± 0%     ~     (p=0.108 n=9+10)
Match/Hard1/1M-6                          1.74MB/s ± 0%   1.74MB/s ± 1%     ~     (p=1.000 n=9+10)
Match/Hard1/32M-6                         1.75MB/s ± 0%   1.75MB/s ± 1%     ~     (p=0.157 n=9+10)
Match_onepass_regex/32-6                  5.05MB/s ± 0%   5.05MB/s ± 1%     ~     (p=0.262 n=8+10)
Match_onepass_regex/1K-6                  5.02MB/s ± 1%   5.02MB/s ± 1%     ~     (p=0.677 n=9+10)
Match_onepass_regex/32K-6                 5.02MB/s ± 0%   4.99MB/s ± 0%   -0.47%  (p=0.000 n=10+9)
Match_onepass_regex/1M-6                  5.01MB/s ± 0%   5.04MB/s ± 1%   +0.68%  (p=0.017 n=8+10)
Match_onepass_regex/32M-6                 4.99MB/s ± 0%   5.03MB/s ± 1%   +0.74%  (p=0.000 n=10+10)
[Geo mean]                                29.1MB/s        29.8MB/s        +2.44%

go1 data for reference

name                     old time/op    new time/op    delta
BinaryTree17-6              4.39s ± 1%     4.37s ± 0%   -0.58%  (p=0.006 n=9+9)
Fannkuch11-6                5.13s ± 0%     5.18s ± 0%   +0.87%  (p=0.000 n=8+8)
FmtFprintfEmpty-6          74.2ns ± 0%    71.7ns ± 3%   -3.41%  (p=0.000 n=10+10)
FmtFprintfString-6          120ns ± 1%     122ns ± 2%     ~     (p=0.333 n=10+10)
FmtFprintfInt-6             127ns ± 1%     127ns ± 1%     ~     (p=0.809 n=10+10)
FmtFprintfIntInt-6          186ns ± 0%     188ns ± 1%   +1.02%  (p=0.002 n=8+10)
FmtFprintfPrefixedInt-6     223ns ± 1%     222ns ± 2%     ~     (p=0.421 n=10+10)
FmtFprintfFloat-6           374ns ± 0%     376ns ± 1%   +0.43%  (p=0.030 n=8+10)
FmtManyArgs-6               795ns ± 0%     788ns ± 1%   -0.79%  (p=0.000 n=8+9)
GobDecode-6                10.9ms ± 1%    10.9ms ± 0%     ~     (p=0.079 n=10+9)
GobEncode-6                8.60ms ± 1%    8.56ms ± 0%   -0.52%  (p=0.004 n=10+10)
Gzip-6                      378ms ± 1%     386ms ± 1%   +2.28%  (p=0.000 n=10+10)
Gunzip-6                   63.7ms ± 0%    62.3ms ± 0%   -2.22%  (p=0.000 n=9+8)
HTTPClientServer-6          120µs ± 3%     114µs ± 3%   -4.99%  (p=0.000 n=10+10)
JSONEncode-6               20.3ms ± 1%    19.9ms ± 0%   -1.90%  (p=0.000 n=9+10)
JSONDecode-6               84.3ms ± 0%    83.7ms ± 0%   -0.76%  (p=0.000 n=8+8)
Mandelbrot200-6            6.91ms ± 0%    6.89ms ± 0%   -0.31%  (p=0.000 n=9+8)
GoParse-6                  5.49ms ± 0%    5.47ms ± 1%     ~     (p=0.101 n=8+10)
RegexpMatchEasy0_32-6       130ns ± 0%     128ns ± 0%   -1.54%  (p=0.002 n=8+10)
RegexpMatchEasy0_1K-6       322ns ± 1%     322ns ± 0%     ~     (p=0.525 n=10+9)
RegexpMatchEasy1_32-6       124ns ± 0%     124ns ± 0%   -0.32%  (p=0.046 n=8+10)
RegexpMatchEasy1_1K-6       570ns ± 0%     548ns ± 1%   -3.76%  (p=0.000 n=10+10)
RegexpMatchMedium_32-6      196ns ± 0%     183ns ± 1%   -6.61%  (p=0.000 n=8+10)
RegexpMatchMedium_1K-6     64.3µs ± 0%    59.0µs ± 1%   -8.31%  (p=0.000 n=8+10)
RegexpMatchHard_32-6       3.08µs ± 0%    2.80µs ± 0%   -8.96%  (p=0.000 n=8+9)
RegexpMatchHard_1K-6       93.0µs ± 0%    84.5µs ± 1%   -9.17%  (p=0.000 n=8+9)
Revcomp-6                   647ms ± 2%     646ms ± 1%     ~     (p=0.720 n=10+9)
Template-6                 92.3ms ± 0%    91.7ms ± 0%   -0.65%  (p=0.000 n=8+8)
TimeParse-6                 490ns ± 0%     488ns ± 0%   -0.43%  (p=0.000 n=10+10)
TimeFormat-6                513ns ± 0%     513ns ± 1%     ~     (p=0.144 n=9+10)
[Geo mean]                 79.1µs         77.7µs        -1.73%

name                     old speed      new speed      delta
GobDecode-6              70.1MB/s ± 1%  70.3MB/s ± 0%     ~     (p=0.078 n=10+9)
GobEncode-6              89.2MB/s ± 1%  89.7MB/s ± 0%   +0.52%  (p=0.004 n=10+10)
Gzip-6                   51.4MB/s ± 1%  50.2MB/s ± 1%   -2.23%  (p=0.000 n=10+10)
Gunzip-6                  304MB/s ± 0%   311MB/s ± 0%   +2.27%  (p=0.000 n=9+8)
JSONEncode-6             95.8MB/s ± 1%  97.7MB/s ± 0%   +1.93%  (p=0.000 n=9+10)
JSONDecode-6             23.0MB/s ± 0%  23.2MB/s ± 0%   +0.76%  (p=0.000 n=8+8)
GoParse-6                10.6MB/s ± 0%  10.6MB/s ± 1%     ~     (p=0.111 n=8+10)
RegexpMatchEasy0_32-6     244MB/s ± 0%   249MB/s ± 0%   +2.06%  (p=0.000 n=9+10)
RegexpMatchEasy0_1K-6    3.18GB/s ± 1%  3.17GB/s ± 0%     ~     (p=0.211 n=10+9)
RegexpMatchEasy1_32-6     257MB/s ± 0%   258MB/s ± 0%   +0.37%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-6    1.80GB/s ± 0%  1.87GB/s ± 1%   +3.91%  (p=0.000 n=10+10)
RegexpMatchMedium_32-6   5.08MB/s ± 0%  5.43MB/s ± 1%   +7.03%  (p=0.000 n=8+10)
RegexpMatchMedium_1K-6   15.9MB/s ± 0%  17.4MB/s ± 1%   +9.08%  (p=0.000 n=8+10)
RegexpMatchHard_32-6     10.4MB/s ± 0%  11.4MB/s ± 0%   +9.82%  (p=0.000 n=8+9)
RegexpMatchHard_1K-6     11.0MB/s ± 0%  12.1MB/s ± 1%  +10.10%  (p=0.000 n=8+9)
Revcomp-6                 393MB/s ± 2%   394MB/s ± 1%     ~     (p=0.720 n=10+9)
Template-6               21.0MB/s ± 0%  21.2MB/s ± 0%   +0.66%  (p=0.000 n=8+8)
[Geo mean]               74.2MB/s       76.2MB/s        +2.70%

Updates #21851

Change-Id: Ie88455db925f422a828f8528293790726a9c036b
Reviewed-on: https://go-review.googlesource.com/65491
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 19:34:26 +00:00
Daniel Martí
6945c67e10 cmd/compile: merge bytes inline test with the rest
In golang.org/cl/42813, a test was added in the bytes package to check
if a Buffer method was being inlined, using 'go tool nm'.

Now that we have a compiler test that verifies that certain funcs are
inlineable, merge it there. Knowing whether the funcs are inlineable is
also more reliable than whether or not their symbol appears in the
binary, too. For example, under some circumstances, inlineable funcs
can't be inlined, such as if closures are used.

While at it, add a few more bytes.Buffer methods that are currently
inlined and should clearly stay that way.

Updates #21851.

Change-Id: I62066e32ef5542d37908bd64f90bda51276da4de
Reviewed-on: https://go-review.googlesource.com/65658
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 06:52:31 +00:00
Daniel Martí
6936671ed1 cmd/compile: clarify adjustctxt inlining comment
The reason why adjustctxt wasn't being inlined was reported as:

	function too complex: cost 92 exceeds budget 80

However, after tweaking the code to be under the budget limit, we see
the real blocker:

	non-leaf function

There is little we can do about this one in particular at the moment.
Create a section with funcs that will need mid-stack inlining to be
inlineable, since this will likely come up again in other cases.

Change-Id: I3a8eb1546b289a060ac896506a007b0496946e84
Reviewed-on: https://go-review.googlesource.com/65650
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-23 20:24:10 +00:00
Daniel Martí
f366379d84 cmd/compile: add more runtime funcs to inline test
This is based from a list that Keith Randall provided in mid-2016. These
are all funcs that, at the time, were important and small enough that
they should be clearly inlined.

The runtime has changed a bit since then. Ctz16 and Ctz8 were removed,
so don't add them. stringtoslicebytetmp was moved to the backend, so
it's no longer a Go function. And itabhash was moved to itabHashFunc.

The only other outlier is adjustctxt, which is not inlineable at the
moment. I've added a TODO and will address it myself in a separate
commit.

While at it, error if any funcs in the input table are duplicated.
They're never useful and typos could lead to unintentionally thinking a
function is inlineable when it actually isn't.

And, since the lists are getting long, start sorting alphabetically.

Finally, rotl_31 is only defined on 64-bit architectures, and the added
runtime/internal/sys funcs are assembly on 386 and thus non-inlineable
in that case.

Updates #21851.

Change-Id: Ib99ab53d777860270e8fd4aefc41adb448f13662
Reviewed-on: https://go-review.googlesource.com/65351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-22 17:19:01 +00:00
Daniel Martí
3d741349f5 cmd/compile: collect reasons in inlining test
If we use -gcflags='-m -m', the compiler should give us a reason why a
func couldn't be inlined. Add the extra -m necessary for that extra info
and use it to give better test failures. For example, for the func in
the TODO:

	--- FAIL: TestIntendedInlining (1.53s)
		inl_test.go:104: runtime.nextFreeFast was not inlined: function too complex

We might increase the number of -m flags to get more information at some
later point, such as getting details on how close the func was to the
inlining budget.

Also started using regexes, as the output parsing is getting a bit too
complex for manual string handling.

While at it, also refactored the test to not buffer the entire output
into memory. This is fine in practice, but it won't scale well as we add
more packages or we depend more on the compiler's debugging output.

For example, "go build -a -gcflags='-m -m' std" prints nearly 40MB of
plaintext - and we only need to see the output line by line anyway.

Updates #21851.

Change-Id: I00986ff360eb56e4e9737b65a6be749ef8540643
Reviewed-on: https://go-review.googlesource.com/63810
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-22 10:59:14 +00:00
Ilya Tocar
101fbc2c82 runtime: make nextFreeFast inlinable
https://golang.org/cl/22598 made nextFreeFast inlinable.
But during https://golang.org/cl/63611 it was discovered, that it is no longer inlinable.
Reduce number of statements below inlining threshold to make it inlinable again.
Also update tests, to prevent regressions.
Doesn't reduce readability.

Change-Id: Ia672784dd48ed3b1ab46e390132f1094fe453de5
Reviewed-on: https://go-review.googlesource.com/65030
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2017-09-20 20:27:13 +00:00
Daniel Martí
f351dbfa4d cmd/compile: expand inlining test to multiple pkgs
Rework the test to work with any number of std packages. This was done
to include a few funcs from unicode/utf8. Adding more will be much
simpler too.

While at it, add more runtime funcs by searching for "inlined" or
"inlining" in the git log of its directory. These are: addb, subtractb,
fastrand and noescape.

Updates #21851.

Change-Id: I4fb2bd8aa6a5054218f9b36cb19d897ac533710e
Reviewed-on: https://go-review.googlesource.com/63611
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-14 04:49:58 +00:00
Daniel Martí
0d8a3b208c cmd/compile: add TestIntendedInlining from runtime
Move it from the runtime package, as we will soon add more packages and
functions for it to check.

The test used the testEnv func, which cleaned certain environment
variables from a command, so it was moved to internal/testenv under a
more descriptive (and less ambiguous) name. Add a simple godoc to it
too.

For #21851.

Change-Id: I6f39c1f23b45377718355fafe66ffd87047d8ab6
Reviewed-on: https://go-review.googlesource.com/63550
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-09-13 18:10:31 +00:00