Commit graph

24 commits

Author SHA1 Message Date
Leon Klingele
d090429ea9 all: fix typos as reported by 'misspell'
Change-Id: I904b8655f21743189814bccf24073b6fbb9fc56d
GitHub-Last-Rev: b032c14394
GitHub-Pull-Request: golang/go#29997
Reviewed-on: https://go-review.googlesource.com/c/160421
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-02-26 23:02:05 +00:00
Daniel Martí
b397248168 cmd/compile: add Buffer.Grow to TestIntendedInlining
golang.org/cl/151977 slightly decreased the cost of inlining an extra
call from 60 to 57, since it was a safe change that could help in some
scenarios.

One notable change spotted in that CL is that bytes.Buffer.Grow is now
inlinable, meaning that a fixedbugs test needed updating.

For consistency, add the test case to TestIntendedInlining too,
alongside other commonly used bytes.Buffer methods.

Change-Id: I4fb402fc684ef4c543fc65aea343ca1a4d73a189
Reviewed-on: https://go-review.googlesource.com/c/151979
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-01 17:31:02 +00:00
Austin Clements
9098d1d854 runtime: debug code to catch bad gcWork.puts
This adds a debug check to throw immediately if any pointers are added
to the gcWork buffer after the mark completion barrier. The intent is
to catch the source of the cached GC work that occasionally produces
"P has cached GC work at end of mark termination" failures.

The result should be that we get "throwOnGCWork" throws instead of "P
has cached GC work at end of mark termination" throws, but with useful
stack traces.

This should be reverted before the release. I've been unable to
reproduce this issue locally, but this issue appears fairly regularly
on the builders, so the intent is to catch it on the builders.

This probably slows down the GC slightly.

For #27993.

Change-Id: I5035e14058ad313bfbd3d68c41ec05179147a85c
Reviewed-on: https://go-review.googlesource.com/c/149969
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-21 16:28:17 +00:00
Josh Bleecher Snyder
7d6b5e340c runtime: reduce linear search through pcvalue cache
This change introduces two optimizations together,
one for recursive and one for non-recursive stacks.

For recursive stacks, we introduce the new entry
at the beginning of the cache, so it can be found first.
This adds an extra read and write.
While we're here, switch from fastrandn, which does a multiply,
to fastrand % n, which does a shift.

For non-recursive stacks, split the cache from [16]pcvalueCacheEnt
into [2][8]pcvalueCacheEnt, and add a very cheap associative lookup.

name                old time/op  new time/op  delta
StackCopyPtr-8       118ms ± 1%   106ms ± 2%  -9.56%  (p=0.000 n=17+18)
StackCopy-8         95.8ms ± 1%  87.0ms ± 3%  -9.11%  (p=0.000 n=19+20)
StackCopyNoCache-8   135ms ± 2%   139ms ± 1%  +3.06%  (p=0.000 n=19+18)

During make.bash, the association function used has this return distribution:

percent count  return value
 53.23% 678797 1
 46.74% 596094 0

It is definitely not perfect, but it is pretty good,
and that's all we need.

Change-Id: I2cabb1d26b99c5111bc28f427016a2a5e6c620fd
Reviewed-on: https://go-review.googlesource.com/c/110564
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-11-09 16:06:56 +00:00
Martin Möhrmann
642792350c runtime: remove unused maxSliceCap function and maxElems array
All uses of these have been converted to use runtime/internal/math
functions for overflow checking.

Fixes #21588

Change-Id: I0ba57028e471803dc7d445e66d77a8f87edfdafb
Reviewed-on: https://go-review.googlesource.com/c/144037
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-23 16:58:43 +00:00
Ilya Tocar
fa913a36a2 cmd/compile/internal/gc: inline autogenerated (*T).M wrappers
Currently all inlining of autogenerated wrappers is disabled,
because it causes build failures, when indexed export format is enabled.
Turns out we can reenable it for common case of (*T).M wrappers.
This fixes most performance degradation of 1.11 vs 1.10.

encoding/binary:
name                    old time/op   new time/op   delta
ReadSlice1000Int32s-6    14.8µs ± 2%   11.5µs ± 2%  -22.01%  (p=0.000 n=10+10)
WriteSlice1000Int32s-6   14.8µs ± 2%   11.7µs ± 2%  -20.95%  (p=0.000 n=10+10)

bufio:
name           old time/op    new time/op    delta
WriterFlush-6    32.4ns ± 1%    28.8ns ± 0%  -11.17%  (p=0.000 n=9+10)

sort:
SearchWrappers-6       231ns ± 1%   231ns ± 0%     ~     (p=0.129 n=9+10)
SortString1K-6         365µs ± 1%   298µs ± 1%  -18.43%  (p=0.000 n=9+10)
SortString1K_Slice-6   274µs ± 2%   276µs ± 1%     ~     (p=0.105 n=10+10)
StableString1K-6       490µs ± 1%   373µs ± 1%  -23.73%  (p=0.000 n=10+10)
SortInt1K-6            210µs ± 1%   142µs ± 1%  -32.69%  (p=0.000 n=10+10)
StableInt1K-6          243µs ± 0%   151µs ± 1%  -37.75%  (p=0.000 n=10+10)
StableInt1K_Slice-6    130µs ± 1%   130µs ± 0%     ~     (p=0.237 n=10+8)
SortInt64K-6          19.9ms ± 1%  13.5ms ± 1%  -32.32%  (p=0.000 n=10+10)
SortInt64K_Slice-6    11.5ms ± 1%  11.5ms ± 1%     ~     (p=0.912 n=10+10)
StableInt64K-6        21.5ms ± 0%  13.5ms ± 1%  -37.30%  (p=0.000 n=9+10)
Sort1e2-6              108µs ± 2%    83µs ± 3%  -23.26%  (p=0.000 n=10+10)
Stable1e2-6            218µs ± 0%   161µs ± 1%  -25.99%  (p=0.000 n=8+9)
Sort1e4-6             22.6ms ± 1%  16.8ms ± 0%  -25.45%  (p=0.000 n=10+7)
Stable1e4-6           67.6ms ± 1%  49.7ms ± 0%  -26.48%  (p=0.000 n=10+10)
Sort1e6-6              3.44s ± 0%   2.55s ± 1%  -26.05%  (p=0.000 n=8+9)
Stable1e6-6            13.7s ± 0%    9.9s ± 1%  -27.68%  (p=0.000 n=8+10)

Fixes #27621
Updates #25338

Change-Id: I6fe633202f63fa829a6ab849c44d7e45f8835dff
Reviewed-on: https://go-review.googlesource.com/c/135697
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-10-16 22:02:06 +00:00
Martin Möhrmann
c9130cae9a runtime/internal/math: add multiplication with overflow check
This CL adds a new internal math package for use by the runtime.
The new package exports a MulUintptr function with uintptr arguments
a and b and returns uintptr(a*b) and whether the full-width product
x*y does overflow the uintptr value range (uintptr(x*y) != x*y).

Uses of MulUinptr in the runtime and intrinsics for performance
will be added in followup CLs.

Updates #21588

Change-Id: Ia5a02eeabc955249118e4edf68c67d9fc0858058
Reviewed-on: https://go-review.googlesource.com/c/91755
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-15 17:58:06 +00:00
Josh Bleecher Snyder
31cfa7f2f2 runtime: allow inlining of stackmapdata
Also do very minor code cleanup.

name                old time/op  new time/op  delta
StackCopyPtr-8      84.8ms ± 6%  82.9ms ± 5%  -2.19%  (p=0.000 n=95+94)
StackCopy-8         68.4ms ± 5%  65.3ms ± 4%  -4.54%  (p=0.000 n=99+99)
StackCopyNoCache-8   107ms ± 2%   105ms ± 2%  -2.13%  (p=0.000 n=91+95)

Change-Id: I2d85ede48bffada9584d437a08a82212c0da6d00
Reviewed-on: https://go-review.googlesource.com/109001
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 19:14:17 +00:00
Matthew Dempsky
3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
Josh Bleecher Snyder
13cd006139 runtime: add fast version of getArgInfo
getArgInfo is called a lot during stack copying.
In the common case it doesn't do much work,
but it cannot be inlined.

This change works around that.

name                old time/op  new time/op  delta
StackCopyPtr-8       108ms ± 5%    96ms ± 4%  -10.40%  (p=0.000 n=20+20)
StackCopy-8         82.6ms ± 3%  78.4ms ± 6%   -5.15%  (p=0.000 n=19+20)
StackCopyNoCache-8   130ms ± 3%   122ms ± 3%   -6.44%  (p=0.000 n=20+20)

Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
Reviewed-on: https://go-review.googlesource.com/108945
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-04-29 03:33:09 +00:00
isharipo
d2a5263a9c math/big: speedup nat.setBytes for bigger slices
Set up to _S (number of bytes in Uint) bytes at time
by using BigEndian.Uint32 and BigEndian.Uint64.

The performance improves for slices bigger than _S bytes.
This is the case for 128/256bit arith that initializes
it's objects from bytes.

name               old time/op  new time/op  delta
NatSetBytes/8-4    29.8ns ± 1%  11.4ns ± 0%  -61.63%  (p=0.000 n=9+8)
NatSetBytes/24-4    109ns ± 1%    56ns ± 0%  -48.75%  (p=0.000 n=9+8)
NatSetBytes/128-4   420ns ± 2%   110ns ± 1%  -73.83%  (p=0.000 n=10+10)
NatSetBytes/7-4    26.2ns ± 1%  21.3ns ± 2%  -18.63%  (p=0.000 n=8+9)
NatSetBytes/23-4    106ns ± 1%    67ns ± 1%  -36.93%  (p=0.000 n=9+10)
NatSetBytes/127-4   410ns ± 2%   121ns ± 0%  -70.46%  (p=0.000 n=9+8)

Found this optimization opportunity by looking at ethereum_corevm
community benchmark cpuprofile.

name        old time/op  new time/op  delta
OpDiv256-4   715ns ± 1%   596ns ± 1%  -16.57%  (p=0.008 n=5+5)
OpDiv128-4   373ns ± 1%   314ns ± 1%  -15.83%  (p=0.008 n=5+5)
OpDiv64-4    301ns ± 0%   285ns ± 1%   -5.12%  (p=0.008 n=5+5)

Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57
Reviewed-on: https://go-review.googlesource.com/98775
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-08 18:50:10 +00:00
Austin Clements
3e214e5693 runtime: simplify bulkBarrierPreWrite
Currently, bulkBarrierPreWrite uses inheap to decide whether the
destination is in the heap or whether to check for stack or global
data. However, this isn't the best question to ask.

Instead, get the span directly and query its state. This lets us
directly determine whether this might be a global, or is stack memory,
or is heap memory.

At this point, inheap is no longer used in the hot path, so drop it
from the must-be-inlined list and substitute spanOf.

This will help in a circuitous way with #23862, since fixing that is
going to push inheap very slightly over the inline-able threshold on a
few platforms.

Change-Id: I5360fc1181183598502409f12979899e1e4d45f7
Reviewed-on: https://go-review.googlesource.com/95495
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:33 +00:00
Russ Cox
5993251c01 cmd/go: implement per-package asmflags, gcflags, ldflags, gccgoflags
It has always been problematic that there was no way to specify
tool flags that applied only to the build of certain packages;
it was only to specify flags for all packages being built.
The usual workaround was to install all dependencies of something,
then build just that one thing with different flags. Since the
dependencies appeared to be up-to-date, they were not rebuilt
with the different flags. The new content-based staleness
(up-to-date) checks see through this trick, because they detect
changes in flags. This forces us to address the underlying problem
of providing a way to specify per-package flags.

The solution is to allow -gcflags=pattern=flags, which means
that flags apply to packages matching pattern, in addition to the
usual -gcflags=flags, which is now redefined to apply only to
the packages named on the command line.

See #22527 for discussion and rationale.

Fixes #22527.

Change-Id: I6716bed69edc324767f707b5bbf3aaa90e8e7302
Reviewed-on: https://go-review.googlesource.com/76551
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2017-11-09 15:04:04 +00:00
Cherry Zhang
5c18a3ca70 cmd/compile: skip runtime.nextFreeFast inlining test on MIPS64x
Since inlining budget calculation is fixed in CL 70151
runtime.nextFreeFast is no longer inlineable on MIPS64x because
it does not support Ctz64 as intrinsic. Skip the test.

Updates #22239.

Change-Id: Id00d55628ddb4b48d27aebfa10377a896765d569
Reviewed-on: https://go-review.googlesource.com/72271
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-20 20:20:15 +00:00
Daniel Martí
a2993456bb cmd/compile: add reflect to TestIntendedInlining
Add the package to the table and start it off with a few small, basic
functions. Inspired by CL 66331, which added flag.ro.

Updates #21851.

Change-Id: I3995cde1ff7bb09a718110473bed8b193c2232a5
Reviewed-on: https://go-review.googlesource.com/66990
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-28 21:09:46 +00:00
Daniel Martí
4d3d333468 cmd/compile: add runtime GC funcs to inlining test
This is based on a list that Austin Clements provided in mid-2016. It is
mostly untouched, except for the fact that the wbufptr funcs were
removed from the runtime thus removed from the lits here too.

Add a section for these GC funcs, since there are quite a lot of them
and the runtime has tons of funcs that we want to inline. As before,
sort this section too.

Also place some of these funcs out of the GC section, as they are not
directly related to the GC.

Updates #21851.

Change-Id: I35eb777a4c50b5f655618920dc2bc568c7c30ff5
Reviewed-on: https://go-review.googlesource.com/65654
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-25 21:55:06 +00:00
Ilya Tocar
1ae67965e4 regexp: make (*bitState).push inlinable
By refactoring job.arg from int with 0/1 as the only valid values into bool
and simplifying (*bitState).push, we reduce the number of nodes below the inlining threshold.
This improves backtracking regexp performance by 5-10% and go1 geomean  by 1.7%
Full performance data below:

name                                      old time/op    new time/op     delta
Find-6                                       510ns ± 0%      480ns ± 1%   -5.90%  (p=0.000 n=10+10)
FindString-6                                 504ns ± 1%      479ns ± 1%   -5.10%  (p=0.000 n=10+10)
FindSubmatch-6                               689ns ± 1%      659ns ± 1%   -4.27%  (p=0.000 n=9+10)
FindStringSubmatch-6                         659ns ± 0%      628ns ± 1%   -4.69%  (p=0.000 n=8+10)
Literal-6                                    174ns ± 1%      171ns ± 1%   -1.50%  (p=0.000 n=10+10)
NotLiteral-6                                2.89µs ± 1%     2.72µs ± 0%   -5.84%  (p=0.000 n=10+9)
MatchClass-6                                4.65µs ± 1%     4.28µs ± 1%   -7.96%  (p=0.000 n=10+10)
MatchClass_InRange-6                        4.15µs ± 1%     3.80µs ± 0%   -8.61%  (p=0.000 n=10+8)
ReplaceAll-6                                2.72µs ± 1%     2.60µs ± 1%   -4.68%  (p=0.000 n=10+10)
AnchoredLiteralShortNonMatch-6               158ns ± 1%      153ns ± 1%   -3.03%  (p=0.000 n=10+10)
AnchoredLiteralLongNonMatch-6                176ns ± 1%      176ns ± 0%     ~     (p=1.000 n=10+9)
AnchoredShortMatch-6                         260ns ± 0%      255ns ± 1%   -1.84%  (p=0.000 n=9+10)
AnchoredLongMatch-6                          456ns ± 0%      455ns ± 0%   -0.19%  (p=0.008 n=8+10)
OnePassShortA-6                             1.13µs ± 1%     1.12µs ± 0%   -0.57%  (p=0.046 n=10+8)
NotOnePassShortA-6                          1.14µs ± 1%     1.14µs ± 1%     ~     (p=0.162 n=10+10)
OnePassShortB-6                              908ns ± 0%      893ns ± 0%   -1.60%  (p=0.000 n=8+9)
NotOnePassShortB-6                           857ns ± 0%      803ns ± 1%   -6.34%  (p=0.000 n=8+10)
OnePassLongPrefix-6                          190ns ± 0%      190ns ± 1%     ~     (p=0.059 n=8+10)
OnePassLongNotPrefix-6                       722ns ± 1%      722ns ± 1%     ~     (p=0.451 n=10+10)
MatchParallelShared-6                        810ns ± 2%      807ns ± 2%     ~     (p=0.643 n=10+10)
MatchParallelCopied-6                       72.1ns ± 1%     69.4ns ± 1%   -3.81%  (p=0.000 n=10+10)
QuoteMetaAll-6                               213ns ± 2%      216ns ± 3%     ~     (p=0.284 n=10+10)
QuoteMetaNone-6                             89.7ns ± 1%     89.8ns ± 1%     ~     (p=0.616 n=10+10)
Match/Easy0/32-6                             127ns ± 1%      127ns ± 1%     ~     (p=0.977 n=10+10)
Match/Easy0/1K-6                             566ns ± 0%      566ns ± 0%     ~     (p=1.000 n=8+8)
Match/Easy0/32K-6                           9.30µs ± 1%     9.28µs ± 1%     ~     (p=0.529 n=10+10)
Match/Easy0/1M-6                             460µs ± 1%      460µs ± 1%     ~     (p=0.853 n=10+10)
Match/Easy0/32M-6                           15.0ms ± 0%     15.1ms ± 0%   +0.77%  (p=0.000 n=9+8)
Match/Easy0i/32-6                           2.10µs ± 1%     1.98µs ± 0%   -6.02%  (p=0.000 n=10+8)
Match/Easy0i/1K-6                           61.5µs ± 0%     57.2µs ± 0%   -6.97%  (p=0.000 n=10+9)
Match/Easy0i/32K-6                          2.75ms ± 0%     2.72ms ± 0%   -1.10%  (p=0.000 n=9+9)
Match/Easy0i/1M-6                           88.0ms ± 0%     86.9ms ± 1%   -1.29%  (p=0.000 n=8+10)
Match/Easy0i/32M-6                           2.82s ± 0%      2.77s ± 1%   -1.81%  (p=0.000 n=8+10)
Match/Easy1/32-6                             123ns ± 1%      124ns ± 1%   +0.90%  (p=0.001 n=10+10)
Match/Easy1/1K-6                            1.70µs ± 1%     1.65µs ± 0%   -3.18%  (p=0.000 n=9+10)
Match/Easy1/32K-6                           69.1µs ± 0%     68.4µs ± 1%   -0.95%  (p=0.000 n=8+10)
Match/Easy1/1M-6                            2.46ms ± 1%     2.42ms ± 1%   -1.66%  (p=0.000 n=10+10)
Match/Easy1/32M-6                           78.4ms ± 1%     77.5ms ± 0%   -1.08%  (p=0.000 n=10+9)
Match/Medium/32-6                           2.07µs ± 1%     1.91µs ± 1%   -7.69%  (p=0.000 n=10+10)
Match/Medium/1K-6                           62.8µs ± 0%     58.0µs ± 1%   -7.70%  (p=0.000 n=8+10)
Match/Medium/32K-6                          2.63ms ± 1%     2.58ms ± 1%   -2.14%  (p=0.000 n=10+10)
Match/Medium/1M-6                           84.6ms ± 0%     82.5ms ± 0%   -2.37%  (p=0.000 n=8+9)
Match/Medium/32M-6                           2.71s ± 0%      2.64s ± 0%   -2.46%  (p=0.000 n=10+9)
Match/Hard/32-6                             3.26µs ± 1%     2.98µs ± 1%   -8.49%  (p=0.000 n=10+10)
Match/Hard/1K-6                              100µs ± 0%       90µs ± 1%   -9.55%  (p=0.000 n=9+10)
Match/Hard/32K-6                            3.82ms ± 0%     3.82ms ± 1%     ~     (p=0.515 n=8+10)
Match/Hard/1M-6                              122ms ± 1%      123ms ± 0%   +0.66%  (p=0.000 n=10+8)
Match/Hard/32M-6                             3.89s ± 1%      3.91s ± 1%     ~     (p=0.105 n=10+10)
Match/Hard1/32-6                            18.1µs ± 1%     16.1µs ± 1%  -11.31%  (p=0.000 n=10+10)
Match/Hard1/1K-6                             565µs ± 0%      493µs ± 1%  -12.65%  (p=0.000 n=8+10)
Match/Hard1/32K-6                           18.8ms ± 0%     18.8ms ± 1%     ~     (p=0.905 n=9+10)
Match/Hard1/1M-6                             602ms ± 1%      602ms ± 1%     ~     (p=0.278 n=9+10)
Match/Hard1/32M-6                            19.1s ± 1%      19.2s ± 1%   +0.31%  (p=0.035 n=9+10)
Match_onepass_regex/32-6                    6.32µs ± 1%     6.34µs ± 1%     ~     (p=0.060 n=10+10)
Match_onepass_regex/1K-6                     204µs ± 1%      204µs ± 1%     ~     (p=0.842 n=9+10)
Match_onepass_regex/32K-6                   6.53ms ± 0%     6.55ms ± 1%   +0.36%  (p=0.005 n=10+10)
Match_onepass_regex/1M-6                     209ms ± 0%      208ms ± 1%   -0.65%  (p=0.034 n=8+10)
Match_onepass_regex/32M-6                    6.72s ± 0%      6.68s ± 1%   -0.74%  (p=0.000 n=9+10)
CompileOnepass/^(?:(?:(?:.(?:$))?))...-6    7.02µs ± 1%     7.02µs ± 1%     ~     (p=0.671 n=10+10)
CompileOnepass/^abcd$-6                     5.65µs ± 1%     5.65µs ± 1%     ~     (p=0.411 n=10+9)
CompileOnepass/^(?:(?:a{0,})*?)$-6          7.06µs ± 1%     7.06µs ± 1%     ~     (p=0.912 n=10+10)
CompileOnepass/^(?:(?:a+)*)$-6              6.40µs ± 1%     6.41µs ± 1%     ~     (p=0.699 n=10+10)
CompileOnepass/^(?:(?:a|(?:aa)))$-6         8.18µs ± 2%     8.16µs ± 1%     ~     (p=0.529 n=10+10)
CompileOnepass/^(?:[^\s\S])$-6              5.08µs ± 1%     5.17µs ± 1%   +1.77%  (p=0.000 n=9+10)
CompileOnepass/^(?:(?:(?:a*)+))$-6          6.86µs ± 1%     6.85µs ± 0%     ~     (p=0.190 n=10+9)
CompileOnepass/^[a-c]+$-6                   5.14µs ± 1%     5.11µs ± 0%   -0.53%  (p=0.041 n=10+10)
CompileOnepass/^[a-c]*$-6                   5.62µs ± 1%     5.63µs ± 1%     ~     (p=0.382 n=10+10)
CompileOnepass/^(?:a*)$-6                   5.76µs ± 1%     5.73µs ± 1%   -0.41%  (p=0.008 n=9+10)
CompileOnepass/^(?:(?:aa)|a)$-6             7.89µs ± 1%     7.84µs ± 1%   -0.66%  (p=0.020 n=10+10)
CompileOnepass/^...$-6                      5.38µs ± 1%     5.38µs ± 1%     ~     (p=0.857 n=9+10)
CompileOnepass/^(?:a|(?:aa))$-6             7.80µs ± 2%     7.82µs ± 1%     ~     (p=0.342 n=10+10)
CompileOnepass/^a((b))c$-6                  7.75µs ± 1%     7.78µs ± 1%     ~     (p=0.172 n=10+10)
CompileOnepass/^a.[l-nA-Cg-j]?e$-6          8.39µs ± 1%     8.42µs ± 1%     ~     (p=0.138 n=10+10)
CompileOnepass/^a((b))$-6                   6.92µs ± 1%     6.95µs ± 1%     ~     (p=0.159 n=10+10)
CompileOnepass/^a(?:(b)|(c))c$-6            10.0µs ± 1%     10.0µs ± 1%     ~     (p=0.896 n=10+10)
CompileOnepass/^a(?:b|c)$-6                 5.62µs ± 1%     5.66µs ± 1%   +0.71%  (p=0.023 n=10+10)
CompileOnepass/^a(?:b?|c)$-6                8.49µs ± 1%     8.43µs ± 1%   -0.69%  (p=0.010 n=10+10)
CompileOnepass/^a(?:b?|c+)$-6               9.26µs ± 1%     9.28µs ± 1%     ~     (p=0.448 n=10+10)
CompileOnepass/^a(?:bc)+$-6                 6.52µs ± 1%     6.46µs ± 2%   -1.02%  (p=0.003 n=10+10)
CompileOnepass/^a(?:[bcd])+$-6              6.29µs ± 1%     6.32µs ± 1%     ~     (p=0.256 n=10+10)
CompileOnepass/^a((?:[bcd])+)$-6            7.77µs ± 1%     7.79µs ± 1%     ~     (p=0.105 n=10+10)
CompileOnepass/^a(:?b|c)*d$-6               14.0µs ± 1%     13.9µs ± 1%   -0.69%  (p=0.003 n=10+10)
CompileOnepass/^.bc(d|e)*$-6                8.96µs ± 1%     9.06µs ± 1%   +1.20%  (p=0.000 n=10+9)
CompileOnepass/^loooooooooooooooooo...-6     219µs ± 1%      220µs ± 1%   +0.63%  (p=0.006 n=9+10)
[Geo mean]                                  31.6µs          31.1µs        -1.82%

name                                      old speed      new speed       delta
QuoteMetaAll-6                            65.5MB/s ± 2%   64.8MB/s ± 3%     ~     (p=0.315 n=10+10)
QuoteMetaNone-6                            290MB/s ± 1%    290MB/s ± 1%     ~     (p=0.755 n=10+10)
Match/Easy0/32-6                           250MB/s ± 0%    251MB/s ± 1%     ~     (p=0.277 n=8+9)
Match/Easy0/1K-6                          1.81GB/s ± 0%   1.81GB/s ± 0%     ~     (p=0.408 n=8+10)
Match/Easy0/32K-6                         3.52GB/s ± 1%   3.53GB/s ± 1%     ~     (p=0.529 n=10+10)
Match/Easy0/1M-6                          2.28GB/s ± 1%   2.28GB/s ± 1%     ~     (p=0.853 n=10+10)
Match/Easy0/32M-6                         2.24GB/s ± 0%   2.23GB/s ± 0%   -0.76%  (p=0.000 n=9+8)
Match/Easy0i/32-6                         15.2MB/s ± 1%   16.2MB/s ± 0%   +6.43%  (p=0.000 n=10+9)
Match/Easy0i/1K-6                         16.6MB/s ± 0%   17.9MB/s ± 0%   +7.48%  (p=0.000 n=10+9)
Match/Easy0i/32K-6                        11.9MB/s ± 0%   12.0MB/s ± 0%   +1.11%  (p=0.000 n=9+9)
Match/Easy0i/1M-6                         11.9MB/s ± 0%   12.1MB/s ± 1%   +1.31%  (p=0.000 n=8+10)
Match/Easy0i/32M-6                        11.9MB/s ± 0%   12.1MB/s ± 1%   +1.84%  (p=0.000 n=8+10)
Match/Easy1/32-6                           260MB/s ± 1%    258MB/s ± 1%   -0.91%  (p=0.001 n=10+10)
Match/Easy1/1K-6                           601MB/s ± 1%    621MB/s ± 0%   +3.28%  (p=0.000 n=9+10)
Match/Easy1/32K-6                          474MB/s ± 0%    479MB/s ± 1%   +0.96%  (p=0.000 n=8+10)
Match/Easy1/1M-6                           426MB/s ± 1%    433MB/s ± 1%   +1.68%  (p=0.000 n=10+10)
Match/Easy1/32M-6                          428MB/s ± 1%    433MB/s ± 0%   +1.09%  (p=0.000 n=10+9)
Match/Medium/32-6                         15.4MB/s ± 1%   16.7MB/s ± 1%   +8.23%  (p=0.000 n=10+9)
Match/Medium/1K-6                         16.3MB/s ± 1%   17.7MB/s ± 1%   +8.43%  (p=0.000 n=9+10)
Match/Medium/32K-6                        12.5MB/s ± 1%   12.7MB/s ± 1%   +2.15%  (p=0.000 n=10+10)
Match/Medium/1M-6                         12.4MB/s ± 0%   12.7MB/s ± 0%   +2.44%  (p=0.000 n=8+9)
Match/Medium/32M-6                        12.4MB/s ± 0%   12.7MB/s ± 0%   +2.52%  (p=0.000 n=10+9)
Match/Hard/32-6                           9.82MB/s ± 1%  10.73MB/s ± 1%   +9.29%  (p=0.000 n=10+10)
Match/Hard/1K-6                           10.2MB/s ± 0%   11.3MB/s ± 1%  +10.56%  (p=0.000 n=9+10)
Match/Hard/32K-6                          8.58MB/s ± 0%   8.58MB/s ± 1%     ~     (p=0.554 n=8+10)
Match/Hard/1M-6                           8.59MB/s ± 1%   8.53MB/s ± 0%   -0.70%  (p=0.000 n=10+8)
Match/Hard/32M-6                          8.62MB/s ± 1%   8.59MB/s ± 1%     ~     (p=0.098 n=10+10)
Match/Hard1/32-6                          1.77MB/s ± 1%   1.99MB/s ± 1%  +12.40%  (p=0.000 n=10+8)
Match/Hard1/1K-6                          1.81MB/s ± 1%   2.08MB/s ± 1%  +14.55%  (p=0.000 n=10+10)
Match/Hard1/32K-6                         1.74MB/s ± 0%   1.74MB/s ± 0%     ~     (p=0.108 n=9+10)
Match/Hard1/1M-6                          1.74MB/s ± 0%   1.74MB/s ± 1%     ~     (p=1.000 n=9+10)
Match/Hard1/32M-6                         1.75MB/s ± 0%   1.75MB/s ± 1%     ~     (p=0.157 n=9+10)
Match_onepass_regex/32-6                  5.05MB/s ± 0%   5.05MB/s ± 1%     ~     (p=0.262 n=8+10)
Match_onepass_regex/1K-6                  5.02MB/s ± 1%   5.02MB/s ± 1%     ~     (p=0.677 n=9+10)
Match_onepass_regex/32K-6                 5.02MB/s ± 0%   4.99MB/s ± 0%   -0.47%  (p=0.000 n=10+9)
Match_onepass_regex/1M-6                  5.01MB/s ± 0%   5.04MB/s ± 1%   +0.68%  (p=0.017 n=8+10)
Match_onepass_regex/32M-6                 4.99MB/s ± 0%   5.03MB/s ± 1%   +0.74%  (p=0.000 n=10+10)
[Geo mean]                                29.1MB/s        29.8MB/s        +2.44%

go1 data for reference

name                     old time/op    new time/op    delta
BinaryTree17-6              4.39s ± 1%     4.37s ± 0%   -0.58%  (p=0.006 n=9+9)
Fannkuch11-6                5.13s ± 0%     5.18s ± 0%   +0.87%  (p=0.000 n=8+8)
FmtFprintfEmpty-6          74.2ns ± 0%    71.7ns ± 3%   -3.41%  (p=0.000 n=10+10)
FmtFprintfString-6          120ns ± 1%     122ns ± 2%     ~     (p=0.333 n=10+10)
FmtFprintfInt-6             127ns ± 1%     127ns ± 1%     ~     (p=0.809 n=10+10)
FmtFprintfIntInt-6          186ns ± 0%     188ns ± 1%   +1.02%  (p=0.002 n=8+10)
FmtFprintfPrefixedInt-6     223ns ± 1%     222ns ± 2%     ~     (p=0.421 n=10+10)
FmtFprintfFloat-6           374ns ± 0%     376ns ± 1%   +0.43%  (p=0.030 n=8+10)
FmtManyArgs-6               795ns ± 0%     788ns ± 1%   -0.79%  (p=0.000 n=8+9)
GobDecode-6                10.9ms ± 1%    10.9ms ± 0%     ~     (p=0.079 n=10+9)
GobEncode-6                8.60ms ± 1%    8.56ms ± 0%   -0.52%  (p=0.004 n=10+10)
Gzip-6                      378ms ± 1%     386ms ± 1%   +2.28%  (p=0.000 n=10+10)
Gunzip-6                   63.7ms ± 0%    62.3ms ± 0%   -2.22%  (p=0.000 n=9+8)
HTTPClientServer-6          120µs ± 3%     114µs ± 3%   -4.99%  (p=0.000 n=10+10)
JSONEncode-6               20.3ms ± 1%    19.9ms ± 0%   -1.90%  (p=0.000 n=9+10)
JSONDecode-6               84.3ms ± 0%    83.7ms ± 0%   -0.76%  (p=0.000 n=8+8)
Mandelbrot200-6            6.91ms ± 0%    6.89ms ± 0%   -0.31%  (p=0.000 n=9+8)
GoParse-6                  5.49ms ± 0%    5.47ms ± 1%     ~     (p=0.101 n=8+10)
RegexpMatchEasy0_32-6       130ns ± 0%     128ns ± 0%   -1.54%  (p=0.002 n=8+10)
RegexpMatchEasy0_1K-6       322ns ± 1%     322ns ± 0%     ~     (p=0.525 n=10+9)
RegexpMatchEasy1_32-6       124ns ± 0%     124ns ± 0%   -0.32%  (p=0.046 n=8+10)
RegexpMatchEasy1_1K-6       570ns ± 0%     548ns ± 1%   -3.76%  (p=0.000 n=10+10)
RegexpMatchMedium_32-6      196ns ± 0%     183ns ± 1%   -6.61%  (p=0.000 n=8+10)
RegexpMatchMedium_1K-6     64.3µs ± 0%    59.0µs ± 1%   -8.31%  (p=0.000 n=8+10)
RegexpMatchHard_32-6       3.08µs ± 0%    2.80µs ± 0%   -8.96%  (p=0.000 n=8+9)
RegexpMatchHard_1K-6       93.0µs ± 0%    84.5µs ± 1%   -9.17%  (p=0.000 n=8+9)
Revcomp-6                   647ms ± 2%     646ms ± 1%     ~     (p=0.720 n=10+9)
Template-6                 92.3ms ± 0%    91.7ms ± 0%   -0.65%  (p=0.000 n=8+8)
TimeParse-6                 490ns ± 0%     488ns ± 0%   -0.43%  (p=0.000 n=10+10)
TimeFormat-6                513ns ± 0%     513ns ± 1%     ~     (p=0.144 n=9+10)
[Geo mean]                 79.1µs         77.7µs        -1.73%

name                     old speed      new speed      delta
GobDecode-6              70.1MB/s ± 1%  70.3MB/s ± 0%     ~     (p=0.078 n=10+9)
GobEncode-6              89.2MB/s ± 1%  89.7MB/s ± 0%   +0.52%  (p=0.004 n=10+10)
Gzip-6                   51.4MB/s ± 1%  50.2MB/s ± 1%   -2.23%  (p=0.000 n=10+10)
Gunzip-6                  304MB/s ± 0%   311MB/s ± 0%   +2.27%  (p=0.000 n=9+8)
JSONEncode-6             95.8MB/s ± 1%  97.7MB/s ± 0%   +1.93%  (p=0.000 n=9+10)
JSONDecode-6             23.0MB/s ± 0%  23.2MB/s ± 0%   +0.76%  (p=0.000 n=8+8)
GoParse-6                10.6MB/s ± 0%  10.6MB/s ± 1%     ~     (p=0.111 n=8+10)
RegexpMatchEasy0_32-6     244MB/s ± 0%   249MB/s ± 0%   +2.06%  (p=0.000 n=9+10)
RegexpMatchEasy0_1K-6    3.18GB/s ± 1%  3.17GB/s ± 0%     ~     (p=0.211 n=10+9)
RegexpMatchEasy1_32-6     257MB/s ± 0%   258MB/s ± 0%   +0.37%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-6    1.80GB/s ± 0%  1.87GB/s ± 1%   +3.91%  (p=0.000 n=10+10)
RegexpMatchMedium_32-6   5.08MB/s ± 0%  5.43MB/s ± 1%   +7.03%  (p=0.000 n=8+10)
RegexpMatchMedium_1K-6   15.9MB/s ± 0%  17.4MB/s ± 1%   +9.08%  (p=0.000 n=8+10)
RegexpMatchHard_32-6     10.4MB/s ± 0%  11.4MB/s ± 0%   +9.82%  (p=0.000 n=8+9)
RegexpMatchHard_1K-6     11.0MB/s ± 0%  12.1MB/s ± 1%  +10.10%  (p=0.000 n=8+9)
Revcomp-6                 393MB/s ± 2%   394MB/s ± 1%     ~     (p=0.720 n=10+9)
Template-6               21.0MB/s ± 0%  21.2MB/s ± 0%   +0.66%  (p=0.000 n=8+8)
[Geo mean]               74.2MB/s       76.2MB/s        +2.70%

Updates #21851

Change-Id: Ie88455db925f422a828f8528293790726a9c036b
Reviewed-on: https://go-review.googlesource.com/65491
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 19:34:26 +00:00
Daniel Martí
6945c67e10 cmd/compile: merge bytes inline test with the rest
In golang.org/cl/42813, a test was added in the bytes package to check
if a Buffer method was being inlined, using 'go tool nm'.

Now that we have a compiler test that verifies that certain funcs are
inlineable, merge it there. Knowing whether the funcs are inlineable is
also more reliable than whether or not their symbol appears in the
binary, too. For example, under some circumstances, inlineable funcs
can't be inlined, such as if closures are used.

While at it, add a few more bytes.Buffer methods that are currently
inlined and should clearly stay that way.

Updates #21851.

Change-Id: I62066e32ef5542d37908bd64f90bda51276da4de
Reviewed-on: https://go-review.googlesource.com/65658
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-25 06:52:31 +00:00
Daniel Martí
6936671ed1 cmd/compile: clarify adjustctxt inlining comment
The reason why adjustctxt wasn't being inlined was reported as:

	function too complex: cost 92 exceeds budget 80

However, after tweaking the code to be under the budget limit, we see
the real blocker:

	non-leaf function

There is little we can do about this one in particular at the moment.
Create a section with funcs that will need mid-stack inlining to be
inlineable, since this will likely come up again in other cases.

Change-Id: I3a8eb1546b289a060ac896506a007b0496946e84
Reviewed-on: https://go-review.googlesource.com/65650
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-23 20:24:10 +00:00
Daniel Martí
f366379d84 cmd/compile: add more runtime funcs to inline test
This is based from a list that Keith Randall provided in mid-2016. These
are all funcs that, at the time, were important and small enough that
they should be clearly inlined.

The runtime has changed a bit since then. Ctz16 and Ctz8 were removed,
so don't add them. stringtoslicebytetmp was moved to the backend, so
it's no longer a Go function. And itabhash was moved to itabHashFunc.

The only other outlier is adjustctxt, which is not inlineable at the
moment. I've added a TODO and will address it myself in a separate
commit.

While at it, error if any funcs in the input table are duplicated.
They're never useful and typos could lead to unintentionally thinking a
function is inlineable when it actually isn't.

And, since the lists are getting long, start sorting alphabetically.

Finally, rotl_31 is only defined on 64-bit architectures, and the added
runtime/internal/sys funcs are assembly on 386 and thus non-inlineable
in that case.

Updates #21851.

Change-Id: Ib99ab53d777860270e8fd4aefc41adb448f13662
Reviewed-on: https://go-review.googlesource.com/65351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-09-22 17:19:01 +00:00
Daniel Martí
3d741349f5 cmd/compile: collect reasons in inlining test
If we use -gcflags='-m -m', the compiler should give us a reason why a
func couldn't be inlined. Add the extra -m necessary for that extra info
and use it to give better test failures. For example, for the func in
the TODO:

	--- FAIL: TestIntendedInlining (1.53s)
		inl_test.go:104: runtime.nextFreeFast was not inlined: function too complex

We might increase the number of -m flags to get more information at some
later point, such as getting details on how close the func was to the
inlining budget.

Also started using regexes, as the output parsing is getting a bit too
complex for manual string handling.

While at it, also refactored the test to not buffer the entire output
into memory. This is fine in practice, but it won't scale well as we add
more packages or we depend more on the compiler's debugging output.

For example, "go build -a -gcflags='-m -m' std" prints nearly 40MB of
plaintext - and we only need to see the output line by line anyway.

Updates #21851.

Change-Id: I00986ff360eb56e4e9737b65a6be749ef8540643
Reviewed-on: https://go-review.googlesource.com/63810
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-22 10:59:14 +00:00
Ilya Tocar
101fbc2c82 runtime: make nextFreeFast inlinable
https://golang.org/cl/22598 made nextFreeFast inlinable.
But during https://golang.org/cl/63611 it was discovered, that it is no longer inlinable.
Reduce number of statements below inlining threshold to make it inlinable again.
Also update tests, to prevent regressions.
Doesn't reduce readability.

Change-Id: Ia672784dd48ed3b1ab46e390132f1094fe453de5
Reviewed-on: https://go-review.googlesource.com/65030
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2017-09-20 20:27:13 +00:00
Daniel Martí
f351dbfa4d cmd/compile: expand inlining test to multiple pkgs
Rework the test to work with any number of std packages. This was done
to include a few funcs from unicode/utf8. Adding more will be much
simpler too.

While at it, add more runtime funcs by searching for "inlined" or
"inlining" in the git log of its directory. These are: addb, subtractb,
fastrand and noescape.

Updates #21851.

Change-Id: I4fb2bd8aa6a5054218f9b36cb19d897ac533710e
Reviewed-on: https://go-review.googlesource.com/63611
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-09-14 04:49:58 +00:00
Daniel Martí
0d8a3b208c cmd/compile: add TestIntendedInlining from runtime
Move it from the runtime package, as we will soon add more packages and
functions for it to check.

The test used the testEnv func, which cleaned certain environment
variables from a command, so it was moved to internal/testenv under a
more descriptive (and less ambiguous) name. Add a simple godoc to it
too.

For #21851.

Change-Id: I6f39c1f23b45377718355fafe66ffd87047d8ab6
Reviewed-on: https://go-review.googlesource.com/63550
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-09-13 18:10:31 +00:00