Commit graph

67 commits

Author SHA1 Message Date
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164
7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
Matthew Dempsky
80a6fedea0 cmd/compile: add -d=checkptr to validate unsafe.Pointer rules
This CL adds -d=checkptr as a compile-time option for adding
instrumentation to check that Go code is following unsafe.Pointer
safety rules dynamically. In particular, it currently checks two
things:

1. When converting unsafe.Pointer to *T, make sure the resulting
pointer is aligned appropriately for T.

2. When performing pointer arithmetic, if the result points to a Go
heap object, make sure we can find an unsafe.Pointer-typed operand
that pointed into the same object.

These checks are currently disabled for the runtime, and can also be
disabled through a new //go:nocheckptr annotation. The latter is
necessary for functions like strings.noescape, which intentionally
violate safety rules to workaround escape analysis limitations.

Fixes #22218.

Change-Id: If5a51273881d93048f74bcff10a3275c9c91da6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/162237
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-17 00:40:21 +00:00
Keith Randall
36f30ba289 cmd/compile,runtime: generate hash functions only for types which are map keys
Right now we generate hash functions for all types, just in case they
are used as map keys. That's a lot of wasted effort and binary size
for types which will never be used as a map key. Instead, generate
hash functions only for types that we know are map keys.

Just doing that is a bit too simple, since maps with an interface type
as a key might have to hash any concrete key type that implements that
interface. So for that case, implement hashing of such types at
runtime (instead of with generated code). It will be slower, but only
for maps with interface types as keys, and maybe only a bit slower as
the aeshash time probably dominates the dispatch time.

Reorg where we keep the equals and hash functions. Move the hash function
from the key type to the map type, saving a field in every non-map type.
That leaves only one function in the alg structure, so get rid of that and
just keep the equal function in the type descriptor itself.

cmd/go now has 10 generated hash functions, instead of 504. Makes
cmd/go 1.0% smaller. Update #6853.

Speed on non-interface keys is unchanged. Speed on interface keys
is ~20% slower:

name                  old time/op  new time/op  delta
MapInterfaceString-8  23.0ns ±21%  27.6ns ±14%  +20.01%  (p=0.002 n=10+10)
MapInterfacePtr-8     19.4ns ±16%  23.7ns ± 7%  +22.48%   (p=0.000 n=10+8)

Change-Id: I7c2e42292a46b5d4e288aaec4029bdbb01089263
Reviewed-on: https://go-review.googlesource.com/c/go/+/191198
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2019-09-03 20:41:29 +00:00
Keith Randall
2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
Keith Randall
585c9e8412 cmd/compile: implement shifts by signed amounts
Allow shifts by signed amounts. Panic if the shift amount is negative.

TODO: We end up doing two compares per shift, see Ian's comment
https://github.com/golang/go/issues/19113#issuecomment-443241799 that
we could do it with a single comparison in the normal case.

The prove pass mostly handles this code well. For instance, it removes the
<0 check for cases like this:
    if s >= 0 { _ = x << s }
    _ = x << len(a)

This case isn't handled well yet:
    _ = x << (y & 0xf)
I'll do followon CLs for unhandled cases as needed.

Update #19113

R=go1.13

Change-Id: I839a5933d94b54ab04deb9dd5149f32c51c90fa1
Reviewed-on: https://go-review.googlesource.com/c/158719
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-02-15 23:13:09 +00:00
Martin Möhrmann
75798e8ada runtime: make processor capability variable naming platform specific
The current support_XXX variables are specific for the
amd64 and 386 platforms.

Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.

This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.

Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-14 20:30:31 +00:00
Josh Bleecher Snyder
5848b6c9b8 cmd/compile: shrink specialized convT2x call sites
convT2E16 and other specialized type-to-interface routines
accept a type/itab argument and return a complete interface value.
However, we know enough in the routine to do without the type.
And the caller can construct the interface value using the type.

Doing so shrinks the call sites of ten of the specialized convT2x routines.
It also lets us unify the empty and non-empty interface routines.

Cuts 12k off cmd/go.

name                         old time/op  new time/op  delta
ConvT2ESmall-8               2.96ns ± 2%  2.34ns ± 4%  -21.01%  (p=0.000 n=175+189)
ConvT2EUintptr-8             3.00ns ± 3%  2.34ns ± 4%  -22.02%  (p=0.000 n=189+187)
ConvT2ELarge-8               21.3ns ± 7%  21.5ns ± 5%   +1.02%  (p=0.000 n=200+197)
ConvT2ISmall-8               2.99ns ± 4%  2.33ns ± 3%  -21.95%  (p=0.000 n=193+184)
ConvT2IUintptr-8             3.02ns ± 3%  2.33ns ± 3%  -22.82%  (p=0.000 n=198+190)
ConvT2ILarge-8               21.7ns ± 5%  22.2ns ± 4%   +2.31%  (p=0.000 n=199+198)
ConvT2Ezero/zero/16-8        2.96ns ± 2%  2.33ns ± 3%  -21.11%  (p=0.000 n=174+187)
ConvT2Ezero/zero/32-8        2.96ns ± 1%  2.35ns ± 4%  -20.62%  (p=0.000 n=163+193)
ConvT2Ezero/zero/64-8        2.99ns ± 2%  2.34ns ± 4%  -21.78%  (p=0.000 n=183+188)
ConvT2Ezero/zero/str-8       3.27ns ± 3%  2.54ns ± 3%  -22.32%  (p=0.000 n=195+192)
ConvT2Ezero/zero/slice-8     3.46ns ± 4%  2.81ns ± 3%  -18.96%  (p=0.000 n=197+164)
ConvT2Ezero/zero/big-8       88.4ns ±20%  90.0ns ±20%   +1.84%  (p=0.000 n=196+198)
ConvT2Ezero/nonzero/16-8     12.6ns ± 3%  12.3ns ± 3%   -2.34%  (p=0.000 n=167+196)
ConvT2Ezero/nonzero/32-8     12.3ns ± 4%  11.9ns ± 3%   -2.95%  (p=0.000 n=187+193)
ConvT2Ezero/nonzero/64-8     14.2ns ± 6%  13.8ns ± 5%   -2.94%  (p=0.000 n=198+199)
ConvT2Ezero/nonzero/str-8    27.2ns ± 5%  26.8ns ± 5%   -1.33%  (p=0.000 n=200+198)
ConvT2Ezero/nonzero/slice-8  33.3ns ± 8%  33.1ns ± 6%   -0.82%  (p=0.000 n=199+200)
ConvT2Ezero/nonzero/big-8    88.8ns ±22%  90.2ns ±18%   +1.58%  (p=0.000 n=200+199)


Neligible toolspeed impact.

name        old alloc/op      new alloc/op      delta
Template         35.4MB ± 0%       35.3MB ± 0%  -0.06%  (p=0.008 n=5+5)
Unicode          29.1MB ± 0%       29.1MB ± 0%    ~     (p=0.310 n=5+5)
GoTypes           122MB ± 0%        122MB ± 0%  -0.08%  (p=0.008 n=5+5)
Compiler          514MB ± 0%        513MB ± 0%  -0.02%  (p=0.008 n=5+5)
SSA              1.94GB ± 0%       1.94GB ± 0%  -0.01%  (p=0.008 n=5+5)
Flate            24.2MB ± 0%       24.2MB ± 0%    ~     (p=0.548 n=5+5)
GoParser         28.5MB ± 0%       28.5MB ± 0%  -0.05%  (p=0.016 n=5+5)
Reflect          86.3MB ± 0%       86.2MB ± 0%  -0.02%  (p=0.008 n=5+5)
Tar              34.9MB ± 0%       34.9MB ± 0%    ~     (p=0.095 n=5+5)
XML              47.1MB ± 0%       47.1MB ± 0%  -0.05%  (p=0.008 n=5+5)
[Geo mean]       81.0MB            81.0MB       -0.03%

name        old allocs/op     new allocs/op     delta
Template           349k ± 0%         349k ± 0%  -0.08%  (p=0.008 n=5+5)
Unicode            340k ± 0%         340k ± 0%    ~     (p=0.111 n=5+5)
GoTypes           1.28M ± 0%        1.28M ± 0%  -0.09%  (p=0.008 n=5+5)
Compiler          4.92M ± 0%        4.92M ± 0%  -0.08%  (p=0.008 n=5+5)
SSA               15.3M ± 0%        15.3M ± 0%  -0.03%  (p=0.008 n=5+5)
Flate              233k ± 0%         233k ± 0%    ~     (p=0.500 n=5+5)
GoParser           292k ± 0%         292k ± 0%  -0.06%  (p=0.008 n=5+5)
Reflect           1.05M ± 0%        1.05M ± 0%  -0.02%  (p=0.008 n=5+5)
Tar                344k ± 0%         343k ± 0%  -0.06%  (p=0.008 n=5+5)
XML                430k ± 0%         429k ± 0%  -0.08%  (p=0.008 n=5+5)
[Geo mean]         809k              809k       -0.05%

name        old object-bytes  new object-bytes  delta
Template          507kB ± 0%        507kB ± 0%  -0.04%  (p=0.008 n=5+5)
Unicode           225kB ± 0%        225kB ± 0%    ~     (all equal)
GoTypes          1.85MB ± 0%       1.85MB ± 0%  -0.08%  (p=0.008 n=5+5)
Compiler         6.75MB ± 0%       6.75MB ± 0%  +0.01%  (p=0.008 n=5+5)
SSA              21.4MB ± 0%       21.4MB ± 0%  -0.02%  (p=0.008 n=5+5)
Flate             328kB ± 0%        328kB ± 0%  -0.03%  (p=0.008 n=5+5)
GoParser          403kB ± 0%        402kB ± 0%  -0.06%  (p=0.008 n=5+5)
Reflect          1.41MB ± 0%       1.41MB ± 0%  -0.03%  (p=0.008 n=5+5)
Tar               457kB ± 0%        457kB ± 0%  -0.05%  (p=0.008 n=5+5)
XML               601kB ± 0%        600kB ± 0%  -0.16%  (p=0.008 n=5+5)
[Geo mean]       1.05MB            1.04MB       -0.05%


Change-Id: I677a4108c0ecd32617549294036aa84f9214c4fe
Reviewed-on: https://go-review.googlesource.com/c/147360
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2018-11-06 00:02:14 +00:00
Martin Möhrmann
020a18c545 cmd/compile: move slice construction to callers of makeslice
Only return a pointer p to the new slices backing array from makeslice.
Makeslice callers then construct sliceheader{p, len, cap} explictly
instead of makeslice returning the slice.

Reduces go binary size by ~0.2%.
Removes 92 (~3.5%) panicindex calls from go binary.

Change-Id: I29b7c3b5fe8b9dcec96e2c43730575071cfe8a94
Reviewed-on: https://go-review.googlesource.com/c/141822
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-10-29 19:23:00 +00:00
Keith Randall
0e9f8a21f8 runtime,cmd/compile: pass strings and slices to convT2{E,I} by value
When we pass these types by reference, we usually have to allocate
temporaries on the stack, initialize them, then pass their address
to the conversion functions. It's simpler to pass these types
directly by value.

This particularly applies to conversions needed for fmt.Printf
(to interface{} for constructing a [...]interface{}).

func f(a, b, c string) {
     fmt.Printf("%s %s\n", a, b)
     fmt.Printf("%s %s\n", b, c)
}

This function's stack frame shrinks from 200 to 136 bytes, and
its code shrinks from 535 to 453 bytes.

The go binary shrinks 0.3%.

Update #24286

Aside: for this function f, we don't really need to allocate
temporaries for the convT2E function. We could use the address
of a, b, and c directly. That might get similar (or maybe better?)
improvements. I investigated a bit, but it seemed complicated
to do it safely. This change was much easier.

Change-Id: I78cbe51b501fb41e1e324ce4203f0de56a1db82d
Reviewed-on: https://go-review.googlesource.com/c/135377
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-10-14 03:46:51 +00:00
Lynn Boger
9e9ff565cd runtime/race: implement race detector for ppc64le
This adds the support to enable the race detector for ppc64le.

Added runtime/race_ppc64le.s to manage the calls from Go to the
LLVM tsan functions, mostly converting from the Go ABI to the
PPC64 ABI expected by Clang generated code.

Changed racewalk.go to call racefuncenterfp instead of racefuncenter
on ppc64le to allow the caller pc to be obtained in the asm code
before calling the tsan version.

Changed the set up code for racecallbackthunk so it doesn't use
the autogenerated save and restore of the link register since that
sequence uses registers inconsistent with the normal ppc64 ABI.

Made various changes to recognize that race is supported for
ppc64le.

Ensured that tls_g is updated and accessible from race_linux_ppc64le.s
so that the race ctx can be obtained and passed to tsan functions.

This enables the race tests for ppc64le in cmd/dist/test.go and
increases the timeout when running the benchmarks with the -race
option to avoid timing out.

Updates #24354, #23731

Change-Id: Ib97dc7ac313e6313c836dc7d2fb698f9d8fba3ef
Reviewed-on: https://go-review.googlesource.com/107935
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-11 17:45:36 +00:00
Martin Möhrmann
aee71dd70b cmd/compile: optimize map-clearing range idiom
replace map clears of the form:

        for k := range m {
                delete(m, k)
        }

(where m is map with key type that is reflexive for ==)
with a new runtime function that clears the maps backing
array with a memclr and reinitializes the hmap struct.

Map key types that for example contain floats are not
replaced by this optimization since NaN keys cannot
be deleted from maps using delete.

name                           old time/op  new time/op  delta
GoMapClear/Reflexive/1         92.2ns ± 1%  47.1ns ± 2%  -48.89%  (p=0.000 n=9+9)
GoMapClear/Reflexive/10         108ns ± 1%    48ns ± 2%  -55.68%  (p=0.000 n=10+10)
GoMapClear/Reflexive/100        303ns ± 2%   110ns ± 3%  -63.56%  (p=0.000 n=10+10)
GoMapClear/Reflexive/1000      3.58µs ± 3%  1.23µs ± 2%  -65.49%  (p=0.000 n=9+10)
GoMapClear/Reflexive/10000     28.2µs ± 3%  10.3µs ± 2%  -63.55%  (p=0.000 n=9+10)
GoMapClear/NonReflexive/1       121ns ± 2%   124ns ± 7%     ~     (p=0.097 n=10+10)
GoMapClear/NonReflexive/10      137ns ± 2%   139ns ± 3%   +1.53%  (p=0.033 n=10+10)
GoMapClear/NonReflexive/100     331ns ± 3%   334ns ± 2%     ~     (p=0.342 n=10+10)
GoMapClear/NonReflexive/1000   3.64µs ± 3%  3.64µs ± 2%     ~     (p=0.887 n=9+10)
GoMapClear/NonReflexive/10000  28.1µs ± 2%  28.4µs ± 3%     ~     (p=0.247 n=10+10)

Fixes #20138

Change-Id: I181332a8ef434a4f0d89659f492d8711db3f3213
Reviewed-on: https://go-review.googlesource.com/110055
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 21:15:16 +00:00
Martin Möhrmann
b9a59d9f2e cmd/compile: optimize len([]rune(string))
Adds a new runtime function to count runes in a string.
Modifies the compiler to detect the pattern len([]rune(string))
and replaces it with the new rune counting runtime function.

RuneCount/lenruneslice/ASCII                  27.8ns ± 2%  14.5ns ± 3%  -47.70%  (p=0.000 n=10+10)
RuneCount/lenruneslice/Japanese                126ns ± 2%    60ns ± 2%  -52.03%  (p=0.000 n=10+10)
RuneCount/lenruneslice/MixedLength             104ns ± 2%    50ns ± 1%  -51.71%  (p=0.000 n=10+9)

Fixes #24923

Change-Id: Ie9c7e7391a4e2cca675c5cdcc1e5ce7d523948b9
Reviewed-on: https://go-review.googlesource.com/108985
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 05:31:01 +00:00
Martin Möhrmann
a8a60ac2a7 cmd/compile: optimize append(x, make([]T, y)...) slice extension
Changes the compiler to recognize the slice extension pattern

  append(x, make([]T, y)...)

and replace it with growslice and an optional memclr to avoid an allocation for make([]T, y).

Memclr is not called in case growslice already allocated a new cleared backing array
when T contains pointers.

amd64:
name                      old time/op    new time/op    delta
ExtendSlice/IntSlice         103ns ± 4%      57ns ± 4%   -44.55%  (p=0.000 n=18+18)
ExtendSlice/PointerSlice     155ns ± 3%      77ns ± 3%   -49.93%  (p=0.000 n=20+20)
ExtendSlice/NoGrow          50.2ns ± 3%     5.2ns ± 2%   -89.67%  (p=0.000 n=18+18)

name                      old alloc/op   new alloc/op   delta
ExtendSlice/IntSlice         64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice     64.0B ± 0%     32.0B ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow           32.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)

name                      old allocs/op  new allocs/op  delta
ExtendSlice/IntSlice          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/PointerSlice      2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=20+20)
ExtendSlice/NoGrow            1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)

Fixes #21266

Change-Id: Idc3077665f63cbe89762b590c5967a864fd1c07f
Reviewed-on: https://go-review.googlesource.com/109517
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-05-06 04:28:23 +00:00
Matthew Dempsky
6658219b7e runtime: eliminate scase.receivedp
Make selectgo return recvOK as a result parameter instead.

Change-Id: Iffd436371d360bf666b76d4d7503e7c3037a9f1d
Reviewed-on: https://go-review.googlesource.com/37935
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:54 +00:00
Matthew Dempsky
004260afde cmd/compile: open code select{send,recv,default}
Registration now looks like:

        var cases [4]runtime.scases
        var order [8]uint16
	cases[0].kind = caseSend
	cases[0].c = c1
	cases[0].elem = &v1
	if raceenabled || msanenabled {
		selectsetpc(&cases[0])
	}
	cases[1].kind = caseRecv
	cases[1].c = c2
	cases[1].elem = &v2
	if raceenabled || msanenabled {
		selectsetpc(&cases[1])
	}
	...

Change-Id: Ib9bcf426a4797fe4bfd8152ca9e6e08e39a70b48
Reviewed-on: https://go-review.googlesource.com/37934
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:44 +00:00
Matthew Dempsky
3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
ChrisALiles
22ff9521da cmd/compile: pass arguments to convt2E/I integer functions by value
The motivation is avoid generating a pointer to the data being
converted so it can be garbage collected.
The change also slightly reduces binary size by shrinking call sites.

Fixes #24286

Benchmark results:
name                   old time/op  new time/op  delta
ConvT2ESmall-4         2.86ns ± 0%  2.80ns ± 1%  -2.12%  (p=0.000 n=29+28)
ConvT2EUintptr-4       2.88ns ± 1%  2.88ns ± 0%  -0.20%  (p=0.002 n=28+30)
ConvT2ELarge-4         19.6ns ± 0%  20.4ns ± 1%  +4.22%  (p=0.000 n=19+30)
ConvT2ISmall-4         3.01ns ± 0%  2.85ns ± 0%  -5.32%  (p=0.000 n=24+28)
ConvT2IUintptr-4       3.00ns ± 1%  2.87ns ± 0%  -4.44%  (p=0.000 n=29+25)
ConvT2ILarge-4         20.4ns ± 1%  21.3ns ± 1%  +4.41%  (p=0.000 n=30+26)
ConvT2Ezero/zero/16-4  2.84ns ± 1%  2.99ns ± 0%  +5.38%  (p=0.000 n=30+25)
ConvT2Ezero/zero/32-4  2.83ns ± 2%  3.00ns ± 0%  +5.91%  (p=0.004 n=27+3)

Change-Id: I65016ec94c53f97c52113121cab582d0c342b7a8
Reviewed-on: https://go-review.googlesource.com/102636
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-29 15:53:39 +00:00
Matthew Dempsky
b55eedd173 Revert "cmd/compile: cleanup nodpc and nodfp"
This reverts commit dcac984b97.

Reason for revert: broke LR architectures (arm64, ppc64, s390x)

Change-Id: I531d311c9053e81503c8c78d6cf044b318fc828b
Reviewed-on: https://go-review.googlesource.com/99695
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-08 21:23:01 +00:00
Matthew Dempsky
dcac984b97 cmd/compile: cleanup nodpc and nodfp
Instead of creating a new &nodfp expression for every recover() call,
or a new nodpc variable for every function instrumented by the race
detector, this CL introduces two new uintptr-typed pseudo-variables
callerSP and callerPC. These pseudo-variables act just like calls to
the runtime's getcallersp() and getcallerpc() functions.

For consistency, change runtime.gorecover's builtin stub's parameter
type from "*int32" to "uintptr".

Passes toolstash-check, but toolstash-check -race fails because of
register allocator changes.

Change-Id: I985d644653de2dac8b7b03a28829ad04dfd4f358
Reviewed-on: https://go-review.googlesource.com/99416
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 18:22:29 +00:00
Austin Clements
2010189407 runtime: remove legacy eager write barrier
Now that the buffered write barrier is implemented for all
architectures, we can remove the old eager write barrier
implementation. This CL removes the implementation from the runtime,
support in the compiler for calling it, and updates some compiler
tests that relied on the old eager barrier support. It also makes sure
that all of the useful comments from the old write barrier
implementation still have a place to live.

Fixes #22460.

Updates #21640 since this fixes the layering concerns of the write
barrier (but not the other things in that issue).

Change-Id: I580f93c152e89607e0a72fe43370237ba97bae74
Reviewed-on: https://go-review.googlesource.com/92705
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 16:34:46 +00:00
Keith Randall
48e207d518 cmd/compile: fix mapassign_fast* routines for pointer keys
The signature of the mapassign_fast* routines need to distinguish
the pointerness of their key argument.  If the affected routines
suspend part way through, the object pointed to by the key might
get garbage collected because the key is typed as a uint{32,64}.

This is not a problem for mapaccess or mapdelete because the key
in those situations do not live beyond the call involved.  If the
object referenced by the key is garbage collected prematurely, the
code still works fine.  Even if that object is subsequently reallocated,
it can't be written to the map in time to affect the lookup/delete.

Fixes #22781

Change-Id: I0bbbc5e9883d5ce702faf4e655348be1191ee439
Reviewed-on: https://go-review.googlesource.com/79018
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2017-11-22 04:30:27 +00:00
Martin Möhrmann
fbfc2031a6 cmd/compile: specialize map creation for small hint sizes
Handle make(map[any]any) and make(map[any]any, hint) where
hint <= BUCKETSIZE special to allow for faster map initialization
and to improve binary size by using runtime calls with fewer arguments.

Given hint is smaller or equal to BUCKETSIZE in which case
overLoadFactor(hint, 0)  is false and no buckets would be allocated by makemap:
* If hmap needs to be allocated on the stack then only hmap's hash0
  field needs to be initialized and no call to makemap is needed.
* If hmap needs to be allocated on the heap then a new special
  makehmap function will allocate hmap and intialize hmap's
  hash0 field.

Reduces size of the godoc by ~36kb.

AMD64
name         old time/op    new time/op    delta
NewEmptyMap    16.6ns ± 2%     5.5ns ± 2%  -66.72%  (p=0.000 n=10+10)
NewSmallMap    64.8ns ± 1%    56.5ns ± 1%  -12.75%  (p=0.000 n=9+10)

Updates #6853

Change-Id: I624e90da6775afaa061178e95db8aca674f44e9b
Reviewed-on: https://go-review.googlesource.com/61190
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-11-02 17:03:45 +00:00
Ilya Tocar
94484d8ed5 cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64
This significantly speed-ups Trunc.
Ceil/Floor are using the same instruction, so do them too.

name     old time/op  new time/op  delta
Floor-6  3.33ns ± 1%  3.22ns ± 0%   -3.39%  (p=0.000 n=10+10)
Ceil-6   3.33ns ± 1%  3.22ns ± 0%   -3.16%  (p=0.000 n=10+7)
Trunc-6  4.83ns ± 0%  3.22ns ± 0%  -33.36%  (p=0.000 n=6+8)

Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254
Reviewed-on: https://go-review.googlesource.com/68630
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-31 19:30:54 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Martin Möhrmann
cbc4e5d9c4 cmd/compile: generate makemap calls with int arguments
Where possible generate calls to runtime makemap with int hint argument
during compile time instead of makemap with int64 hint argument.

This eliminates converting the hint argument for calls to makemap with
int64 hint argument for platforms where int64 values do not fit into
an argument of type int.

A similar optimization for makeslice was introduced in CL
golang.org/cl/27851.

386:
name         old time/op    new time/op    delta
NewEmptyMap    53.5ns ± 5%    41.9ns ± 5%  -21.56%  (p=0.000 n=10+10)
NewSmallMap     182ns ± 1%     165ns ± 1%   -8.92%  (p=0.000 n=10+10)

Change-Id: Ibd2b4c57b36f171b173bf7a0602b3a59771e6e44
Reviewed-on: https://go-review.googlesource.com/55142
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-22 20:28:21 +00:00
Martin Möhrmann
3216e0cefa cmd/compile: replace eqstring with memequal
eqstring is only called for strings with equal lengths.
Instead of pushing a pointer and length for each argument string
on the stack we can omit pushing one of the lengths on the stack.

Changing eqstrings signature to eqstring(*uint8, *uint8, int) bool
to implement the above optimization would make it very similar to the
existing memequal(*any, *any, uintptr) bool function.

Since string lengths are positive we can avoid code redundancy and
use memequal instead of using eqstring with an optimized signature.

go command binary size reduced by 4128 bytes on amd64.

name                          old time/op    new time/op    delta
CompareStringEqual              6.03ns ± 1%    5.71ns ± 1%   -5.23%  (p=0.000 n=19+18)
CompareStringIdentical          2.88ns ± 1%    3.22ns ± 7%  +11.86%  (p=0.000 n=20+20)
CompareStringSameLength         4.31ns ± 1%    4.01ns ± 1%   -7.17%  (p=0.000 n=19+19)
CompareStringDifferentLength    0.29ns ± 2%    0.29ns ± 2%     ~     (p=1.000 n=20+20)
CompareStringBigUnaligned       64.3µs ± 2%    64.1µs ± 3%     ~     (p=0.164 n=20+19)
CompareStringBig                61.9µs ± 1%    61.6µs ± 2%   -0.46%  (p=0.033 n=20+19)

Change-Id: Ice15f3b937c981f0d3bc8479a9ea0d10658ac8df
Reviewed-on: https://go-review.googlesource.com/53650
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-22 17:59:02 +00:00
Martin Möhrmann
06a78b5737 cmd/compile: pass stack allocated bucket to makemap inside hmap
name         old time/op    new time/op    delta
NewEmptyMap    53.2ns ± 7%    48.0ns ± 5%  -9.77%  (p=0.000 n=20+20)
NewSmallMap     111ns ± 1%     106ns ± 2%  -3.78%  (p=0.000 n=20+19)

Change-Id: I979d21ab16eae9f6893873becca517db57e054b5
Reviewed-on: https://go-review.googlesource.com/56290
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-22 06:01:59 +00:00
Martin Möhrmann
8a6e51aede cmd/compile: generate makechan calls with int arguments
Where possible generate calls to runtime makechan with int arguments
during compile time instead of makechan with int64 arguments.

This eliminates converting arguments for calls to makechan with
int64 arguments for platforms where int64 values do not fit into
arguments of type int.

A similar optimization for makeslice was introduced in CL
golang.org/cl/27851.

386:
name                old time/op  new time/op  delta
MakeChan/Byte       52.4ns ± 6%  45.0ns ± 1%  -14.14%  (p=0.000 n=10+10)
MakeChan/Int        54.5ns ± 1%  49.1ns ± 1%   -9.87%  (p=0.000 n=10+10)
MakeChan/Ptr         150ns ± 1%   143ns ± 0%   -4.38%  (p=0.000 n=9+7)
MakeChan/Struct/0   49.2ns ± 2%  43.2ns ± 2%  -12.27%  (p=0.000 n=10+10)
MakeChan/Struct/32  81.7ns ± 2%  76.2ns ± 1%   -6.71%  (p=0.000 n=10+10)
MakeChan/Struct/40  88.4ns ± 2%  82.5ns ± 2%   -6.60%  (p=0.000 n=10+10)

AMD64:
name                old time/op  new time/op  delta
MakeChan/Byte       83.4ns ± 8%  80.8ns ± 3%    ~     (p=0.171 n=10+10)
MakeChan/Int         101ns ± 3%   101ns ± 2%    ~     (p=0.412 n=10+10)
MakeChan/Ptr         128ns ± 1%   128ns ± 1%    ~     (p=0.191 n=10+10)
MakeChan/Struct/0   67.6ns ± 3%  68.7ns ± 4%    ~     (p=0.224 n=10+10)
MakeChan/Struct/32   138ns ± 1%   139ns ± 1%    ~     (p=0.185 n=10+9)
MakeChan/Struct/40   154ns ± 1%   154ns ± 1%  -0.55%  (p=0.027 n=10+9)

Change-Id: Ie854cb066007232c5e9f71ea7d6fe27e81a9c050
Reviewed-on: https://go-review.googlesource.com/55140
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-15 05:54:24 +00:00
Keith Randall
5cadc91b3c cmd/compile: intrinsics for math/bits.OnesCount
Popcount instructions on amd64 are not guaranteed to be
present, so we must guard their call.  Rewrite rules can't
generate control flow at the moment, so the intrinsifier
needs to generate that code.

name           old time/op  new time/op  delta
OnesCount-8    2.47ns ± 5%  1.04ns ± 2%  -57.70%  (p=0.000 n=10+10)
OnesCount16-8  1.05ns ± 1%  0.78ns ± 0%  -25.56%    (p=0.000 n=9+8)
OnesCount32-8  1.63ns ± 5%  1.04ns ± 2%  -35.96%  (p=0.000 n=10+10)
OnesCount64-8  2.45ns ± 0%  1.04ns ± 1%  -57.55%   (p=0.000 n=6+10)

Update #18616

Change-Id: I4aff2cc9aa93787898d7b22055fe272a7cf95673
Reviewed-on: https://go-review.googlesource.com/38320
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-04-04 02:40:11 +00:00
Keith Randall
e67d881bc3 cmd/compile: simplify efaceeq and ifaceeq
Clean up code that does interface equality. Avoid doing checks
in efaceeq/ifaceeq that we already did before calling those routines.

No noticeable performance changes for existing benchmarks.

name            old time/op  new time/op  delta
EfaceCmpDiff-8   604ns ± 1%   553ns ± 1%  -8.41%  (p=0.000 n=9+10)

Fixes #18618

Change-Id: I3bd46db82b96494873045bc3300c56400bc582eb
Reviewed-on: https://go-review.googlesource.com/38606
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-03-24 23:03:09 +00:00
Daniel Martí
2e29eb57db runtime: remove unused *chantype parameters
The chanrecv funcs don't use it at all. The chansend ones do, but the
element type is now part of the hchan struct, which is already a
parameter.

hchan can be nil in chansend when sending to a nil channel, so when
instrumenting we must copy to the stack to be able to read the channel
type.

name             old time/op  new time/op  delta
ChanUncontended  6.42µs ± 1%  6.22µs ± 0%  -3.06%  (p=0.000 n=19+18)

Initially found by github.com/mvdan/unparam.

Fixes #19591.

Change-Id: I3a5e8a0082e8445cc3f0074695e3593fd9c88412
Reviewed-on: https://go-review.googlesource.com/38351
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-21 17:10:16 +00:00
Hugues Bruant
5d6b7fcaa1 runtime: add mapdelete_fast*
Add benchmarks for map delete with int32/int64/string key

Benchmark results on darwin/amd64

name                 old time/op  new time/op  delta
MapDelete/Int32/1-8   151ns ± 8%    99ns ± 3%  -34.39%  (p=0.008 n=5+5)
MapDelete/Int32/2-8   128ns ± 2%   111ns ±15%  -13.40%  (p=0.040 n=5+5)
MapDelete/Int32/4-8   128ns ± 5%   114ns ± 2%  -10.82%  (p=0.008 n=5+5)
MapDelete/Int64/1-8   144ns ± 0%   104ns ± 3%  -27.53%  (p=0.016 n=4+5)
MapDelete/Int64/2-8   153ns ± 1%   126ns ± 3%  -17.17%  (p=0.008 n=5+5)
MapDelete/Int64/4-8   178ns ± 3%   136ns ± 2%  -23.60%  (p=0.008 n=5+5)
MapDelete/Str/1-8     187ns ± 3%   171ns ± 3%   -8.54%  (p=0.008 n=5+5)
MapDelete/Str/2-8     221ns ± 3%   206ns ± 4%   -7.18%  (p=0.016 n=5+4)
MapDelete/Str/4-8     256ns ± 5%   232ns ± 2%   -9.36%  (p=0.016 n=4+5)

name                     old time/op    new time/op    delta
BinaryTree17-8              2.78s ± 7%     2.70s ± 1%    ~     (p=0.151 n=5+5)
Fannkuch11-8                3.21s ± 2%     3.19s ± 1%    ~     (p=0.310 n=5+5)
FmtFprintfEmpty-8          49.1ns ± 3%    50.2ns ± 2%    ~     (p=0.095 n=5+5)
FmtFprintfString-8         78.6ns ± 4%    80.2ns ± 5%    ~     (p=0.460 n=5+5)
FmtFprintfInt-8            79.7ns ± 1%    81.0ns ± 3%    ~     (p=0.103 n=5+5)
FmtFprintfIntInt-8          117ns ± 2%     119ns ± 0%    ~     (p=0.079 n=5+4)
FmtFprintfPrefixedInt-8     153ns ± 1%     146ns ± 3%  -4.19%  (p=0.024 n=5+5)
FmtFprintfFloat-8           239ns ± 1%     237ns ± 1%    ~     (p=0.246 n=5+5)
FmtManyArgs-8               506ns ± 2%     509ns ± 2%    ~     (p=0.238 n=5+5)
GobDecode-8                7.06ms ± 4%    6.86ms ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8                6.01ms ± 5%    5.87ms ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                      246ms ± 4%     236ms ± 1%  -4.12%  (p=0.008 n=5+5)
Gunzip-8                   37.7ms ± 4%    37.3ms ± 1%    ~     (p=0.841 n=5+5)
HTTPClientServer-8         64.9µs ± 1%    64.4µs ± 0%  -0.80%  (p=0.032 n=5+4)
JSONEncode-8               16.0ms ± 2%    16.2ms ±11%    ~     (p=0.548 n=5+5)
JSONDecode-8               53.2ms ± 2%    53.1ms ± 4%    ~     (p=1.000 n=5+5)
Mandelbrot200-8            4.33ms ± 2%    4.32ms ± 2%    ~     (p=0.841 n=5+5)
GoParse-8                  3.24ms ± 2%    3.27ms ± 4%    ~     (p=0.690 n=5+5)
RegexpMatchEasy0_32-8      86.2ns ± 1%    85.2ns ± 3%    ~     (p=0.286 n=5+5)
RegexpMatchEasy0_1K-8       198ns ± 2%     199ns ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_32-8      82.6ns ± 2%    81.8ns ± 1%    ~     (p=0.294 n=5+5)
RegexpMatchEasy1_1K-8       359ns ± 2%     354ns ± 1%  -1.39%  (p=0.048 n=5+5)
RegexpMatchMedium_32-8      123ns ± 2%     123ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchMedium_1K-8     38.2µs ± 2%    38.6µs ± 8%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8       1.92µs ± 2%    1.91µs ± 5%    ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-8       57.6µs ± 1%    57.0µs ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                   483ms ± 7%     441ms ± 1%  -8.79%  (p=0.016 n=5+4)
Template-8                 58.0ms ± 1%    58.2ms ± 7%    ~     (p=0.310 n=5+5)
TimeParse-8                 324ns ± 6%     312ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                330ns ± 1%     329ns ± 1%    ~     (p=0.968 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               109MB/s ± 4%   112MB/s ± 1%    ~     (p=0.222 n=5+5)
GobEncode-8               128MB/s ± 5%   131MB/s ± 2%    ~     (p=0.222 n=5+5)
Gzip-8                   78.9MB/s ± 4%  82.3MB/s ± 1%  +4.25%  (p=0.008 n=5+5)
Gunzip-8                  514MB/s ± 4%   521MB/s ± 1%    ~     (p=0.841 n=5+5)
JSONEncode-8              121MB/s ± 2%   120MB/s ±10%    ~     (p=0.548 n=5+5)
JSONDecode-8             36.5MB/s ± 2%  36.6MB/s ± 4%    ~     (p=1.000 n=5+5)
GoParse-8                17.9MB/s ± 2%  17.7MB/s ± 4%    ~     (p=0.730 n=5+5)
RegexpMatchEasy0_32-8     371MB/s ± 1%   375MB/s ± 3%    ~     (p=0.310 n=5+5)
RegexpMatchEasy0_1K-8    5.15GB/s ± 1%  5.13GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     387MB/s ± 2%   391MB/s ± 1%    ~     (p=0.310 n=5+5)
RegexpMatchEasy1_1K-8    2.85GB/s ± 2%  2.89GB/s ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchMedium_32-8   8.07MB/s ± 2%  8.06MB/s ± 1%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   26.8MB/s ± 2%  26.6MB/s ± 7%    ~     (p=0.690 n=5+5)
RegexpMatchHard_32-8     16.7MB/s ± 2%  16.7MB/s ± 5%    ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-8     17.8MB/s ± 1%  18.0MB/s ± 2%    ~     (p=0.310 n=5+5)
Revcomp-8                 527MB/s ± 6%   577MB/s ± 1%  +9.44%  (p=0.016 n=5+4)
Template-8               33.5MB/s ± 1%  33.4MB/s ± 7%    ~     (p=0.310 n=5+5)

Updates #19495

Change-Id: Ib9ece1690813d9b4788455db43d30891e2138df5
Reviewed-on: https://go-review.googlesource.com/38172
Reviewed-by: Hugues Bruant <hugues.bruant@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-21 06:07:24 +00:00
Hugues Bruant
ec091b6af2 runtime: add mapassign_fast*
Add benchmarks for map assignment with int32/int64/string key

Benchmark results on darwin/amd64

name                  old time/op  new time/op  delta
MapAssignInt32_255-8  24.7ns ± 3%  17.4ns ± 2%  -29.75%  (p=0.000 n=10+10)
MapAssignInt32_64k-8  45.5ns ± 4%  37.6ns ± 4%  -17.18%  (p=0.000 n=10+10)
MapAssignInt64_255-8  26.0ns ± 3%  17.9ns ± 4%  -31.03%  (p=0.000 n=10+10)
MapAssignInt64_64k-8  46.9ns ± 5%  38.7ns ± 2%  -17.53%  (p=0.000 n=9+10)
MapAssignStr_255-8    47.8ns ± 3%  24.8ns ± 4%  -48.01%  (p=0.000 n=10+10)
MapAssignStr_64k-8    83.0ns ± 3%  51.9ns ± 3%  -37.45%  (p=0.000 n=10+9)

name                     old time/op    new time/op    delta
BinaryTree17-8              3.11s ±19%     2.78s ± 3%    ~     (p=0.095 n=5+5)
Fannkuch11-8                3.26s ± 1%     3.21s ± 2%    ~     (p=0.056 n=5+5)
FmtFprintfEmpty-8          50.3ns ± 1%    50.8ns ± 2%    ~     (p=0.246 n=5+5)
FmtFprintfString-8         82.7ns ± 4%    80.1ns ± 5%    ~     (p=0.238 n=5+5)
FmtFprintfInt-8            82.6ns ± 2%    81.9ns ± 3%    ~     (p=0.508 n=5+5)
FmtFprintfIntInt-8          124ns ± 4%     121ns ± 3%    ~     (p=0.111 n=5+5)
FmtFprintfPrefixedInt-8     158ns ± 6%     160ns ± 2%    ~     (p=0.341 n=5+5)
FmtFprintfFloat-8           249ns ± 2%     245ns ± 2%    ~     (p=0.095 n=5+5)
FmtManyArgs-8               513ns ± 2%     519ns ± 3%    ~     (p=0.151 n=5+5)
GobDecode-8                7.48ms ±12%    7.11ms ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8                6.25ms ± 1%    6.03ms ± 2%  -3.56%  (p=0.008 n=5+5)
Gzip-8                      252ms ± 4%     252ms ± 4%    ~     (p=1.000 n=5+5)
Gunzip-8                   38.4ms ± 3%    38.6ms ± 2%    ~     (p=0.690 n=5+5)
HTTPClientServer-8         76.9µs ±41%    66.4µs ± 6%    ~     (p=0.310 n=5+5)
JSONEncode-8               16.5ms ± 3%    16.7ms ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8               54.6ms ± 1%    54.3ms ± 2%    ~     (p=0.548 n=5+5)
Mandelbrot200-8            4.45ms ± 3%    4.47ms ± 1%    ~     (p=0.841 n=5+5)
GoParse-8                  3.43ms ± 1%    3.32ms ± 2%  -3.28%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      88.2ns ± 3%    89.4ns ± 2%    ~     (p=0.333 n=5+5)
RegexpMatchEasy0_1K-8       205ns ± 1%     206ns ± 1%    ~     (p=0.905 n=5+5)
RegexpMatchEasy1_32-8      85.1ns ± 1%    85.5ns ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8       365ns ± 1%     371ns ± 9%    ~     (p=1.000 n=5+5)
RegexpMatchMedium_32-8      129ns ± 2%     128ns ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8     39.8µs ± 0%    39.7µs ± 4%    ~     (p=0.730 n=4+5)
RegexpMatchHard_32-8       1.99µs ± 3%    2.05µs ±16%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8       59.3µs ± 1%    60.3µs ± 7%    ~     (p=1.000 n=5+5)
Revcomp-8                   1.36s ±63%     0.52s ± 5%    ~     (p=0.095 n=5+5)
Template-8                 62.6ms ±14%    60.5ms ± 5%    ~     (p=0.690 n=5+5)
TimeParse-8                 330ns ± 2%     324ns ± 2%    ~     (p=0.087 n=5+5)
TimeFormat-8                350ns ± 3%     340ns ± 1%  -2.86%  (p=0.008 n=5+5)

name                     old speed      new speed      delta
GobDecode-8               103MB/s ±11%   108MB/s ± 2%    ~     (p=0.222 n=5+5)
GobEncode-8               123MB/s ± 1%   127MB/s ± 2%  +3.71%  (p=0.008 n=5+5)
Gzip-8                   77.1MB/s ± 4%  76.9MB/s ± 3%    ~     (p=1.000 n=5+5)
Gunzip-8                  505MB/s ± 3%   503MB/s ± 2%    ~     (p=0.690 n=5+5)
JSONEncode-8              118MB/s ± 3%   116MB/s ± 3%    ~     (p=0.421 n=5+5)
JSONDecode-8             35.5MB/s ± 1%  35.8MB/s ± 2%    ~     (p=0.397 n=5+5)
GoParse-8                16.9MB/s ± 1%  17.4MB/s ± 2%  +3.45%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8     363MB/s ± 3%   358MB/s ± 2%    ~     (p=0.421 n=5+5)
RegexpMatchEasy0_1K-8    4.98GB/s ± 1%  4.97GB/s ± 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy1_32-8     376MB/s ± 1%   375MB/s ± 5%    ~     (p=0.690 n=5+5)
RegexpMatchEasy1_1K-8    2.80GB/s ± 1%  2.76GB/s ± 9%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32-8   7.73MB/s ± 1%  7.76MB/s ± 3%    ~     (p=0.730 n=5+5)
RegexpMatchMedium_1K-8   25.8MB/s ± 0%  25.8MB/s ± 4%    ~     (p=0.651 n=4+5)
RegexpMatchHard_32-8     16.1MB/s ± 3%  15.7MB/s ±14%    ~     (p=0.794 n=5+5)
RegexpMatchHard_1K-8     17.3MB/s ± 1%  17.0MB/s ± 7%    ~     (p=0.984 n=5+5)
Revcomp-8                 273MB/s ±83%   488MB/s ± 5%    ~     (p=0.095 n=5+5)
Template-8               31.1MB/s ±13%  32.1MB/s ± 5%    ~     (p=0.690 n=5+5)

Updates #19495

Change-Id: I116e9a2a4594769318b22d736464de8a98499909
Reviewed-on: https://go-review.googlesource.com/38091
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-13 23:43:16 +00:00
Matthew Dempsky
c310c688ff cmd/compile, runtime: simplify multiway select implementation
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.

test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.

Updates #19331.

Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-07 20:14:17 +00:00
Josh Bleecher Snyder
504bc3ed24 cmd/compile, runtime: specialize convT2x, don't alloc for zero vals
Prior to this CL, all runtime conversions
from a concrete value to an interface went
through one of two runtime calls: convT2E or convT2I.
However, in practice, basic types are very common.
Specializing convT2x for those basic types allows
for a more efficient implementation for those types.
For basic scalars and strings, allocation and copying
can use the same methods as normal code.
For pointer-free types, allocation can occur without
zeroing, and copying can take place without GC calls.
For slices, copying is cheaper and simpler.

This CL adds twelve runtime routines:

convT2E16, convT2I16
convT2E32, convT2I32
convT2E64, convT2I64
convT2Estring, convT2Istring
convT2Eslice, convT2Islice
convT2Enoptr, convT2Inoptr

While compiling make.bash, 93% of all convT2x calls
are now to one of these specialized convT2x call.

Within specialized convT2x routines, it is cheap to check
for a zero value, in a way that it is not in general.
When we detect a zero value there, we return a pointer
to zeroVal, rather than allocating.

name                         old time/op  new time/op  delta
ConvT2Ezero/zero/16-8        17.9ns ± 2%   3.0ns ± 3%  -83.20%  (p=0.000 n=56+56)
ConvT2Ezero/zero/32-8        17.8ns ± 2%   3.0ns ± 3%  -83.15%  (p=0.000 n=59+60)
ConvT2Ezero/zero/64-8        20.1ns ± 1%   3.0ns ± 2%  -84.98%  (p=0.000 n=57+57)
ConvT2Ezero/zero/str-8       32.6ns ± 1%   3.0ns ± 4%  -90.70%  (p=0.000 n=59+60)
ConvT2Ezero/zero/slice-8     36.7ns ± 2%   3.0ns ± 2%  -91.78%  (p=0.000 n=59+59)
ConvT2Ezero/zero/big-8       91.9ns ± 2%  85.9ns ± 2%   -6.52%  (p=0.000 n=57+57)
ConvT2Ezero/nonzero/16-8     17.7ns ± 2%  12.7ns ± 3%  -28.38%  (p=0.000 n=55+60)
ConvT2Ezero/nonzero/32-8     17.8ns ± 1%  12.7ns ± 1%  -28.44%  (p=0.000 n=54+57)
ConvT2Ezero/nonzero/64-8     20.0ns ± 1%  15.0ns ± 1%  -24.90%  (p=0.000 n=56+58)
ConvT2Ezero/nonzero/str-8    32.6ns ± 1%  25.7ns ± 1%  -21.17%  (p=0.000 n=58+55)
ConvT2Ezero/nonzero/slice-8  36.8ns ± 2%  30.4ns ± 1%  -17.32%  (p=0.000 n=60+52)
ConvT2Ezero/nonzero/big-8    92.1ns ± 2%  85.9ns ± 2%   -6.70%  (p=0.000 n=57+59)

Benchmarks on a real program (the compiler):

name       old time/op      new time/op      delta
Template        227ms ± 5%       221ms ± 2%  -2.48%  (p=0.000 n=30+26)
Unicode         102ms ± 5%       100ms ± 3%  -1.30%  (p=0.009 n=30+26)
GoTypes         656ms ± 5%       659ms ± 4%    ~     (p=0.208 n=30+30)
Compiler        2.82s ± 2%       2.82s ± 1%    ~     (p=0.614 n=29+27)
Flate           128ms ± 2%       128ms ± 5%    ~     (p=0.783 n=27+28)
GoParser        158ms ± 3%       158ms ± 3%    ~     (p=0.261 n=28+30)
Reflect         408ms ± 7%       401ms ± 3%    ~     (p=0.075 n=30+30)
Tar             123ms ± 6%       121ms ± 8%    ~     (p=0.287 n=29+30)
XML             220ms ± 2%       220ms ± 4%    ~     (p=0.805 n=29+29)

name       old user-ns/op   new user-ns/op   delta
Template   281user-ms ± 4%  279user-ms ± 3%  -0.87%  (p=0.044 n=28+28)
Unicode    142user-ms ± 4%  141user-ms ± 3%  -1.04%  (p=0.015 n=30+27)
GoTypes    884user-ms ± 3%  886user-ms ± 2%    ~     (p=0.532 n=30+30)
Compiler   3.94user-s ± 3%  3.92user-s ± 1%    ~     (p=0.185 n=30+28)
Flate      165user-ms ± 2%  165user-ms ± 4%    ~     (p=0.780 n=27+29)
GoParser   209user-ms ± 2%  208user-ms ± 3%    ~     (p=0.453 n=28+30)
Reflect    533user-ms ± 6%  526user-ms ± 3%    ~     (p=0.057 n=30+30)
Tar        156user-ms ± 6%  154user-ms ± 6%    ~     (p=0.133 n=29+30)
XML        288user-ms ± 4%  288user-ms ± 4%    ~     (p=0.633 n=30+30)

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.9MB ± 0%  -0.11%  (p=0.000 n=29+29)
Unicode        32.6MB ± 0%      32.6MB ± 0%    ~     (p=0.572 n=29+30)
GoTypes         122MB ± 0%       122MB ± 0%  -0.10%  (p=0.000 n=30+30)
Compiler        482MB ± 0%       481MB ± 0%  -0.07%  (p=0.000 n=30+29)
Flate          26.6MB ± 0%      26.6MB ± 0%    ~     (p=0.096 n=30+30)
GoParser       32.7MB ± 0%      32.6MB ± 0%  -0.06%  (p=0.011 n=28+28)
Reflect        84.2MB ± 0%      84.1MB ± 0%  -0.17%  (p=0.000 n=29+30)
Tar            27.7MB ± 0%      27.7MB ± 0%  -0.05%  (p=0.032 n=27+28)
XML            44.7MB ± 0%      44.7MB ± 0%    ~     (p=0.131 n=28+30)

name       old allocs/op    new allocs/op    delta
Template         373k ± 1%        370k ± 1%  -0.76%  (p=0.000 n=30+30)
Unicode          325k ± 1%        325k ± 1%    ~     (p=0.383 n=29+30)
GoTypes         1.16M ± 0%       1.15M ± 0%  -0.75%  (p=0.000 n=29+30)
Compiler        4.15M ± 0%       4.13M ± 0%  -0.59%  (p=0.000 n=30+29)
Flate            238k ± 1%        237k ± 1%  -0.62%  (p=0.000 n=30+30)
GoParser         304k ± 1%        302k ± 1%  -0.64%  (p=0.000 n=30+28)
Reflect         1.00M ± 0%       0.99M ± 0%  -1.10%  (p=0.000 n=29+30)
Tar              245k ± 1%        244k ± 1%  -0.59%  (p=0.000 n=27+29)
XML              391k ± 1%        389k ± 1%  -0.59%  (p=0.000 n=29+30)

Change-Id: Id7f456d690567c2b0a96b0d6d64de8784b6e305f
Reviewed-on: https://go-review.googlesource.com/36476
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-02-28 19:23:33 +00:00
Cherry Zhang
f6fc0dd620 cmd/compile: update signature of runtime.memclr*
runtime.memclr* functions have signatures

func memclrNoHeapPointers(ptr unsafe.Pointer, n uintptr)
func memclrHasPointers(ptr unsafe.Pointer, n uintptr)

Update compiler's copy. Also teach gc/mkbuiltin.go to handle
unsafe.Pointer. The import statement and its support is not
really necessary, but just to make it look like real Go code.

Fixes #19185.

Change-Id: I251d02571fde2716d4727e31e04d56ec04b6f22a
Reviewed-on: https://go-review.googlesource.com/37257
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-02-28 19:22:29 +00:00
Ian Lance Taylor
db6e27c38d cmd/compile: update builtin writeBarrier to match runtime
The definition of writeBarrier in the runtime was changed in CL 22855
to include padding. Update the definition built in to the compiler to match.
This doesn't affect the generated code, as the compiler sets the type
to use anyhow, but having them be different seems clearly wrong.

Change-Id: I8eac05bf70a424a0b2338ba5e9e41af231316de0
Reviewed-on: https://go-review.googlesource.com/37377
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-22 01:32:31 +00:00
Keith Randall
5a75d6a08e cmd/compile: optimize non-empty-interface type conversions
When doing i.(T) for non-empty-interface i and concrete type T,
there's no need to read the type out of the itab. Just compare the
itab to the itab we expect for that interface/type pair.

Also optimize type switches by putting the type hash of the
concrete type in the itab. That way we don't need to load the
type pointer out of the itab.

Update #18492

Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce
Reviewed-on: https://go-review.googlesource.com/34810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2017-02-13 18:16:31 +00:00
Josh Bleecher Snyder
2c91bb4c8a cmd/compile: make panicwrap argument-free
When code defines a method on T,
the compiler generates a corresponding wrapper method on *T.
The first thing the wrapper does is check whether
the pointer is nil and if so, call panicwrap.
This is done to provide a useful error message.

The existing implementation gets its information
from arguments set up by the compiler.
However, with some trouble, this information can
be extracted from the name of the wrapper method itself.

Removing the arguments to panicwrap simplifies and
shrinks the wrapper method.
It also means that the call to panicwrap does not
require any stack space.
This enables a further optimization on amd64/x86,
which is to skip the function prologue if nothing
else in the method requires stack space.
This is frequently the case in simple, hot methods,
such as Less and Swap in sort.Interface implementations.

Fixes #19040.

Benchmarks for package sort on amd64:

name                  old time/op  new time/op  delta
SearchWrappers-8       104ns ± 1%   104ns ± 1%    ~     (p=0.286 n=27+27)
SortString1K-8         128µs ± 1%   128µs ± 1%  -0.44%  (p=0.004 n=30+30)
SortString1K_Slice-8   118µs ± 2%   117µs ± 1%    ~     (p=0.106 n=30+30)
StableString1K-8      18.6µs ± 1%  18.6µs ± 1%    ~     (p=0.446 n=28+26)
SortInt1K-8           65.9µs ± 1%  60.7µs ± 1%  -7.96%  (p=0.000 n=28+30)
StableInt1K-8         75.3µs ± 2%  72.8µs ± 1%  -3.41%  (p=0.000 n=30+30)
StableInt1K_Slice-8   57.7µs ± 1%  57.7µs ± 1%    ~     (p=0.515 n=30+30)
SortInt64K-8          6.28ms ± 1%  6.01ms ± 1%  -4.19%  (p=0.000 n=28+28)
SortInt64K_Slice-8    5.04ms ± 1%  5.04ms ± 1%    ~     (p=0.927 n=28+27)
StableInt64K-8        6.65ms ± 1%  6.38ms ± 1%  -3.97%  (p=0.000 n=26+30)
Sort1e2-8             37.9µs ± 1%  37.2µs ± 1%  -1.89%  (p=0.000 n=29+27)
Stable1e2-8           77.0µs ± 1%  74.7µs ± 1%  -3.06%  (p=0.000 n=27+30)
Sort1e4-8             8.21ms ± 2%  7.98ms ± 1%  -2.77%  (p=0.000 n=29+30)
Stable1e4-8           24.8ms ± 1%  24.3ms ± 1%  -2.31%  (p=0.000 n=28+30)
Sort1e6-8              1.27s ± 4%   1.22s ± 1%  -3.42%  (p=0.000 n=30+29)
Stable1e6-8            5.06s ± 1%   4.92s ± 1%  -2.77%  (p=0.000 n=25+29)
[Geo mean]             731µs        714µs       -2.29%

Before/after assembly for sort.(*intPairs).Less follows.
It can be optimized further, but that's for a follow-up CL.

Before:

"".(*intPairs).Less t=1 size=214 args=0x20 locals=0x38
	0x0000 00000 (<autogenerated>:1)	TEXT	"".(*intPairs).Less(SB), $56-32
	0x0000 00000 (<autogenerated>:1)	MOVQ	(TLS), CX
	0x0009 00009 (<autogenerated>:1)	CMPQ	SP, 16(CX)
	0x000d 00013 (<autogenerated>:1)	JLS	204
	0x0013 00019 (<autogenerated>:1)	SUBQ	$56, SP
	0x0017 00023 (<autogenerated>:1)	MOVQ	BP, 48(SP)
	0x001c 00028 (<autogenerated>:1)	LEAQ	48(SP), BP
	0x0021 00033 (<autogenerated>:1)	MOVQ	32(CX), BX
	0x0025 00037 (<autogenerated>:1)	TESTQ	BX, BX
	0x0028 00040 (<autogenerated>:1)	JEQ	55
	0x002a 00042 (<autogenerated>:1)	LEAQ	64(SP), DI
	0x002f 00047 (<autogenerated>:1)	CMPQ	(BX), DI
	0x0032 00050 (<autogenerated>:1)	JNE	55
	0x0034 00052 (<autogenerated>:1)	MOVQ	SP, (BX)
	0x0037 00055 (<autogenerated>:1)	NOP
	0x0037 00055 (<autogenerated>:1)	FUNCDATA	$0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
	0x0037 00055 (<autogenerated>:1)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x0037 00055 (<autogenerated>:1)	MOVQ	""..this+64(FP), AX
	0x003c 00060 (<autogenerated>:1)	TESTQ	AX, AX
	0x003f 00063 (<autogenerated>:1)	JEQ	$0, 135
	0x0041 00065 (<autogenerated>:1)	MOVQ	(AX), CX
	0x0044 00068 (<autogenerated>:1)	MOVQ	8(AX), AX
	0x0048 00072 (<autogenerated>:1)	MOVQ	"".i+72(FP), DX
	0x004d 00077 (<autogenerated>:1)	CMPQ	DX, AX
	0x0050 00080 (<autogenerated>:1)	JCC	$0, 128
	0x0052 00082 (<autogenerated>:1)	SHLQ	$4, DX
	0x0056 00086 (<autogenerated>:1)	MOVQ	(CX)(DX*1), DX
	0x005a 00090 (<autogenerated>:1)	MOVQ	"".j+80(FP), BX
	0x005f 00095 (<autogenerated>:1)	CMPQ	BX, AX
	0x0062 00098 (<autogenerated>:1)	JCC	$0, 128
	0x0064 00100 (<autogenerated>:1)	SHLQ	$4, BX
	0x0068 00104 (<autogenerated>:1)	MOVQ	(CX)(BX*1), AX
	0x006c 00108 (<autogenerated>:1)	CMPQ	DX, AX
	0x006f 00111 (<autogenerated>:1)	SETLT	AL
	0x0072 00114 (<autogenerated>:1)	MOVB	AL, "".~r2+88(FP)
	0x0076 00118 (<autogenerated>:1)	MOVQ	48(SP), BP
	0x007b 00123 (<autogenerated>:1)	ADDQ	$56, SP
	0x007f 00127 (<autogenerated>:1)	RET
	0x0080 00128 (<autogenerated>:1)	PCDATA	$0, $1
	0x0080 00128 (<autogenerated>:1)	CALL	runtime.panicindex(SB)
	0x0085 00133 (<autogenerated>:1)	UNDEF
	0x0087 00135 (<autogenerated>:1)	LEAQ	go.string."sort_test"(SB), AX
	0x008e 00142 (<autogenerated>:1)	MOVQ	AX, (SP)
	0x0092 00146 (<autogenerated>:1)	MOVQ	$9, 8(SP)
	0x009b 00155 (<autogenerated>:1)	LEAQ	go.string."intPairs"(SB), AX
	0x00a2 00162 (<autogenerated>:1)	MOVQ	AX, 16(SP)
	0x00a7 00167 (<autogenerated>:1)	MOVQ	$8, 24(SP)
	0x00b0 00176 (<autogenerated>:1)	LEAQ	go.string."Less"(SB), AX
	0x00b7 00183 (<autogenerated>:1)	MOVQ	AX, 32(SP)
	0x00bc 00188 (<autogenerated>:1)	MOVQ	$4, 40(SP)
	0x00c5 00197 (<autogenerated>:1)	PCDATA	$0, $1
	0x00c5 00197 (<autogenerated>:1)	CALL	runtime.panicwrap(SB)
	0x00ca 00202 (<autogenerated>:1)	UNDEF
	0x00cc 00204 (<autogenerated>:1)	NOP
	0x00cc 00204 (<autogenerated>:1)	PCDATA	$0, $-1
	0x00cc 00204 (<autogenerated>:1)	CALL	runtime.morestack_noctxt(SB)
	0x00d1 00209 (<autogenerated>:1)	JMP	0

After:

"".(*intPairs).Swap t=1 size=147 args=0x18 locals=0x8
	0x0000 00000 (<autogenerated>:1)	TEXT	"".(*intPairs).Swap(SB), $8-24
	0x0000 00000 (<autogenerated>:1)	MOVQ	(TLS), CX
	0x0009 00009 (<autogenerated>:1)	SUBQ	$8, SP
	0x000d 00013 (<autogenerated>:1)	MOVQ	BP, (SP)
	0x0011 00017 (<autogenerated>:1)	LEAQ	(SP), BP
	0x0015 00021 (<autogenerated>:1)	MOVQ	32(CX), BX
	0x0019 00025 (<autogenerated>:1)	TESTQ	BX, BX
	0x001c 00028 (<autogenerated>:1)	JEQ	43
	0x001e 00030 (<autogenerated>:1)	LEAQ	16(SP), DI
	0x0023 00035 (<autogenerated>:1)	CMPQ	(BX), DI
	0x0026 00038 (<autogenerated>:1)	JNE	43
	0x0028 00040 (<autogenerated>:1)	MOVQ	SP, (BX)
	0x002b 00043 (<autogenerated>:1)	NOP
	0x002b 00043 (<autogenerated>:1)	FUNCDATA	$0, gclocals·e6397a44f8e1b6e77d0f200b4fba5269(SB)
	0x002b 00043 (<autogenerated>:1)	FUNCDATA	$1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
	0x002b 00043 (<autogenerated>:1)	MOVQ	""..this+16(FP), AX
	0x0030 00048 (<autogenerated>:1)	TESTQ	AX, AX
	0x0033 00051 (<autogenerated>:1)	JEQ	$0, 140
	0x0035 00053 (<autogenerated>:1)	MOVQ	(AX), CX
	0x0038 00056 (<autogenerated>:1)	MOVQ	8(AX), AX
	0x003c 00060 (<autogenerated>:1)	MOVQ	"".i+24(FP), DX
	0x0041 00065 (<autogenerated>:1)	CMPQ	DX, AX
	0x0044 00068 (<autogenerated>:1)	JCC	$0, 133
	0x0046 00070 (<autogenerated>:1)	SHLQ	$4, DX
	0x004a 00074 (<autogenerated>:1)	MOVQ	8(CX)(DX*1), BX
	0x004f 00079 (<autogenerated>:1)	MOVQ	(CX)(DX*1), SI
	0x0053 00083 (<autogenerated>:1)	MOVQ	"".j+32(FP), DI
	0x0058 00088 (<autogenerated>:1)	CMPQ	DI, AX
	0x005b 00091 (<autogenerated>:1)	JCC	$0, 133
	0x005d 00093 (<autogenerated>:1)	SHLQ	$4, DI
	0x0061 00097 (<autogenerated>:1)	MOVQ	8(CX)(DI*1), AX
	0x0066 00102 (<autogenerated>:1)	MOVQ	(CX)(DI*1), R8
	0x006a 00106 (<autogenerated>:1)	MOVQ	R8, (CX)(DX*1)
	0x006e 00110 (<autogenerated>:1)	MOVQ	AX, 8(CX)(DX*1)
	0x0073 00115 (<autogenerated>:1)	MOVQ	SI, (CX)(DI*1)
	0x0077 00119 (<autogenerated>:1)	MOVQ	BX, 8(CX)(DI*1)
	0x007c 00124 (<autogenerated>:1)	MOVQ	(SP), BP
	0x0080 00128 (<autogenerated>:1)	ADDQ	$8, SP
	0x0084 00132 (<autogenerated>:1)	RET
	0x0085 00133 (<autogenerated>:1)	PCDATA	$0, $1
	0x0085 00133 (<autogenerated>:1)	CALL	runtime.panicindex(SB)
	0x008a 00138 (<autogenerated>:1)	UNDEF
	0x008c 00140 (<autogenerated>:1)	PCDATA	$0, $1
	0x008c 00140 (<autogenerated>:1)	CALL	runtime.panicwrap(SB)
	0x0091 00145 (<autogenerated>:1)	UNDEF

Change-Id: I15bb8435f0690badb868799f313ed8817335efd3
Reviewed-on: https://go-review.googlesource.com/36809
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-11 23:27:35 +00:00
David Chase
7f1ff65c39 cmd/compile: insert scheduling checks on loop backedges
Loop breaking with a counter.  Benchmarked (see comments),
eyeball checked for sanity on popular loops.  This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.

Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test.  Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.

If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop.  Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.

This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.

Updates #17831.
Updates #10958.

Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-09 21:01:29 +00:00
Keith Randall
688995d1e9 cmd/compile: do more type conversion inline
The code to do the conversion is smaller than the
call to the runtime.
The 1-result asserts need to call panic if they fail, but that
code is out of line.

The only conversions left in the runtime are those which
might allocate and those which might need to generate an itab.

Given the following types:
  type E interface{}
  type I interface { foo() }
  type I2 iterface { foo(); bar() }
  type Big [10]int
  func (b Big) foo() { ... }

This CL inlines the following conversions:

was assertE2T
  var e E = ...
  b := i.(Big)
was assertE2T2
  var e E = ...
  b, ok := i.(Big)
was assertI2T
  var i I = ...
  b := i.(Big)
was assertI2T2
  var i I = ...
  b, ok := i.(Big)
was assertI2E
  var i I = ...
  e := i.(E)
was assertI2E2
  var i I = ...
  e, ok := i.(E)

These are the remaining runtime calls:

convT2E:
  var b Big = ...
  var e E = b
convT2I:
  var b Big = ...
  var i I = b
convI2I:
  var i2 I2 = ...
  var i I = i2
assertE2I:
  var e E = ...
  i := e.(I)
assertE2I2:
  var e E = ...
  i, ok := e.(I)
assertI2I:
  var i I = ...
  i2 := i.(I2)
assertI2I2:
  var i I = ...
  i2, ok := i.(I2)

Fixes #17405
Fixes #8422

Change-Id: Ida2367bf8ce3cd2c6bb599a1814f1d275afabe21
Reviewed-on: https://go-review.googlesource.com/32313
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-11-02 21:33:03 +00:00
Austin Clements
87e48c5afd runtime, cmd/compile: rename memclr -> memclrNoHeapPointers
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.

This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.

Updates #17503.

Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2016-10-28 18:20:33 +00:00
Martin Möhrmann
b679665a18 cmd/compile: move stringtoslicebytetmp to the backend
- removes the runtime function stringtoslicebytetmp
- removes the generation of calls to stringtoslicebytetmp from the frontend
- adds handling of OSTRARRAYBYTETMP in the backend

This reduces binary sizes and avoids function call overhead.

Change-Id: Ib9988d48549cee663b685b4897a483f94727b940
Reviewed-on: https://go-review.googlesource.com/32158
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-28 07:58:47 +00:00
Keith Randall
c78d072c8e Revert "Revert "cmd/compile: inline convI2E""
This reverts commit 7dd9c385f6.

Reason for revert: Reverting the revert, which will re-enable the convI2E optimization.  We originally reverted the convI2E optimization because it was making the builder fail, but the underlying cause was later determined to be unrelated.

Original CL: https://go-review.googlesource.com/31260
Revert CL: https://go-review.googlesource.com/31310
Real bug: https://go-review.googlesource.com/c/25159
Real fix: https://go-review.googlesource.com/c/31316

Change-Id: I17237bb577a23a7675a5caab970ccda71a4124f2
Reviewed-on: https://go-review.googlesource.com/32023
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-25 23:51:34 +00:00
Matthew Dempsky
7dd9c385f6 Revert "cmd/compile: inline convI2E"
This reverts commit 395d36a67d.

Appears to be responsible for builder failures.

Change-Id: Ic6c6307f662767e529060b88704a9f074785d99e
Reviewed-on: https://go-review.googlesource.com/31310
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-10-17 20:37:19 +00:00
Keith Randall
395d36a67d cmd/compile: inline convI2E
It's pretty simple.  For:
  e = (interface{})(i)
Do:
  tmp = i.itab
  if tmp != nil {
    tmp = tmp.typ_ // load type from itab
  }
  e = eface{tmp, i.data}

It is smaller and faster than calling the runtime.

Change-Id: I0ad27f62f4ec0b6cd53bc8530e4da0eae3e67a6c
Reviewed-on: https://go-review.googlesource.com/31260
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-17 19:09:07 +00:00
Martin Möhrmann
d295174030 runtime: speed up non-ASCII rune decoding
Copies utf8 constants and EncodeRune implementation from unicode/utf8.

Adds a new decoderune implementation that is used by the compiler
in code generated for ranging over strings. It does not handle
ASCII runes since these are handled directly before calls to decoderune.

The DecodeRuneInString implementation from unicode/utf8 is not used
since it uses a lookup table that would increase the use of cpu caches.

Adds more tests that check decoding of valid and invalid utf8 sequences.

name                              old time/op  new time/op  delta
RuneIterate/range2/ASCII-4        7.45ns ± 2%  7.45ns ± 1%     ~     (p=0.634 n=16+16)
RuneIterate/range2/Japanese-4     53.5ns ± 1%  49.2ns ± 2%   -8.03%  (p=0.000 n=20+20)
RuneIterate/range2/MixedLength-4  46.3ns ± 1%  41.0ns ± 2%  -11.57%  (p=0.000 n=20+20)

new:
"".decoderune t=1 size=423 args=0x28 locals=0x0
old:
"".charntorune t=1 size=666 args=0x28 locals=0x0

Change-Id: I1df1fdb385bb9ea5e5e71b8818ea2bf5ce62de52
Reviewed-on: https://go-review.googlesource.com/28490
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-10-17 11:25:22 +00:00
Keith Randall
442de98c14 cmd/compile,runtime: redo how map assignments work
To compile:
  m[k] = v
instead of:
  mapassign(maptype, m, &k, &v), do
do:
  *mapassign(maptype, m, &k) = v

mapassign returns a pointer to the value slot in the map.  It is just
like mapaccess except that it will allocate a new slot if k is not
already present in the map.

This makes map accesses faster but potentially larger (codewise).

It is faster because the write into the map is done when the compiler
knows the concrete type, so it can be done with a few store
instructions instead of calling typedmemmove.  We also potentially
avoid stack temporaries to hold v.

The code can be larger when the map has pointers in its value type,
since there is a write barrier call in addition to the mapassign call.
That makes the code at the callsite a bit bigger (go binary is 0.3%
bigger).

This CL is in preparation for doing operations like m[k] += v with
only a single runtime call.  That will roughly double the speed of
such operations.

Update #17133
Update #5147

Change-Id: Ia435f032090a2ed905dac9234e693972fe8c2dc5
Reviewed-on: https://go-review.googlesource.com/30815
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-12 20:41:23 +00:00
Keith Randall
6129f37367 cmd/compile: inline convT2{I,E} when result doesn't escape
No point in calling a function when we can build the interface
using a known type (or itab) and the address of a local.

Get rid of third arg (preallocated stack space) to convT2{I,E}.

Makes go binary smaller by 0.2%

benchmark                   old ns/op     new ns/op     delta
BenchmarkEfaceInteger-8     16.7          10.1          -39.52%

Update #17118
Update #15375

Change-Id: I9724a1f802bfa1e3957bf1856b55558278e198a2
Reviewed-on: https://go-review.googlesource.com/29373
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-19 02:37:08 +00:00