Commit graph

168 commits

Author SHA1 Message Date
Russ Cox
9e0e43d84d [dev.regabi] cmd/compile: remove uses of dummy
Per https://developers.google.com/style/inclusive-documentation,
since we are editing some of this code anyway and it is easier
to put the cleanup in a separate CL.

Change-Id: Ib6b851f43f9cc0a57676564477d4ff22abb1cee5
Reviewed-on: https://go-review.googlesource.com/c/go/+/273106
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 04:35:29 +00:00
eric fang
b2a8317b31 cmd/compile: use desired info when allocating registers for live values
When allocting registers for live values, use desired register if available,
this is helpful for some cases, such as (*entry).delete, which can save a
few of copies.
Besides, this patch allows more debugging information to be printed out.

Test results of compilecmp on Linux/amd64:
name                      old time/op                 new time/op                 delta
Template                    326729362.060000ns +- 3%    329227238.775510ns +- 4%  +0.76%  (p=0.038 n=50+49)
Unicode                     157671860.391304ns +- 6%    156917927.320000ns +- 6%    ~     (p=0.291 n=46+50)
GoTypes                    1065591138.304348ns +- 2%   1063695977.434783ns +- 1%    ~     (p=0.208 n=46+46)
Compiler                   5053424790.760001ns +- 2%   5052729636.551020ns +- 3%    ~     (p=0.908 n=50+49)
SSA                       12392067635.866669ns +- 2%  12319786960.460005ns +- 2%  -0.58%  (p=0.008 n=45+50)
Flate                       212609767.340000ns +- 5%    213011228.085106ns +- 5%    ~     (p=0.685 n=50+47)
GoParser                    266870495.100000ns +- 4%    266962314.280000ns +- 3%    ~     (p=0.975 n=50+50)
Reflect                     660164306.551021ns +- 2%    658284470.729167ns +- 2%    ~     (p=0.069 n=49+48)
Tar                         292805895.720000ns +- 4%    292103626.954545ns +- 2%    ~     (p=0.321 n=50+44)
XML                         386294811.700000ns +- 4%    386665088.820000ns +- 4%    ~     (p=0.786 n=50+50)
LinkCompiler                548495788.659575ns +- 5%    549359489.102041ns +- 4%    ~     (p=0.855 n=47+49)
ExternalLinkCompiler       1810414270.280000ns +- 2%   1806872224.673470ns +- 2%    ~     (p=0.313 n=50+49)
LinkWithoutDebugCompiler    340888843.795918ns +- 5%    340341541.100000ns +- 6%    ~     (p=0.735 n=49+50)
[Geo mean]                   664550174.613777ns          664090221.153575ns       -0.07%

name                      old user-time/op            new user-time/op            delta
Template                    565202800.000000ns +-16%    595351040.000000ns +-16%  +5.33%  (p=0.001 n=50+50)
Unicode                     378444740.000000ns +-14%    373825183.673469ns +-17%    ~     (p=0.458 n=50+49)
GoTypes                    2052073341.463415ns +-12%   2059679864.864865ns +- 7%    ~     (p=0.381 n=41+37)
Compiler                   9913371980.000000ns +-20%   9848836720.000002ns +-19%    ~     (p=0.781 n=50+50)
SSA                       25013846224.489799ns +-17%  24571896183.673466ns +-17%    ~     (p=0.132 n=49+49)
Flate                       314422702.127660ns +-17%    314831666.666667ns +-11%    ~     (p=0.427 n=47+45)
GoParser                    419496060.000000ns +- 9%    417403460.000000ns +-11%    ~     (p=0.512 n=50+50)
Reflect                    1233632469.387755ns +-17%   1193061073.170732ns +-13%  -3.29%  (p=0.030 n=49+41)
Tar                         509855937.500000ns +-10%    508700740.000000ns +-14%    ~     (p=0.890 n=48+50)
XML                         703511425.531915ns +-12%    694007591.836735ns +-11%    ~     (p=0.164 n=47+49)
LinkCompiler                993137687.500000ns +- 6%    991914714.285714ns +- 8%    ~     (p=0.860 n=48+49)
ExternalLinkCompiler       2193851840.000001ns +- 3%   2186672183.673470ns +- 5%    ~     (p=0.320 n=50+49)
LinkWithoutDebugCompiler    420800875.000000ns +-10%    422062640.000000ns +- 9%    ~     (p=0.840 n=48+50)
[Geo mean]                  1145156131.480097ns         1142033233.550961ns       -0.27%

name                      old alloc/op                new alloc/op                delta
Template                                36.3MB +- 0%                36.3MB +- 0%    ~     (p=0.886 n=50+49)
Unicode                                 30.1MB +- 0%                30.1MB +- 0%    ~     (p=0.792 n=50+50)
GoTypes                                  118MB +- 0%                 118MB +- 0%    ~     (p=1.000 n=47+48)
Compiler                                 562MB +- 0%                 562MB +- 0%    ~     (p=0.205 n=50+49)
SSA                                     1.42GB +- 0%                1.42GB +- 0%  -0.12%  (p=0.000 n=50+50)
Flate                                   22.8MB +- 0%                22.8MB +- 0%    ~     (p=0.384 n=50+47)
GoParser                                28.0MB +- 0%                28.0MB +- 0%  -0.02%  (p=0.013 n=50+50)
Reflect                                 78.0MB +- 0%                78.0MB +- 0%    ~     (p=0.384 n=46+48)
Tar                                     34.1MB +- 0%                34.1MB +- 0%    ~     (p=0.072 n=50+50)
XML                                     43.1MB +- 0%                43.1MB +- 0%  -0.04%  (p=0.000 n=49+50)
LinkCompiler                            98.5MB +- 0%                98.5MB +- 0%  +0.01%  (p=0.012 n=50+43)
ExternalLinkCompiler                    89.6MB +- 0%                89.6MB +- 0%    ~     (p=0.762 n=50+50)
LinkWithoutDebugCompiler                56.9MB +- 0%                56.9MB +- 0%    ~     (p=0.268 n=49+48)
[Geo mean]                               77.7MB                      77.7MB       -0.01%

name                      old allocs/op               new allocs/op               delta
Template                                  367k +- 0%                  367k +- 0%  -0.01%  (p=0.002 n=50+49)
Unicode                                   345k +- 0%                  345k +- 0%    ~     (p=0.981 n=50+50)
GoTypes                                  1.28M +- 0%                 1.28M +- 0%  -0.00%  (p=0.002 n=49+50)
Compiler                                 5.39M +- 0%                 5.39M +- 0%  -0.00%  (p=0.000 n=50+50)
SSA                                      13.9M +- 0%                 13.9M +- 0%  +0.01%  (p=0.000 n=50+50)
Flate                                     230k +- 0%                  230k +- 0%    ~     (p=0.815 n=50+50)
GoParser                                  292k +- 0%                  292k +- 0%  -0.01%  (p=0.000 n=50+50)
Reflect                                   977k +- 0%                  977k +- 0%  -0.00%  (p=0.035 n=50+50)
Tar                                       343k +- 0%                  343k +- 0%  -0.01%  (p=0.008 n=48+50)
XML                                       418k +- 0%                  418k +- 0%  -0.01%  (p=0.000 n=50+50)
LinkCompiler                              516k +- 0%                  516k +- 0%  +0.01%  (p=0.002 n=50+48)
ExternalLinkCompiler                      570k +- 0%                  570k +- 0%    ~     (p=0.430 n=46+50)
LinkWithoutDebugCompiler                  169k +- 0%                  169k +- 0%    ~     (p=0.706 n=49+49)
[Geo mean]                                 672k                        672k       -0.00%

name                      old maxRSS/op               new maxRSS/op               delta
Template                                 34.3M +- 5%                 34.7M +- 4%  +1.24%  (p=0.004 n=50+50)
Unicode                                  36.2M +- 5%                 36.1M +- 8%    ~     (p=0.785 n=50+50)
GoTypes                                  75.7M +- 7%                 76.1M +- 6%    ~     (p=0.544 n=50+50)
Compiler                                  304M +- 7%                  304M +- 7%    ~     (p=0.744 n=50+50)
SSA                                       721M +- 6%                  723M +- 7%    ~     (p=0.724 n=49+50)
Flate                                    26.1M +- 3%                 26.1M +- 5%    ~     (p=0.649 n=48+49)
GoParser                                 29.3M +- 5%                 29.3M +- 4%    ~     (p=0.809 n=50+50)
Reflect                                  56.0M +- 6%                 56.3M +- 5%    ~     (p=0.350 n=50+50)
Tar                                      34.1M +- 3%                 33.9M +- 5%    ~     (p=0.121 n=49+50)
XML                                      39.6M +- 5%                 39.9M +- 4%    ~     (p=0.109 n=50+50)
LinkCompiler                              168M +- 1%                  168M +- 1%    ~     (p=0.578 n=49+48)
ExternalLinkCompiler                      179M +- 1%                  179M +- 2%    ~     (p=0.522 n=46+46)
LinkWithoutDebugCompiler                  137M +- 3%                  137M +- 3%    ~     (p=0.463 n=41+50)
[Geo mean]                                79.3M                       79.5M       +0.20%

name                      old text-bytes              new text-bytes              delta
HelloSize                                812kB +- 0%                 811kB +- 0%  -0.05%  (p=0.000 n=50+50)

name                      old data-bytes              new data-bytes              delta
HelloSize                               13.3kB +- 0%                13.3kB +- 0%    ~     (all equal)

name                      old bss-bytes               new bss-bytes               delta
HelloSize                                206kB +- 0%                 206kB +- 0%    ~     (all equal)

name                      old exe-bytes               new exe-bytes               delta
HelloSize                               1.21MB +- 0%                1.21MB +- 0%  +0.02%  (p=0.000 n=50+50)

file      before    after     Δ       %
addr2line 4052949   4052453   -496    -0.012%
api       4948171   4947163   -1008   -0.020%
asm       4888889   4888049   -840    -0.017%
buildid   2617545   2617673   +128    +0.005%
cgo       4521681   4516801   -4880   -0.108%
compile   19139091  19137683  -1408   -0.007%
cover     4843191   4840359   -2832   -0.058%
dist      3473677   3474717   +1040   +0.030%
doc       3821592   3821552   -40     -0.001%
fix       3220587   3220059   -528    -0.016%
link      6587368   6582696   -4672   -0.071%
nm        3999858   3999186   -672    -0.017%
objdump   4409161   4408217   -944    -0.021%
pack      2394038   2393846   -192    -0.008%
pprof     13601271  13602487  +1216   +0.009%
test2json 2645148   2644604   -544    -0.021%
trace     10357878  10356862  -1016   -0.010%
vet       6779482   6778706   -776    -0.011%
total     106301577 106283113 -18464  -0.017%

Change-Id: I63ac6e224e1a4756ddc1bfc4aabbaeb92d7d4273
Reviewed-on: https://go-review.googlesource.com/c/go/+/263599
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-21 19:08:53 +00:00
erifan01
1c0d1f4e97 cmd/compile: optimize regalloc for phi value
When allocating registers for phi value, only the primary predecessor is considered.
Taking into account the allocation status of other predecessors can help reduce
unnecessary copy or spill operations. Many such cases can be found in the standard
library, such as runtime.wirep, moveByType, etc. The test results from benchstat
also show that this change helps reduce the file size.

name                      old time/op       new time/op       delta
Template                        328ms ± 5%        326ms ± 4%    ~     (p=0.254 n=50+47)
Unicode                         156ms ± 7%        158ms ±10%    ~     (p=0.412 n=49+49)
GoTypes                         1.07s ± 3%        1.07s ± 2%    ~     (p=0.664 n=48+49)
Compiler                        4.43s ± 3%        4.44s ± 3%    ~     (p=0.758 n=48+50)
SSA                             10.3s ± 2%        10.4s ± 2%  +0.43%  (p=0.017 n=50+46)
Flate                           208ms ± 9%        209ms ± 7%    ~     (p=0.920 n=49+46)
GoParser                        260ms ± 5%        262ms ± 4%    ~     (p=0.063 n=50+48)
Reflect                         687ms ± 3%        685ms ± 2%    ~     (p=0.459 n=50+48)
Tar                             293ms ± 4%        293ms ± 5%    ~     (p=0.695 n=49+48)
XML                             391ms ± 4%        389ms ± 3%    ~     (p=0.109 n=49+46)
LinkCompiler                    570ms ± 5%        563ms ± 5%  -1.10%  (p=0.006 n=46+47)
ExternalLinkCompiler            1.57s ± 3%        1.56s ± 3%    ~     (p=0.118 n=47+46)
LinkWithoutDebugCompiler        349ms ± 6%        349ms ± 5%    ~     (p=0.726 n=49+47)
[Geo mean]                      645ms             645ms       -0.05%

name                      old user-time/op  new user-time/op  delta
Template                        507ms ±14%        513ms ±14%    ~     (p=0.398 n=48+49)
Unicode                         345ms ±29%        345ms ±38%    ~     (p=0.521 n=47+49)
GoTypes                         1.95s ±16%        1.94s ±19%    ~     (p=0.324 n=50+50)
Compiler                        8.26s ±16%        8.22s ±14%    ~     (p=0.834 n=50+50)
SSA                             19.6s ± 8%        19.2s ±15%    ~     (p=0.056 n=50+50)
Flate                           293ms ± 9%        299ms ±12%    ~     (p=0.057 n=47+50)
GoParser                        388ms ± 9%        387ms ±14%    ~     (p=0.660 n=46+50)
Reflect                         1.15s ±28%        1.12s ±18%    ~     (p=0.648 n=49+48)
Tar                             456ms ±10%        476ms ±15%  +4.48%  (p=0.001 n=46+48)
XML                             648ms ±27%        634ms ±16%    ~     (p=0.685 n=50+46)
LinkCompiler                    1.00s ± 8%        1.00s ± 8%    ~     (p=0.638 n=50+50)
ExternalLinkCompiler            1.96s ± 5%        1.96s ± 5%    ~     (p=0.792 n=50+50)
LinkWithoutDebugCompiler        443ms ±10%        442ms ±11%    ~     (p=0.813 n=50+50)
[Geo mean]                      1.05s             1.05s       -0.09%

name                      old alloc/op      new alloc/op      delta
Template                       36.0MB ± 0%       36.0MB ± 0%    ~     (p=0.599 n=49+50)
Unicode                        29.8MB ± 0%       29.8MB ± 0%    ~     (p=0.739 n=50+50)
GoTypes                         118MB ± 0%        118MB ± 0%    ~     (p=0.436 n=50+50)
Compiler                        562MB ± 0%        562MB ± 0%    ~     (p=0.693 n=50+50)
SSA                            1.42GB ± 0%       1.42GB ± 0%  -0.10%  (p=0.000 n=50+49)
Flate                          22.5MB ± 0%       22.5MB ± 0%    ~     (p=0.429 n=48+49)
GoParser                       27.7MB ± 0%       27.7MB ± 0%    ~     (p=0.705 n=49+48)
Reflect                        77.7MB ± 0%       77.7MB ± 0%  -0.01%  (p=0.043 n=50+50)
Tar                            33.8MB ± 0%       33.8MB ± 0%    ~     (p=0.241 n=49+50)
XML                            42.8MB ± 0%       42.8MB ± 0%    ~     (p=0.677 n=47+49)
LinkCompiler                   98.3MB ± 0%       98.3MB ± 0%    ~     (p=0.157 n=50+50)
ExternalLinkCompiler           89.4MB ± 0%       89.4MB ± 0%    ~     (p=0.683 n=50+50)
LinkWithoutDebugCompiler       56.7MB ± 0%       56.7MB ± 0%    ~     (p=0.155 n=49+49)
[Geo mean]                     77.3MB            77.3MB       -0.01%

name                      old allocs/op     new allocs/op     delta
Template                         367k ± 0%         367k ± 0%    ~     (p=0.863 n=50+50)
Unicode                          345k ± 0%         345k ± 0%    ~     (p=0.744 n=49+49)
GoTypes                         1.28M ± 0%        1.28M ± 0%    ~     (p=0.957 n=48+50)
Compiler                        5.39M ± 0%        5.39M ± 0%  +0.00%  (p=0.012 n=50+49)
SSA                             13.9M ± 0%        13.9M ± 0%  +0.02%  (p=0.000 n=47+49)
Flate                            230k ± 0%         230k ± 0%  -0.01%  (p=0.007 n=47+49)
GoParser                         292k ± 0%         292k ± 0%    ~     (p=0.891 n=50+49)
Reflect                          977k ± 0%         977k ± 0%    ~     (p=0.274 n=50+50)
Tar                              343k ± 0%         343k ± 0%    ~     (p=0.942 n=50+50)
XML                              418k ± 0%         418k ± 0%    ~     (p=0.374 n=50+49)
LinkCompiler                     516k ± 0%         516k ± 0%    ~     (p=0.205 n=49+47)
ExternalLinkCompiler             570k ± 0%         570k ± 0%    ~     (p=0.783 n=49+47)
LinkWithoutDebugCompiler         169k ± 0%         169k ± 0%    ~     (p=0.233 n=50+46)
[Geo mean]                       672k              672k       +0.00%

name                      old maxRSS/op     new maxRSS/op     delta
Template                        34.5M ± 3%        34.4M ± 3%    ~     (p=0.566 n=49+48)
Unicode                         36.0M ± 6%        35.9M ± 6%    ~     (p=0.736 n=50+50)
GoTypes                         75.7M ± 7%        75.4M ± 5%    ~     (p=0.412 n=50+50)
Compiler                         314M ±10%         313M ± 8%    ~     (p=0.708 n=50+50)
SSA                              730M ± 6%         735M ± 6%    ~     (p=0.324 n=50+50)
Flate                           25.8M ± 5%        25.6M ± 6%    ~     (p=0.415 n=49+50)
GoParser                        28.5M ± 3%        28.5M ± 4%    ~     (p=0.977 n=46+50)
Reflect                         57.4M ± 4%        57.2M ± 3%    ~     (p=0.173 n=50+50)
Tar                             33.3M ± 3%        33.2M ± 4%    ~     (p=0.621 n=48+50)
XML                             39.6M ± 5%        39.6M ± 4%    ~     (p=0.997 n=50+50)
LinkCompiler                     168M ± 2%         167M ± 1%    ~     (p=0.072 n=49+45)
ExternalLinkCompiler             179M ± 1%         179M ± 1%    ~     (p=0.147 n=48+50)
LinkWithoutDebugCompiler         136M ± 1%         136M ± 1%    ~     (p=0.789 n=47+49)
[Geo mean]                      79.2M             79.1M       -0.12%

name                      old text-bytes    new text-bytes    delta
HelloSize                       812kB ± 0%        811kB ± 0%  -0.06%  (p=0.000 n=50+50)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.3kB ± 0%       13.3kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       206kB ± 0%        206kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.21MB ± 0%       1.21MB ± 0%  -0.03%  (p=0.000 n=50+50)

file      before    after     Δ       %
addr2line 4057421   4056237   -1184   -0.029%
api       4952451   4946715   -5736   -0.116%
asm       4888993   4888185   -808    -0.017%
buildid   2617705   2616441   -1264   -0.048%
cgo       4521849   4520681   -1168   -0.026%
compile   19143451  19141243  -2208   -0.012%
cover     4847391   4837151   -10240  -0.211%
dist      3473877   3472565   -1312   -0.038%
doc       3821496   3820432   -1064   -0.028%
fix       3220587   3220659   +72     +0.002%
link      6587504   6582576   -4928   -0.075%
nm        4000154   3998690   -1464   -0.037%
objdump   4409449   4407625   -1824   -0.041%
pack      2398086   2393110   -4976   -0.207%
pprof     13599060  13606111  +7051   +0.052%
test2json 2645148   2645692   +544    +0.021%
trace     10355281  10355862  +581    +0.006%
vet       6780026   6779666   -360    -0.005%
total     106319929 106289641 -30288  -0.028%

Change-Id: Ia5399286958c187c8664c769bbddf7bc4c1cae99
Reviewed-on: https://go-review.googlesource.com/c/go/+/263600
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-21 19:08:11 +00:00
Keith Randall
fe2cfb74ba all: drop 387 support
My last 387 CL. So sad ... ... ... ... not!

Fixes #40255

Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/258957
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-02 00:00:51 +00:00
Keith Randall
5c2c6d3fbf runtime: framepointers are no longer an experiment - hard code them
I think they are no longer experimental status. Might as well promote
them to permanent.

Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-08-27 21:15:47 +00:00
Michael Munday
48403b268b cmd/compile: error if register is reused when setting edge state
When setting the edge state in register allocation we should only
be setting each register once. It is not possible for a register
to hold multiple values at once.

This CL converts the runtime error seen in #38195 into an internal
compiler error (ICE). It is better for the compiler to fail than
generate an incorrect program.

The bug reported in #38195 is now exposed as:

./parserc.go:459:11: internal compiler error: 'yaml_parser_parse_node': R5 is already set (v1074/v1241)

[stack trace]

Updates #38195.

Change-Id: Id95842fd850b95494cbd472b6fd5a55513ecacec
Reviewed-on: https://go-review.googlesource.com/c/go/+/228060
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 19:04:38 +00:00
Michael Munday
382fe3e249 cmd/compile: fix deallocation of live value copies in regalloc
When deallocating the input register to a phi so that the phi
itself could be allocated to that register the code was also
deallocating all copies of that phi input value. Those copies
of the value could still be live and if they were the register
allocator could reuse them incorrectly to hold speculative
copies of other phi inputs. This causes strange bugs.

No test because this is a very obscure scenario that is hard
to replicate but CL 228060 adds an assertion to the compiler
that does trigger when running the std tests on linux/s390x
without this CL applied. Hopefully that assertion will prevent
future regressions.

Fixes #38195.

Change-Id: Id975dadedd731c7bb21933b9ea6b17daaa5c9e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228061
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-14 19:04:32 +00:00
Dan Scales
1b3a1db19f cmd/compile: fix liveness for open-coded defer args for infinite loops
Once defined, a stack slot holding an open-coded defer arg should always be marked
live, since it may be used at any time if there is a panic. These stack slots are
typically kept live naturally by the open-defer code inlined at each return/exit point.
However, we need to do extra work to make sure that they are kept live if a
function has an infinite loop or a panic exit.

For this fix, only in the case of a function that is using open-coded defers, we
compute the set of blocks (most often empty) that cannot reach a return or a
BlockExit (panic) because of an infinite loop. Then, for each block b which
cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg
slot as live, as long as the definition of the defer arg slot dominates block b.

For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq
(-> IsAncestorEq)

Updates #35277

Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204802
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-05 17:19:16 +00:00
Brad Fitzpatrick
07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Keith Randall
72dc9ab191 cmd/compile: reuse dead register before reusing register holding constant
For commuting ops, check whether the second argument is dead before
checking if the first argument is rematerializeable. Reusing the register
holding a dead value is always best.

Fixes #33580

Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657
Reviewed-on: https://go-review.googlesource.com/c/go/+/199559
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-10-07 15:16:26 +00:00
Michael Munday
9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Richard Musiol
1c50fcf853 cmd/compile: add 32 bit float registers/variables on wasm
Before this change, wasm only used float variables with a size of 64 bit
and applied rounding to 32 bit precision where necessary. This change
adds proper 32 bit float variables.

Reduces the size of pkg/js_wasm by 254 bytes.

Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345
Reviewed-on: https://go-review.googlesource.com/c/go/+/195117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-19 20:26:22 +00:00
Ainar Garipov
0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
Keith Randall
8a317ebc0f cmd/compile: don't eliminate all registers when restricting to desired ones
We shouldn't mask to desired registers if we haven't masked out all the
forbidden registers yet.  In this path we haven't masked out the nospill
registers yet. If the resulting mask contains only nospill registers, then
allocReg fails.

This can only happen on resultNotInArgs-marked instructions, which exist
only on the ARM64, MIPS, MIPS64, and PPC64 ports.

Maybe there's a better way to handle resultNotInArgs instructions.
But for 1.13, this is a low-risk fix.

Fixes #33355

Change-Id: I1082f78f798d1371bde65c58cc265540480e4fa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/188178
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-08-01 02:15:18 +00:00
Josh Bleecher Snyder
4ae31dc8c5 cmd/compile: re-use regalloc's []valState
Updates #27739: reduces package ssa's allocated space by 3.77%.

maxrss is harder to measure, but using best-of-three-runs
as reported by /usr/bin/time -l, I see ~2% reduction in maxrss.

We still have a long way to go, though; the new maxrss is still 1.1gb.

name        old alloc/op      new alloc/op      delta
Template         38.8MB ± 0%       37.7MB ± 0%  -2.77%  (p=0.008 n=5+5)
Unicode          28.2MB ± 0%       28.1MB ± 0%  -0.20%  (p=0.008 n=5+5)
GoTypes           131MB ± 0%        127MB ± 0%  -2.94%  (p=0.008 n=5+5)
Compiler          606MB ± 0%        587MB ± 0%  -3.21%  (p=0.008 n=5+5)
SSA              2.14GB ± 0%       2.06GB ± 0%  -3.77%  (p=0.008 n=5+5)
Flate            24.0MB ± 0%       23.3MB ± 0%  -3.00%  (p=0.008 n=5+5)
GoParser         28.8MB ± 0%       28.1MB ± 0%  -2.61%  (p=0.008 n=5+5)
Reflect          83.8MB ± 0%       81.5MB ± 0%  -2.71%  (p=0.008 n=5+5)
Tar              36.4MB ± 0%       35.4MB ± 0%  -2.73%  (p=0.008 n=5+5)
XML              47.9MB ± 0%       46.7MB ± 0%  -2.49%  (p=0.008 n=5+5)
[Geo mean]       84.6MB            82.4MB       -2.65%

name        old allocs/op     new allocs/op     delta
Template           379k ± 0%         379k ± 0%  -0.05%  (p=0.008 n=5+5)
Unicode            340k ± 0%         340k ± 0%    ~     (p=0.151 n=5+5)
GoTypes           1.36M ± 0%        1.36M ± 0%  -0.06%  (p=0.008 n=5+5)
Compiler          5.49M ± 0%        5.48M ± 0%  -0.03%  (p=0.008 n=5+5)
SSA               17.5M ± 0%        17.5M ± 0%  -0.03%  (p=0.008 n=5+5)
Flate              235k ± 0%         235k ± 0%  -0.04%  (p=0.008 n=5+5)
GoParser           302k ± 0%         302k ± 0%  -0.04%  (p=0.008 n=5+5)
Reflect            976k ± 0%         975k ± 0%  -0.10%  (p=0.008 n=5+5)
Tar                352k ± 0%         352k ± 0%  -0.06%  (p=0.008 n=5+5)
XML                436k ± 0%         436k ± 0%  -0.03%  (p=0.008 n=5+5)
[Geo mean]         842k              841k       -0.04%

Change-Id: I0ab6631b5a0bb6303c291dcb0367b586a4e584fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/176221
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-10 00:14:40 +00:00
Cherry Zhang
40df9cc606 cmd/compile: make KeepAlive work on stack object
Currently, runtime.KeepAlive applied on a stack object doesn't
actually keeps the stack object alive, and the heap object
referenced from it could be collected. This is because the
address of the stack object is rematerializeable, and we just
ignored KeepAlive on rematerializeable values. This CL fixes it.

Fixes #30476.

Change-Id: Ic1f75ee54ed94ea79bd46a8ddcd9e81d01556d1d
Reviewed-on: https://go-review.googlesource.com/c/164537
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-01 15:36:52 +00:00
Keith Randall
f493e55723 cmd/compile: document regalloc fields
Document what the fields of regalloc mean.
Hopefully will help people understand how the register allocator works.

Change-Id: Ic322ed2019cc839b812740afe8cd2cf0b61da046
Reviewed-on: https://go-review.googlesource.com/137016
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-09-24 19:55:51 +00:00
Iskander Sharipov
bce1f12225 cmd/compile/internal/ssa: use math/bits in countRegs and pickReg
Makes code simpler and faster (at least on x86).

	name               old time/op  new time/op  delta
	CountRegs-8        7.40ns ± 1%  0.59ns ± 0%  -92.02%  (p=0.000 n=9+9)
	PickReg/(1<<0)-8   2.07ns ± 0%  0.37ns ± 0%  -82.13%  (p=0.000 n=9+10)
	PickReg/(1<<16)-8  11.8ns ± 0%   0.4ns ± 0%  -96.86%  (p=0.002 n=8+10)

Change-Id: Ic780b615b75c25b6e7632a0de93b16a8e9ed0f8f
Reviewed-on: https://go-review.googlesource.com/120318
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-21 03:14:39 +00:00
David Chase
31e1c30f55 cmd/compile: do not allow regalloc to LoadReg G register
On architectures where G is stored in a register, it is
possible for a variable to allocated to it, and subsequently
that variable may be spilled and reloaded, for example
because of an intervening call.  If such an allocation
reaches a join point and it is the primary predecessor,
it becomes the target of a reload, which is only usually
right.

Fix: guard all the LoadReg ops, and spill value in the G
register (if any) before merges (in the same way that 387
FP registers are freed between blocks).

Includes test.

Fixes #25504.

Change-Id: I0482a53e20970c7315bf09c0e407ae5bba2fe05d
Reviewed-on: https://go-review.googlesource.com/114695
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-30 16:39:21 +00:00
Richard Musiol
482d241936 cmd/compile: add wasm stack optimization
Go's SSA instructions only operate on registers. For example, an add
instruction would read two registers, do the addition and then write
to a register. WebAssembly's instructions, on the other hand, operate
on the stack. The add instruction first pops two values from the stack,
does the addition, then pushes the result to the stack. To fulfill
Go's semantics, one needs to map Go's single add instruction to
4 WebAssembly instructions:
- Push the value of local variable A to the stack
- Push the value of local variable B to the stack
- Do addition
- Write value from stack to local variable C

Now consider that B was set to the constant 42 before the addition:
- Push constant 42 to the stack
- Write value from stack to local variable B

This works, but is inefficient. Instead, the stack is used directly
by inlining instructions if possible. With inlining it becomes:
- Push the value of local variable A to the stack (add)
- Push constant 42 to the stack (constant)
- Do addition (add)
- Write value from stack to local variable C (add)

Note that the two SSA instructions can not be generated sequentially
anymore, because their WebAssembly instructions are interleaved.

Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4

Updates #18892

Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec
Reviewed-on: https://go-review.googlesource.com/103535
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-23 21:37:59 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00
Lynn Boger
28edaf4584 cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.

This adds new testcases in test/codegen/memcombine.go

Fixes #22496
Updates #24242

Benchmark improvement for encoding/binary:
name                      old time/op    new time/op    delta
ReadSlice1000Int32s-16      11.0µs ± 0%     9.0µs ± 0%  -17.47%  (p=0.029 n=4+4)
ReadStruct-16               2.47µs ± 1%    2.48µs ± 0%   +0.67%  (p=0.114 n=4+4)
ReadInts-16                  642ns ± 1%     630ns ± 1%   -2.02%  (p=0.029 n=4+4)
WriteInts-16                 654ns ± 0%     653ns ± 1%   -0.08%  (p=0.629 n=4+4)
WriteSlice1000Int32s-16     8.75µs ± 0%    8.20µs ± 0%   -6.19%  (p=0.029 n=4+4)
PutUint16-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint32-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint64-16                1.85ns ± 0%    0.93ns ± 0%  -49.73%  (p=0.029 n=4+4)
LittleEndianPutUint16-16    1.03ns ± 0%    0.93ns ± 0%   -9.71%  (p=0.029 n=4+4)
LittleEndianPutUint32-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
LittleEndianPutUint64-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
PutUvarint32-16             43.0ns ± 0%    43.1ns ± 0%   +0.12%  (p=0.429 n=4+4)
PutUvarint64-16              174ns ± 0%     175ns ± 0%   +0.29%  (p=0.429 n=4+4)

Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.

Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00
Michael Munday
5d9c78201f cmd/compile: allow R11 to be allocated on s390x
R11 is only used as a temporary by a very small set of instructions
(DIV, MOD, MULH and extended MVC/XC instructions). By marking these
instructions as clobbering R11 we can allocate R11 in the general
case.

Change-Id: I0d4ffe80e57c164d42a5ea5ef6308756a5b0f742
Reviewed-on: https://go-review.googlesource.com/110255
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-01 15:48:17 +00:00
Josh Bleecher Snyder
3d6647d6f8 cmd/compile: improve regalloc live values debug printing
Before:

live values at end of each block
  b1: v3 v2 v7 avoid=0
  b2: v3 v13 avoid=81
  b3: v19[AX] v3 avoid=81
  b6: avoid=0
  b7: avoid=0
  b5: avoid=0
  b4: v3 v18 avoid=81

After:

live values at end of each block
  b1: v3 v2 v7
  b2: v3 v13 avoid=AX DI
  b3: v19[AX] v3 avoid=AX DI
  b6:
  b7:
  b5:
  b4: v3 v18 avoid=AX DI

Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28
Reviewed-on: https://go-review.googlesource.com/109000
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 17:45:48 +00:00
Austin Clements
8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
David Chase
b9a365681f cmd/compile: adjust is-statement on Pos's to improve debugging
Stores to auto tmp variables can be hoisted to places
where the line numbers make debugging look "jumpy".
Turning those instructions into ones with is_stmt = 0 in
the DWARF (accomplished by marking ssa nodes with NotStmt)
makes debugging look better while still attributing the
instructions with the correct line number.

The same is true for certain register allocator spills and
reloads.

Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0
Reviewed-on: https://go-review.googlesource.com/98415
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-04 22:14:59 +00:00
Josh Bleecher Snyder
014a9048d4 cmd/compile: prefer to evict a rematerializable register
This resolves a long-standing regalloc TODO:
If you must evict a register, choose to evict a register
containing a rematerializable value, since that value
won't need to be spilled.

Provides very minor performance and size improvements.

name                     old time/op    new time/op    delta
BinaryTree17-8              2.20s ± 3%     2.18s ± 2%  -0.77%  (p=0.000 n=45+49)
Fannkuch11-8                2.14s ± 2%     2.15s ± 2%  +0.73%  (p=0.000 n=43+44)
FmtFprintfEmpty-8          30.6ns ± 4%    30.2ns ± 3%  -1.14%  (p=0.000 n=50+48)
FmtFprintfString-8         54.5ns ± 6%    53.6ns ± 5%  -1.64%  (p=0.001 n=50+48)
FmtFprintfInt-8            58.0ns ± 7%    57.6ns ± 4%    ~     (p=0.220 n=50+50)
FmtFprintfIntInt-8         85.3ns ± 2%    84.8ns ± 3%  -0.62%  (p=0.001 n=44+47)
FmtFprintfPrefixedInt-8    93.9ns ± 6%    93.6ns ± 5%    ~     (p=0.706 n=50+48)
FmtFprintfFloat-8           178ns ± 4%     177ns ± 4%    ~     (p=0.107 n=49+50)
FmtManyArgs-8               376ns ± 4%     374ns ± 3%  -0.58%  (p=0.013 n=45+50)
GobDecode-8                4.77ms ± 2%    4.76ms ± 3%    ~     (p=0.059 n=47+46)
GobEncode-8                4.04ms ± 2%    3.99ms ± 3%  -1.13%  (p=0.000 n=49+49)
Gzip-8                      177ms ± 2%     180ms ± 3%  +1.43%  (p=0.000 n=48+48)
Gunzip-8                   28.5ms ± 6%    28.3ms ± 5%    ~     (p=0.104 n=50+49)
HTTPClientServer-8         72.1µs ± 1%    72.0µs ± 1%  -0.15%  (p=0.042 n=48+42)
JSONEncode-8               9.81ms ± 5%   10.03ms ± 6%  +2.29%  (p=0.000 n=50+49)
JSONDecode-8               39.2ms ± 3%    39.3ms ± 2%    ~     (p=0.095 n=49+49)
Mandelbrot200-8            3.48ms ± 2%    3.46ms ± 2%  -0.80%  (p=0.000 n=47+48)
GoParse-8                  2.54ms ± 3%    2.51ms ± 3%  -1.35%  (p=0.000 n=49+49)
RegexpMatchEasy0_32-8      66.0ns ± 7%    65.7ns ± 8%    ~     (p=0.331 n=50+50)
RegexpMatchEasy0_1K-8       155ns ± 4%     154ns ± 4%    ~     (p=0.986 n=49+50)
RegexpMatchEasy1_32-8      62.6ns ± 8%    62.2ns ± 5%    ~     (p=0.395 n=50+49)
RegexpMatchEasy1_1K-8       260ns ± 5%     255ns ± 3%  -1.92%  (p=0.000 n=49+49)
RegexpMatchMedium_32-8     92.9ns ± 2%    91.8ns ± 2%  -1.25%  (p=0.000 n=46+48)
RegexpMatchMedium_1K-8     27.7µs ± 3%    27.0µs ± 2%  -2.59%  (p=0.000 n=49+49)
RegexpMatchHard_32-8       1.23µs ± 4%    1.21µs ± 2%  -2.16%  (p=0.000 n=49+44)
RegexpMatchHard_1K-8       36.4µs ± 2%    35.7µs ± 2%  -1.87%  (p=0.000 n=48+49)
Revcomp-8                   274ms ± 2%     276ms ± 3%  +0.70%  (p=0.034 n=45+48)
Template-8                 45.1ms ± 8%    45.1ms ± 8%    ~     (p=0.643 n=50+50)
TimeParse-8                 223ns ± 2%     223ns ± 2%    ~     (p=0.401 n=47+47)
TimeFormat-8                245ns ± 2%     246ns ± 3%    ~     (p=0.758 n=49+50)
[Geo mean]                 36.5µs         36.3µs       -0.54%


name        old object-bytes  new object-bytes  delta
Template          480kB ± 0%        480kB ± 0%    ~     (all equal)
Unicode           214kB ± 0%        214kB ± 0%    ~     (all equal)
GoTypes          1.54MB ± 0%       1.54MB ± 0%  -0.03%  (p=0.008 n=5+5)
Compiler         5.75MB ± 0%       5.75MB ± 0%    ~     (all equal)
SSA              14.6MB ± 0%       14.6MB ± 0%  -0.01%  (p=0.008 n=5+5)
Flate             300kB ± 0%        300kB ± 0%  -0.01%  (p=0.008 n=5+5)
GoParser          366kB ± 0%        366kB ± 0%    ~     (all equal)
Reflect          1.20MB ± 0%       1.20MB ± 0%    ~     (all equal)
Tar               413kB ± 0%        413kB ± 0%    ~     (all equal)
XML               529kB ± 0%        528kB ± 0%  -0.13%  (p=0.008 n=5+5)
[Geo mean]        909kB             909kB       -0.02%


Change-Id: I46d37a55197683a98913f35801dc2b0d609653c8
Reviewed-on: https://go-review.googlesource.com/103240
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-29 18:47:18 +00:00
Josh Bleecher Snyder
dafca7de0f cmd/compile: prefer rematerialization to copying
Fixes #24132


name                     old time/op    new time/op    delta
BinaryTree17-8              2.18s ± 2%     2.15s ± 2%  -1.28%  (p=0.000 n=25+26)
Fannkuch11-8                2.16s ± 3%     2.13s ± 3%  -1.54%  (p=0.000 n=27+30)
FmtFprintfEmpty-8          29.9ns ± 3%    29.6ns ± 3%  -1.08%  (p=0.001 n=29+26)
FmtFprintfString-8         53.6ns ± 2%    54.0ns ± 4%    ~     (p=0.193 n=28+29)
FmtFprintfInt-8            56.8ns ± 3%    57.0ns ± 3%    ~     (p=0.330 n=29+29)
FmtFprintfIntInt-8         85.3ns ± 2%    85.8ns ± 3%  +0.56%  (p=0.042 n=30+29)
FmtFprintfPrefixedInt-8    94.1ns ± 5%    99.0ns ± 8%  +5.20%  (p=0.000 n=27+30)
FmtFprintfFloat-8           183ns ± 4%     182ns ± 3%    ~     (p=0.619 n=30+26)
FmtManyArgs-8               369ns ± 2%     369ns ± 2%    ~     (p=0.748 n=27+29)
GobDecode-8                4.78ms ± 2%    4.75ms ± 1%    ~     (p=0.051 n=28+27)
GobEncode-8                4.06ms ± 3%    4.07ms ± 3%    ~     (p=0.781 n=29+30)
Gzip-8                      178ms ± 2%     177ms ± 2%    ~     (p=0.171 n=29+30)
Gunzip-8                   28.2ms ± 7%    28.0ms ± 4%    ~     (p=0.155 n=30+30)
HTTPClientServer-8         71.5µs ± 3%    71.3µs ± 1%    ~     (p=0.913 n=25+27)
JSONEncode-8               9.71ms ± 5%    9.86ms ± 4%  +1.55%  (p=0.015 n=28+30)
JSONDecode-8               38.8ms ± 2%    39.3ms ± 2%  +1.41%  (p=0.000 n=28+29)
Mandelbrot200-8            3.47ms ± 6%    3.44ms ± 3%    ~     (p=0.183 n=28+28)
GoParse-8                  2.55ms ± 2%    2.54ms ± 3%  -0.58%  (p=0.003 n=27+29)
RegexpMatchEasy0_32-8      66.0ns ± 5%    65.3ns ± 4%    ~     (p=0.124 n=30+30)
RegexpMatchEasy0_1K-8       152ns ± 2%     152ns ± 3%    ~     (p=0.881 n=30+30)
RegexpMatchEasy1_32-8      62.9ns ± 9%    62.7ns ± 7%    ~     (p=0.717 n=30+30)
RegexpMatchEasy1_1K-8       263ns ± 3%     263ns ± 4%    ~     (p=0.909 n=30+29)
RegexpMatchMedium_32-8     93.4ns ± 3%    89.3ns ± 2%  -4.32%  (p=0.000 n=29+29)
RegexpMatchMedium_1K-8     27.5µs ± 3%    27.1µs ± 2%  -1.46%  (p=0.000 n=30+27)
RegexpMatchHard_32-8       1.33µs ± 3%    1.31µs ± 3%  -1.50%  (p=0.000 n=27+28)
RegexpMatchHard_1K-8       39.4µs ± 2%    39.1µs ± 2%  -0.54%  (p=0.027 n=28+28)
Revcomp-8                   274ms ± 4%     276ms ± 2%  +0.67%  (p=0.048 n=29+28)
Template-8                 45.1ms ± 5%    44.6ms ± 7%  -1.22%  (p=0.029 n=30+29)
TimeParse-8                 227ns ± 3%     224ns ± 3%  -1.25%  (p=0.000 n=28+27)
TimeFormat-8                248ns ± 3%     245ns ± 3%  -1.33%  (p=0.002 n=30+29)
[Geo mean]                 36.6µs         36.5µs       -0.32%


Change-Id: I24083f0013506b77e2d9da99c40ae2f67803285e
Reviewed-on: https://go-review.googlesource.com/101076
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-28 18:43:49 +00:00
Alberto Donizetti
377a2cb2d2 cmd/compile: reduce allocations in regAllocState.regalloc
name      old time/op       new time/op       delta
Template        281ms ± 2%        282ms ± 3%    ~     (p=0.428 n=19+20)
Unicode         138ms ± 6%        138ms ± 7%    ~     (p=0.813 n=19+20)
GoTypes         901ms ± 2%        895ms ± 2%    ~     (p=0.050 n=19+20)
Compiler        4.25s ± 1%        4.23s ± 1%  -0.31%  (p=0.031 n=19+18)
SSA             9.77s ± 1%        9.78s ± 1%    ~     (p=0.512 n=20+20)
Flate           187ms ± 3%        187ms ± 4%    ~     (p=0.687 n=20+19)
GoParser        224ms ± 4%        222ms ± 3%    ~     (p=0.301 n=20+20)
Reflect         576ms ± 2%        576ms ± 2%    ~     (p=0.620 n=20+20)
Tar             262ms ± 3%        263ms ± 3%    ~     (p=0.599 n=19+18)
XML             322ms ± 4%        322ms ± 2%    ~     (p=0.512 n=20+20)

name      old user-time/op  new user-time/op  delta
Template        403ms ± 3%        399ms ± 5%    ~     (p=0.149 n=17+20)
Unicode         217ms ±12%        217ms ± 9%    ~     (p=0.883 n=20+20)
GoTypes         1.24s ± 3%        1.24s ± 3%    ~     (p=0.718 n=20+20)
Compiler        5.90s ± 3%        5.84s ± 5%    ~     (p=0.217 n=18+20)
SSA             14.0s ± 6%        14.1s ± 5%    ~     (p=0.235 n=19+20)
Flate           253ms ± 6%        254ms ± 5%    ~     (p=0.749 n=20+19)
GoParser        309ms ± 7%        307ms ± 5%    ~     (p=0.398 n=20+20)
Reflect         772ms ± 3%        771ms ± 3%    ~     (p=0.901 n=20+19)
Tar             368ms ± 5%        369ms ± 8%    ~     (p=0.429 n=20+20)
XML             435ms ± 5%        434ms ± 5%    ~     (p=0.841 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.0MB ± 0%       38.9MB ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode        29.0MB ± 0%       29.0MB ± 0%  -0.03%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%        115MB ± 0%  -0.33%  (p=0.000 n=20+20)
Compiler        498MB ± 0%        496MB ± 0%  -0.37%  (p=0.000 n=19+20)
SSA            1.41GB ± 0%       1.40GB ± 0%  -0.24%  (p=0.000 n=20+20)
Flate          25.0MB ± 0%       25.0MB ± 0%  -0.22%  (p=0.000 n=20+19)
GoParser       31.0MB ± 0%       30.9MB ± 0%  -0.23%  (p=0.000 n=20+17)
Reflect        77.1MB ± 0%       77.0MB ± 0%  -0.12%  (p=0.000 n=20+20)
Tar            39.7MB ± 0%       39.6MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            44.9MB ± 0%       44.8MB ± 0%  -0.29%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         386k ± 0%         385k ± 0%  -0.28%  (p=0.000 n=20+20)
Unicode          337k ± 0%         336k ± 0%  -0.07%  (p=0.000 n=20+20)
GoTypes         1.20M ± 0%        1.20M ± 0%  -0.41%  (p=0.000 n=20+20)
Compiler        4.71M ± 0%        4.68M ± 0%  -0.52%  (p=0.000 n=20+20)
SSA             11.7M ± 0%        11.6M ± 0%  -0.31%  (p=0.000 n=20+19)
Flate            238k ± 0%         237k ± 0%  -0.28%  (p=0.000 n=18+20)
GoParser         320k ± 0%         319k ± 0%  -0.34%  (p=0.000 n=20+19)
Reflect          961k ± 0%         959k ± 0%  -0.12%  (p=0.000 n=20+20)
Tar              397k ± 0%         396k ± 0%  -0.23%  (p=0.000 n=20+20)
XML              419k ± 0%         417k ± 0%  -0.39%  (p=0.000 n=20+19)

Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21
Reviewed-on: https://go-review.googlesource.com/102676
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 18:03:39 +00:00
Alberto Donizetti
2ba98f1ae9 cmd/compile: avoid some allocations in regalloc
Compilebench:
name      old time/op       new time/op       delta
Template        283ms ± 3%        281ms ± 4%    ~     (p=0.242 n=20+20)
Unicode         137ms ± 6%        135ms ± 6%    ~     (p=0.194 n=20+19)
GoTypes         890ms ± 2%        883ms ± 1%  -0.74%  (p=0.001 n=19+19)
Compiler        4.21s ± 2%        4.20s ± 2%  -0.40%  (p=0.033 n=20+19)
SSA             9.86s ± 2%        9.68s ± 1%  -1.80%  (p=0.000 n=20+19)
Flate           185ms ± 5%        185ms ± 7%    ~     (p=0.429 n=20+20)
GoParser        222ms ± 3%        222ms ± 4%    ~     (p=0.588 n=19+20)
Reflect         572ms ± 2%        570ms ± 3%    ~     (p=0.113 n=19+20)
Tar             263ms ± 4%        259ms ± 2%  -1.41%  (p=0.013 n=20+20)
XML             321ms ± 2%        321ms ± 4%    ~     (p=0.835 n=20+19)

name      old user-time/op  new user-time/op  delta
Template        400ms ± 5%        405ms ± 5%    ~     (p=0.096 n=20+20)
Unicode         217ms ± 8%        213ms ± 8%    ~     (p=0.242 n=20+20)
GoTypes         1.23s ± 3%        1.22s ± 3%    ~     (p=0.923 n=19+20)
Compiler        5.76s ± 6%        5.81s ± 2%    ~     (p=0.687 n=20+19)
SSA             14.2s ± 4%        14.0s ± 4%    ~     (p=0.121 n=20+20)
Flate           248ms ± 7%        251ms ±10%    ~     (p=0.369 n=20+20)
GoParser        308ms ± 5%        305ms ± 6%    ~     (p=0.336 n=19+20)
Reflect         771ms ± 2%        766ms ± 2%    ~     (p=0.113 n=20+19)
Tar             370ms ± 5%        362ms ± 7%  -2.06%  (p=0.036 n=19+20)
XML             435ms ± 4%        432ms ± 5%    ~     (p=0.369 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.5MB ± 0%       39.4MB ± 0%  -0.20%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.1MB ± 0%    ~     (p=0.064 n=20+20)
GoTypes         117MB ± 0%        117MB ± 0%  -0.17%  (p=0.000 n=20+20)
Compiler        503MB ± 0%        502MB ± 0%  -0.15%  (p=0.000 n=19+19)
SSA            1.42GB ± 0%       1.42GB ± 0%  -0.16%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.3MB ± 0%  -0.19%  (p=0.000 n=20+20)
GoParser       31.4MB ± 0%       31.3MB ± 0%  -0.14%  (p=0.000 n=20+18)
Reflect        78.1MB ± 0%       77.9MB ± 0%  -0.34%  (p=0.000 n=20+19)
Tar            40.1MB ± 0%       40.0MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            45.3MB ± 0%       45.2MB ± 0%  -0.13%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         393k ± 0%         392k ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode          337k ± 0%         337k ± 0%  -0.02%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.22M ± 0%  -0.21%  (p=0.000 n=20+20)
Compiler        4.77M ± 0%        4.76M ± 0%  -0.16%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.8M ± 0%  -0.12%  (p=0.000 n=20+20)
Flate            242k ± 0%         241k ± 0%  -0.20%  (p=0.000 n=20+20)
GoParser         324k ± 0%         324k ± 0%  -0.14%  (p=0.000 n=20+20)
Reflect          985k ± 0%         981k ± 0%  -0.38%  (p=0.000 n=20+20)
Tar              403k ± 0%         402k ± 0%  -0.19%  (p=0.000 n=20+20)
XML              424k ± 0%         424k ± 0%  -0.16%  (p=0.000 n=19+20)

Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499
Reviewed-on: https://go-review.googlesource.com/102421
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-25 18:27:49 +00:00
Ilya Tocar
983dcf70ba cmd/compile/internal/ssa: update regalloc in loops
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:

for .. {
    if hard_case {
	call()
    }
    // simple case, without call
}

So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)

There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)

Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)

Fixes #22698
Updates #22234

Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-20 21:02:39 +00:00
Daniel Martí
cd2cb6e3f5 cmd/compile: cache sparse maps across ssa passes
This is done for sparse sets already, but it was missing for sparse
maps. Only affects deadstore and regalloc, as they're the only ones that
use sparse maps.

name                 old time/op    new time/op    delta
DSEPass-4               247µs ± 0%     216µs ± 0%  -12.75%  (p=0.008 n=5+5)
DSEPassBlock-4         3.05ms ± 1%    2.87ms ± 1%   -6.02%  (p=0.002 n=6+6)
CSEPass-4              2.30ms ± 0%    2.32ms ± 0%   +0.53%  (p=0.026 n=6+6)
CSEPassBlock-4         23.8ms ± 0%    23.8ms ± 0%     ~     (p=0.931 n=6+5)
DeadcodePass-4         51.7µs ± 1%    51.5µs ± 2%     ~     (p=0.429 n=5+6)
DeadcodePassBlock-4     734µs ± 1%     742µs ± 3%     ~     (p=0.394 n=6+6)
MultiPass-4             152µs ± 0%     149µs ± 2%     ~     (p=0.082 n=5+6)
MultiPassBlock-4       2.67ms ± 1%    2.41ms ± 2%   -9.77%  (p=0.008 n=5+5)

name                 old alloc/op   new alloc/op   delta
DSEPass-4              41.2kB ± 0%     0.1kB ± 0%  -99.68%  (p=0.002 n=6+6)
DSEPassBlock-4          560kB ± 0%       4kB ± 0%  -99.34%  (p=0.026 n=5+6)
CSEPass-4               189kB ± 0%     189kB ± 0%     ~     (all equal)
CSEPassBlock-4         3.10MB ± 0%    3.10MB ± 0%     ~     (p=0.444 n=5+5)
DeadcodePass-4         10.5kB ± 0%    10.5kB ± 0%     ~     (all equal)
DeadcodePassBlock-4     164kB ± 0%     164kB ± 0%     ~     (all equal)
MultiPass-4             240kB ± 0%     199kB ± 0%  -17.06%  (p=0.002 n=6+6)
MultiPassBlock-4       3.60MB ± 0%    2.99MB ± 0%  -17.06%  (p=0.002 n=6+6)

name                 old allocs/op  new allocs/op  delta
DSEPass-4                8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002 n=6+6)
DSEPassBlock-4            240 ± 0%       120 ± 0%  -50.00%  (p=0.002 n=6+6)
CSEPass-4                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
CSEPassBlock-4          1.35k ± 0%     1.35k ± 0%     ~     (all equal)
DeadcodePass-4           3.00 ± 0%      3.00 ± 0%     ~     (all equal)
DeadcodePassBlock-4      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
MultiPass-4              11.0 ± 0%      10.0 ± 0%   -9.09%  (p=0.002 n=6+6)
MultiPassBlock-4          165 ± 0%       150 ± 0%   -9.09%  (p=0.002 n=6+6)

Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a
Reviewed-on: https://go-review.googlesource.com/98449
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-15 17:24:39 +00:00
David Chase
c18ff18465 cmd/compile: decouple emitted block order from regalloc block order
While tinkering with different block orders for the preemptible
loop experiment, crashed the register allocator with a "bad"
one (these exist).  Realized that one knob was controlling
two things (register allocation and branch patterns) and
decided that life would be simpler if the two orders were
independent.

Ran some experiments and determined that we have probably,
mostly, been optimizing for register allocation effects, not
branch effects.  Bad block orders for register allocation are
somewhat costly.

This will also allow separate experimentation with perhaps-
better block orders for register allocation.

Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9
Reviewed-on: https://go-review.googlesource.com/47314
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-22 03:02:34 +00:00
Heschi Kreinick
2075a9323d cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.

RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.

The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.

Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.

Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.

TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.

Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-02-14 18:29:19 +00:00
Keith Randall
0153a4130d cmd/compile: fix runtime.KeepAlive
KeepAlive needs to introduce a use of the spill of the
value it is keeping alive.  Without that, we don't guarantee
that the spill dominates the KeepAlive.

This bug was probably introduced with the code to move spills
down to the dominator of the restores, instead of always spilling
just after the value itself (CL 34822).

Fixes #22458.

Change-Id: I94955a21960448ffdacc4df775fe1213967b1d4c
Reviewed-on: https://go-review.googlesource.com/74210
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-30 19:55:02 +00:00
David Chase
ca360c3992 cmd/compile: better XPos for rematerialized values and JMPs
This attempts to choose better values for values that are
rematerialized (uses the XPos of the consumer, not the
original) and for unconditional branches (uses the last
assigned XPos in the block).

The JMP branches seem to sometimes end up with a PC in the
destination block, I think because of register movement
or rematerialization that gets placed in predecessor blocks.
This may be acceptable because (eyeball-empirically) that is
often the line number of the target block, so the line number
flow is correct.

Added proper test, that checks both -N -l and regular compilation.
The test is also capable (for gdb, delve soon) of tracking
variable printing based on comments in the source code.

There's substantial room for improvement in debugger behavior.

Updates #21098.

Change-Id: I13abd48a39141583b85576a015f561065819afd0
Reviewed-on: https://go-review.googlesource.com/50610
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-10-07 22:12:36 +00:00
Daniel Martí
ded2c65db3 cmd/compile: simplify a few bits of the code
Remove an unused type, a few redundant returns and replace a few slice
append loops with a single append.

Change-Id: If07248180bae5631b5b152c6051d9635889997d5
Reviewed-on: https://go-review.googlesource.com/66851
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
2017-09-28 20:40:17 +00:00
Lynn Boger
b74b43de68 cmd/compile: request r12 for indirect calls on ppc64le
On ppc64le, functions compiled with -shared expect r12 to
hold the function's address for indirect calls. Previously
this was enforced by generating a move instruction if the
address wasn't already in r12. This change avoids that extra
move by requesting r12 in the CALL ops that do indirect calls.

As a result of adding support for plugins on ppc64le, it was
discovered that there would be more cases where this extra
move was needed, so this seemed like a better solution.

Updates #20756

Change-Id: I6770885a46990f78c6d2902a715dcdaa822192a1
Reviewed-on: https://go-review.googlesource.com/62890
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-11 18:02:33 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Keith Randall
770d8d8207 cmd/compile: free value earlier in nilcheck
When we remove a nil check, add it back to the free Value pool immediately.

Fixes #18732

Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc
Reviewed-on: https://go-review.googlesource.com/58810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-25 06:01:26 +00:00
Keith Randall
bf4d8d3d05 cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls.

Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-17 21:53:08 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Heschi Kreinick
2d57d94ac3 [dev.debug] cmd/compile: track variable decomposition in LocalSlot
When the compiler decomposes a user variable, track its origin so that
it can be recomposed during DWARF generation.

Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936
Reviewed-on: https://go-review.googlesource.com/50878
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-26 18:39:39 +00:00
David Chase
00263a8968 cmd/compile: reduce debugger-worsening line number churn
Reuse block head or preceding instruction's line number for
register allocator's spill, fill, copy, rematerialization
instructionsl; and also for phi, and for no-src-pos
instructions.  Assembler creates same line number tables
for copy-predecessor-line and for no-src-pos,
but copy-predecessor produces better-looking assembly
language output with -S and with GOSSAFUNC, and does not
require changes to tests of existing assembly language.

Split "copyInto" into two cases, one for register allocation,
one for otherwise.  This caused the test score line change
count to increase by one, which may reflect legitimately
useful information preserved.  Without any special treatment
for copyInto, the change count increases by 21 more, from
51 to 72 (i.e., quite a lot).

There is a test; using two naive "scores" for line number
churn, the old numbering is 2x or 4x worse.

Fixes #18902.

Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1
Reviewed-on: https://go-review.googlesource.com/36207
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-10 17:16:44 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Cherry Zhang
8a5175df35 cmd/compile: improve startRegs calculation
In register allocation, we calculate what values are used in
and after the current block. If a value is used only after a
function call, since registers are clobbered in call, we don't
need to mark the value live at the entrance of the block.
Before this CL it is considered live, and unnecessary copy or
load may be generated when resolving merge edge.

Fixes #14761.

On AMD64:
name                      old time/op    new time/op    delta
BinaryTree17-12              2.84s ± 1%     2.81s ± 1%   -1.06%  (p=0.000 n=10+9)
Fannkuch11-12                3.61s ± 0%     3.55s ± 1%   -1.77%  (p=0.000 n=10+9)
FmtFprintfEmpty-12          50.4ns ± 4%    50.0ns ± 1%     ~     (p=0.785 n=9+8)
FmtFprintfString-12         80.0ns ± 3%    78.2ns ± 3%   -2.35%  (p=0.004 n=10+9)
FmtFprintfInt-12            81.3ns ± 4%    81.8ns ± 2%     ~     (p=0.159 n=10+10)
FmtFprintfIntInt-12          120ns ± 4%     118ns ± 2%     ~     (p=0.218 n=10+10)
FmtFprintfPrefixedInt-12     152ns ± 3%     155ns ± 2%   +2.11%  (p=0.026 n=10+10)
FmtFprintfFloat-12           240ns ± 1%     238ns ± 1%   -0.79%  (p=0.005 n=9+9)
FmtManyArgs-12               504ns ± 1%     510ns ± 1%   +1.14%  (p=0.000 n=8+9)
GobDecode-12                7.00ms ± 1%    6.99ms ± 0%     ~     (p=0.497 n=9+10)
GobEncode-12                5.47ms ± 1%    5.48ms ± 1%     ~     (p=0.218 n=10+10)
Gzip-12                      258ms ± 2%     256ms ± 1%   -0.96%  (p=0.043 n=10+9)
Gunzip-12                   38.6ms ± 0%    38.3ms ± 0%   -0.64%  (p=0.000 n=9+8)
HTTPClientServer-12         90.4µs ± 3%    87.2µs ±11%     ~     (p=0.053 n=9+10)
JSONEncode-12               15.6ms ± 0%    15.6ms ± 1%     ~     (p=0.077 n=9+9)
JSONDecode-12               55.1ms ± 1%    54.6ms ± 1%   -0.85%  (p=0.010 n=10+9)
Mandelbrot200-12            4.49ms ± 0%    4.47ms ± 0%   -0.25%  (p=0.000 n=10+8)
GoParse-12                  3.38ms ± 0%    3.37ms ± 1%     ~     (p=0.315 n=8+10)
RegexpMatchEasy0_32-12      82.5ns ± 4%    82.0ns ± 0%     ~     (p=0.164 n=10+8)
RegexpMatchEasy0_1K-12       203ns ± 1%     202ns ± 1%   -0.85%  (p=0.000 n=9+10)
RegexpMatchEasy1_32-12      82.3ns ± 1%    81.1ns ± 0%   -1.39%  (p=0.000 n=10+8)
RegexpMatchEasy1_1K-12       357ns ± 1%     357ns ± 1%     ~     (p=0.697 n=8+9)
RegexpMatchMedium_32-12      125ns ± 2%     126ns ± 2%     ~     (p=0.197 n=10+10)
RegexpMatchMedium_1K-12     39.6µs ± 3%    39.6µs ± 1%     ~     (p=0.971 n=10+10)
RegexpMatchHard_32-12       1.99µs ± 2%    1.99µs ± 4%     ~     (p=0.891 n=10+9)
RegexpMatchHard_1K-12       60.1µs ± 3%    60.4µs ± 3%     ~     (p=0.684 n=10+10)
Revcomp-12                   531ms ± 6%     441ms ± 0%  -16.94%  (p=0.000 n=10+9)
Template-12                 58.9ms ± 1%    58.7ms ± 1%     ~     (p=0.315 n=10+10)
TimeParse-12                 319ns ± 1%     320ns ± 4%     ~     (p=0.215 n=9+9)
TimeFormat-12                345ns ± 0%     333ns ± 1%   -3.36%  (p=0.000 n=9+10)
[Geo mean]                  52.2µs         51.6µs        -1.13%

On ARM64:
name                     old time/op    new time/op    delta
BinaryTree17-8              8.53s ± 0%     8.36s ± 0%   -1.89%  (p=0.000 n=10+10)
Fannkuch11-8                6.15s ± 0%     6.10s ± 0%   -0.67%  (p=0.000 n=10+10)
FmtFprintfEmpty-8           117ns ± 0%     117ns ± 0%     ~     (all equal)
FmtFprintfString-8          192ns ± 0%     192ns ± 0%     ~     (all equal)
FmtFprintfInt-8             198ns ± 0%     198ns ± 0%     ~     (p=0.211 n=10+10)
FmtFprintfIntInt-8          289ns ± 0%     291ns ± 0%   +0.59%  (p=0.000 n=7+10)
FmtFprintfPrefixedInt-8     320ns ± 2%     317ns ± 0%     ~     (p=0.431 n=10+8)
FmtFprintfFloat-8           538ns ± 0%     538ns ± 0%     ~     (all equal)
FmtManyArgs-8              1.17µs ± 1%    1.18µs ± 1%     ~     (p=0.063 n=10+10)
GobDecode-8                17.0ms ± 1%    17.2ms ± 1%   +0.83%  (p=0.000 n=10+10)
GobEncode-8                14.2ms ± 0%    14.1ms ± 1%   -0.78%  (p=0.001 n=9+10)
Gzip-8                      806ms ± 0%     797ms ± 0%   -1.12%  (p=0.000 n=6+9)
Gunzip-8                    131ms ± 0%     130ms ± 0%   -0.51%  (p=0.000 n=10+9)
HTTPClientServer-8          206µs ± 9%     212µs ± 2%     ~     (p=0.829 n=10+8)
JSONEncode-8               40.1ms ± 0%    40.1ms ± 0%     ~     (p=0.136 n=9+9)
JSONDecode-8                157ms ± 0%     151ms ± 0%   -3.32%  (p=0.000 n=9+9)
Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%   -0.05%  (p=0.000 n=9+8)
GoParse-8                  8.43ms ± 0%    8.43ms ± 0%     ~     (p=0.912 n=10+10)
RegexpMatchEasy0_32-8       228ns ± 1%     227ns ± 0%   -0.26%  (p=0.026 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.18%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       258ns ± 1%     250ns ± 0%   -2.83%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-8      2.39µs ± 0%    2.13µs ± 0%  -10.94%  (p=0.000 n=9+9)
RegexpMatchMedium_32-8      352ns ± 0%     351ns ± 0%   -0.29%  (p=0.004 n=9+10)
RegexpMatchMedium_1K-8      104µs ± 0%     105µs ± 0%   +0.58%  (p=0.000 n=8+9)
RegexpMatchHard_32-8       5.84µs ± 0%    5.82µs ± 0%   -0.27%  (p=0.000 n=9+10)
RegexpMatchHard_1K-8        177µs ± 0%     177µs ± 0%   -0.07%  (p=0.000 n=9+9)
Revcomp-8                   1.57s ± 1%     1.50s ± 1%   -4.60%  (p=0.000 n=9+10)
Template-8                  157ms ± 1%     153ms ± 1%   -2.28%  (p=0.000 n=10+9)
TimeParse-8                 779ns ± 1%     770ns ± 1%   -1.18%  (p=0.013 n=10+10)
TimeFormat-8                823ns ± 2%     826ns ± 1%     ~     (p=0.324 n=10+9)
[Geo mean]                  144µs          142µs        -1.45%

Reduce cmd/go text size by 0.5%.

Change-Id: I9288ff983c4a7cf03fc0cb35b9b1750828013117
Reviewed-on: https://go-review.googlesource.com/38457
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 19:45:32 +00:00
Josh Bleecher Snyder
34975095d0 cmd/compile: provide pos and curfn to temp
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.

Updates #15756
Updates #19683

Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-25 00:09:21 +00:00
Keith Randall
27bc723b51 cmd/compile: initialize loop depths
Regalloc uses loop depths - make sure they are initialized!

Test to make sure we aren't pushing spills into loops.

This fixes a generated-code performance bug introduced with
the better spill placement change:
https://go-review.googlesource.com/c/34822/

Update #19595

Change-Id: Ib9f0da6fb588503518847d7aab51e569fd3fa61e
Reviewed-on: https://go-review.googlesource.com/38434
Reviewed-by: David Chase <drchase@google.com>
2017-03-23 16:41:22 +00:00
Josh Bleecher Snyder
aea3aff669 cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.

This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.

Passes toolstash-check -all. No compiler performance change.

Updates #15756

Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-19 00:21:23 +00:00