Commit graph

55 commits

Author SHA1 Message Date
Ben Shi
27965c1436 cmd/compile: optimize 386's ADDLconstmodifyidx4
This CL optimize ADDLconstmodifyidx4 to INCL/DECL, when the
constant is +1/-1.

1. The total size of pkg/linux_386/ decreases 28 bytes, excluding
cmd/compile.

2. There is no regression in the go1 benchmark test, excluding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.25s ± 2%     3.23s ± 3%  -0.70%  (p=0.040 n=30+30)
Fannkuch11-4                3.50s ± 1%     3.47s ± 1%  -0.68%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          44.6ns ± 3%    44.8ns ± 3%  +0.46%  (p=0.029 n=30+30)
FmtFprintfString-4         79.0ns ± 3%    78.7ns ± 3%    ~     (p=0.053 n=30+30)
FmtFprintfInt-4            89.2ns ± 2%    89.4ns ± 3%    ~     (p=0.665 n=30+29)
FmtFprintfIntInt-4          142ns ± 3%     142ns ± 3%    ~     (p=0.435 n=30+30)
FmtFprintfPrefixedInt-4     182ns ± 2%     182ns ± 2%    ~     (p=0.964 n=30+30)
FmtFprintfFloat-4           407ns ± 3%     411ns ± 4%    ~     (p=0.080 n=30+30)
FmtManyArgs-4               597ns ± 3%     593ns ± 4%    ~     (p=0.222 n=30+30)
GobDecode-4                7.09ms ± 6%    7.07ms ± 7%    ~     (p=0.633 n=30+30)
GobEncode-4                6.81ms ± 9%    6.81ms ± 8%    ~     (p=0.982 n=30+30)
Gzip-4                      398ms ± 4%     400ms ± 6%    ~     (p=0.177 n=30+30)
Gunzip-4                   41.3ms ± 3%    40.6ms ± 4%  -1.71%  (p=0.005 n=30+30)
HTTPClientServer-4         63.4µs ± 3%    63.4µs ± 4%    ~     (p=0.646 n=30+28)
JSONEncode-4               16.0ms ± 3%    16.1ms ± 3%    ~     (p=0.057 n=30+30)
JSONDecode-4               63.3ms ± 8%    63.1ms ± 7%    ~     (p=0.786 n=30+30)
Mandelbrot200-4            5.17ms ± 3%    5.15ms ± 8%    ~     (p=0.654 n=30+30)
GoParse-4                  3.24ms ± 3%    3.23ms ± 2%    ~     (p=0.091 n=30+30)
RegexpMatchEasy0_32-4       103ns ± 4%     103ns ± 4%    ~     (p=0.575 n=30+30)
RegexpMatchEasy0_1K-4       823ns ± 2%     821ns ± 3%    ~     (p=0.827 n=30+30)
RegexpMatchEasy1_32-4       113ns ± 3%     112ns ± 3%    ~     (p=0.076 n=30+30)
RegexpMatchEasy1_1K-4      1.02µs ± 4%    1.01µs ± 5%    ~     (p=0.087 n=30+30)
RegexpMatchMedium_32-4      129ns ± 3%     127ns ± 4%  -1.55%  (p=0.009 n=30+30)
RegexpMatchMedium_1K-4     39.3µs ± 4%    39.7µs ± 3%    ~     (p=0.054 n=30+30)
RegexpMatchHard_32-4       2.15µs ± 4%    2.15µs ± 4%    ~     (p=0.712 n=30+30)
RegexpMatchHard_1K-4       66.0µs ± 3%    65.1µs ± 3%  -1.32%  (p=0.002 n=30+30)
Revcomp-4                   1.85s ± 2%     1.85s ± 3%    ~     (p=0.168 n=30+30)
Template-4                 69.5ms ± 7%    68.9ms ± 6%    ~     (p=0.250 n=28+28)
TimeParse-4                 434ns ± 3%     432ns ± 4%    ~     (p=0.629 n=30+30)
TimeFormat-4                403ns ± 4%     408ns ± 3%  +1.23%  (p=0.019 n=30+29)
[Geo mean]                 65.5µs         65.3µs       -0.20%

name                     old speed      new speed      delta
GobDecode-4               108MB/s ± 6%   109MB/s ± 6%    ~     (p=0.636 n=30+30)
GobEncode-4               113MB/s ±10%   113MB/s ± 9%    ~     (p=0.982 n=30+30)
Gzip-4                   48.8MB/s ± 4%  48.6MB/s ± 5%    ~     (p=0.178 n=30+30)
Gunzip-4                  470MB/s ± 3%   479MB/s ± 4%  +1.72%  (p=0.006 n=30+30)
JSONEncode-4              121MB/s ± 3%   120MB/s ± 3%    ~     (p=0.057 n=30+30)
JSONDecode-4             30.7MB/s ± 8%  30.8MB/s ± 8%    ~     (p=0.784 n=30+30)
GoParse-4                17.9MB/s ± 3%  17.9MB/s ± 2%    ~     (p=0.090 n=30+30)
RegexpMatchEasy0_32-4     309MB/s ± 4%   309MB/s ± 3%    ~     (p=0.530 n=30+30)
RegexpMatchEasy0_1K-4    1.24GB/s ± 2%  1.25GB/s ± 3%    ~     (p=0.976 n=30+30)
RegexpMatchEasy1_32-4     282MB/s ± 3%   284MB/s ± 3%  +0.81%  (p=0.041 n=30+30)
RegexpMatchEasy1_1K-4    1.00GB/s ± 3%  1.01GB/s ± 4%    ~     (p=0.091 n=30+30)
RegexpMatchMedium_32-4   7.71MB/s ± 3%  7.84MB/s ± 4%  +1.71%  (p=0.000 n=30+30)
RegexpMatchMedium_1K-4   26.1MB/s ± 4%  25.8MB/s ± 3%    ~     (p=0.051 n=30+30)
RegexpMatchHard_32-4     14.9MB/s ± 4%  14.9MB/s ± 4%    ~     (p=0.712 n=30+30)
RegexpMatchHard_1K-4     15.5MB/s ± 3%  15.7MB/s ± 3%  +1.34%  (p=0.003 n=30+30)
Revcomp-4                 138MB/s ± 2%   137MB/s ± 3%    ~     (p=0.174 n=30+30)
Template-4               28.0MB/s ± 6%  28.2MB/s ± 6%    ~     (p=0.251 n=28+28)
[Geo mean]               82.3MB/s       82.6MB/s       +0.36%

Change-Id: I389829699ffe9500a013fcf31be58a97e98043e1
Reviewed-on: https://go-review.googlesource.com/c/140701
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-11 01:20:34 +00:00
Ben Shi
3933302550 cmd/compile: add indexed form for several 386 instructions
This CL implements indexed memory operands for the following instructions.
(ADD|SUB|MUL|AND|OR|XOR)Lload -> (ADD|SUB|MUL|AND|OR|XOR)Lloadidx4
(ADD|SUB|AND|OR|XOR)Lmodify -> (ADD|SUB|AND|OR|XOR)Lmodifyidx4
(ADD|AND|OR|XOR)Lconstmodify -> (ADD|AND|OR|XOR)Lconstmodifyidx4

1. The total size of pkg/linux_386/ decreases about 2.5KB, excluding
cmd/compile/ .

2. There is little regression in the go1 benchmark test, excluding noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.25s ± 3%     3.25s ± 3%    ~     (p=0.218 n=40+40)
Fannkuch11-4                3.53s ± 1%     3.53s ± 1%    ~     (p=0.303 n=40+40)
FmtFprintfEmpty-4          44.9ns ± 3%    45.6ns ± 3%  +1.48%  (p=0.030 n=40+36)
FmtFprintfString-4         78.7ns ± 5%    80.1ns ± 7%    ~     (p=0.217 n=36+40)
FmtFprintfInt-4            90.2ns ± 6%    89.8ns ± 5%    ~     (p=0.659 n=40+38)
FmtFprintfIntInt-4          140ns ± 5%     141ns ± 5%  +1.00%  (p=0.027 n=40+40)
FmtFprintfPrefixedInt-4     185ns ± 3%     183ns ± 3%    ~     (p=0.104 n=40+40)
FmtFprintfFloat-4           411ns ± 4%     406ns ± 3%  -1.37%  (p=0.005 n=40+40)
FmtManyArgs-4               590ns ± 4%     598ns ± 4%  +1.35%  (p=0.008 n=40+40)
GobDecode-4                7.16ms ± 5%    7.10ms ± 5%    ~     (p=0.335 n=40+40)
GobEncode-4                6.85ms ± 7%    6.74ms ± 9%    ~     (p=0.058 n=38+40)
Gzip-4                      400ms ± 4%     399ms ± 2%  -0.34%  (p=0.003 n=40+33)
Gunzip-4                   41.4ms ± 3%    41.4ms ± 4%  -0.12%  (p=0.020 n=40+40)
HTTPClientServer-4         64.1µs ± 4%    63.5µs ± 2%  -1.07%  (p=0.000 n=39+37)
JSONEncode-4               15.9ms ± 2%    15.9ms ± 3%    ~     (p=0.103 n=40+40)
JSONDecode-4               62.2ms ± 4%    61.6ms ± 3%  -0.98%  (p=0.006 n=39+40)
Mandelbrot200-4            5.18ms ± 3%    5.14ms ± 4%    ~     (p=0.125 n=40+40)
GoParse-4                  3.29ms ± 2%    3.27ms ± 2%  -0.66%  (p=0.006 n=40+40)
RegexpMatchEasy0_32-4       103ns ± 4%     103ns ± 4%    ~     (p=0.632 n=40+40)
RegexpMatchEasy0_1K-4       830ns ± 3%     828ns ± 3%    ~     (p=0.563 n=40+40)
RegexpMatchEasy1_32-4       113ns ± 4%     113ns ± 4%    ~     (p=0.494 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.03µs ± 4%    ~     (p=0.665 n=40+40)
RegexpMatchMedium_32-4      130ns ± 4%     129ns ± 3%    ~     (p=0.458 n=40+40)
RegexpMatchMedium_1K-4     39.4µs ± 3%    39.7µs ± 3%    ~     (p=0.825 n=40+40)
RegexpMatchHard_32-4       2.16µs ± 4%    2.15µs ± 4%    ~     (p=0.137 n=40+40)
RegexpMatchHard_1K-4       65.2µs ± 3%    65.4µs ± 4%    ~     (p=0.160 n=40+40)
Revcomp-4                   1.87s ± 2%     1.87s ± 1%  +0.17%  (p=0.019 n=33+33)
Template-4                 69.4ms ± 3%    69.8ms ± 3%  +0.60%  (p=0.009 n=40+40)
TimeParse-4                 437ns ± 4%     438ns ± 4%    ~     (p=0.234 n=40+40)
TimeFormat-4                408ns ± 3%     408ns ± 3%    ~     (p=0.904 n=40+40)
[Geo mean]                 65.7µs         65.6µs       -0.08%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 5%   108MB/s ± 5%    ~     (p=0.336 n=40+40)
GobEncode-4               112MB/s ± 6%   114MB/s ± 9%  +1.95%  (p=0.036 n=37+40)
Gzip-4                   48.5MB/s ± 4%  48.6MB/s ± 2%  +0.28%  (p=0.003 n=40+33)
Gunzip-4                  469MB/s ± 4%   469MB/s ± 4%  +0.11%  (p=0.021 n=40+40)
JSONEncode-4              122MB/s ± 2%   122MB/s ± 3%    ~     (p=0.105 n=40+40)
JSONDecode-4             31.2MB/s ± 4%  31.5MB/s ± 4%  +0.99%  (p=0.007 n=39+40)
GoParse-4                17.6MB/s ± 2%  17.7MB/s ± 2%  +0.66%  (p=0.007 n=40+40)
RegexpMatchEasy0_32-4     310MB/s ± 4%   310MB/s ± 4%    ~     (p=0.384 n=40+40)
RegexpMatchEasy0_1K-4    1.23GB/s ± 3%  1.24GB/s ± 3%    ~     (p=0.186 n=40+40)
RegexpMatchEasy1_32-4     283MB/s ± 3%   281MB/s ± 4%    ~     (p=0.855 n=40+40)
RegexpMatchEasy1_1K-4    1.00GB/s ± 4%  1.00GB/s ± 4%    ~     (p=0.665 n=40+40)
RegexpMatchMedium_32-4   7.68MB/s ± 4%  7.73MB/s ± 3%    ~     (p=0.359 n=40+40)
RegexpMatchMedium_1K-4   26.0MB/s ± 3%  25.8MB/s ± 3%    ~     (p=0.825 n=40+40)
RegexpMatchHard_32-4     14.8MB/s ± 3%  14.9MB/s ± 4%    ~     (p=0.136 n=40+40)
RegexpMatchHard_1K-4     15.7MB/s ± 3%  15.7MB/s ± 4%    ~     (p=0.150 n=40+40)
Revcomp-4                 136MB/s ± 1%   136MB/s ± 1%  -0.09%  (p=0.028 n=32+33)
Template-4               28.0MB/s ± 3%  27.8MB/s ± 3%  -0.59%  (p=0.010 n=40+40)
[Geo mean]               82.1MB/s       82.3MB/s       +0.25%

Change-Id: Ifa387a251056678326d3508aa02753b70bf7e5d0
Reviewed-on: https://go-review.googlesource.com/c/140303
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-09 03:55:08 +00:00
Ben Shi
7b8b3f30ed cmd/compile: combine similar code in 386's assembly generator
The indexed MOVload and MOVstore have similar logic, and this CL
combine them together. The total size of pkg/linux_386/cmd/compile/
decreases about 4KB.

Change-Id: I06236a3542aaa3dfc113c49fe4c69d209e018dfe
Reviewed-on: https://go-review.googlesource.com/c/139958
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-06 00:02:16 +00:00
Ben Shi
9a033bf9d3 cmd/compile: optimize 386's assembly generator
The ADDconstmodify has similar logic with other constmodify like
instructions. This CL optimize them to share code via fallthrough.
And the size of pkg/linux_386/cmd/compile/internal/x86.a decreases
about 0.3KB.

Change-Id: Ibdf06228afde875e8fe8e30851b50ca2be513dd9
Reviewed-on: https://go-review.googlesource.com/136398
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-09-20 06:33:58 +00:00
Ben Shi
a0a7e9fc0c cmd/compile: implement "OPC $imm, (mem)" for 386
New read-modify-write operations are introduced in this CL for 386.

1. The total size of pkg/linux_386 decreases about 10KB (excluding
cmd/compile).

2. The go1 benchmark shows little regression.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.32s ± 4%     3.29s ± 2%    ~     (p=0.059 n=30+30)
Fannkuch11-4                3.49s ± 1%     3.46s ± 1%  -0.92%  (p=0.001 n=30+30)
FmtFprintfEmpty-4          47.7ns ± 2%    46.8ns ± 5%  -1.93%  (p=0.011 n=25+30)
FmtFprintfString-4         79.5ns ± 7%    80.2ns ± 3%  +0.89%  (p=0.001 n=28+29)
FmtFprintfInt-4            90.5ns ± 2%    92.1ns ± 2%  +1.82%  (p=0.014 n=22+30)
FmtFprintfIntInt-4          141ns ± 1%     144ns ± 3%  +2.23%  (p=0.013 n=22+30)
FmtFprintfPrefixedInt-4     183ns ± 2%     184ns ± 3%    ~     (p=0.080 n=21+30)
FmtFprintfFloat-4           409ns ± 3%     412ns ± 3%  +0.83%  (p=0.040 n=30+30)
FmtManyArgs-4               597ns ± 6%     607ns ± 4%  +1.71%  (p=0.006 n=30+30)
GobDecode-4                7.21ms ± 5%    7.18ms ± 6%    ~     (p=0.665 n=30+30)
GobEncode-4                7.17ms ± 6%    7.09ms ± 7%    ~     (p=0.117 n=29+30)
Gzip-4                      413ms ± 4%     399ms ± 4%  -3.48%  (p=0.000 n=30+30)
Gunzip-4                   41.3ms ± 4%    41.7ms ± 3%  +1.05%  (p=0.011 n=30+30)
HTTPClientServer-4         63.5µs ± 3%    62.9µs ± 2%  -0.97%  (p=0.017 n=30+27)
JSONEncode-4               20.3ms ± 5%    20.1ms ± 5%  -1.16%  (p=0.004 n=30+30)
JSONDecode-4               66.2ms ± 4%    67.7ms ± 4%  +2.21%  (p=0.000 n=30+30)
Mandelbrot200-4            5.16ms ± 3%    5.18ms ± 3%    ~     (p=0.123 n=30+30)
GoParse-4                  3.23ms ± 2%    3.27ms ± 2%  +1.08%  (p=0.006 n=30+30)
RegexpMatchEasy0_32-4      98.9ns ± 5%    97.1ns ± 4%  -1.83%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4       842ns ± 3%     842ns ± 3%    ~     (p=0.550 n=30+30)
RegexpMatchEasy1_32-4       107ns ± 4%     105ns ± 4%  -1.93%  (p=0.012 n=30+30)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.04µs ± 4%    ~     (p=0.304 n=30+30)
RegexpMatchMedium_32-4      132ns ± 2%     129ns ± 4%  -2.02%  (p=0.000 n=21+30)
RegexpMatchMedium_1K-4     44.1µs ± 4%    43.8µs ± 3%    ~     (p=0.641 n=30+30)
RegexpMatchHard_32-4       2.26µs ± 4%    2.23µs ± 4%  -1.28%  (p=0.023 n=30+30)
RegexpMatchHard_1K-4       68.1µs ± 3%    68.6µs ± 4%    ~     (p=0.089 n=30+30)
Revcomp-4                   1.85s ± 2%     1.84s ± 2%    ~     (p=0.072 n=30+30)
Template-4                 69.2ms ± 3%    68.5ms ± 3%  -1.04%  (p=0.012 n=30+30)
TimeParse-4                 441ns ± 3%     446ns ± 4%  +1.21%  (p=0.001 n=30+30)
TimeFormat-4                415ns ± 3%     415ns ± 3%    ~     (p=0.436 n=30+30)
[Geo mean]                 67.0µs         66.9µs       -0.17%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 5%   107MB/s ± 6%    ~     (p=0.663 n=30+30)
GobEncode-4               107MB/s ± 6%   108MB/s ± 7%    ~     (p=0.117 n=29+30)
Gzip-4                   47.0MB/s ± 4%  48.7MB/s ± 4%  +3.61%  (p=0.000 n=30+30)
Gunzip-4                  470MB/s ± 4%   466MB/s ± 4%  -1.05%  (p=0.011 n=30+30)
JSONEncode-4             95.6MB/s ± 5%  96.7MB/s ± 5%  +1.16%  (p=0.005 n=30+30)
JSONDecode-4             29.3MB/s ± 4%  28.7MB/s ± 4%  -2.17%  (p=0.000 n=30+30)
GoParse-4                17.9MB/s ± 2%  17.7MB/s ± 2%  -1.06%  (p=0.007 n=30+30)
RegexpMatchEasy0_32-4     323MB/s ± 5%   329MB/s ± 4%  +1.93%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4    1.22GB/s ± 3%  1.22GB/s ± 3%    ~     (p=0.496 n=30+30)
RegexpMatchEasy1_32-4     298MB/s ± 4%   303MB/s ± 4%  +1.84%  (p=0.017 n=30+30)
RegexpMatchEasy1_1K-4     995MB/s ± 4%   989MB/s ± 4%    ~     (p=0.307 n=30+30)
RegexpMatchMedium_32-4   7.56MB/s ± 4%  7.74MB/s ± 4%  +2.46%  (p=0.000 n=22+30)
RegexpMatchMedium_1K-4   23.2MB/s ± 4%  23.4MB/s ± 3%    ~     (p=0.651 n=30+30)
RegexpMatchHard_32-4     14.2MB/s ± 4%  14.3MB/s ± 4%  +1.29%  (p=0.021 n=30+30)
RegexpMatchHard_1K-4     15.0MB/s ± 3%  14.9MB/s ± 4%    ~     (p=0.069 n=30+29)
Revcomp-4                 138MB/s ± 2%   138MB/s ± 2%    ~     (p=0.072 n=30+30)
Template-4               28.1MB/s ± 3%  28.4MB/s ± 3%  +1.05%  (p=0.012 n=30+30)
[Geo mean]               79.7MB/s       80.2MB/s       +0.60%

Change-Id: I44a1dfc942c9a385904553c4fe1fa8e509c8aa31
Reviewed-on: https://go-review.googlesource.com/120916
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-22 04:12:42 +00:00
Ben Shi
90f2fa0037 cmd/compile: optimize 386 code with MULLload/DIVSSload/DIVSDload
IMULL/DIVSS/DIVSD all can take the source operand from memory
directly. And this CL implement that optimization.

1. The total size of pkg/linux_386 decreases about 84KB (excluding
cmd/compile).

2. The go1 benchmark shows little regression in total (excluding noise).
name                     old time/op    new time/op    delta
BinaryTree17-4              3.29s ± 2%     3.27s ± 4%    ~     (p=0.192 n=30+30)
Fannkuch11-4                3.49s ± 2%     3.54s ± 1%  +1.48%  (p=0.000 n=30+30)
FmtFprintfEmpty-4          45.9ns ± 3%    46.3ns ± 4%  +0.89%  (p=0.037 n=30+30)
FmtFprintfString-4         78.8ns ± 3%    78.7ns ± 4%    ~     (p=0.209 n=30+27)
FmtFprintfInt-4            91.0ns ± 2%    90.3ns ± 2%  -0.82%  (p=0.031 n=30+27)
FmtFprintfIntInt-4          142ns ± 4%     143ns ± 4%    ~     (p=0.136 n=30+30)
FmtFprintfPrefixedInt-4     181ns ± 3%     183ns ± 4%  +1.40%  (p=0.005 n=30+30)
FmtFprintfFloat-4           404ns ± 4%     408ns ± 3%    ~     (p=0.397 n=30+30)
FmtManyArgs-4               601ns ± 3%     609ns ± 5%    ~     (p=0.059 n=30+30)
GobDecode-4                7.21ms ± 5%    7.24ms ± 5%    ~     (p=0.612 n=30+30)
GobEncode-4                6.91ms ± 6%    6.91ms ± 6%    ~     (p=0.797 n=30+30)
Gzip-4                      398ms ± 6%     399ms ± 4%    ~     (p=0.173 n=30+30)
Gunzip-4                   41.7ms ± 3%    41.8ms ± 3%    ~     (p=0.423 n=30+30)
HTTPClientServer-4         62.3µs ± 2%    62.7µs ± 3%    ~     (p=0.085 n=29+30)
JSONEncode-4               21.0ms ± 4%    20.7ms ± 5%  -1.39%  (p=0.014 n=30+30)
JSONDecode-4               66.3ms ± 3%    67.4ms ± 1%  +1.71%  (p=0.003 n=30+24)
Mandelbrot200-4            5.15ms ± 3%    5.16ms ± 3%    ~     (p=0.697 n=30+30)
GoParse-4                  3.24ms ± 3%    3.27ms ± 4%  +0.91%  (p=0.032 n=30+30)
RegexpMatchEasy0_32-4       101ns ± 5%      99ns ± 4%  -1.82%  (p=0.008 n=29+30)
RegexpMatchEasy0_1K-4       848ns ± 4%     841ns ± 2%  -0.77%  (p=0.043 n=30+30)
RegexpMatchEasy1_32-4       106ns ± 6%     106ns ± 3%    ~     (p=0.939 n=29+30)
RegexpMatchEasy1_1K-4      1.02µs ± 3%    1.03µs ± 4%    ~     (p=0.297 n=28+30)
RegexpMatchMedium_32-4      129ns ± 4%     127ns ± 4%    ~     (p=0.073 n=30+30)
RegexpMatchMedium_1K-4     43.9µs ± 3%    43.8µs ± 3%    ~     (p=0.186 n=30+30)
RegexpMatchHard_32-4       2.24µs ± 4%    2.22µs ± 4%    ~     (p=0.332 n=30+29)
RegexpMatchHard_1K-4       68.0µs ± 4%    67.5µs ± 3%    ~     (p=0.290 n=30+30)
Revcomp-4                   1.85s ± 3%     1.85s ± 3%    ~     (p=0.358 n=30+30)
Template-4                 69.6ms ± 3%    70.0ms ± 4%    ~     (p=0.273 n=30+30)
TimeParse-4                 445ns ± 3%     441ns ± 3%    ~     (p=0.494 n=30+30)
TimeFormat-4                412ns ± 3%     412ns ± 6%    ~     (p=0.841 n=30+30)
[Geo mean]                 66.7µs         66.8µs       +0.13%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 5%   106MB/s ± 5%    ~     (p=0.615 n=30+30)
GobEncode-4               111MB/s ± 6%   111MB/s ± 6%    ~     (p=0.790 n=30+30)
Gzip-4                   48.8MB/s ± 6%  48.7MB/s ± 4%    ~     (p=0.167 n=30+30)
Gunzip-4                  465MB/s ± 3%   465MB/s ± 3%    ~     (p=0.420 n=30+30)
JSONEncode-4             92.4MB/s ± 4%  93.7MB/s ± 5%  +1.42%  (p=0.015 n=30+30)
JSONDecode-4             29.3MB/s ± 3%  28.8MB/s ± 1%  -1.72%  (p=0.003 n=30+24)
GoParse-4                17.9MB/s ± 3%  17.7MB/s ± 4%  -0.89%  (p=0.037 n=30+30)
RegexpMatchEasy0_32-4     317MB/s ± 8%   324MB/s ± 4%  +2.14%  (p=0.006 n=30+30)
RegexpMatchEasy0_1K-4    1.21GB/s ± 4%  1.22GB/s ± 2%  +0.77%  (p=0.036 n=30+30)
RegexpMatchEasy1_32-4     298MB/s ± 7%   299MB/s ± 4%    ~     (p=0.511 n=30+30)
RegexpMatchEasy1_1K-4    1.00GB/s ± 3%  1.00GB/s ± 4%    ~     (p=0.304 n=28+30)
RegexpMatchMedium_32-4   7.75MB/s ± 4%  7.82MB/s ± 4%    ~     (p=0.089 n=30+30)
RegexpMatchMedium_1K-4   23.3MB/s ± 3%  23.4MB/s ± 3%    ~     (p=0.181 n=30+30)
RegexpMatchHard_32-4     14.3MB/s ± 4%  14.4MB/s ± 4%    ~     (p=0.320 n=30+29)
RegexpMatchHard_1K-4     15.1MB/s ± 4%  15.2MB/s ± 3%    ~     (p=0.273 n=30+30)
Revcomp-4                 137MB/s ± 3%   137MB/s ± 3%    ~     (p=0.352 n=30+30)
Template-4               27.9MB/s ± 3%  27.7MB/s ± 4%    ~     (p=0.277 n=30+30)
[Geo mean]               79.9MB/s       80.1MB/s       +0.15%

Change-Id: I97333cd8ddabb3c7c88ca5aa9e14a005b74d306d
Reviewed-on: https://go-review.googlesource.com/120695
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-22 03:38:38 +00:00
Ben Shi
556971316f cmd/compile: optimize 386's comparison
CMPL/CMPW/CMPB can take a memory operand on 386, and this CL
implements that optimization.

1. The total size of pkg/linux_386 decreases about 45KB, excluding
cmd/compile.

2. The go1 benchmark shows a little improvement.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.36s ± 2%     3.37s ± 3%    ~     (p=0.537 n=40+40)
Fannkuch11-4                3.59s ± 1%     3.53s ± 2%  -1.58%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.0ns ± 3%    45.8ns ± 3%    ~     (p=0.249 n=40+40)
FmtFprintfString-4         80.0ns ± 4%    78.8ns ± 3%  -1.49%  (p=0.001 n=40+40)
FmtFprintfInt-4            89.7ns ± 2%    90.3ns ± 2%  +0.74%  (p=0.003 n=40+40)
FmtFprintfIntInt-4          144ns ± 3%     143ns ± 3%  -0.95%  (p=0.003 n=40+40)
FmtFprintfPrefixedInt-4     181ns ± 4%     180ns ± 2%    ~     (p=0.103 n=40+40)
FmtFprintfFloat-4           412ns ± 3%     408ns ± 4%  -0.97%  (p=0.018 n=40+40)
FmtManyArgs-4               607ns ± 4%     605ns ± 4%    ~     (p=0.148 n=40+40)
GobDecode-4                7.19ms ± 4%    7.24ms ± 5%    ~     (p=0.340 n=40+40)
GobEncode-4                7.04ms ± 9%    6.99ms ± 9%    ~     (p=0.289 n=40+40)
Gzip-4                      400ms ± 6%     398ms ± 5%    ~     (p=0.168 n=40+40)
Gunzip-4                   41.2ms ± 3%    41.7ms ± 3%  +1.40%  (p=0.001 n=40+40)
HTTPClientServer-4         62.5µs ± 1%    62.1µs ± 2%  -0.61%  (p=0.000 n=37+37)
JSONEncode-4               20.7ms ± 4%    20.4ms ± 3%  -1.60%  (p=0.000 n=40+40)
JSONDecode-4               69.4ms ± 4%    69.2ms ± 6%    ~     (p=0.177 n=40+40)
Mandelbrot200-4            5.22ms ± 6%    5.21ms ± 3%    ~     (p=0.531 n=40+40)
GoParse-4                  3.29ms ± 3%    3.28ms ± 3%    ~     (p=0.321 n=40+39)
RegexpMatchEasy0_32-4       104ns ± 4%     103ns ± 7%  -0.89%  (p=0.040 n=40+40)
RegexpMatchEasy0_1K-4       852ns ± 3%     853ns ± 2%    ~     (p=0.357 n=40+40)
RegexpMatchEasy1_32-4       113ns ± 8%     113ns ± 3%    ~     (p=0.906 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 4%    1.03µs ± 5%    ~     (p=0.326 n=40+40)
RegexpMatchMedium_32-4      136ns ± 3%     133ns ± 3%  -2.31%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 3%    43.7µs ± 3%    ~     (p=0.053 n=40+40)
RegexpMatchHard_32-4       2.27µs ± 3%    2.26µs ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4       68.0µs ± 3%    68.9µs ± 3%  +1.28%  (p=0.000 n=40+40)
Revcomp-4                   1.86s ± 5%     1.86s ± 2%    ~     (p=0.950 n=40+40)
Template-4                 73.4ms ± 4%    69.9ms ± 7%  -4.78%  (p=0.000 n=40+40)
TimeParse-4                 449ns ± 4%     441ns ± 5%  -1.76%  (p=0.000 n=40+40)
TimeFormat-4                416ns ± 3%     417ns ± 4%    ~     (p=0.304 n=40+40)
[Geo mean]                 67.7µs         67.3µs       -0.55%

name                     old speed      new speed      delta
GobDecode-4               107MB/s ± 4%   106MB/s ± 5%    ~     (p=0.336 n=40+40)
GobEncode-4               109MB/s ± 5%   110MB/s ± 9%    ~     (p=0.142 n=38+40)
Gzip-4                   48.5MB/s ± 5%  48.8MB/s ± 5%    ~     (p=0.172 n=40+40)
Gunzip-4                  472MB/s ± 3%   465MB/s ± 3%  -1.39%  (p=0.001 n=40+40)
JSONEncode-4             93.6MB/s ± 4%  95.1MB/s ± 3%  +1.61%  (p=0.000 n=40+40)
JSONDecode-4             28.0MB/s ± 3%  28.1MB/s ± 6%    ~     (p=0.181 n=40+40)
GoParse-4                17.6MB/s ± 3%  17.7MB/s ± 3%    ~     (p=0.350 n=40+39)
RegexpMatchEasy0_32-4     308MB/s ± 4%   311MB/s ± 6%  +0.96%  (p=0.025 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.20GB/s ± 2%    ~     (p=0.317 n=40+40)
RegexpMatchEasy1_32-4     282MB/s ± 7%   282MB/s ± 3%    ~     (p=0.516 n=40+40)
RegexpMatchEasy1_1K-4     994MB/s ± 4%   991MB/s ± 5%    ~     (p=0.319 n=40+40)
RegexpMatchMedium_32-4   7.31MB/s ± 3%  7.49MB/s ± 3%  +2.46%  (p=0.000 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 3%  23.4MB/s ± 3%    ~     (p=0.052 n=40+40)
RegexpMatchHard_32-4     14.1MB/s ± 3%  14.1MB/s ± 4%    ~     (p=0.391 n=40+40)
RegexpMatchHard_1K-4     15.1MB/s ± 3%  14.9MB/s ± 3%  -1.27%  (p=0.000 n=40+40)
Revcomp-4                 137MB/s ± 5%   137MB/s ± 2%    ~     (p=0.942 n=40+40)
Template-4               26.5MB/s ± 4%  27.8MB/s ± 7%  +5.03%  (p=0.000 n=40+40)
[Geo mean]               78.6MB/s       79.0MB/s       +0.57%

Change-Id: Idcacc6881ef57cd7dc33aa87b711282842b72a53
Reviewed-on: https://go-review.googlesource.com/126618
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-20 14:23:22 +00:00
Keith Randall
f32348669c cmd/compile: rename memory-using operations
Some *mem ops are loads, some are stores, some are modifications.
Replace mem->load for the loads.
Replace mem->store for the stores.
Replace mem->modify for the load-modify-stores.

The only semantic change in this CL is to mark
ADD(Q|L)constmodify (which used to be ADD(Q|L)constmem) as
both a read and a write, instead of just a write. This is arguably
a bug fix, but the bug isn't triggerable at the moment, see CL 112157.

Change-Id: Iccb45aea817b606adb2d712ff99b10ee28e4616a
Reviewed-on: https://go-review.googlesource.com/112159
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-08 19:16:50 +00:00
Ben Shi
098ca846c7 cmd/compile: emit more compact 386 instructions
ADDL/SUBL/ANDL/ORL/XORL can have a memory operand as destination,
and this CL optimize the compiler to emit such instructions on
386 for more compact binary.

Here is test report:
1. The total size of pkg/linux_386/ and pkg/tool/linux_386/ decreases
about 14KB.
(pkg/linux_386/cmd/compile/ and pkg/tool/linux_386/compile are excluded)

2. The go1 benchmark shows little change, excluding ±2% noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              3.34s ± 2%     3.38s ± 2%  +1.27%  (p=0.000 n=40+39)
Fannkuch11-4                3.55s ± 1%     3.51s ± 1%  -1.33%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.3ns ± 3%    46.9ns ± 4%  +1.41%  (p=0.002 n=40+40)
FmtFprintfString-4         80.8ns ± 3%    80.4ns ± 6%  -0.54%  (p=0.044 n=40+40)
FmtFprintfInt-4            93.0ns ± 3%    92.2ns ± 4%  -0.88%  (p=0.007 n=39+40)
FmtFprintfIntInt-4          144ns ± 5%     145ns ± 2%  +0.78%  (p=0.015 n=40+40)
FmtFprintfPrefixedInt-4     184ns ± 2%     182ns ± 2%  -1.06%  (p=0.004 n=40+40)
FmtFprintfFloat-4           415ns ± 4%     419ns ± 4%    ~     (p=0.434 n=40+40)
FmtManyArgs-4               615ns ± 3%     619ns ± 3%    ~     (p=0.100 n=40+40)
GobDecode-4                7.30ms ± 6%    7.36ms ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4                7.10ms ± 6%    7.21ms ± 5%    ~     (p=0.082 n=40+39)
Gzip-4                      364ms ± 3%     362ms ± 6%  -0.71%  (p=0.020 n=40+40)
Gunzip-4                   42.4ms ± 3%    42.2ms ± 3%    ~     (p=0.303 n=40+40)
HTTPClientServer-4         62.9µs ± 1%    62.9µs ± 1%    ~     (p=0.768 n=38+39)
JSONEncode-4               21.4ms ± 4%    21.5ms ± 5%    ~     (p=0.210 n=40+40)
JSONDecode-4               67.7ms ± 3%    67.9ms ± 4%    ~     (p=0.713 n=40+40)
Mandelbrot200-4            5.18ms ± 3%    5.21ms ± 3%  +0.59%  (p=0.021 n=40+40)
GoParse-4                  3.35ms ± 3%    3.34ms ± 2%    ~     (p=0.996 n=40+40)
RegexpMatchEasy0_32-4      98.5ns ± 5%    96.3ns ± 4%  -2.15%  (p=0.001 n=40+40)
RegexpMatchEasy0_1K-4       851ns ± 4%     850ns ± 5%    ~     (p=0.700 n=40+40)
RegexpMatchEasy1_32-4       105ns ± 7%     107ns ± 4%  +1.50%  (p=0.017 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 5%    1.03µs ± 4%    ~     (p=0.992 n=40+40)
RegexpMatchMedium_32-4      130ns ± 6%     128ns ± 4%  -1.66%  (p=0.012 n=40+40)
RegexpMatchMedium_1K-4     44.0µs ± 5%    43.6µs ± 3%    ~     (p=0.704 n=40+40)
RegexpMatchHard_32-4       2.29µs ± 3%    2.23µs ± 4%  -2.38%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4       69.0µs ± 3%    68.1µs ± 3%  -1.28%  (p=0.003 n=40+40)
Revcomp-4                   1.85s ± 2%     1.87s ± 3%  +1.11%  (p=0.000 n=40+40)
Template-4                 69.8ms ± 3%    69.6ms ± 3%    ~     (p=0.125 n=40+40)
TimeParse-4                 442ns ± 5%     440ns ± 3%    ~     (p=0.585 n=40+40)
TimeFormat-4                419ns ± 3%     420ns ± 3%    ~     (p=0.824 n=40+40)
[Geo mean]                 67.3µs         67.2µs       -0.11%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 6%   104MB/s ± 6%    ~     (p=0.074 n=40+40)
GobEncode-4               108MB/s ± 7%   107MB/s ± 5%    ~     (p=0.080 n=40+39)
Gzip-4                   53.3MB/s ± 3%  53.7MB/s ± 6%  +0.73%  (p=0.021 n=40+40)
Gunzip-4                  458MB/s ± 3%   460MB/s ± 3%    ~     (p=0.301 n=40+40)
JSONEncode-4             90.8MB/s ± 4%  90.3MB/s ± 4%    ~     (p=0.213 n=40+40)
JSONDecode-4             28.7MB/s ± 3%  28.6MB/s ± 4%    ~     (p=0.679 n=40+40)
GoParse-4                17.3MB/s ± 3%  17.3MB/s ± 2%    ~     (p=1.000 n=40+40)
RegexpMatchEasy0_32-4     325MB/s ± 5%   333MB/s ± 4%  +2.44%  (p=0.000 n=40+38)
RegexpMatchEasy0_1K-4    1.20GB/s ± 4%  1.21GB/s ± 5%    ~     (p=0.684 n=40+40)
RegexpMatchEasy1_32-4     303MB/s ± 7%   298MB/s ± 4%  -1.52%  (p=0.022 n=40+40)
RegexpMatchEasy1_1K-4     995MB/s ± 5%   996MB/s ± 4%    ~     (p=0.996 n=40+40)
RegexpMatchMedium_32-4   7.67MB/s ± 6%  7.80MB/s ± 4%  +1.68%  (p=0.011 n=40+40)
RegexpMatchMedium_1K-4   23.3MB/s ± 5%  23.5MB/s ± 3%    ~     (p=0.697 n=40+40)
RegexpMatchHard_32-4     14.0MB/s ± 3%  14.3MB/s ± 4%  +2.43%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4     14.8MB/s ± 3%  15.0MB/s ± 3%  +1.30%  (p=0.003 n=40+40)
Revcomp-4                 137MB/s ± 2%   136MB/s ± 3%  -1.10%  (p=0.000 n=40+40)
Template-4               27.8MB/s ± 3%  27.9MB/s ± 3%    ~     (p=0.128 n=40+40)
[Geo mean]               79.6MB/s       79.9MB/s       +0.28%

Change-Id: I02a3efc125dc81e18fc8495eb2bf1bba59ab8733
Reviewed-on: https://go-review.googlesource.com/110157
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-08 06:44:54 +00:00
Austin Clements
8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
David Chase
ab48574bab cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch.  If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.

This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless.  The problematic loop is

		for i := 0; i < w; i++ {
			if pw[i] != 0 {
				return true
			}
		}

compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.

Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly.  In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.

Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11 17:35:34 +00:00
Ben Shi
20cf5c4962 cmd/compile: optimize 386 binary operations with a memory operand
Some integer/float binary operations of 386 can take a direct memory
operand, which is more efficient than loading to a register.

These CL does this optimization by copying the similar solution
of amd64. And the go1 benchmark shows some inprovements, especially
the test case Template. (excluding noise)

name                     old time/op    new time/op    delta
BinaryTree17-4              3.42s ± 2%     3.40s ± 2%    ~     (p=0.069 n=38+39)
Fannkuch11-4                3.48s ± 1%     3.53s ± 1%  +1.59%  (p=0.000 n=40+40)
FmtFprintfEmpty-4          46.7ns ± 4%    46.3ns ± 3%  -1.03%  (p=0.001 n=40+40)
FmtFprintfString-4         80.1ns ± 3%    80.6ns ± 3%  +0.56%  (p=0.029 n=40+40)
FmtFprintfInt-4            92.4ns ± 2%    92.3ns ± 3%    ~     (p=0.847 n=40+40)
FmtFprintfIntInt-4          147ns ± 3%     144ns ± 3%  -1.87%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4     182ns ± 2%     184ns ± 3%  +0.99%  (p=0.002 n=40+40)
FmtFprintfFloat-4           387ns ± 3%     384ns ± 3%    ~     (p=0.069 n=40+40)
FmtManyArgs-4               619ns ± 3%     616ns ± 3%    ~     (p=0.320 n=40+40)
GobDecode-4                7.28ms ± 6%    7.27ms ± 5%    ~     (p=0.897 n=40+40)
GobEncode-4                7.33ms ± 6%    7.21ms ± 6%  -1.56%  (p=0.022 n=38+40)
Gzip-4                      357ms ± 4%     357ms ± 4%    ~     (p=0.071 n=40+40)
Gunzip-4                   45.3ms ± 3%    45.4ms ± 3%    ~     (p=0.452 n=40+40)
HTTPClientServer-4         63.0µs ± 2%    62.9µs ± 3%    ~     (p=0.760 n=38+39)
JSONEncode-4               22.0ms ± 4%    21.7ms ± 4%  -1.49%  (p=0.000 n=40+40)
JSONDecode-4               67.7ms ± 4%    68.3ms ± 3%  +0.86%  (p=0.039 n=40+40)
Mandelbrot200-4            5.16ms ± 3%    5.17ms ± 3%    ~     (p=0.418 n=40+40)
GoParse-4                  3.30ms ± 2%    3.32ms ± 3%  +0.55%  (p=0.017 n=40+40)
RegexpMatchEasy0_32-4       104ns ± 3%     104ns ± 4%    ~     (p=0.992 n=40+40)
RegexpMatchEasy0_1K-4       852ns ± 3%     851ns ± 2%    ~     (p=0.344 n=40+40)
RegexpMatchEasy1_32-4       113ns ± 4%     113ns ± 5%    ~     (p=0.937 n=40+40)
RegexpMatchEasy1_1K-4      1.03µs ± 5%    1.04µs ± 4%    ~     (p=0.430 n=40+40)
RegexpMatchMedium_32-4      132ns ± 4%     131ns ± 3%  -1.06%  (p=0.027 n=40+40)
RegexpMatchMedium_1K-4     43.0µs ± 3%    43.2µs ± 3%    ~     (p=0.122 n=40+40)
RegexpMatchHard_32-4       2.21µs ± 4%    2.20µs ± 4%    ~     (p=0.146 n=40+40)
RegexpMatchHard_1K-4       67.1µs ± 4%    67.2µs ± 3%    ~     (p=0.859 n=40+40)
Revcomp-4                   1.85s ± 2%     1.85s ± 3%    ~     (p=0.184 n=40+40)
Template-4                 70.1ms ± 4%    67.5ms ± 3%  -3.65%  (p=0.000 n=40+40)
TimeParse-4                 457ns ±16%     439ns ± 4%    ~     (p=0.683 n=40+34)
TimeFormat-4                413ns ± 3%     414ns ± 3%    ~     (p=0.850 n=40+40)
[Geo mean]                 67.5µs         67.3µs       -0.38%

name                     old speed      new speed      delta
GobDecode-4               105MB/s ± 6%   106MB/s ± 5%    ~     (p=0.893 n=40+40)
GobEncode-4               105MB/s ± 6%   107MB/s ± 7%  +1.60%  (p=0.023 n=38+40)
Gzip-4                   54.4MB/s ± 4%  54.5MB/s ± 4%    ~     (p=0.073 n=40+40)
Gunzip-4                  429MB/s ± 3%   428MB/s ± 3%    ~     (p=0.453 n=40+40)
JSONEncode-4             88.3MB/s ± 5%  89.6MB/s ± 4%  +1.51%  (p=0.000 n=40+40)
JSONDecode-4             28.7MB/s ± 4%  28.4MB/s ± 3%  -0.87%  (p=0.039 n=40+40)
GoParse-4                17.6MB/s ± 3%  17.5MB/s ± 3%  -0.55%  (p=0.020 n=40+40)
RegexpMatchEasy0_32-4     308MB/s ± 4%   308MB/s ± 5%    ~     (p=0.988 n=40+40)
RegexpMatchEasy0_1K-4    1.20GB/s ± 3%  1.20GB/s ± 2%    ~     (p=0.329 n=40+40)
RegexpMatchEasy1_32-4     283MB/s ± 4%   283MB/s ± 4%    ~     (p=0.507 n=40+40)
RegexpMatchEasy1_1K-4     991MB/s ± 5%   987MB/s ± 4%    ~     (p=0.446 n=40+40)
RegexpMatchMedium_32-4   7.54MB/s ± 4%  7.63MB/s ± 3%  +1.26%  (p=0.004 n=40+40)
RegexpMatchMedium_1K-4   23.8MB/s ± 3%  23.7MB/s ± 4%    ~     (p=0.121 n=40+40)
RegexpMatchHard_32-4     14.5MB/s ± 4%  14.6MB/s ± 4%    ~     (p=0.145 n=40+40)
RegexpMatchHard_1K-4     15.3MB/s ± 4%  15.2MB/s ± 3%    ~     (p=0.874 n=40+40)
Revcomp-4                 137MB/s ± 2%   137MB/s ± 3%    ~     (p=0.179 n=40+40)
Template-4               27.7MB/s ± 4%  28.7MB/s ± 3%  +3.78%  (p=0.000 n=40+40)
[Geo mean]               78.9MB/s       79.2MB/s       +0.38%

Change-Id: I3ba688c253b665485c1ebdf5a75f4ce82cc3def3
Reviewed-on: https://go-review.googlesource.com/102036
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-05 16:09:32 +00:00
isharipo
85a8d25d53 cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.

This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.

Note: these have exactly the same encoding when reg0=reg1:
      IMUL3x $const, reg0, reg1
      IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].

Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12 19:02:36 +00:00
Austin Clements
252f1170e5 runtime: buffered write barrier for 386
Updates #22460.

Change-Id: I3c8e90fd6bcda7e28911036591873d63665aaca7
Reviewed-on: https://go-review.googlesource.com/92696
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13 16:34:15 +00:00
Matthew Dempsky
a5868a47c6 cmd/internal/obj/x86: move MOV->XOR rewriting into compiler
Fixes #20986.

Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3
Reviewed-on: https://go-review.googlesource.com/73072
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-10-24 21:32:17 +00:00
Cherry Zhang
6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
David Chase
6cac100eef cmd/compile: add intrinsic for reading caller's pc
First step towards removing the mandatory argument for
getcallerpc, which solves certain problems for the runtime.
This might also slightly improve performance.

Intrinsic enabled on 386, amd64, amd64p32,
runtime asm implementation removed on those architectures.

Now-superfluous argument remains in getcallerpc signature
(for a future CL; non-386/amd64 asm funcs ignore it).

Added getcallerpc to the "not a real function" test
in dcl.go, that story is a little odd with respect to
unexported functions but that is not this CL.

Fixes #17327.

Change-Id: I5df1ad91f27ee9ac1f0dd88fa48f1329d6306c3e
Reviewed-on: https://go-review.googlesource.com/31851
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-09-22 18:37:03 +00:00
Daniel Martí
99da8730b0 all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones
mid-sentence or after commas.

Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0
Reviewed-on: https://go-review.googlesource.com/57293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-26 15:09:09 +00:00
Josh Bleecher Snyder
a45d6859cb cmd/compile: enforce that MOVXconvert is a no-op on 386 and amd64
Follow-up to CL 58371.

Change-Id: I3d2aaec84ee6db3ef1bd4fcfcaf46cc297c7176b
Reviewed-on: https://go-review.googlesource.com/58610
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-25 19:39:46 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Josh Bleecher Snyder
dae5389d3d Revert "cmd/compile: add Type.MustSize and Type.MustAlignment"
This reverts commit 94d540a4b6.

Reason for revert: prefer something along the lines of CL 42018.

Change-Id: I876fe32e98f37d8d725fe55e0fd0ea429c0198e0
Reviewed-on: https://go-review.googlesource.com/42022
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-28 01:24:13 +00:00
Josh Bleecher Snyder
94d540a4b6 cmd/compile: add Type.MustSize and Type.MustAlignment
Type.Size and Type.Alignment are for the front end:
They calculate size and alignment if needed.

Type.MustSize and Type.MustAlignment are for the back end:
They call Fatal if size and alignment are not already calculated.

Most uses are of MustSize and MustAlignment,
but that's because the back end is newer,
and this API was added to support it.

This CL was mostly generated with sed and selective reversion.
The only mildly interesting bit is the change of the ssa.Type interface
and the supporting ssa dummy types.

Follow-up to review feedback on CL 41970.

Passes toolstash-check.

Change-Id: I0d9b9505e57453dae8fb6a236a07a7a02abd459e
Reviewed-on: https://go-review.googlesource.com/42016
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-27 22:57:57 +00:00
Keith Randall
1e72bf6218 cmd/compile: experiment which clobbers all dead pointer fields
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.

Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860

Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-21 20:19:50 +00:00
Josh Bleecher Snyder
99683483d6 cmd/internal/obj: unify creation of numeric literal syms
This is a straightforward refactoring,
to reduce the scope of upcoming changes.

The symbol size and AttrLocal=true was not
set universally, but it appears not to matter,
since toolstash -cmp is happy.

Passes toolstash-check -all.

Change-Id: I7f8392f939592d3a1bc6f61dec992f5661f42fca
Reviewed-on: https://go-review.googlesource.com/39791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-06 19:01:50 +00:00
Josh Bleecher Snyder
c311488283 cmd/internal/obj: remove Linklookup
It was simply a wrapper around Link.Lookup.
Unwrap everything.

CL prepared using eg with template:

package p

import "cmd/internal/obj"

func before(ctxt *obj.Link, name string, version int) *obj.LSym {
	return obj.Linklookup(ctxt, name, version)
}

func after(ctxt *obj.Link, name string, version int) *obj.LSym {
	return ctxt.Lookup(name, version)
}

Then one comment in cmd/asm/internal/asm/parse.go
was manually updated (and gofmt'ed!),
and func Linklookup deleted.

Passes toolstash-check (as a sanity measure).

Change-Id: Icc4d56b0b2b5c8888d3184c1898c48359ea1e638
Reviewed-on: https://go-review.googlesource.com/39715
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-06 19:01:41 +00:00
Keith Randall
214be5b302 cmd/compile: remove likely bits from generated assembly
We don't need them any more since #15837 was fixed.

Fixes #19718

Change-Id: I13e46c62b321b2c9265f44c977b63bfb23163ca2
Reviewed-on: https://go-review.googlesource.com/38664
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-26 04:40:20 +00:00
Matthew Dempsky
22ea7fc1a9 cmd/compile/internal/gc: make SSAGenFPJump a method of SSAGenState
Change-Id: Ie22a08c93dfcfd4b336e7b158415448dd55b2c11
Reviewed-on: https://go-review.googlesource.com/38407
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-22 17:46:08 +00:00
Josh Bleecher Snyder
0a94daa378 cmd/compile: funnel SSA Prog creation through SSAGenState
Step one in eliminating Prog-related globals.

Passes toolstash-check -all.

Updates #15756

Change-Id: I3b777fb5a7716f2d9da3067fbd94c28ca894a465
Reviewed-on: https://go-review.googlesource.com/38450
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-22 17:18:40 +00:00
Matthew Dempsky
3e2f980e27 cmd/compile: eliminate direct uses of gc.Thearch in backends
This CL changes the GOARCH.Init functions to take gc.Thearch as a
parameter, which gc.Main supplies.

Additionally, the x86 backend is refactored to decide within Init
whether to use the 387 or SSE2 instruction generators, rather than for
each individual SSA Value/Block.

Passes toolstash-check -all.

Change-Id: Ie6305a6cd6f6ab4e89ecbb3cbbaf5ffd57057a24
Reviewed-on: https://go-review.googlesource.com/38301
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-17 22:10:53 +00:00
Matthew Dempsky
118b3fe7bb cmd/compile/internal/gc: refactor ACALL Prog creation
This abstracts creation of ACALL Progs into package gc. The main
benefit of this today is we can refactor away a lot of common
boilerplate code.

Later, once liveness analysis happens on the SSA graph, this will also
provide an easy insertion point for emitting the PCDATA Progs
immediately before call instructions.

Passes toolstash-check -all.

Change-Id: Ia15108ace97201cd84314f1ca916dfeb4f09d61c
Reviewed-on: https://go-review.googlesource.com/38081
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:04:16 +00:00
Matthew Dempsky
08d8d5c986 cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
Passes toolstash-check -all.

Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 19:44:36 +00:00
Matthew Dempsky
5e4a958351 cmd/compile: refactor portable SSA Op handling
Several SSA ops will always behave identically regardless of target
architecture, so handle those within gc/ssa.go instead.

Passes toolstash-check -all.

Change-Id: I54d514e80ab86723e44332a5a38e3054cbca8c5d
Reviewed-on: https://go-review.googlesource.com/37931
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-07 21:17:03 +00:00
Matthew Dempsky
02e36f8c87 cmd/compile/internal/ssa: remove Hmul{8,16}{,u} ops
Change-Id: I90865921584ae4bdfb6c220d439b14593d72b6f9
Reviewed-on: https://go-review.googlesource.com/37752
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-03 20:47:36 +00:00
Keith Randall
708ba22a0c cmd/compile: move constant divide strength reduction to SSA rules
Currently the conversion from constant divides to multiplies is mostly
done during the walk pass.  This is suboptimal because SSA can
determine that the value being divided by is constant more often
(e.g. after inlining).

Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
Reviewed-on: https://go-review.googlesource.com/37015
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-17 06:16:44 +00:00
Matthew Dempsky
6a29440dcc cmd/compile/internal/gc: remove more backend Sym uses
Removes all external uses of Linksym and Pkglookup, which are the only
two exported functions that return Syms.

Also add Duffcopy and Duffzero since they're used often enough across
SSA backends.

Passes toolstash -cmp.

Change-Id: I8d3fd048ad5cd676fc46378f09a917569ffc9b2c
Reviewed-on: https://go-review.googlesource.com/36418
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-06 23:25:44 +00:00
Matthew Dempsky
5f374ea8fb cmd/compile/internal/gc: stop exporting *gc.Sym-typed globals
The arch-specific SSA backends now no longer use gc.Sym either.

Passes toolstash -cmp.

Change-Id: Ic13b934b92a1b89b4b79c6c4796ab0a137608163
Reviewed-on: https://go-review.googlesource.com/36416
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-06 22:45:49 +00:00
Matthew Dempsky
87c475c227 cmd/compile/internal/ssa: use obj.LSym instead of gc.Sym
Gc's Sym type represents a package-qualified identifier, which is a
frontend concept and doesn't belong in SSA. Bonus: we can replace some
interface{} types with *obj.LSym.

Passes toolstash -cmp.

Change-Id: I456eb9957207d80f99f6eb9b8eab4a1f3263e9ed
Reviewed-on: https://go-review.googlesource.com/36415
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-06 22:45:34 +00:00
Robert Griesemer
cfd17f51c8 [dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos
This is a mostly mechanical rename followed by manual fixes where necessary.

Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:36:52 +00:00
Robert Griesemer
eab3707d6d [dev.inline] cmd/compile: rename various fields from Lineno to Pos
Various minor adjustments.

Change-Id: Iedfb97989f7bedaa3e9e8993b167e05f162434a7
Reviewed-on: https://go-review.googlesource.com/34136
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:35:18 +00:00
Robert Griesemer
82d0caea2c [dev.inline] cmd/internal/src: make Pos implementation abstract
Adjust cmd/compile accordingly.

This will make it easier to replace the underlying implementation.

Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:31:28 +00:00
shaharko
d391dc260a cmd/internal/obj: Use bitfield for LSym attributes
Reduces the size of LSym struct.

On 32bit: before 84  after 76
On 64bit: before 136 after 128

name       old time/op     new time/op     delta
Template       182ms ± 3%      182ms ± 3%    ~           (p=0.607 n=19+20)
Unicode       93.5ms ± 4%     94.2ms ± 3%    ~           (p=0.141 n=20+19)
GoTypes        608ms ± 1%      605ms ± 2%    ~           (p=0.056 n=20+20)

name       old user-ns/op  new user-ns/op  delta
Template        249M ± 7%       249M ± 4%    ~           (p=0.605 n=18+19)
Unicode         149M ±14%       151M ± 5%    ~           (p=0.724 n=20+17)
GoTypes         855M ± 4%       853M ± 3%    ~           (p=0.537 n=19+19)

name       old alloc/op    new alloc/op    delta
Template      40.3MB ± 0%     40.3MB ± 0%  -0.11%        (p=0.000 n=19+20)
Unicode       33.8MB ± 0%     33.8MB ± 0%  -0.08%        (p=0.000 n=20+20)
GoTypes        119MB ± 0%      119MB ± 0%  -0.10%        (p=0.000 n=19+20)

name       old allocs/op   new allocs/op   delta
Template        383k ± 0%       383k ± 0%    ~           (p=0.703 n=20+20)
Unicode         317k ± 0%       317k ± 0%    ~           (p=0.982 n=19+18)
GoTypes        1.14M ± 0%      1.14M ± 0%    ~           (p=0.086 n=20+20)

Change-Id: Id6ba0db3ecc4503a4e9af3ed0d5884d4366e8bf9
Reviewed-on: https://go-review.googlesource.com/31870
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Shahar Kohanim <skohanim@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-25 20:10:05 +00:00
Matthew Dempsky
8c24bff52b cmd/compile: layout stack frame during SSA
Identify live stack variables during SSA and compute the stack frame
layout earlier so that we can emit instructions with the correct
offsets upfront.

Passes toolstash/buildall.

Change-Id: I191100dba274f1e364a15bdcfdc1d1466cdd1db5
Reviewed-on: https://go-review.googlesource.com/30216
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-04 17:07:36 +00:00
Keith Randall
833ed7c431 cmd/compile: reorganize SSA register numbering
Teach SSA about the cmd/internal/obj/$ARCH register numbering.
It can then return that numbering when requested.  Each architecture
now does not need to know anything about the internal SSA numbering
of registers.

Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d
Reviewed-on: https://go-review.googlesource.com/29249
Reviewed-by: David Chase <drchase@google.com>
2016-09-16 19:01:55 +00:00
Keith Randall
3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Keith Randall
167e381f40 cmd/compile: make ssa compilation unconditional
Rip out the code that allows SSA to be used conditionally.

No longer exists:
 ssa=0 flag
 GOSSAHASH
 GOSSAPKG
 SSATEST

GOSSAFUNC now only controls the printing of the IR/html.

Still need to rip out all of the old backend.  It should no longer be
callable after this CL.

Update #16357

Change-Id: Ib30cc18fba6ca52232c41689ba610b0a94aa74f5
Reviewed-on: https://go-review.googlesource.com/29155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-14 17:38:04 +00:00
Keith Randall
c345a3913f cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.

Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.

Fixes #15631

Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Michael Pratt
41e1c42028 cmd/compile: refactor out KeepAlive
Reduce the duplication in every arch by moving the code into package gc.

Change-Id: Ia111add8316492571825431ecd4f0154c8792ae1
Reviewed-on: https://go-review.googlesource.com/28481
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-04 04:02:12 +00:00
Keith Randall
320ddcf834 cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64.  There's no reason
to pay the overhead of a call for these.

To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.

Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out.  This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
BenchmarkAtomicStore64-8     7.54          5.72          -24.14%

TBD (in a different CL): Cas, Or8, ...

Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-25 20:09:04 +00:00
Keith Randall
8f955d3664 [dev.ssa] cmd/compile: fix fp constant loads for 386+PIC
In position-independent 386 code, loading floating-point constants from
the constant pool requires two steps: materializing the address of
the constant pool entry (requires calling a thunk) and then loading
from that address.

Before this CL, the materializing happened implicitly in CX, which
clobbered that register.

Change-Id: Id094e0fb2d3be211089f299e8f7c89c315de0a87
Reviewed-on: https://go-review.googlesource.com/26811
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-08-11 19:52:45 +00:00
Keith Randall
c069bc4996 [dev.ssa] cmd/compile: implement GO386=387
Last part of the 386 SSA port.

Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.

Turn on SSA backend for 386 by default.

Fixes #16358

Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-08-10 17:41:01 +00:00