Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Josh Bleecher Snyder	74f898360d	cmd/compile: constant fold SSA bool to int conversions Shaves off a few instructions here and there. file before after Δ % go/types.s 322118 321851 -267 -0.083% go/internal/gcimporter.s 34937 34909 -28 -0.080% go/internal/gccgoimporter.s 56493 56474 -19 -0.034% cmd/compile/internal/ssa.s 3926994 3927177 +183 +0.005% total 18862670 18862539 -131 -0.001% Change-Id: I724f32317b946b5138224808f85709d9c097a247 Reviewed-on: https://go-review.googlesource.com/c/go/+/221428 Reviewed-by: Keith Randall <khr@golang.org>	2020-02-29 17:02:40 +00:00
Josh Bleecher Snyder	390c096ee9	cmd/compile: make clobber variadic There are often many values to clobber. Allow passing them all in at once. The goal is increased rule readability. As a bonus, it shrinks cmd/compile by ~97k, almost half a percent. Package SSA requires 1.2% less memory to compile. The single-line changes were make via regex, and the remaining multi-line clobbers were manually combined. Passes toolstash-check -all. Change-Id: Ib310e9265d3616211f8192c9040b4c8933824d19 Reviewed-on: https://go-review.googlesource.com/c/go/+/220691 Reviewed-by: Michael Munday <mike.munday@ibm.com>	2020-02-26 18:59:58 +00:00
Michael Munday	cb74dcc172	cmd/compile: remove Greater* and Geq* generic integer ops The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 13:11:53 +00:00
Bryan C. Mills	a9f1ea4a83	Revert "cmd/compile: don't allow NaNs in floating-point constant ops" This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-25 15:49:19 +00:00
Keith Randall	2aa7c6c548	cmd/compile: don't allow NaNs in floating-point constant ops We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-25 02:21:53 +00:00
Keith Randall	bc98e35b53	cmd/compile: avoid memmove -> SSA move rewrite when size is negative We should panic in this situation. Rewriting to a SSA op just leads to a compiler panic. Fixes #36259 Change-Id: I6e0bccbed7dd0fdac7ebae76b98a211947947386 Reviewed-on: https://go-review.googlesource.com/c/go/+/212405 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-24 20:23:14 +00:00
Josh Bleecher Snyder	6dd11bcb35	cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder	a3f234c706	cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder	bd6d78ef37	cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 17:34:07 +00:00
Josh Bleecher Snyder	49f8d45994	cmd/compile: delete duplicate rules Add logic during rulegen to detect exact duplicates (after applying commutativity), and clean up existing duplicates. Change-Id: I7179f40fc48e236c74b74f429ec9f0f100026530 Reviewed-on: https://go-review.googlesource.com/c/go/+/213699 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 05:04:39 +00:00
Brian Kessler	6b1d5471b9	cmd/compile: add signed indivisibility by power of 2 rules Commit `44343c777c` (CL 173557) added rules for handling divisibility checks for powers of 2 for signed integers, x%c ==0. This change adds the complementary indivisibility rules, x%c != 0. Fixes #34166 Change-Id: I87379e30af7aff633371acca82db2397da9b2c07 Reviewed-on: https://go-review.googlesource.com/c/go/+/194219 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-07 16:30:46 +00:00
Michael Munday	bb7890b85a	cmd/compile: absorb more Not ops into Neq* and Eq* ops We absorbed Not into most integer comparisons but not into pointer and floating point equality checks. The new cases trigger more than 300 times during make.bash. Change-Id: I77c6b31fcacde10da5470b73fc001a19521ce78d Reviewed-on: https://go-review.googlesource.com/c/go/+/200618 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-04 20:35:52 +00:00
Keith Randall	70331a31ed	cmd/compile: fix typing of IData opcodes The rules for extracting the interface data word don't leave the result typed correctly. If I do i.([1]int)[0], the result should have type int, not [1]*int. Using (IData x) for the result keeps the typing of the original top-level Value. I don't think this would ever cause a real codegen bug, bug fixing it at least makes the typing shown in ssa.html more consistent. Change-Id: I239d821c394e58347639387981b0510d13b2f7b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/204042 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-29 21:21:09 +00:00
Michael Munday	c7d81bc086	cmd/compile: reduce amount of code generated for block rewrite rules Add a Reset method to blocks that allows us to reduce the amount of code we generate for block rewrite rules. Thanks to Cherry for suggesting a similar fix to this in CL 196557. Compilebench result: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 1% -0.30% (p=0.028 n=19+20) Unicode 83.7ms ± 3% 83.0ms ± 2% -0.79% (p=0.029 n=18+19) GoTypes 757ms ± 1% 755ms ± 1% -0.31% (p=0.034 n=19+19) Compiler 3.51s ± 1% 3.50s ± 1% -0.20% (p=0.013 n=18+18) SSA 11.7s ± 1% 11.7s ± 1% -0.38% (p=0.000 n=19+19) Flate 131ms ± 1% 130ms ± 1% -0.32% (p=0.024 n=18+18) GoParser 162ms ± 1% 162ms ± 1% ~ (p=0.059 n=20+18) Reflect 471ms ± 0% 470ms ± 0% -0.24% (p=0.045 n=20+17) Tar 187ms ± 1% 186ms ± 1% ~ (p=0.157 n=20+20) XML 255ms ± 1% 255ms ± 1% ~ (p=0.461 n=19+20) LinkCompiler 754ms ± 2% 755ms ± 2% ~ (p=0.919 n=17+17) ExternalLinkCompiler 2.82s ±16% 2.37s ±10% -15.94% (p=0.000 n=20+20) LinkWithoutDebugCompiler 439ms ± 4% 442ms ± 6% ~ (p=0.461 n=18+19) StdCmd 25.8s ± 2% 25.5s ± 1% -0.95% (p=0.000 n=20+20) name old user-time/op new user-time/op delta Template 240ms ± 8% 238ms ± 7% ~ (p=0.301 n=20+20) Unicode 107ms ±18% 104ms ±13% ~ (p=0.149 n=20+20) GoTypes 883ms ± 3% 888ms ± 2% ~ (p=0.211 n=20+20) Compiler 4.22s ± 1% 4.20s ± 1% ~ (p=0.077 n=20+18) SSA 14.1s ± 1% 14.1s ± 2% ~ (p=0.192 n=20+20) Flate 145ms ±10% 148ms ± 5% ~ (p=0.126 n=20+18) GoParser 186ms ± 7% 186ms ± 7% ~ (p=0.779 n=20+20) Reflect 538ms ± 3% 541ms ± 3% ~ (p=0.192 n=20+20) Tar 218ms ± 4% 217ms ± 6% ~ (p=0.835 n=19+20) XML 298ms ± 5% 298ms ± 5% ~ (p=0.749 n=19+20) LinkCompiler 818ms ± 5% 825ms ± 8% ~ (p=0.461 n=20+20) ExternalLinkCompiler 1.55s ± 4% 1.53s ± 5% ~ (p=0.063 n=20+18) LinkWithoutDebugCompiler 460ms ±12% 460ms ± 7% ~ (p=0.925 n=20+20) name old object-bytes new object-bytes delta Template 554kB ± 0% 554kB ± 0% ~ (all equal) Unicode 215kB ± 0% 215kB ± 0% ~ (all equal) GoTypes 2.01MB ± 0% 2.01MB ± 0% ~ (all equal) Compiler 7.97MB ± 0% 7.97MB ± 0% +0.00% (p=0.000 n=20+20) SSA 26.8MB ± 0% 26.9MB ± 0% +0.27% (p=0.000 n=20+20) Flate 340kB ± 0% 340kB ± 0% ~ (all equal) GoParser 434kB ± 0% 434kB ± 0% ~ (all equal) Reflect 1.34MB ± 0% 1.34MB ± 0% ~ (all equal) Tar 480kB ± 0% 480kB ± 0% ~ (all equal) XML 622kB ± 0% 622kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 20.4kB ± 0% 20.4kB ± 0% ~ (all equal) Unicode 8.21kB ± 0% 8.21kB ± 0% ~ (all equal) GoTypes 36.6kB ± 0% 36.6kB ± 0% ~ (all equal) Compiler 115kB ± 0% 115kB ± 0% +0.08% (p=0.000 n=20+20) SSA 141kB ± 0% 141kB ± 0% +0.07% (p=0.000 n=20+20) Flate 5.11kB ± 0% 5.11kB ± 0% ~ (all equal) GoParser 8.93kB ± 0% 8.93kB ± 0% ~ (all equal) Reflect 11.8kB ± 0% 11.8kB ± 0% ~ (all equal) Tar 10.9kB ± 0% 10.9kB ± 0% ~ (all equal) XML 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 742kB ± 0% 742kB ± 0% ~ (all equal) CmdGoSize 10.7MB ± 0% 10.7MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.10MB ± 0% 1.10MB ± 0% ~ (all equal) CmdGoSize 14.9MB ± 0% 14.9MB ± 0% ~ (all equal) Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/198419 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-10-07 09:19:12 +00:00
Michael Munday	9c2e7e8bed	cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-02 09:56:36 +00:00
Martin Möhrmann	f41451e7eb	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I5a59f6239660d53c029cd22dfb44ddf39f93a56c Reviewed-on: https://go-review.googlesource.com/c/go/+/196810 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-24 20:30:59 +00:00
Daniel Martí	870080752d	cmd/compile: reduce rulegen's output by 200 KiB First, renove unnecessary "// cond:" lines from the generated files. This shaves off about ~7k lines. Second, join "if cond { break }" statements via "\|\|", which allows us to deduplicate a large number of them. This shaves off another ~25k lines. This change is not for readability or simplicity; but rather, to avoid unnecessary verbosity that makes the generated files larger. All in all, git reports that the generated files overall weigh ~200KiB less, or about 2.7% less. While at it, add a -trace flag to rulegen. Updates #33644. Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a Reviewed-on: https://go-review.googlesource.com/c/go/+/196498 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-23 16:36:10 +00:00
Bryan C. Mills	34fe8295c5	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts CL 194297. Reason for revert: introduced register allocation failures on PPC64LE builders. Updates #33826 Updates #32781 Updates #34468 Change-Id: I7d0b55df8cdf8e7d2277f1814299b083c2692e48 Reviewed-on: https://go-review.googlesource.com/c/go/+/196957 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-23 15:20:12 +00:00
Martin Möhrmann	4e2b84ffc5	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb Reviewed-on: https://go-review.googlesource.com/c/go/+/194297 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-21 18:00:13 +00:00
Martin Möhrmann	5bb59b6d16	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts commit `9ec7074a94`. Reason for revert: broke s390x (copysign, abs) and arm64 (bitfield) tests. Change-Id: I16c1b389c062e8c4aa5de079f1d46c9b25b0db52 Reviewed-on: https://go-review.googlesource.com/c/go/+/193850 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-09 07:33:25 +00:00
Martin Möhrmann	9ec7074a94	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): Updates #33826 Updates #32781 Change-Id: I862d3587446410c447b9a7265196b57f85358633 Reviewed-on: https://go-review.googlesource.com/c/go/+/191780 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-09 06:49:17 +00:00
Josh Bleecher Snyder	9675f81928	cmd/compile: add more Neg/Com optimizations This is a grab-bag of minor optimizations. While we're here, document the c != -(1<<31) constraints better (#31888). file before after Δ % go 14669924 14665828 -4096 -0.028% asm 4867088 4858896 -8192 -0.168% compile 23988320 23984224 -4096 -0.017% cover 5210856 5206760 -4096 -0.079% link 6084376 6080280 -4096 -0.067% total 132181084 132156508 -24576 -0.019% file before after Δ % archive/tar.a 516708 516702 -6 -0.001% bufio.a 182200 181974 -226 -0.124% bytes.a 217624 216890 -734 -0.337% cmd/compile/internal/gc.a 8865412 8865228 -184 -0.002% cmd/compile/internal/ssa.a 29921002 29933976 +12974 +0.043% cmd/go/internal/modfetch/codehost.a 530602 530430 -172 -0.032% cmd/go/internal/modfetch.a 679664 679578 -86 -0.013% cmd/go/internal/modfile.a 411102 410928 -174 -0.042% cmd/go/internal/test.a 315218 315126 -92 -0.029% cmd/go/internal/tlog.a 183242 183256 +14 +0.008% cmd/go/internal/txtar.a 23148 23060 -88 -0.380% cmd/internal/bio.a 132064 132060 -4 -0.003% cmd/internal/buildid.a 107174 107172 -2 -0.002% cmd/internal/edit.a 33208 33354 +146 +0.440% cmd/internal/obj/arm.a 416488 416432 -56 -0.013% cmd/internal/obj/arm64.a 2772626 2772622 -4 -0.000% cmd/internal/obj/x86.a 923186 923114 -72 -0.008% cmd/internal/obj.a 679834 679836 +2 +0.000% cmd/internal/objfile.a 358374 358372 -2 -0.001% cmd/internal/test2json.a 67482 67434 -48 -0.071% cmd/link/internal/ld.a 2836280 2836110 -170 -0.006% cmd/link/internal/loadpe.a 148234 147736 -498 -0.336% cmd/link/internal/objfile.a 144534 144434 -100 -0.069% cmd/link/internal/ppc64.a 170876 170382 -494 -0.289% cmd/vendor/github.com/google/pprof/internal/elfexec.a 49896 49892 -4 -0.008% cmd/vendor/github.com/google/pprof/internal/graph.a 437478 437404 -74 -0.017% cmd/vendor/github.com/google/pprof/profile.a 902040 902044 +4 +0.000% cmd/vendor/github.com/ianlancetaylor/demangle.a 1217856 1217854 -2 -0.000% cmd/vendor/golang.org/x/arch/x86/x86asm.a 561332 560684 -648 -0.115% cmd/vendor/golang.org/x/crypto/ssh/terminal.a 153788 153784 -4 -0.003% cmd/vendor/golang.org/x/sys/unix.a 1043894 1043814 -80 -0.008% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.a 288458 288414 -44 -0.015% compress/flate.a 369024 368132 -892 -0.242% crypto/aes.a 109058 108968 -90 -0.083% crypto/cipher.a 150410 150544 +134 +0.089% crypto/elliptic.a 323572 323758 +186 +0.057% crypto/md5.a 50868 50788 -80 -0.157% crypto/rsa.a 195292 195214 -78 -0.040% crypto/sha1.a 70936 70858 -78 -0.110% crypto/sha256.a 75316 75236 -80 -0.106% crypto/sha512.a 84846 84768 -78 -0.092% crypto/subtle.a 6520 6514 -6 -0.092% crypto/tls.a 1654916 1654852 -64 -0.004% crypto/x509.a 888674 888638 -36 -0.004% database/sql.a 730280 730198 -82 -0.011% debug/gosym.a 184936 184862 -74 -0.040% debug/macho.a 272138 272136 -2 -0.001% debug/plan9obj.a 78444 78368 -76 -0.097% encoding/base64.a 82126 81882 -244 -0.297% encoding/binary.a 187196 187150 -46 -0.025% encoding/gob.a 897868 897870 +2 +0.000% encoding/json.a 659934 659832 -102 -0.015% encoding/pem.a 59138 58870 -268 -0.453% encoding/xml.a 694054 693300 -754 -0.109% fmt.a 484518 484196 -322 -0.066% go/format.a 33962 33994 +32 +0.094% go/printer.a 437132 437134 +2 +0.000% go/scanner.a 141774 141772 -2 -0.001% go/token.a 125130 125126 -4 -0.003% go/types.a 2192086 2191994 -92 -0.004% html/template.a 599038 598770 -268 -0.045% html.a 184842 184710 -132 -0.071% image/draw.a 129592 129238 -354 -0.273% image/gif.a 171824 171716 -108 -0.063% image/internal/imageutil.a 20282 19272 -1010 -4.980% image/jpeg.a 275608 275114 -494 -0.179% image/png.a 343416 343620 +204 +0.059% image.a 362244 362210 -34 -0.009% index/suffixarray.a 113040 112954 -86 -0.076% internal/trace.a 518972 518838 -134 -0.026% math/big.a 1012670 1012354 -316 -0.031% math.a 219338 219334 -4 -0.002% mime/multipart.a 178854 178502 -352 -0.197% mime/quotedprintable.a 49226 48936 -290 -0.589% net/http/cgi.a 172328 172324 -4 -0.002% net/http.a 4000180 3999732 -448 -0.011% net.a 1858330 1858252 -78 -0.004% path/filepath.a 107496 107498 +2 +0.002% reflect.a 1439776 1439994 +218 +0.015% regexp/syntax.a 459430 459432 +2 +0.000% regexp.a 416394 416400 +6 +0.001% runtime/debug.a 42106 42100 -6 -0.014% runtime/pprof/internal/profile.a 608718 608720 +2 +0.000% runtime/pprof.a 355474 355476 +2 +0.001% runtime.a 3555748 3555796 +48 +0.001% strconv.a 294432 294410 -22 -0.007% strings.a 292148 292090 -58 -0.020% syscall.a 859682 859470 -212 -0.025% text/tabwriter.a 65614 65148 -466 -0.710% vendor/golang.org/x/crypto/chacha20poly1305.a 126736 126728 -8 -0.006% vendor/golang.org/x/crypto/cryptobyte.a 269112 269114 +2 +0.001% vendor/golang.org/x/crypto/internal/chacha20.a 61842 61262 -580 -0.938% vendor/golang.org/x/crypto/poly1305.a 47410 47404 -6 -0.013% vendor/golang.org/x/net/dns/dnsmessage.a 628700 628012 -688 -0.109% vendor/golang.org/x/net/idna.a 237678 237826 +148 +0.062% vendor/golang.org/x/net/route.a 187852 187458 -394 -0.210% vendor/golang.org/x/sys/unix.a 1022426 1022348 -78 -0.008% vendor/golang.org/x/text/transform.a 117954 118104 +150 +0.127% vendor/golang.org/x/text/unicode/bidi.a 291398 291404 +6 +0.002% vendor/golang.org/x/text/unicode/norm.a 534640 534540 -100 -0.019% total 128945190 128945128 -62 -0.000% Change-Id: I346dc31356d5ef7774b824cf202169610bd26432 Reviewed-on: https://go-review.googlesource.com/c/go/+/175778 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:55:20 +00:00
Josh Bleecher Snyder	b8cbcacabe	cmd/compile: optimize more pointer comparisons The existing pointer comparison optimizations don't include pointer arithmetic. Add them. These rules trigger a few times in std cmd, while compiling: time.Duration.String cmd/go/internal/tlog.NodeHash crypto/tls.ticketKeyFromBytes (3 times) crypto/elliptic.(*p256Point).p256ScalarMult (15 times!) crypto/elliptic.initTable These weird comparisons occur when using the copy builtin, which does a pointer comparison between src and dst. This also happens to fix #32454, by optimizing enough early on that all values can be eliminated. Fixes #32454 Change-Id: I799d45743350bddd15a295dc1e12f8d03c11d1c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/180940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:35:18 +00:00
Josh Bleecher Snyder	abda0a6a92	cmd/compile: remove redundant rules EqPtr and NeqPtr are marked as commutative, so the transformations for rules are already generated by the preceding two lines. Change-Id: Ibecba5c8e54d9df00c84e1dae7e5d8cb53eeff43 Reviewed-on: https://go-review.googlesource.com/c/go/+/180939 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:32:42 +00:00
LE Manh Cuong	c5f142fa9f	cmd/compile: optimize bitset tests The assembly output for x & c == c, where c is power of 2: MOVQ "".set+8(SP), AX ANDQ $8, AX CMPQ AX, $8 SETEQ "".~r2+24(SP) With optimization using bitset: MOVQ "".set+8(SP), AX BTL $3, AX SETCS "".~r2+24(SP) output less than 1 instruction. However, there is no speed improvement: name old time/op new time/op delta AllBitSet-8 0.35ns ± 0% 0.35ns ± 0% ~ (all equal) Fixes #31904 Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/175957 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 18:01:16 +00:00
Daniel Martí	79dee788ec	cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 17:04:18 +00:00
Brian Kessler	4d9dd35806	cmd/compile: add signed divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication, and add and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also needed. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. On amd64, the divisibility check is 30-45% faster. name old time/op new time/op delta` DivisiblePow2constI64-4 0.83ns ± 1% 0.82ns ± 0% ~ (p=0.079 n=5+4) DivisibleconstI64-4 2.68ns ± 1% 1.87ns ± 0% -30.33% (p=0.000 n=5+4) DivisibleWDivconstI64-4 2.69ns ± 1% 2.71ns ± 3% ~ (p=1.000 n=5+5) DivisiblePow2constI32-4 1.15ns ± 1% 1.15ns ± 0% ~ (p=0.238 n=5+4) DivisibleconstI32-4 2.24ns ± 1% 1.20ns ± 0% -46.48% (p=0.016 n=5+4) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.683 n=5+5) DivisiblePow2constI16-4 0.81ns ± 1% 0.82ns ± 1% ~ (p=0.135 n=5+5) DivisibleconstI16-4 2.11ns ± 2% 1.20ns ± 1% -42.99% (p=0.008 n=5+5) DivisibleWDivconstI16-4 2.23ns ± 0% 2.27ns ± 2% +1.79% (p=0.029 n=4+4) DivisiblePow2constI8-4 0.81ns ± 1% 0.81ns ± 1% ~ (p=0.286 n=5+5) DivisibleconstI8-4 2.13ns ± 3% 1.19ns ± 1% -43.84% (p=0.008 n=5+5) DivisibleWDivconstI8-4 2.23ns ± 1% 2.25ns ± 1% ~ (p=0.183 n=5+5) Fixes #30282 Fixes #15806 Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/173998 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-30 22:02:07 +00:00
Brian Kessler	a28a942768	cmd/compile: add unsigned divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-27 20:46:46 +00:00
Brian Kessler	44343c777c	cmd/compile: add signed divisibility by power of 2 rules For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1) == 0 even for signed integers. This avoids division fix-ups when just divisibility check is needed. To apply this rule, we match on the fixed-up version of the division. This is neccessary because the mod and division rewrite rules are already applied during the initial opt pass. The speed up on amd64 due to elimination of unneccessary fix-up code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.09ns ± 1% ~ (p=0.730 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.66% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=0.683 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 1% ~ (p=0.968 n=4+5) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 1% -54.97% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.80ns ± 2% ~ (p=0.206 n=5+5) DivconstI16-4 1.54ns ± 2% 1.54ns ± 0% ~ (p=0.238 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.81ns ± 1% -54.72% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 1.01ns ± 1% ~ (p=0.119 n=5+5) DivconstI8-4 1.54ns ± 0% 1.54ns ± 2% ~ (p=0.571 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.82ns ± 8% -53.71% (p=0.008 n=5+5) DivconstU8-4 0.93ns ± 1% 0.93ns ± 1% ~ (p=0.643 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: Iabadbbe369b6e0998c8ce85d038ebc236142e42a Reviewed-on: https://go-review.googlesource.com/c/go/+/173557 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-25 03:00:32 +00:00
Keith Randall	17615969b6	Revert "cmd/compile: add signed divisibility by power of 2 rules" This reverts CL 168038 (git `68819fb6d2`) Reason for revert: Doesn't work on 32 bit archs. Change-Id: Idec9098060dc65bc2f774c5383f0477f8eb63a3d Reviewed-on: https://go-review.googlesource.com/c/go/+/173442 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-04-23 21:23:18 +00:00
Brian Kessler	68819fb6d2	cmd/compile: add signed divisibility by power of 2 rules For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1)==0 even for signed integers. This avoids division fixups when just divisibility check is needed. To apply this rule the generic divisibility rule for A%B = A-(A/B*B) is disabled on the "opt" pass, but this does not affect generated code as this rule is applied later. The speed up on amd64 due to elimination of unneccessary fixup code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.07ns ± 0% ~ (p=0.079 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.55% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 0% ~ (all equal) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 4% -54.75% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.78ns ± 1% ~ (p=1.000 n=5+5) DivconstI16-4 1.54ns ± 2% 1.53ns ± 0% ~ (p=0.333 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.79ns ± 1% -55.39% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 0.99ns ± 1% ~ (p=0.730 n=5+5) DivconstI8-4 1.54ns ± 0% 1.53ns ± 0% ~ (p=0.714 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.80ns ± 0% -55.06% (p=0.000 n=5+4) DivconstU8-4 0.93ns ± 1% 0.95ns ± 1% +1.72% (p=0.024 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: I0d284863774b1bc8c4ce87443bbaec6103e14ef4 Reviewed-on: https://go-review.googlesource.com/c/go/+/168038 Reviewed-by: Keith Randall <khr@golang.org>	2019-04-23 20:35:54 +00:00
Josh Bleecher Snyder	68d4b1265e	cmd/compile: reduce bits.Div64(0, lo, y) to 64 bit division With this change, these two functions generate identical code: func f(x uint64) (uint64, uint64) { return bits.Div64(0, x, 5) } func g(x uint64) (uint64, uint64) { return x / 5, x % 5 } Updates #31582 Change-Id: Ia96c2e67f8af5dd985823afee5f155608c04a4b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/173197 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-20 19:34:03 +00:00
Daniel Martí	0fbf681840	cmd/compile: reduce rulegen's for loop verbosity A lot of the naked for loops begin like: for { v := b.Control if v.Op != OpConstBool { break } ... return true } Instead, write them out in a more compact and readable way: for v.Op == OpConstBool { ... return true } This requires the addition of two bytes.Buffer writers, as this helps us make a decision based on future pieces of generated code. This probably makes rulegen slightly slower, but that's not noticeable; the code generation still takes ~3.5s on my laptop, excluding build time. The "v := b.Control" declaration can be moved to the top of each function. Even though the rules can modify b.Control when firing, they also make the function return, so v can't be used again. While at it, remove three unnecessary lines from the top of each rewriteBlock func. In total, this results in ~4k lines removed from the generated code, and a slight improvement in readability. Change-Id: I317e4c6a4842c64df506f4513375475fad2aeec5 Reviewed-on: https://go-review.googlesource.com/c/go/+/167399 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-22 15:41:51 +00:00
Josh Bleecher Snyder	66f5d4e035	cmd/compile: int64(uint64 >> x) >= 0 if x > 0 This rewrite rule triggers only once, in math/big.quotToFloat64, as part of converting a uint64 to a float64. Nevertheless, it is cheap; let's add it. Change-Id: I3ed4a197a559110fec1bc04b3a8abb4c7fcc2c89 Reviewed-on: https://go-review.googlesource.com/c/go/+/167500 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-14 00:03:38 +00:00
Josh Bleecher Snyder	61945fc502	cmd/compile: don't generate panicshift for masked int shifts We know that a & 31 is non-negative for all a, signed or not. We can avoid checking that and needing to write out an unreachable call to panicshift. Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917 Reviewed-on: https://go-review.googlesource.com/c/go/+/167499 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-14 00:03:29 +00:00
Michael Munday	04f1b65cc6	cmd/compile: try and access last argument first in rulegen This reduces the number of extra bounds check hints we need to insert. For example, rather than producing: _ = v.Args[2] x := v.Args[0] y := v.Args[1] z := v.Args[2] We now produce: z := v.Args[2] x := v.Args[0] y := v.Args[1] This gets rid of about 7000 lines of code from the rewrite rules. Change-Id: I1291cf0f82e8d035a6d65bce7dee6cedee04cbcd Reviewed-on: https://go-review.googlesource.com/c/go/+/167397 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-13 15:11:37 +00:00
Josh Bleecher Snyder	53948127d3	cmd/compile: make rulegen magic variable prediction more precise The sheer length of the generated rules files makes my editor and git client unhappy. This change is a small step towards shortening them. We recognize a few magic variables during rulegen: b, config, fe, typ. Of these, only b appears prone to false positives. By tightening the heuristic and fixing one case in MIPS.rules, we can make the heuristic enough that it has no failures. That allows us to remove the hedge assignments to _, removing 3000 pointless lines of code. Change-Id: I080cde5db28c8277cb3fd9ddcd829306c9a27785 Reviewed-on: https://go-review.googlesource.com/c/go/+/166979 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-12 20:13:46 +00:00
Keith Randall	8f854244ad	cmd/compile: fix crash when memmove argument is not the right type Make sure the argument to memmove is of pointer type before we try to get the element type. This has been noticed for code that uses unsafe+linkname so it can call runtime.memmove. Probably not the best thing to allow, but the code is out there and we'd rather not break it unnecessarily. Fixes #30061 Change-Id: I334a8453f2e293959fd742044c43fbe93f0b3d31 Reviewed-on: https://go-review.googlesource.com/c/160826 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-02-01 23:43:09 +00:00
David Chase	04105ef1da	cmd/compile: decompose composite OpArg before decomposeUser This makes it easier to track names of function arguments for debugging purposes. Change-Id: Ic34856fe0b910005e1c7bc051d769d489a4b158e Reviewed-on: https://go-review.googlesource.com/c/150098 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-11-23 22:59:32 +00:00
Josh Bleecher Snyder	8607b2e825	cmd/compile: optimize A->B->C Moves that include VarDefs We have an existing optimization that recognizes memory moves of the form A -> B -> C and converts them into A -> C, in the hopes that the store to B will be end up being dead and thus eliminated. However, when A, B, and C are large types, the front end sometimes emits VarDef ops for the moves. This change adds an optimization to match that pattern. This required changing an old compiler test. The test assumed that a temporary was required to deal with a large return value. With this optimization in place, that temporary ended up being eliminated. Triggers 649 times during 'go build -a std cmd'. Cuts 16k off cmd/go. name old object-bytes new object-bytes delta Template 507kB ± 0% 507kB ± 0% -0.15% (p=0.008 n=5+5) Unicode 225kB ± 0% 225kB ± 0% ~ (all equal) GoTypes 1.85MB ± 0% 1.85MB ± 0% ~ (all equal) Flate 328kB ± 0% 328kB ± 0% ~ (all equal) GoParser 402kB ± 0% 402kB ± 0% -0.00% (p=0.008 n=5+5) Reflect 1.41MB ± 0% 1.41MB ± 0% -0.20% (p=0.008 n=5+5) Tar 458kB ± 0% 458kB ± 0% ~ (all equal) XML 601kB ± 0% 599kB ± 0% -0.21% (p=0.008 n=5+5) Change-Id: I9b5f25c8663a0b772ad1ee51fa61f74b74d26dd3 Reviewed-on: https://go-review.googlesource.com/c/143479 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2018-11-11 14:18:33 +00:00
Josh Bleecher Snyder	a98bb7e244	cmd/compile: only optimize chained Moves on disjoint stack mem This optimization is not sound if A, B, or C might overlap with each other. Thanks to Michael Munday for pointing this out during the review of CL 143479. This reduces the number of times this optimization triggers during make.bash from 386 to 74. This is unfortunate, but I don't see an obvious way around it, short of souping up the disjointness analysis. name old object-bytes new object-bytes delta Template 507kB ± 0% 507kB ± 0% +0.13% (p=0.008 n=5+5) Unicode 225kB ± 0% 225kB ± 0% ~ (all equal) GoTypes 1.85MB ± 0% 1.85MB ± 0% +0.02% (p=0.008 n=5+5) Flate 328kB ± 0% 328kB ± 0% ~ (all equal) GoParser 402kB ± 0% 402kB ± 0% ~ (all equal) Reflect 1.41MB ± 0% 1.41MB ± 0% ~ (all equal) Tar 457kB ± 0% 458kB ± 0% +0.20% (p=0.008 n=5+5) XML 600kB ± 0% 601kB ± 0% +0.03% (p=0.008 n=5+5) Change-Id: Ida408cb627145ba9faf473a78606f050c2f3f51c Reviewed-on: https://go-review.googlesource.com/c/145208 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2018-11-08 23:38:23 +00:00
Josh Bleecher Snyder	3bab4373c7	cmd/compile: make fmt available in rewrite rules During development and debugging, I often want to write noteRule(fmt.Sprintf(...)), and end up manually adding the import to the generated code. Let's just make it always available instead. Change-Id: I1e2d47c98ba056e1b5da42e35fb6ad26f1d9cc3d Reviewed-on: https://go-review.googlesource.com/c/145207 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2018-10-28 18:46:36 +00:00
Josh Bleecher Snyder	2578ac54eb	cmd/compile: move argument stack construction to SSA generation The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2018-10-19 21:23:16 +00:00
Keith Randall	ceb0c371d9	cmd/compile: make []byte("...") more efficient Do []byte(string) conversions more efficiently when the string is a constant. Instead of calling stringtobyteslice, allocate just the space we need and encode the initialization directly. []byte("foo") rewrites to the following pseudocode: var s [3]byte // on heap or stack, depending on whether b escapes s = ([3]byte)(&"foo"[0]) // initialize s from the string b = s[:] which generates this assembly: 0x001d 00029 (tmp1.go:9) LEAQ type.[3]uint8(SB), AX 0x0024 00036 (tmp1.go:9) MOVQ AX, (SP) 0x0028 00040 (tmp1.go:9) CALL runtime.newobject(SB) 0x002d 00045 (tmp1.go:9) MOVQ 8(SP), AX 0x0032 00050 (tmp1.go:9) MOVBLZX go.string."foo"+2(SB), CX 0x0039 00057 (tmp1.go:9) MOVWLZX go.string."foo"(SB), DX 0x0040 00064 (tmp1.go:9) MOVW DX, (AX) 0x0043 00067 (tmp1.go:9) MOVB CL, 2(AX) // Then the slice is b = {AX, 3, 3} The generated code is still not optimal, as it still does load/store from read-only memory instead of constant stores. Next CL... Update #26498 Fixes #10170 Change-Id: I4b990b19f9a308f60c8f4f148934acffefe0a5bd Reviewed-on: https://go-review.googlesource.com/c/140698 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-10 16:10:40 +00:00
Cherry Zhang	c96e3bcc97	cmd/compile: fix type of OffPtr in some optimization rules In some optimization rules the type of generated OffPtr was incorrectly set to the type of the pointee, instead of the pointer. When the OffPtr value is spilled, this may generate a spill of the wrong type, e.g. a floating point spill of an integer (pointer) value. On Wasm, this leads to invalid bytecode. Fixes #27961. Change-Id: I5d464847eb900ed90794105c0013a1a7330756cc Reviewed-on: https://go-review.googlesource.com/c/139257 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>	2018-10-03 15:01:47 +00:00
Keith Randall	c6118af558	cmd/compile: don't do floating point optimization x+0 -> x That optimization is not valid if x == -0. The test is a bit tricky because 0 == -0. We distinguish 0 from -0 with 1/0 == inf, 1/-0 == -inf. This has been a bug since CL 24790 in Go 1.8. Probably doesn't warrant a backport. Fixes #27718 Note: the optimization x-0 -> x is actually valid. But it's probably best to take it out, so as to not confuse readers. Change-Id: I99f16a93b45f7406ec8053c2dc759a13eba035fa Reviewed-on: https://go-review.googlesource.com/135701 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-09-18 20:27:09 +00:00
Michael Munday	2db1a7f929	cmd/compile: avoid more float32 <-> float64 conversions in compiler Use the new custom truncate/extension code when storing or extracting float32 values from AuxInts to avoid the value being changed by the host platform's floating point conversion instructions (e.g. sNaN -> qNaN). Updates #27516. Change-Id: Id39650f1431ef74af088c895cf4738ea5fa87974 Reviewed-on: https://go-review.googlesource.com/134855 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-09-17 14:37:45 +00:00
Josh Bleecher Snyder	25b84c0155	cmd/compile: move v.Pos.line check to warnRule This simplifies the rewrite rules. Change-Id: Iff062297d42a23cb31ad55e8c733842ecbc07da2 Reviewed-on: https://go-review.googlesource.com/129377 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-09-08 14:21:45 +00:00
Michael Munday	48af3a8be5	cmd/compile: fix store-to-load forwarding of 32-bit sNaNs Signalling NaNs were being converted to quiet NaNs during constant propagation through integer <-> float store-to-load forwarding. This occurs because we store float32 constants as float64 values and CPU hardware 'quietens' NaNs during conversion between the two. Eventually we want to move to using float32 values to store float32 constants, however this will be a big change since both the compiler and the assembler expect float64 values. So for now this is a small change that will fix the immediate issue. Fixes #27193. Change-Id: Iac54bd8c13abe26f9396712bc71f9b396f842724 Reviewed-on: https://go-review.googlesource.com/132956 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-09-05 15:27:15 +00:00
Martin Möhrmann	379d2dea72	cmd/compile: remove superfluous signed right shift used for signed division by 2 A signed right shift before an unsigned right shift by register width-1 (extracts the sign bit) is superflous. trigger counts during ./make.bash 0 (Rsh8U (Rsh8 x _) 7 ) -> (Rsh8U x 7 ) 0 (Rsh16U (Rsh16 x _) 15 ) -> (Rsh16U x 15) 2 (Rsh32U (Rsh32 x _) 31 ) -> (Rsh32U x 31) 251 (Rsh64U (Rsh64 x _) 63 ) -> (Rsh64U x 63) Changes the instructions generated on AMD64 for x / 2 where x is a signed integer from: MOVQ AX, CX SARQ $63, AX SHRQ $63, AX ADDQ CX, AX SARQ $1, AX to: MOVQ AX, CX SHRQ $63, AX ADDQ CX, AX SARQ $1, AX Change-Id: I86321ae8fc9dc24b8fa9eb80aa5c7299eff8c9dc Reviewed-on: https://go-review.googlesource.com/115956 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-08-24 07:06:31 +00:00

1 2 3 4 5

232 commits