Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Josh Bleecher Snyder	79bb41aed6	cmd/compile: convert reassociation optimizations to typed aux, part two Passes toolstash-check. Change-Id: Ia8fad6973983eebe6d78d9dd8de8c99b8edcecdb Reviewed-on: https://go-review.googlesource.com/c/go/+/229684 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-24 23:06:28 +00:00
Josh Bleecher Snyder	67a8660b5a	cmd/compile: CSE the RHS of rewrite rules Keep track of all expressions encountered while generating a rewrite result, and re-use them whenever possible. Named expressions may still be used for clarity when desired. Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82 Reviewed-on: https://go-review.googlesource.com/c/go/+/229800 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2020-04-24 16:44:20 +00:00
Cuong Manh Le	7f8fda3c0b	cmd/compile: use proper magnitude for (x>>c) & uppermask = 0 This is followup of CL 228860, which rewrite shift rules to use typed aux. That CL introduced nlz* functions, to refactor left shift rules. While at it, we realize there's a bug in old rules with both right/left shift rules, but only fix for left shift rules only. This CL fixes the bug for right shift rules. Passes toolstash-check. Change-Id: Id8f2158b1b66c9e87f3fdeaa7ae3e35dc0666f8b Reviewed-on: https://go-review.googlesource.com/c/go/+/229137 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-21 03:45:26 +00:00
Cuong Manh Le	0f14c2a042	cmd/compile: rewrite shift rules to use typed aux fields Passes toolstash-check. Change-Id: I02e78591fe46e19a43dc36913baef0338a014a3d Reviewed-on: https://go-review.googlesource.com/c/go/+/228860 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-21 03:13:22 +00:00
Josh Bleecher Snyder	50b11318fe	cmd/compile: use oneBit instead of isPowerOfTwo in bit optimization This optimization works on any integer with exactly one bit set. This is identical to being a power of two, except in the most negative number. Use oneBit instead. The rule now triggers in a few more places in std+cmd, in packages encoding/asn1, crypto/elliptic, and vendor/golang.org/x/crypto/cryptobyte. This change obviates the need for CL 222479 by doing this optimization consistently in the compiler. Change-Id: I983c6235290fdc634fda5e11b10f1f8ce041272f Reviewed-on: https://go-review.googlesource.com/c/go/+/229124 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2020-04-21 00:38:34 +00:00
Josh Bleecher Snyder	12665b9a06	cmd/compile: convert two generic rules to be typed Prelude to changing the rules. Passes toolstash-check. Change-Id: I22fead7f74d2cf97bb3fbeb22741125b42914c43 Reviewed-on: https://go-review.googlesource.com/c/go/+/229123 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2020-04-21 00:38:14 +00:00
Cuong Manh Le	885099d155	cmd/compile: rewrite integer range rules to use typed aux fields Passes toolstash-check. Change-Id: I2752e4df211294112d502a59c3b9988e00d25aae Reviewed-on: https://go-review.googlesource.com/c/go/+/228857 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-19 10:52:23 +00:00
Keith Randall	2f84caebe3	cmd/compile: rewrite some AMD64 rules to use typed aux fields Surprisingly many rules needed no modification. Use wrapper functions for aux like we did for auxint. Simplifies things a bit. Change-Id: I2e852e77f1585dcb306a976ab9335f1ac5b4a770 Reviewed-on: https://go-review.googlesource.com/c/go/+/227961 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2020-04-12 23:46:56 +00:00
Keith Randall	7580937524	cmd/compile: move more generic rewrites to the typed version Change-Id: I22d0644710d12c7efc509fd2a15789e2e073e6a3 Reviewed-on: https://go-review.googlesource.com/c/go/+/227869 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-04-12 19:41:35 +00:00
Keith Randall	a1b802bde7	cmd/compile: move some generic rules to strongly typed Move a lot of the constant folding rules to use strongly typed AuxInt fields. We need more than a cast to convert AuxInt to, e.g., float32. Make conversion functions for converting back and forth. Change-Id: Ia3d95ee3583ee2179a10938e20210a7617358c88 Reviewed-on: https://go-review.googlesource.com/c/go/+/227866 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-04-11 15:49:38 +00:00
Keith Randall	83e288f3db	cmd/compile: prevent constant folding of +/- when result is NaN Missed as part of CL 221790. It isn't just * and / that can make NaNs. Update #36400 Fixes #38359 Change-Id: I3fa562f772fe03b510793a6dc0cf6189c0c3e652 Reviewed-on: https://go-review.googlesource.com/c/go/+/227860 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2020-04-10 19:32:41 +00:00
Keith Randall	ea7126fe14	cmd/compile: use a Sym type instead of interface{} for symbolic offsets Will help with strongly typed rewrite rules. Change-Id: Ifbf316a49f4081322b3b8f13bc962713437d9aba Reviewed-on: https://go-review.googlesource.com/c/go/+/227785 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2020-04-10 16:24:46 +00:00
Keith Randall	28157b3292	cmd/compile: start implementing strongly typed aux and auxint fields Right now the Aux and AuxInt fields of ssa.Values are typed as interface{} and int64, respectively. Each rule that uses these values must cast them to the type they actually are (*obj.LSym, or int32, or ValAndOff, etc.), use them, and then cast them back to interface{} or int64. We know for each opcode what the types of the Aux and AuxInt fields should be. So let's modify the rule generator to declare the types to be what we know they should be, autoconverting to and from the generic types for us. That way we can make the rules more type safe. It's difficult to make a single CL for this, so I've coopted the "=>" token to indicate a rule that is strongly typed. "->" rules are processed as before. That will let us migrate a few rules at a time in separate CLs. Hopefully we can reach a state where all rules are strongly typed and we can drop the distinction. This CL changes just a few rules to get a feel for what this transition would look like. I've decided not to put explicit types in the rules. I think it makes the rules somewhat clearer, but definitely more verbose. In particular, the passthrough rules that don't modify the fields in question are verbose for no real reason. Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d Reviewed-on: https://go-review.googlesource.com/c/go/+/190197 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-04-09 21:18:55 +00:00
Michael Munday	bfd569fcb0	cmd/compile: delete the floating point Greater and Geq ops Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-07 19:55:05 +00:00
Josh Bleecher Snyder	82253ddc7a	cmd/compile: constant fold CtzNN Change-Id: I3ecd2c7ed3c8ae35c2bb9562aed09f7ade5c8cdd Reviewed-on: https://go-review.googlesource.com/c/go/+/221609 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-31 22:36:54 +00:00
Keith Randall	af7eafd150	cmd/compile: convert 386 port to use addressing modes pass (take 2) Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-27 18:54:45 +00:00
Keith Randall	cd9fd640db	cmd/compile: don't allow NaNs in floating-point constant ops Trying this CL again, with a fixed test that allows platforms to disagree on the exact behavior of converting NaNs. We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. Update #36400 Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb Reviewed-on: https://go-review.googlesource.com/c/go/+/221790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-03-04 04:49:54 +00:00
Michael Munday	e37cc29863	cmd/compile: optimize integer-in-range checks This CL incorporates code from CL 201206 by Josh Bleecher Snyder (thanks Josh). This CL restores the integer-in-range optimizations in the SSA backend. The fuse pass is enhanced to detect inequalities that could be merged and fuse their associated blocks while the generic rules optimize them into a single unsigned comparison. For example, the inequality `x >= 0 && x < 10` will now be optimized to `unsigned(x) < 10`. Overall has a fairly positive impact on binary sizes. name old time/op new time/op delta Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18) Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19) GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20) Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20) SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19) Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19) GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19) Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20) Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19) XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17) LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20) ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20) LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20) [Geo mean] 478ms 481ms +0.52% name old user-time/op new user-time/op delta Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20) Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20) GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18) Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20) SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19) Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20) GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20) Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18) Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20) XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20) LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20) ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20) LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20) [Geo mean] 611ms 611ms +0.05% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20) Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20) GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20) Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19) SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19) Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20) GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20) Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20) Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20) XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19) LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18) ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20) LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20) [Geo mean] 93.7MB 93.9MB +0.20% name old allocs/op new allocs/op delta Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19) Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20) GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20) Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20) SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20) Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20) GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20) Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20) Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20) XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20) LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19) ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20) LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20) [Geo mean] 723k 723k +0.03% name old text-bytes new text-bytes delta HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20) name old data-bytes new data-bytes delta HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20) file before after Δ % addr2line 4320075 4311883 -8192 -0.190% asm 5191932 5187836 -4096 -0.079% buildid 2835338 2831242 -4096 -0.144% compile 20531717 20569099 +37382 +0.182% cover 5322511 5318415 -4096 -0.077% dist 3723749 3719653 -4096 -0.110% doc 4743515 4739419 -4096 -0.086% fix 3413960 3409864 -4096 -0.120% link 6690119 6686023 -4096 -0.061% nm 4269616 4265520 -4096 -0.096% pprof 14942189 14929901 -12288 -0.082% trace 11807164 11790780 -16384 -0.139% vet 8384104 8388200 +4096 +0.049% go 15339076 15334980 -4096 -0.027% total 132258257 132226007 -32250 -0.024% Fixes #30645. Change-Id: If551ac5996097f3685870d083151b5843170aab0 Reviewed-on: https://go-review.googlesource.com/c/go/+/165998 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-03 14:30:26 +00:00
Josh Bleecher Snyder	a2bff7c296	cmd/compile: make pre-elimination of rulegen bounds checks more precise In cases in which we had a named value whose args were all _, like this rule from ARM.rules: (MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x) We previously inserted _ = x.Args[1] even though it is unnecessary. This change eliminates this pointless bounds check. And in other cases, we now check bounds just as far as strictly necessary. No significant movement on any compiler metrics. Just nicer (and less) code. Passes toolstash-check -all. Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a Reviewed-on: https://go-review.googlesource.com/c/go/+/221781 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-02 17:40:11 +00:00
Josh Bleecher Snyder	5e4da0adac	cmd/compile: add streamlined Block Reset+AddControl routines For use in rewrite rules. Shrinks cmd/compile: compile 20082104 19967416 -114688 -0.571% Passes toolstash-check -all. Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30 Reviewed-on: https://go-review.googlesource.com/c/go/+/221780 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-02 17:40:00 +00:00
Josh Bleecher Snyder	d7c073ecbf	cmd/compile: add specialized Value reset for OpCopy This: * Simplifies and shortens the generated code for rewrite rules. * Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile. * Removes the stmt boundary code wrangling from Value.reset, in favor of doing it in the one place where it actually does some work, namely the writebarrier pass. (This was ascertained by inspecting the code for cases in which notStmtBoundary values were generated.) Passes toolstash-check -all. Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07 Reviewed-on: https://go-review.googlesource.com/c/go/+/221421 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-03-02 16:24:47 +00:00
Josh Bleecher Snyder	7913f7dfcf	cmd/compile: add specialized AddArgN functions for rewrite rules This shrinks the compiler without impacting performance. (The performance-sensitive part of rewrite rules is the non-match case.) Passes toolstash-check -all. Executable size: file before after Δ % compile 20356168 20163960 -192208 -0.944% total 115599376 115407168 -192208 -0.166% Text size: file before after Δ % cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807% total 18862943 18713408 -149535 -0.793% Memory allocated compiling package SSA: SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5) Compiler speed impact: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49) Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49) GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48) Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49) SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44) Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49) GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47) Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49) Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48) XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47) [Geo mean] 415ms 415ms -0.03% name old user-time/op new user-time/op delta Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48) Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50) GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49) Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50) SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50) Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48) GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49) Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49) Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49) XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48) [Geo mean] 537ms 537ms +0.01% Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a Reviewed-on: https://go-review.googlesource.com/c/go/+/221237 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder	74f898360d	cmd/compile: constant fold SSA bool to int conversions Shaves off a few instructions here and there. file before after Δ % go/types.s 322118 321851 -267 -0.083% go/internal/gcimporter.s 34937 34909 -28 -0.080% go/internal/gccgoimporter.s 56493 56474 -19 -0.034% cmd/compile/internal/ssa.s 3926994 3927177 +183 +0.005% total 18862670 18862539 -131 -0.001% Change-Id: I724f32317b946b5138224808f85709d9c097a247 Reviewed-on: https://go-review.googlesource.com/c/go/+/221428 Reviewed-by: Keith Randall <khr@golang.org>	2020-02-29 17:02:40 +00:00
Josh Bleecher Snyder	390c096ee9	cmd/compile: make clobber variadic There are often many values to clobber. Allow passing them all in at once. The goal is increased rule readability. As a bonus, it shrinks cmd/compile by ~97k, almost half a percent. Package SSA requires 1.2% less memory to compile. The single-line changes were make via regex, and the remaining multi-line clobbers were manually combined. Passes toolstash-check -all. Change-Id: Ib310e9265d3616211f8192c9040b4c8933824d19 Reviewed-on: https://go-review.googlesource.com/c/go/+/220691 Reviewed-by: Michael Munday <mike.munday@ibm.com>	2020-02-26 18:59:58 +00:00
Michael Munday	cb74dcc172	cmd/compile: remove Greater* and Geq* generic integer ops The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 13:11:53 +00:00
Bryan C. Mills	a9f1ea4a83	Revert "cmd/compile: don't allow NaNs in floating-point constant ops" This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-02-25 15:49:19 +00:00
Keith Randall	2aa7c6c548	cmd/compile: don't allow NaNs in floating-point constant ops We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for almost all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-25 02:21:53 +00:00
Keith Randall	bc98e35b53	cmd/compile: avoid memmove -> SSA move rewrite when size is negative We should panic in this situation. Rewriting to a SSA op just leads to a compiler panic. Fixes #36259 Change-Id: I6e0bccbed7dd0fdac7ebae76b98a211947947386 Reviewed-on: https://go-review.googlesource.com/c/go/+/212405 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2020-02-24 20:23:14 +00:00
Josh Bleecher Snyder	6dd11bcb35	cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder	a3f234c706	cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder	bd6d78ef37	cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 17:34:07 +00:00
Josh Bleecher Snyder	49f8d45994	cmd/compile: delete duplicate rules Add logic during rulegen to detect exact duplicates (after applying commutativity), and clean up existing duplicates. Change-Id: I7179f40fc48e236c74b74f429ec9f0f100026530 Reviewed-on: https://go-review.googlesource.com/c/go/+/213699 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 05:04:39 +00:00
Brian Kessler	6b1d5471b9	cmd/compile: add signed indivisibility by power of 2 rules Commit `44343c777c` (CL 173557) added rules for handling divisibility checks for powers of 2 for signed integers, x%c ==0. This change adds the complementary indivisibility rules, x%c != 0. Fixes #34166 Change-Id: I87379e30af7aff633371acca82db2397da9b2c07 Reviewed-on: https://go-review.googlesource.com/c/go/+/194219 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-07 16:30:46 +00:00
Michael Munday	bb7890b85a	cmd/compile: absorb more Not ops into Neq* and Eq* ops We absorbed Not into most integer comparisons but not into pointer and floating point equality checks. The new cases trigger more than 300 times during make.bash. Change-Id: I77c6b31fcacde10da5470b73fc001a19521ce78d Reviewed-on: https://go-review.googlesource.com/c/go/+/200618 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-11-04 20:35:52 +00:00
Keith Randall	70331a31ed	cmd/compile: fix typing of IData opcodes The rules for extracting the interface data word don't leave the result typed correctly. If I do i.([1]int)[0], the result should have type int, not [1]*int. Using (IData x) for the result keeps the typing of the original top-level Value. I don't think this would ever cause a real codegen bug, bug fixing it at least makes the typing shown in ssa.html more consistent. Change-Id: I239d821c394e58347639387981b0510d13b2f7b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/204042 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-29 21:21:09 +00:00
Michael Munday	c7d81bc086	cmd/compile: reduce amount of code generated for block rewrite rules Add a Reset method to blocks that allows us to reduce the amount of code we generate for block rewrite rules. Thanks to Cherry for suggesting a similar fix to this in CL 196557. Compilebench result: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 1% -0.30% (p=0.028 n=19+20) Unicode 83.7ms ± 3% 83.0ms ± 2% -0.79% (p=0.029 n=18+19) GoTypes 757ms ± 1% 755ms ± 1% -0.31% (p=0.034 n=19+19) Compiler 3.51s ± 1% 3.50s ± 1% -0.20% (p=0.013 n=18+18) SSA 11.7s ± 1% 11.7s ± 1% -0.38% (p=0.000 n=19+19) Flate 131ms ± 1% 130ms ± 1% -0.32% (p=0.024 n=18+18) GoParser 162ms ± 1% 162ms ± 1% ~ (p=0.059 n=20+18) Reflect 471ms ± 0% 470ms ± 0% -0.24% (p=0.045 n=20+17) Tar 187ms ± 1% 186ms ± 1% ~ (p=0.157 n=20+20) XML 255ms ± 1% 255ms ± 1% ~ (p=0.461 n=19+20) LinkCompiler 754ms ± 2% 755ms ± 2% ~ (p=0.919 n=17+17) ExternalLinkCompiler 2.82s ±16% 2.37s ±10% -15.94% (p=0.000 n=20+20) LinkWithoutDebugCompiler 439ms ± 4% 442ms ± 6% ~ (p=0.461 n=18+19) StdCmd 25.8s ± 2% 25.5s ± 1% -0.95% (p=0.000 n=20+20) name old user-time/op new user-time/op delta Template 240ms ± 8% 238ms ± 7% ~ (p=0.301 n=20+20) Unicode 107ms ±18% 104ms ±13% ~ (p=0.149 n=20+20) GoTypes 883ms ± 3% 888ms ± 2% ~ (p=0.211 n=20+20) Compiler 4.22s ± 1% 4.20s ± 1% ~ (p=0.077 n=20+18) SSA 14.1s ± 1% 14.1s ± 2% ~ (p=0.192 n=20+20) Flate 145ms ±10% 148ms ± 5% ~ (p=0.126 n=20+18) GoParser 186ms ± 7% 186ms ± 7% ~ (p=0.779 n=20+20) Reflect 538ms ± 3% 541ms ± 3% ~ (p=0.192 n=20+20) Tar 218ms ± 4% 217ms ± 6% ~ (p=0.835 n=19+20) XML 298ms ± 5% 298ms ± 5% ~ (p=0.749 n=19+20) LinkCompiler 818ms ± 5% 825ms ± 8% ~ (p=0.461 n=20+20) ExternalLinkCompiler 1.55s ± 4% 1.53s ± 5% ~ (p=0.063 n=20+18) LinkWithoutDebugCompiler 460ms ±12% 460ms ± 7% ~ (p=0.925 n=20+20) name old object-bytes new object-bytes delta Template 554kB ± 0% 554kB ± 0% ~ (all equal) Unicode 215kB ± 0% 215kB ± 0% ~ (all equal) GoTypes 2.01MB ± 0% 2.01MB ± 0% ~ (all equal) Compiler 7.97MB ± 0% 7.97MB ± 0% +0.00% (p=0.000 n=20+20) SSA 26.8MB ± 0% 26.9MB ± 0% +0.27% (p=0.000 n=20+20) Flate 340kB ± 0% 340kB ± 0% ~ (all equal) GoParser 434kB ± 0% 434kB ± 0% ~ (all equal) Reflect 1.34MB ± 0% 1.34MB ± 0% ~ (all equal) Tar 480kB ± 0% 480kB ± 0% ~ (all equal) XML 622kB ± 0% 622kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 20.4kB ± 0% 20.4kB ± 0% ~ (all equal) Unicode 8.21kB ± 0% 8.21kB ± 0% ~ (all equal) GoTypes 36.6kB ± 0% 36.6kB ± 0% ~ (all equal) Compiler 115kB ± 0% 115kB ± 0% +0.08% (p=0.000 n=20+20) SSA 141kB ± 0% 141kB ± 0% +0.07% (p=0.000 n=20+20) Flate 5.11kB ± 0% 5.11kB ± 0% ~ (all equal) GoParser 8.93kB ± 0% 8.93kB ± 0% ~ (all equal) Reflect 11.8kB ± 0% 11.8kB ± 0% ~ (all equal) Tar 10.9kB ± 0% 10.9kB ± 0% ~ (all equal) XML 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 742kB ± 0% 742kB ± 0% ~ (all equal) CmdGoSize 10.7MB ± 0% 10.7MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.10MB ± 0% 1.10MB ± 0% ~ (all equal) CmdGoSize 14.9MB ± 0% 14.9MB ± 0% ~ (all equal) Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/198419 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-10-07 09:19:12 +00:00
Michael Munday	9c2e7e8bed	cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-02 09:56:36 +00:00
Martin Möhrmann	f41451e7eb	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I5a59f6239660d53c029cd22dfb44ddf39f93a56c Reviewed-on: https://go-review.googlesource.com/c/go/+/196810 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-24 20:30:59 +00:00
Daniel Martí	870080752d	cmd/compile: reduce rulegen's output by 200 KiB First, renove unnecessary "// cond:" lines from the generated files. This shaves off about ~7k lines. Second, join "if cond { break }" statements via "\|\|", which allows us to deduplicate a large number of them. This shaves off another ~25k lines. This change is not for readability or simplicity; but rather, to avoid unnecessary verbosity that makes the generated files larger. All in all, git reports that the generated files overall weigh ~200KiB less, or about 2.7% less. While at it, add a -trace flag to rulegen. Updates #33644. Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a Reviewed-on: https://go-review.googlesource.com/c/go/+/196498 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-23 16:36:10 +00:00
Bryan C. Mills	34fe8295c5	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts CL 194297. Reason for revert: introduced register allocation failures on PPC64LE builders. Updates #33826 Updates #32781 Updates #34468 Change-Id: I7d0b55df8cdf8e7d2277f1814299b083c2692e48 Reviewed-on: https://go-review.googlesource.com/c/go/+/196957 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2019-09-23 15:20:12 +00:00
Martin Möhrmann	4e2b84ffc5	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): 'go run run.go -all_codegen -v codegen' passes with following adjustments: ARM64: The BFXIL pattern ((x << lc) >> rc \| y & ac) needed adjustment since ORshiftRL generation fusing '>> rc' and '\|' interferes with matching ((x << lc) >> rc) to generate UBFX. Previously ORshiftLL was created first using the shifts generated for (y & ac). S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs. Updates #33826 Updates #32781 Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb Reviewed-on: https://go-review.googlesource.com/c/go/+/194297 Run-TryBot: Martin Möhrmann <martisch@uos.de> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-09-21 18:00:13 +00:00
Martin Möhrmann	5bb59b6d16	Revert "compile: prefer an AND instead of SHR+SHL instructions" This reverts commit `9ec7074a94`. Reason for revert: broke s390x (copysign, abs) and arm64 (bitfield) tests. Change-Id: I16c1b389c062e8c4aa5de079f1d46c9b25b0db52 Reviewed-on: https://go-review.googlesource.com/c/go/+/193850 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-09-09 07:33:25 +00:00
Martin Möhrmann	9ec7074a94	compile: prefer an AND instead of SHR+SHL instructions On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute. A pair of shifts that operate on the same register will take 2 cycles and needs to wait for the input register value to be available. Large constants used to mask the high bits of a register with an AND instruction can not be encoded as an immediate in the AND instruction on amd64 and therefore need to be loaded into a register with a MOV instruction. However that MOV instruction is not dependent on the output register and on many CPUs does not compete with the AND or shift instructions for execution ports. Using a pair of shifts to mask high bits instead of an AND to mask high bits of a register has a shorter encoding and uses one less general purpose register but is slower due to taking one clock cycle longer if there is no register pressure that would make the AND variant need to generate a spill. For example the instructions emitted for (x & 1 << 63) before this CL are: 48c1ea3f SHRQ $0x3f, DX 48c1e23f SHLQ $0x3f, DX after this CL the instructions are the same as GCC and LLVM use: 48b80000000000000080 MOVQ $0x8000000000000000, AX 4821d0 ANDQ DX, AX Some platforms such as arm64 already have SSA optimization rules to fuse two shift instructions back into an AND. Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark: var GlobalU uint func BenchmarkAndHighBits(b *testing.B) { x := uint(0) for i := 0; i < b.N; i++ { x &= 1 << 63 } GlobalU = x } amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz: name old time/op new time/op delta AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25): Updates #33826 Updates #32781 Change-Id: I862d3587446410c447b9a7265196b57f85358633 Reviewed-on: https://go-review.googlesource.com/c/go/+/191780 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-09 06:49:17 +00:00
Josh Bleecher Snyder	9675f81928	cmd/compile: add more Neg/Com optimizations This is a grab-bag of minor optimizations. While we're here, document the c != -(1<<31) constraints better (#31888). file before after Δ % go 14669924 14665828 -4096 -0.028% asm 4867088 4858896 -8192 -0.168% compile 23988320 23984224 -4096 -0.017% cover 5210856 5206760 -4096 -0.079% link 6084376 6080280 -4096 -0.067% total 132181084 132156508 -24576 -0.019% file before after Δ % archive/tar.a 516708 516702 -6 -0.001% bufio.a 182200 181974 -226 -0.124% bytes.a 217624 216890 -734 -0.337% cmd/compile/internal/gc.a 8865412 8865228 -184 -0.002% cmd/compile/internal/ssa.a 29921002 29933976 +12974 +0.043% cmd/go/internal/modfetch/codehost.a 530602 530430 -172 -0.032% cmd/go/internal/modfetch.a 679664 679578 -86 -0.013% cmd/go/internal/modfile.a 411102 410928 -174 -0.042% cmd/go/internal/test.a 315218 315126 -92 -0.029% cmd/go/internal/tlog.a 183242 183256 +14 +0.008% cmd/go/internal/txtar.a 23148 23060 -88 -0.380% cmd/internal/bio.a 132064 132060 -4 -0.003% cmd/internal/buildid.a 107174 107172 -2 -0.002% cmd/internal/edit.a 33208 33354 +146 +0.440% cmd/internal/obj/arm.a 416488 416432 -56 -0.013% cmd/internal/obj/arm64.a 2772626 2772622 -4 -0.000% cmd/internal/obj/x86.a 923186 923114 -72 -0.008% cmd/internal/obj.a 679834 679836 +2 +0.000% cmd/internal/objfile.a 358374 358372 -2 -0.001% cmd/internal/test2json.a 67482 67434 -48 -0.071% cmd/link/internal/ld.a 2836280 2836110 -170 -0.006% cmd/link/internal/loadpe.a 148234 147736 -498 -0.336% cmd/link/internal/objfile.a 144534 144434 -100 -0.069% cmd/link/internal/ppc64.a 170876 170382 -494 -0.289% cmd/vendor/github.com/google/pprof/internal/elfexec.a 49896 49892 -4 -0.008% cmd/vendor/github.com/google/pprof/internal/graph.a 437478 437404 -74 -0.017% cmd/vendor/github.com/google/pprof/profile.a 902040 902044 +4 +0.000% cmd/vendor/github.com/ianlancetaylor/demangle.a 1217856 1217854 -2 -0.000% cmd/vendor/golang.org/x/arch/x86/x86asm.a 561332 560684 -648 -0.115% cmd/vendor/golang.org/x/crypto/ssh/terminal.a 153788 153784 -4 -0.003% cmd/vendor/golang.org/x/sys/unix.a 1043894 1043814 -80 -0.008% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.a 288458 288414 -44 -0.015% compress/flate.a 369024 368132 -892 -0.242% crypto/aes.a 109058 108968 -90 -0.083% crypto/cipher.a 150410 150544 +134 +0.089% crypto/elliptic.a 323572 323758 +186 +0.057% crypto/md5.a 50868 50788 -80 -0.157% crypto/rsa.a 195292 195214 -78 -0.040% crypto/sha1.a 70936 70858 -78 -0.110% crypto/sha256.a 75316 75236 -80 -0.106% crypto/sha512.a 84846 84768 -78 -0.092% crypto/subtle.a 6520 6514 -6 -0.092% crypto/tls.a 1654916 1654852 -64 -0.004% crypto/x509.a 888674 888638 -36 -0.004% database/sql.a 730280 730198 -82 -0.011% debug/gosym.a 184936 184862 -74 -0.040% debug/macho.a 272138 272136 -2 -0.001% debug/plan9obj.a 78444 78368 -76 -0.097% encoding/base64.a 82126 81882 -244 -0.297% encoding/binary.a 187196 187150 -46 -0.025% encoding/gob.a 897868 897870 +2 +0.000% encoding/json.a 659934 659832 -102 -0.015% encoding/pem.a 59138 58870 -268 -0.453% encoding/xml.a 694054 693300 -754 -0.109% fmt.a 484518 484196 -322 -0.066% go/format.a 33962 33994 +32 +0.094% go/printer.a 437132 437134 +2 +0.000% go/scanner.a 141774 141772 -2 -0.001% go/token.a 125130 125126 -4 -0.003% go/types.a 2192086 2191994 -92 -0.004% html/template.a 599038 598770 -268 -0.045% html.a 184842 184710 -132 -0.071% image/draw.a 129592 129238 -354 -0.273% image/gif.a 171824 171716 -108 -0.063% image/internal/imageutil.a 20282 19272 -1010 -4.980% image/jpeg.a 275608 275114 -494 -0.179% image/png.a 343416 343620 +204 +0.059% image.a 362244 362210 -34 -0.009% index/suffixarray.a 113040 112954 -86 -0.076% internal/trace.a 518972 518838 -134 -0.026% math/big.a 1012670 1012354 -316 -0.031% math.a 219338 219334 -4 -0.002% mime/multipart.a 178854 178502 -352 -0.197% mime/quotedprintable.a 49226 48936 -290 -0.589% net/http/cgi.a 172328 172324 -4 -0.002% net/http.a 4000180 3999732 -448 -0.011% net.a 1858330 1858252 -78 -0.004% path/filepath.a 107496 107498 +2 +0.002% reflect.a 1439776 1439994 +218 +0.015% regexp/syntax.a 459430 459432 +2 +0.000% regexp.a 416394 416400 +6 +0.001% runtime/debug.a 42106 42100 -6 -0.014% runtime/pprof/internal/profile.a 608718 608720 +2 +0.000% runtime/pprof.a 355474 355476 +2 +0.001% runtime.a 3555748 3555796 +48 +0.001% strconv.a 294432 294410 -22 -0.007% strings.a 292148 292090 -58 -0.020% syscall.a 859682 859470 -212 -0.025% text/tabwriter.a 65614 65148 -466 -0.710% vendor/golang.org/x/crypto/chacha20poly1305.a 126736 126728 -8 -0.006% vendor/golang.org/x/crypto/cryptobyte.a 269112 269114 +2 +0.001% vendor/golang.org/x/crypto/internal/chacha20.a 61842 61262 -580 -0.938% vendor/golang.org/x/crypto/poly1305.a 47410 47404 -6 -0.013% vendor/golang.org/x/net/dns/dnsmessage.a 628700 628012 -688 -0.109% vendor/golang.org/x/net/idna.a 237678 237826 +148 +0.062% vendor/golang.org/x/net/route.a 187852 187458 -394 -0.210% vendor/golang.org/x/sys/unix.a 1022426 1022348 -78 -0.008% vendor/golang.org/x/text/transform.a 117954 118104 +150 +0.127% vendor/golang.org/x/text/unicode/bidi.a 291398 291404 +6 +0.002% vendor/golang.org/x/text/unicode/norm.a 534640 534540 -100 -0.019% total 128945190 128945128 -62 -0.000% Change-Id: I346dc31356d5ef7774b824cf202169610bd26432 Reviewed-on: https://go-review.googlesource.com/c/go/+/175778 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:55:20 +00:00
Josh Bleecher Snyder	b8cbcacabe	cmd/compile: optimize more pointer comparisons The existing pointer comparison optimizations don't include pointer arithmetic. Add them. These rules trigger a few times in std cmd, while compiling: time.Duration.String cmd/go/internal/tlog.NodeHash crypto/tls.ticketKeyFromBytes (3 times) crypto/elliptic.(*p256Point).p256ScalarMult (15 times!) crypto/elliptic.initTable These weird comparisons occur when using the copy builtin, which does a pointer comparison between src and dst. This also happens to fix #32454, by optimizing enough early on that all values can be eliminated. Fixes #32454 Change-Id: I799d45743350bddd15a295dc1e12f8d03c11d1c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/180940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:35:18 +00:00
Josh Bleecher Snyder	abda0a6a92	cmd/compile: remove redundant rules EqPtr and NeqPtr are marked as commutative, so the transformations for rules are already generated by the preceding two lines. Change-Id: Ibecba5c8e54d9df00c84e1dae7e5d8cb53eeff43 Reviewed-on: https://go-review.googlesource.com/c/go/+/180939 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-29 19:32:42 +00:00
LE Manh Cuong	c5f142fa9f	cmd/compile: optimize bitset tests The assembly output for x & c == c, where c is power of 2: MOVQ "".set+8(SP), AX ANDQ $8, AX CMPQ AX, $8 SETEQ "".~r2+24(SP) With optimization using bitset: MOVQ "".set+8(SP), AX BTL $3, AX SETCS "".~r2+24(SP) output less than 1 instruction. However, there is no speed improvement: name old time/op new time/op delta AllBitSet-8 0.35ns ± 0% 0.35ns ± 0% ~ (all equal) Fixes #31904 Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/175957 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 18:01:16 +00:00
Daniel Martí	79dee788ec	cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 17:04:18 +00:00
Brian Kessler	4d9dd35806	cmd/compile: add signed divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication, and add and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also needed. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. On amd64, the divisibility check is 30-45% faster. name old time/op new time/op delta` DivisiblePow2constI64-4 0.83ns ± 1% 0.82ns ± 0% ~ (p=0.079 n=5+4) DivisibleconstI64-4 2.68ns ± 1% 1.87ns ± 0% -30.33% (p=0.000 n=5+4) DivisibleWDivconstI64-4 2.69ns ± 1% 2.71ns ± 3% ~ (p=1.000 n=5+5) DivisiblePow2constI32-4 1.15ns ± 1% 1.15ns ± 0% ~ (p=0.238 n=5+4) DivisibleconstI32-4 2.24ns ± 1% 1.20ns ± 0% -46.48% (p=0.016 n=5+4) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.683 n=5+5) DivisiblePow2constI16-4 0.81ns ± 1% 0.82ns ± 1% ~ (p=0.135 n=5+5) DivisibleconstI16-4 2.11ns ± 2% 1.20ns ± 1% -42.99% (p=0.008 n=5+5) DivisibleWDivconstI16-4 2.23ns ± 0% 2.27ns ± 2% +1.79% (p=0.029 n=4+4) DivisiblePow2constI8-4 0.81ns ± 1% 0.81ns ± 1% ~ (p=0.286 n=5+5) DivisibleconstI8-4 2.13ns ± 3% 1.19ns ± 1% -43.84% (p=0.008 n=5+5) DivisibleWDivconstI8-4 2.23ns ± 1% 2.25ns ± 1% ~ (p=0.183 n=5+5) Fixes #30282 Fixes #15806 Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/173998 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-30 22:02:07 +00:00
Brian Kessler	a28a942768	cmd/compile: add unsigned divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-27 20:46:46 +00:00

1 2 3 4 5 ...

254 commits