Commit graph

4 commits

Author SHA1 Message Date
Jes Cok
a5a249d6a6 all: eliminate unnecessary type conversions
Found by github.com/mdempsky/unconvert

Change-Id: I88ce10390a49ba768a4deaa0df9057c93c1164de
GitHub-Last-Rev: 3b0f7e8f74
GitHub-Pull-Request: golang/go#75974
Reviewed-on: https://go-review.googlesource.com/c/go/+/712940
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-10-21 08:13:08 -07:00
Brian Kessler
4d9dd35806 cmd/compile: add signed divisibility rules
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17

This general rule can compute divisibilty by one multiplication, and add
and a compare for odd divisors and an additional rotate for even divisors.

To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt".  This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also needed.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.

Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.

On amd64, the divisibility check is 30-45% faster.

name                     old time/op  new time/op  delta`
DivisiblePow2constI64-4  0.83ns ± 1%  0.82ns ± 0%     ~     (p=0.079 n=5+4)
DivisibleconstI64-4      2.68ns ± 1%  1.87ns ± 0%  -30.33%  (p=0.000 n=5+4)
DivisibleWDivconstI64-4  2.69ns ± 1%  2.71ns ± 3%     ~     (p=1.000 n=5+5)
DivisiblePow2constI32-4  1.15ns ± 1%  1.15ns ± 0%     ~     (p=0.238 n=5+4)
DivisibleconstI32-4      2.24ns ± 1%  1.20ns ± 0%  -46.48%  (p=0.016 n=5+4)
DivisibleWDivconstI32-4  2.27ns ± 1%  2.27ns ± 1%     ~     (p=0.683 n=5+5)
DivisiblePow2constI16-4  0.81ns ± 1%  0.82ns ± 1%     ~     (p=0.135 n=5+5)
DivisibleconstI16-4      2.11ns ± 2%  1.20ns ± 1%  -42.99%  (p=0.008 n=5+5)
DivisibleWDivconstI16-4  2.23ns ± 0%  2.27ns ± 2%   +1.79%  (p=0.029 n=4+4)
DivisiblePow2constI8-4   0.81ns ± 1%  0.81ns ± 1%     ~     (p=0.286 n=5+5)
DivisibleconstI8-4       2.13ns ± 3%  1.19ns ± 1%  -43.84%  (p=0.008 n=5+5)
DivisibleWDivconstI8-4   2.23ns ± 1%  2.25ns ± 1%     ~     (p=0.183 n=5+5)

Fixes #30282
Fixes #15806

Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/173998
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-04-30 22:02:07 +00:00
Brian Kessler
a28a942768 cmd/compile: add unsigned divisibility rules
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17

This general rule can compute divisibilty by one multiplication and a compare
for odd divisors and an additional rotate for even divisors.

To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt".  This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also available.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.

Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.

Additional rules to lower the generic RotateLeft* ops were also applied.

On amd64, the divisibility check is 25-50% faster.

name                     old time/op  new time/op  delta
DivconstI64-4            2.08ns ± 0%  2.08ns ± 1%     ~     (p=0.881 n=5+5)
DivisibleconstI64-4      2.67ns ± 0%  2.67ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleWDivconstI64-4  2.67ns ± 0%  2.67ns ± 0%     ~     (p=0.683 n=5+5)
DivconstU64-4            2.08ns ± 1%  2.08ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleconstU64-4      2.77ns ± 1%  1.55ns ± 2%  -43.90%  (p=0.008 n=5+5)
DivisibleWDivconstU64-4  2.99ns ± 1%  2.99ns ± 1%     ~     (p=1.000 n=5+5)
DivconstI32-4            1.53ns ± 2%  1.53ns ± 0%     ~     (p=1.000 n=5+5)
DivisibleconstI32-4      2.23ns ± 0%  2.25ns ± 3%     ~     (p=0.167 n=5+5)
DivisibleWDivconstI32-4  2.27ns ± 1%  2.27ns ± 1%     ~     (p=0.429 n=5+5)
DivconstU32-4            1.78ns ± 0%  1.78ns ± 1%     ~     (p=1.000 n=4+5)
DivisibleconstU32-4      2.52ns ± 2%  1.26ns ± 0%  -49.96%  (p=0.000 n=5+4)
DivisibleWDivconstU32-4  2.63ns ± 0%  2.85ns ±10%   +8.29%  (p=0.016 n=4+5)
DivconstI16-4            1.54ns ± 0%  1.54ns ± 0%     ~     (p=0.333 n=4+5)
DivisibleconstI16-4      2.10ns ± 0%  2.10ns ± 1%     ~     (p=0.571 n=4+5)
DivisibleWDivconstI16-4  2.22ns ± 0%  2.23ns ± 1%     ~     (p=0.556 n=4+5)
DivconstU16-4            1.09ns ± 0%  1.01ns ± 1%   -7.74%  (p=0.000 n=4+5)
DivisibleconstU16-4      1.83ns ± 0%  1.26ns ± 0%  -31.52%  (p=0.008 n=5+5)
DivisibleWDivconstU16-4  1.88ns ± 0%  1.89ns ± 1%     ~     (p=0.365 n=5+5)
DivconstI8-4             1.54ns ± 1%  1.54ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleconstI8-4       2.10ns ± 0%  2.11ns ± 0%     ~     (p=0.238 n=5+4)
DivisibleWDivconstI8-4   2.22ns ± 0%  2.23ns ± 2%     ~     (p=0.762 n=5+5)
DivconstU8-4             0.92ns ± 1%  0.94ns ± 1%   +2.65%  (p=0.008 n=5+5)
DivisibleconstU8-4       1.66ns ± 0%  1.26ns ± 1%  -24.28%  (p=0.008 n=5+5)
DivisibleWDivconstU8-4   1.79ns ± 0%  1.80ns ± 1%     ~     (p=0.079 n=4+5)

A follow-up change will address the signed division case.

Updates #30282

Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403
Reviewed-on: https://go-review.googlesource.com/c/go/+/168037
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-04-27 20:46:46 +00:00
Keith Randall
708ba22a0c cmd/compile: move constant divide strength reduction to SSA rules
Currently the conversion from constant divides to multiplies is mostly
done during the walk pass.  This is suboptimal because SSA can
determine that the value being divided by is constant more often
(e.g. after inlining).

Change-Id: If1a9b993edd71be37396b9167f77da271966f85f
Reviewed-on: https://go-review.googlesource.com/37015
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-17 06:16:44 +00:00