cmd/compile: make prove understand div, mod better

This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-12-08 06:10:04 +00:00 · 2025-10-22 22:22:51 -04:00 · 2025-10-22 22:22:51 -04:00 · 9bbda7c99d
commit 9bbda7c99d
parent 915c1839fe
25 changed files with 6190 additions and 4205 deletions
--- a/src/cmd/compile/internal/ssa/_gen/dec.rules
+++ b/src/cmd/compile/internal/ssa/_gen/dec.rules
@ -4,7 +4,7 @@

 // This file contains rules to decompose builtin compound types
 // (complex,string,slice,interface) into their constituent
-// types.  These rules work together with the decomposeBuiltIn
+// types.  These rules work together with the decomposeBuiltin
 // pass which handles phis of these types.

 (Store {t} _ _ mem) && t.Size() == 0 => mem
--- a/src/cmd/compile/internal/ssa/_gen/dec64.rules
+++ b/src/cmd/compile/internal/ssa/_gen/dec64.rules
@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.

 // This file contains rules to decompose [u]int64 types on 32-bit
-// architectures. These rules work together with the decomposeBuiltIn
+// architectures. These rules work together with the decomposeBuiltin
 // pass which handles phis of these typ.

 (Int64Hi (Int64Make hi _)) => hi
@ -217,11 +217,32 @@
 (Rsh8x64 x y)   => (Rsh8x32   x (Or32 <typ.UInt32> (Zeromask (Int64Hi y)) (Int64Lo y)))
 (Rsh8Ux64 x y)  => (Rsh8Ux32  x (Or32 <typ.UInt32> (Zeromask (Int64Hi y)) (Int64Lo y)))

+
 (RotateLeft64 x (Int64Make hi lo)) => (RotateLeft64 x lo)
 (RotateLeft32 x (Int64Make hi lo)) => (RotateLeft32 x lo)
 (RotateLeft16 x (Int64Make hi lo)) => (RotateLeft16 x lo)
 (RotateLeft8  x (Int64Make hi lo)) => (RotateLeft8  x lo)

+// RotateLeft64 by constant, for use in divmod.
+(RotateLeft64 <t> x (Const(64|32|16|8) [c])) && c&63 == 0 => x
+(RotateLeft64 <t> x (Const(64|32|16|8) [c])) && c&63 == 32 => (Int64Make <t> (Int64Lo x) (Int64Hi x))
+(RotateLeft64 <t> x (Const(64|32|16|8) [c])) && 0 < c&63 && c&63 < 32 =>
+	(Int64Make <t>
+		(Or32 <typ.UInt32>
+			(Lsh32x32  <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)]))
+			(Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)])))
+		(Or32 <typ.UInt32>
+			(Lsh32x32  <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)]))
+			(Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+(RotateLeft64 <t> x (Const(64|32|16|8) [c])) && 32 < c&63 && c&63 < 64 =>
+	(Int64Make <t>
+		(Or32 <typ.UInt32>
+			(Lsh32x32  <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)]))
+			(Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)])))
+		(Or32 <typ.UInt32>
+			(Lsh32x32  <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)]))
+			(Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+
 // Clean up constants a little
 (Or32 <typ.UInt32> (Zeromask (Const32 [c])) y) && c == 0 => y
 (Or32 <typ.UInt32> (Zeromask (Const32 [c])) y) && c != 0 => (Const32 <typ.UInt32> [-1])
--- a/src/cmd/compile/internal/ssa/_gen/divisible.rules
+++ b/src/cmd/compile/internal/ssa/_gen/divisible.rules
@ -0,0 +1,167 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Divisibility checks (x%c == 0 or x%c != 0) convert to multiply, rotate, compare.
+// The opt pass rewrote x%c to x-(x/c)*c
+// and then also rewrote x-(x/c)*c == 0 to x == (x/c)*c.
+// If x/c is being used for a division already (div.Uses != 1)
+// then we leave the expression alone.
+//
+// See ../magic.go for a detailed description of these algorithms.
+// See test/codegen/divmod.go for tests.
+// See divmod.rules for other division rules that run after these.
+
+// Divisiblity by unsigned or signed power of two.
+(Eq(8|16|32|64)  x (Mul(8|16|32|64) <t> (Div(8|16|32|64)u x (Const(8|16|32|64) [c])) (Const(8|16|32|64) [c])))
+  && x.Op != OpConst64 && isPowerOfTwo(c) =>
+  (Eq(8|16|32|64)  (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [c-1])) (Const(8|16|32|64) <t> [0]))
+(Eq(8|16|32|64)  x (Mul(8|16|32|64) <t> (Div(8|16|32|64)  x (Const(8|16|32|64) [c])) (Const(8|16|32|64) [c])))
+  && x.Op != OpConst64 && isPowerOfTwo(c) =>
+  (Eq(8|16|32|64)  (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [c-1])) (Const(8|16|32|64) <t> [0]))
+(Neq(8|16|32|64) x (Mul(8|16|32|64) <t> (Div(8|16|32|64)u x (Const(8|16|32|64) [c])) (Const(8|16|32|64) [c])))
+  && x.Op != OpConst64 && isPowerOfTwo(c) =>
+  (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [c-1])) (Const(8|16|32|64) <t> [0]))
+(Neq(8|16|32|64) x (Mul(8|16|32|64) <t> (Div(8|16|32|64)  x (Const(8|16|32|64) [c])) (Const(8|16|32|64) [c])))
+  && x.Op != OpConst64 && isPowerOfTwo(c) =>
+  (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [c-1])) (Const(8|16|32|64) <t> [0]))
+
+// Divisiblity by unsigned.
+(Eq8 x (Mul8 <t> div:(Div8u x (Const8 [c])) (Const8 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst8 && udivisibleOK8(c) =>
+  (Leq8U
+    (RotateLeft8 <t>
+      (Mul8 <t> x (Const8 <t> [int8(udivisible8(c).m)]))
+      (Const8 <t> [int8(8 - udivisible8(c).k)]))
+    (Const8 <t> [int8(udivisible8(c).max)]))
+(Neq8 x (Mul8 <t> div:(Div8u x (Const8 [c])) (Const8 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst8 && udivisibleOK8(c) =>
+  (Less8U
+    (Const8 <t> [int8(udivisible8(c).max)])
+    (RotateLeft8 <t>
+      (Mul8 <t> x (Const8 <t> [int8(udivisible8(c).m)]))
+      (Const8 <t> [int8(8 - udivisible8(c).k)])))
+(Eq16 x (Mul16 <t> div:(Div16u x (Const16 [c])) (Const16 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst16 && udivisibleOK16(c) =>
+  (Leq16U
+    (RotateLeft16 <t>
+      (Mul16 <t> x (Const16 <t> [int16(udivisible16(c).m)]))
+      (Const16 <t> [int16(16 - udivisible16(c).k)]))
+    (Const16 <t> [int16(udivisible16(c).max)]))
+(Neq16 x (Mul16 <t> div:(Div16u x (Const16 [c])) (Const16 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst16 && udivisibleOK16(c) =>
+  (Less16U
+    (Const16 <t> [int16(udivisible16(c).max)])
+    (RotateLeft16 <t>
+      (Mul16 <t> x (Const16 <t> [int16(udivisible16(c).m)]))
+      (Const16 <t> [int16(16 - udivisible16(c).k)])))
+(Eq32 x (Mul32 <t> div:(Div32u x (Const32 [c])) (Const32 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst32 && udivisibleOK32(c) =>
+  (Leq32U
+    (RotateLeft32 <t>
+      (Mul32 <t> x (Const32 <t> [int32(udivisible32(c).m)]))
+      (Const32 <t> [int32(32 - udivisible32(c).k)]))
+    (Const32 <t> [int32(udivisible32(c).max)]))
+(Neq32 x (Mul32 <t> div:(Div32u x (Const32 [c])) (Const32 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst32 && udivisibleOK32(c) =>
+  (Less32U
+    (Const32 <t> [int32(udivisible32(c).max)])
+    (RotateLeft32 <t>
+      (Mul32 <t> x (Const32 <t> [int32(udivisible32(c).m)]))
+      (Const32 <t> [int32(32 - udivisible32(c).k)])))
+(Eq64 x (Mul64 <t> div:(Div64u x (Const64 [c])) (Const64 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst64 && udivisibleOK64(c) =>
+  (Leq64U
+    (RotateLeft64 <t>
+      (Mul64 <t> x (Const64 <t> [int64(udivisible64(c).m)]))
+      (Const64 <t> [int64(64 - udivisible64(c).k)]))
+    (Const64 <t> [int64(udivisible64(c).max)]))
+(Neq64 x (Mul64 <t> div:(Div64u x (Const64 [c])) (Const64 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst64 && udivisibleOK64(c) =>
+  (Less64U
+    (Const64 <t> [int64(udivisible64(c).max)])
+    (RotateLeft64 <t>
+      (Mul64 <t> x (Const64 <t> [int64(udivisible64(c).m)]))
+      (Const64 <t> [int64(64 - udivisible64(c).k)])))
+
+// Divisiblity by signed.
+(Eq8 x (Mul8 <t> div:(Div8  x (Const8 [c])) (Const8 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst8 && sdivisibleOK8(c) =>
+  (Leq8U
+    (RotateLeft8 <t>
+      (Add8 <t> (Mul8 <t> x (Const8 <t> [int8(sdivisible8(c).m)]))
+        (Const8 <t> [int8(sdivisible8(c).a)]))
+      (Const8 <t> [int8(8 - sdivisible8(c).k)]))
+    (Const8 <t> [int8(sdivisible8(c).max)]))
+(Neq8 x (Mul8 <t> div:(Div8  x (Const8 [c])) (Const8 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst8 && sdivisibleOK8(c) =>
+  (Less8U
+    (Const8 <t> [int8(sdivisible8(c).max)])
+    (RotateLeft8 <t>
+      (Add8 <t> (Mul8 <t> x (Const8 <t> [int8(sdivisible8(c).m)]))
+        (Const8 <t> [int8(sdivisible8(c).a)]))
+      (Const8 <t> [int8(8 - sdivisible8(c).k)])))
+(Eq16 x (Mul16 <t> div:(Div16 x (Const16 [c])) (Const16 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst16 && sdivisibleOK16(c) =>
+  (Leq16U
+    (RotateLeft16 <t>
+      (Add16 <t> (Mul16 <t> x (Const16 <t> [int16(sdivisible16(c).m)]))
+        (Const16 <t> [int16(sdivisible16(c).a)]))
+      (Const16 <t> [int16(16 - sdivisible16(c).k)]))
+    (Const16 <t> [int16(sdivisible16(c).max)]))
+(Neq16 x (Mul16 <t> div:(Div16 x (Const16 [c])) (Const16 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst16 && sdivisibleOK16(c) =>
+  (Less16U
+    (Const16 <t> [int16(sdivisible16(c).max)])
+    (RotateLeft16 <t>
+      (Add16 <t> (Mul16 <t> x (Const16 <t> [int16(sdivisible16(c).m)]))
+        (Const16 <t> [int16(sdivisible16(c).a)]))
+      (Const16 <t> [int16(16 - sdivisible16(c).k)])))
+(Eq32 x (Mul32 <t> div:(Div32 x (Const32 [c])) (Const32 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst32 && sdivisibleOK32(c) =>
+  (Leq32U
+    (RotateLeft32 <t>
+      (Add32 <t> (Mul32 <t> x (Const32 <t> [int32(sdivisible32(c).m)]))
+        (Const32 <t> [int32(sdivisible32(c).a)]))
+      (Const32 <t> [int32(32 - sdivisible32(c).k)]))
+    (Const32 <t> [int32(sdivisible32(c).max)]))
+(Neq32 x (Mul32 <t> div:(Div32 x (Const32 [c])) (Const32 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst32 && sdivisibleOK32(c) =>
+  (Less32U
+    (Const32 <t> [int32(sdivisible32(c).max)])
+    (RotateLeft32 <t>
+      (Add32 <t> (Mul32 <t> x (Const32 <t> [int32(sdivisible32(c).m)]))
+        (Const32 <t> [int32(sdivisible32(c).a)]))
+      (Const32 <t> [int32(32 - sdivisible32(c).k)])))
+(Eq64 x (Mul64 <t> div:(Div64 x (Const64 [c])) (Const64 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst64 && sdivisibleOK64(c) =>
+  (Leq64U
+    (RotateLeft64 <t>
+      (Add64 <t> (Mul64 <t> x (Const64 <t> [int64(sdivisible64(c).m)]))
+        (Const64 <t> [int64(sdivisible64(c).a)]))
+      (Const64 <t> [int64(64 - sdivisible64(c).k)]))
+    (Const64 <t> [int64(sdivisible64(c).max)]))
+(Neq64 x (Mul64 <t> div:(Div64 x (Const64 [c])) (Const64 [c])))
+  && div.Uses == 1
+  && x.Op != OpConst64 && sdivisibleOK64(c) =>
+  (Less64U
+    (Const64 <t> [int64(sdivisible64(c).max)])
+    (RotateLeft64 <t>
+      (Add64 <t> (Mul64 <t> x (Const64 <t> [int64(sdivisible64(c).m)]))
+        (Const64 <t> [int64(sdivisible64(c).a)]))
+      (Const64 <t> [int64(64 - sdivisible64(c).k)])))
--- a/src/cmd/compile/internal/ssa/_gen/divisibleOps.go
+++ b/src/cmd/compile/internal/ssa/_gen/divisibleOps.go
@ -0,0 +1,18 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+var divisibleOps = []opData{}
+
+var divisibleBlocks = []blockData{}
+
+func init() {
+	archs = append(archs, arch{
+		name:    "divisible",
+		ops:     divisibleOps,
+		blocks:  divisibleBlocks,
+		generic: true,
+	})
+}
--- a/src/cmd/compile/internal/ssa/_gen/divmod.rules
+++ b/src/cmd/compile/internal/ssa/_gen/divmod.rules
@ -0,0 +1,288 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Lowering of mul, div, and mod operations.
+// Runs after prove, so that prove can analyze div and mod ops
+// directly instead of these obscured expansions,
+// but before decompose builtin, so that 32-bit systems
+// can still lower 64-bit ops to 32-bit ones.
+//
+// See ../magic.go for a detailed description of these algorithms.
+// See test/codegen/divmod.go for tests.
+
+// Unsigned div and mod by power of 2 handled in generic.rules.
+// (The equivalent unsigned right shift and mask are simple enough for prove to analyze.)
+
+// Signed divide by power of 2.
+// n / c =       n >> log(c) if n >= 0
+//       = (n+c-1) >> log(c) if n < 0
+// We conditionally add c-1 by adding n>>63>>(64-log(c)) (first shift signed, second shift unsigned).
+(Div8  <t> n (Const8  [c])) && isPowerOfTwo(c) =>
+  (Rsh8x64
+    (Add8  <t> n (Rsh8Ux64  <t> (Rsh8x64  <t> n (Const64 <typ.UInt64> [ 7])) (Const64 <typ.UInt64> [int64( 8-log8(c))])))
+    (Const64 <typ.UInt64> [int64(log8(c))]))
+(Div16 <t> n (Const16 [c])) && isPowerOfTwo(c) =>
+  (Rsh16x64
+    (Add16 <t> n (Rsh16Ux64 <t> (Rsh16x64 <t> n (Const64 <typ.UInt64> [15])) (Const64 <typ.UInt64> [int64(16-log16(c))])))
+    (Const64 <typ.UInt64> [int64(log16(c))]))
+(Div32 <t> n (Const32 [c])) && isPowerOfTwo(c) =>
+  (Rsh32x64
+    (Add32 <t> n (Rsh32Ux64 <t> (Rsh32x64 <t> n (Const64 <typ.UInt64> [31])) (Const64 <typ.UInt64> [int64(32-log32(c))])))
+    (Const64 <typ.UInt64> [int64(log32(c))]))
+(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) =>
+  (Rsh64x64
+    (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [int64(64-log64(c))])))
+    (Const64 <typ.UInt64> [int64(log64(c))]))
+
+// Divide, not a power of 2, by strength reduction to double-width multiply and shift.
+//
+// umagicN(c) computes m, s such that N-bit unsigned divide
+//     x/c = (x*((1<<N)+m))>>N>>s = ((x*m)>>N+x)>>s
+// where the multiplies are unsigned.
+// Note that the returned m is always N+1 bits; umagicN omits the high 1<<N bit.
+// The difficult part is implementing the 2N+1-bit multiply,
+// since in general we have only a 2N-bit multiply available.
+//
+// smagic(c) computes m, s such that N-bit signed divide
+//     x/c = (x*m)>>N>>s - bool2int(x < 0).
+// Here m is an unsigned N-bit number but x is signed.
+//
+// In general the division cases are:
+//
+//  1. A signed divide where 2N ≤ the register size.
+//     This form can use the signed algorithm directly.
+//
+//  2. A signed divide where m is even.
+//     This form can use a signed double-width multiply with m/2,
+//     shifting by s-1.
+//
+//  3. A signed divide where m is odd.
+//     This form can use x*m = ((x*(m-2^N))>>N+x) with a signed multiply.
+//     Since intN(m) is m-2^N < 0, the product and x have different signs,
+//     so there can be no overflow on the addition.
+//
+//  4. An unsigned divide where we know x < 1<<(N-1).
+//     This form can use the signed algorithm without the bool2int fixup,
+//     and since we know the product is only 2N-1 bits, we can use an
+//     unsigned multiply to obtain the high N bits directly, regardless
+//     of whether m is odd or even.
+//
+//  5. An unsigned divide where 2N+1 ≤ the register size.
+//     This form uses the unsigned algorithm with an explicit (1<<N)+m.
+//
+//  6. An unsigned divide where the N+1-bit m is even.
+//     This form can use an N-bit m/2 instead and shift one less bit.
+//
+//  7. An unsigned divide where m is odd but c is even.
+//     This form can shift once and then divide by (c/2) instead.
+//     The magic number m for c is ⌈2^k/c⌉, so we can use
+//     (m+1)/2 = ⌈2^k/(c/2)⌉ instead.
+//
+//  8. An unsigned divide on systems with an avg instruction.
+//     We noted above that (x*((1<<N)+m))>>N>>s = ((x*m)>>N+x)>>s.
+//     Let hi = (x*m)>>N, so we want (hi+x) >> s = avg(hi, x) >> (s-1).
+//
+//  9. Unsigned 64-bit divide by 16-bit constant on 32-bit systems.
+//     Use long division with 16-bit digits.
+//
+// Note: All systems have Hmul and Avg except for wasm, and the
+// wasm JITs may well apply all these optimizations already anyway,
+// so it may be worth looking into avoiding this pass entirely on wasm
+// and dropping all the useAvg useHmul uncertainty.
+
+// Case 1. Signed divides where 2N ≤ register size.
+(Div8  <t> x (Const8  [c])) && smagicOK8(c) =>
+  (Sub8  <t>
+    (Rsh32x64 <t>
+      (Mul32 <typ.UInt32> (SignExt8to32 x) (Const32 <typ.UInt32> [int32(smagic8(c).m)]))
+      (Const64 <typ.UInt64> [8 + smagic8(c).s]))
+    (Rsh32x64 <t> (SignExt8to32  x) (Const64 <typ.UInt64> [31])))
+(Div16 <t> x (Const16 [c])) && smagicOK16(c) =>
+  (Sub16 <t>
+    (Rsh32x64 <t>
+      (Mul32 <typ.UInt32> (SignExt16to32 x) (Const32 <typ.UInt32> [int32(smagic16(c).m)]))
+      (Const64 <typ.UInt64> [16 + smagic16(c).s]))
+    (Rsh32x64 <t> (SignExt16to32 x) (Const64 <typ.UInt64> [31])))
+(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 8 =>
+  (Sub32 <t>
+    (Rsh64x64 <t>
+      (Mul64 <typ.UInt64> (SignExt32to64 x) (Const64 <typ.UInt64> [int64(smagic32(c).m)]))
+      (Const64 <typ.UInt64> [32 + smagic32(c).s]))
+    (Rsh64x64 <t> (SignExt32to64 x) (Const64 <typ.UInt64> [63])))
+
+// Case 2. Signed divides where m is even.
+(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 == 0 && config.useHmul =>
+  (Sub32 <t>
+    (Rsh32x64 <t>
+      (Hmul32 <t> x (Const32 <typ.UInt32> [int32(smagic32(c).m/2)]))
+      (Const64 <typ.UInt64> [smagic32(c).s - 1]))
+    (Rsh32x64 <t> x (Const64 <typ.UInt64> [31])))
+(Div64 <t> x (Const64 [c])) && smagicOK64(c) && config.RegSize == 8 && smagic64(c).m&1 == 0 && config.useHmul =>
+  (Sub64 <t>
+    (Rsh64x64 <t>
+      (Hmul64 <t> x (Const64 <typ.UInt64> [int64(smagic64(c).m/2)]))
+      (Const64 <typ.UInt64> [smagic64(c).s - 1]))
+    (Rsh64x64 <t> x (Const64 <typ.UInt64> [63])))
+
+// Case 3. Signed divides where m is odd.
+(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 != 0 && config.useHmul =>
+  (Sub32 <t>
+    (Rsh32x64 <t>
+      (Add32 <t> x (Hmul32 <t> x (Const32 <typ.UInt32> [int32(smagic32(c).m)])))
+      (Const64 <typ.UInt64> [smagic32(c).s]))
+    (Rsh32x64 <t> x (Const64 <typ.UInt64> [31])))
+(Div64 <t> x (Const64 [c])) && smagicOK64(c) && config.RegSize == 8 && smagic64(c).m&1 != 0 && config.useHmul =>
+  (Sub64 <t>
+    (Rsh64x64 <t>
+      (Add64 <t> x (Hmul64 <t> x (Const64 <typ.UInt64> [int64(smagic64(c).m)])))
+      (Const64 <typ.UInt64> [smagic64(c).s]))
+    (Rsh64x64 <t> x (Const64 <typ.UInt64> [63])))
+
+// Case 4. Unsigned divide where x < 1<<(N-1).
+// Skip Div8u since case 5's handling is just as good.
+(Div16u <t> x (Const16 [c])) && t.IsSigned() && smagicOK16(c) =>
+  (Rsh32Ux64 <t>
+    (Mul32 <typ.UInt32> (SignExt16to32 x) (Const32 <typ.UInt32> [int32(smagic16(c).m)]))
+    (Const64 <typ.UInt64> [16 + smagic16(c).s]))
+(Div32u <t> x (Const32 [c])) && t.IsSigned() && smagicOK32(c) && config.RegSize == 8 =>
+  (Rsh64Ux64 <t>
+    (Mul64 <typ.UInt64> (SignExt32to64 x) (Const64 <typ.UInt64> [int64(smagic32(c).m)]))
+    (Const64 <typ.UInt64> [32 + smagic32(c).s]))
+(Div32u <t> x (Const32 [c])) && t.IsSigned() && smagicOK32(c) && config.RegSize == 4 && config.useHmul =>
+  (Rsh32Ux64 <t>
+    (Hmul32u <typ.UInt32> x (Const32 <typ.UInt32> [int32(smagic32(c).m)]))
+    (Const64 <typ.UInt64> [smagic32(c).s]))
+(Div64u <t> x (Const64 [c])) && t.IsSigned() && smagicOK64(c) && config.RegSize == 8 && config.useHmul =>
+  (Rsh64Ux64 <t>
+    (Hmul64u <typ.UInt64> x (Const64 <typ.UInt64> [int64(smagic64(c).m)]))
+    (Const64 <typ.UInt64> [smagic64(c).s]))
+
+// Case 5. Unsigned divide where 2N+1 ≤ register size.
+(Div8u  <t> x (Const8  [c])) && umagicOK8(c) =>
+  (Trunc32to8 <t>
+    (Rsh32Ux64 <typ.UInt32>
+      (Mul32 <typ.UInt32> (ZeroExt8to32 x) (Const32 <typ.UInt32> [int32(1<<8 + umagic8(c).m)]))
+      (Const64 <typ.UInt64> [8 + umagic8(c).s])))
+(Div16u <t> x (Const16 [c])) && umagicOK16(c) && config.RegSize == 8 =>
+  (Trunc64to16 <t>
+    (Rsh64Ux64 <typ.UInt64>
+      (Mul64 <typ.UInt64> (ZeroExt16to64 x) (Const64 <typ.UInt64> [int64(1<<16 + umagic16(c).m)]))
+      (Const64 <typ.UInt64> [16 + umagic16(c).s])))
+
+// Case 6. Unsigned divide where m is even.
+(Div16u <t> x (Const16 [c])) && umagicOK16(c) && umagic16(c).m&1 == 0 =>
+  (Trunc32to16 <t>
+    (Rsh32Ux64 <typ.UInt32>
+      (Mul32 <typ.UInt32> (ZeroExt16to32 x) (Const32 <typ.UInt32> [int32(1<<15 + umagic16(c).m/2)]))
+      (Const64 <typ.UInt64> [16 + umagic16(c).s - 1])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && umagic32(c).m&1 == 0 && config.RegSize == 8 =>
+  (Trunc64to32 <t>
+    (Rsh64Ux64 <typ.UInt64>
+      (Mul64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [int64(1<<31 + umagic32(c).m/2)]))
+      (Const64 <typ.UInt64> [32 + umagic32(c).s - 1])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && umagic32(c).m&1 == 0 && config.RegSize == 4 && config.useHmul =>
+  (Rsh32Ux64 <t>
+    (Hmul32u <typ.UInt32> x (Const32 <typ.UInt32> [int32(1<<31 + umagic32(c).m/2)]))
+    (Const64 <typ.UInt64> [umagic32(c).s - 1]))
+(Div64u <t> x (Const64 [c])) && umagicOK64(c) && umagic64(c).m&1 == 0 && config.RegSize == 8 && config.useHmul =>
+  (Rsh64Ux64 <t>
+    (Hmul64u <typ.UInt64> x (Const64 <typ.UInt64> [int64(1<<63 + umagic64(c).m/2)]))
+    (Const64 <typ.UInt64> [umagic64(c).s - 1]))
+
+// Case 7. Unsigned divide where c is even.
+(Div16u <t> x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && c&1 == 0 =>
+  (Trunc32to16 <t>
+    (Rsh32Ux64 <typ.UInt32>
+      (Mul32 <typ.UInt32>
+        (Rsh32Ux64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [1]))
+        (Const32 <typ.UInt32> [int32(1<<15 + (umagic16(c).m+1)/2)]))
+      (Const64 <typ.UInt64> [16 + umagic16(c).s - 2])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && c&1 == 0 =>
+  (Trunc64to32 <t>
+    (Rsh64Ux64 <typ.UInt64>
+      (Mul64 <typ.UInt64>
+        (Rsh64Ux64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [1]))
+        (Const64 <typ.UInt64> [int64(1<<31 + (umagic32(c).m+1)/2)]))
+      (Const64 <typ.UInt64> [32 + umagic32(c).s - 2])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && c&1 == 0 && config.useHmul =>
+  (Rsh32Ux64 <t>
+    (Hmul32u <typ.UInt32>
+      (Rsh32Ux64 <typ.UInt32> x (Const64 <typ.UInt64> [1]))
+      (Const32 <typ.UInt32> [int32(1<<31 + (umagic32(c).m+1)/2)]))
+    (Const64 <typ.UInt64> [umagic32(c).s - 2]))
+(Div64u <t> x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && c&1 == 0 && config.useHmul =>
+  (Rsh64Ux64 <t>
+    (Hmul64u <typ.UInt64>
+      (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [1]))
+      (Const64 <typ.UInt64> [int64(1<<63 + (umagic64(c).m+1)/2)]))
+    (Const64 <typ.UInt64> [umagic64(c).s - 2]))
+
+// Case 8. Unsigned divide on systems with avg.
+(Div16u <t> x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && config.useAvg =>
+  (Trunc32to16 <t>
+    (Rsh32Ux64 <typ.UInt32>
+      (Avg32u
+        (Lsh32x64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [16]))
+        (Mul32 <typ.UInt32> (ZeroExt16to32 x) (Const32 <typ.UInt32> [int32(umagic16(c).m)])))
+      (Const64 <typ.UInt64> [16 + umagic16(c).s - 1])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && config.useAvg =>
+  (Trunc64to32 <t>
+    (Rsh64Ux64 <typ.UInt64>
+      (Avg64u
+        (Lsh64x64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [32]))
+        (Mul64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt32> [int64(umagic32(c).m)])))
+      (Const64 <typ.UInt64> [32 + umagic32(c).s - 1])))
+(Div32u <t> x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && config.useAvg && config.useHmul =>
+  (Rsh32Ux64 <t>
+    (Avg32u x (Hmul32u <typ.UInt32> x (Const32 <typ.UInt32> [int32(umagic32(c).m)])))
+    (Const64 <typ.UInt64> [umagic32(c).s - 1]))
+(Div64u <t> x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && config.useAvg && config.useHmul =>
+  (Rsh64Ux64 <t>
+    (Avg64u x (Hmul64u <typ.UInt64> x (Const64 <typ.UInt64> [int64(umagic64(c).m)])))
+    (Const64 <typ.UInt64> [umagic64(c).s - 1]))
+
+// Case 9. For unsigned 64-bit divides on 32-bit machines,
+// if the constant fits in 16 bits (so that the last term
+// fits in 32 bits), convert to three 32-bit divides by a constant.
+//
+// If 1<<32 = Q * c + R
+// and    x = hi << 32 + lo
+//
+// Then x = (hi/c*c + hi%c) << 32 + lo
+//        = hi/c*c<<32 + hi%c<<32 + lo
+//        = hi/c*c<<32 + (hi%c)*(Q*c+R) + lo/c*c + lo%c
+//        = hi/c*c<<32 + (hi%c)*Q*c + lo/c*c + (hi%c*R+lo%c)
+// and x / c = (hi/c)<<32 + (hi%c)*Q + lo/c + (hi%c*R+lo%c)/c
+(Div64u x (Const64 [c])) && c > 0 && c <= 0xFFFF && umagicOK32(int32(c)) && config.RegSize == 4 && config.useHmul =>
+  (Add64
+    (Add64 <typ.UInt64>
+      (Add64 <typ.UInt64>
+        (Lsh64x64 <typ.UInt64>
+          (ZeroExt32to64
+            (Div32u <typ.UInt32>
+              (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
+              (Const32 <typ.UInt32> [int32(c)])))
+          (Const64 <typ.UInt64> [32]))
+        (ZeroExt32to64 (Div32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)]))))
+      (Mul64 <typ.UInt64>
+        (ZeroExt32to64 <typ.UInt64>
+          (Mod32u <typ.UInt32>
+            (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
+            (Const32 <typ.UInt32> [int32(c)])))
+        (Const64 <typ.UInt64> [int64((1<<32)/c)])))
+      (ZeroExt32to64
+        (Div32u <typ.UInt32>
+          (Add32 <typ.UInt32>
+            (Mod32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)]))
+            (Mul32 <typ.UInt32>
+              (Mod32u <typ.UInt32>
+                (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
+                (Const32 <typ.UInt32> [int32(c)]))
+              (Const32 <typ.UInt32> [int32((1<<32)%c)])))
+          (Const32 <typ.UInt32> [int32(c)]))))
+
+// Repeated from generic.rules, for expanding the expression above
+// (which can then be further expanded to handle the nested Div32u).
+(Mod32u <t> x (Const32 [c])) && x.Op != OpConst32 && c > 0 && umagicOK32(c)
+  => (Sub32 x (Mul32 <t> (Div32u <t> x (Const32 <t> [c])) (Const32 <t> [c])))
--- a/src/cmd/compile/internal/ssa/_gen/divmodOps.go
+++ b/src/cmd/compile/internal/ssa/_gen/divmodOps.go
@ -0,0 +1,18 @@
+// Copyright 2025 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package main
+
+var divmodOps = []opData{}
+
+var divmodBlocks = []blockData{}
+
+func init() {
+	archs = append(archs, arch{
+		name:    "divmod",
+		ops:     divmodOps,
+		blocks:  divmodBlocks,
+		generic: true,
+	})
+}
--- a/src/cmd/compile/internal/ssa/_gen/generic.rules
+++ b/src/cmd/compile/internal/ssa/_gen/generic.rules
@ -199,16 +199,6 @@
 (And(8|16|32|64) <t> (Com(8|16|32|64) x) (Com(8|16|32|64) y)) => (Com(8|16|32|64) (Or(8|16|32|64) <t> x y))
 (Or(8|16|32|64) <t> (Com(8|16|32|64) x) (Com(8|16|32|64) y)) => (Com(8|16|32|64) (And(8|16|32|64) <t> x y))

-// Convert multiplication by a power of two to a shift.
-(Mul8  <t> n (Const8  [c])) && isPowerOfTwo(c) => (Lsh8x64  <t> n (Const64 <typ.UInt64> [log8(c)]))
-(Mul16 <t> n (Const16 [c])) && isPowerOfTwo(c) => (Lsh16x64 <t> n (Const64 <typ.UInt64> [log16(c)]))
-(Mul32 <t> n (Const32 [c])) && isPowerOfTwo(c) => (Lsh32x64 <t> n (Const64 <typ.UInt64> [log32(c)]))
-(Mul64 <t> n (Const64 [c])) && isPowerOfTwo(c) => (Lsh64x64 <t> n (Const64 <typ.UInt64> [log64(c)]))
-(Mul8  <t> n (Const8  [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg8  (Lsh8x64  <t> n (Const64 <typ.UInt64> [log8(-c)])))
-(Mul16 <t> n (Const16 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg16 (Lsh16x64 <t> n (Const64 <typ.UInt64> [log16(-c)])))
-(Mul32 <t> n (Const32 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg32 (Lsh32x64 <t> n (Const64 <typ.UInt64> [log32(-c)])))
-(Mul64 <t> n (Const64 [c])) && t.IsSigned() && isPowerOfTwo(-c) => (Neg64 (Lsh64x64 <t> n (Const64 <typ.UInt64> [log64(-c)])))
-
 (Mod8  (Const8  [c]) (Const8  [d])) && d != 0 => (Const8  [c % d])
 (Mod16 (Const16 [c]) (Const16 [d])) && d != 0 => (Const16 [c % d])
 (Mod32 (Const32 [c]) (Const32 [d])) && d != 0 => (Const32 [c % d])
@ -380,13 +370,15 @@

 // Distribute multiplication c * (d+x) -> c*d + c*x. Useful for:
 // a[i].b = ...; a[i+1].b = ...
-(Mul64 (Const64 <t> [c]) (Add64 <t> (Const64 <t> [d]) x)) =>
+// The !isPowerOfTwo is a kludge to keep a[i+1] using an index by a multiply,
+// which turns into an index by a shift, which can use a shifted operand on ARM systems.
+(Mul64 (Const64 <t> [c]) (Add64 <t> (Const64 <t> [d]) x)) && !isPowerOfTwo(c) =>
  (Add64 (Const64 <t> [c*d]) (Mul64 <t> (Const64 <t> [c]) x))
-(Mul32 (Const32 <t> [c]) (Add32 <t> (Const32 <t> [d]) x)) =>
+(Mul32 (Const32 <t> [c]) (Add32 <t> (Const32 <t> [d]) x)) && !isPowerOfTwo(c) =>
  (Add32 (Const32 <t> [c*d]) (Mul32 <t> (Const32 <t> [c]) x))
-(Mul16 (Const16 <t> [c]) (Add16 <t> (Const16 <t> [d]) x)) =>
+(Mul16 (Const16 <t> [c]) (Add16 <t> (Const16 <t> [d]) x)) && !isPowerOfTwo(c) =>
  (Add16 (Const16 <t> [c*d]) (Mul16 <t> (Const16 <t> [c]) x))
-(Mul8 (Const8 <t> [c]) (Add8 <t> (Const8 <t> [d]) x)) =>
+(Mul8 (Const8 <t> [c]) (Add8 <t> (Const8 <t> [d]) x)) && !isPowerOfTwo(c) =>
  (Add8 (Const8 <t> [c*d]) (Mul8 <t> (Const8 <t> [c]) x))

 // Rewrite x*y ± x*z  to  x*(y±z)
@ -1034,176 +1026,9 @@
 // We must ensure that no intermediate computations are invalid pointers.
 (Convert a:(Add(64|32) (Add(64|32) (Convert ptr mem) off1) off2) mem) => (AddPtr ptr (Add(64|32) <a.Type> off1 off2))

-// strength reduction of divide by a constant.
-// See ../magic.go for a detailed description of these algorithms.
-
-// Unsigned divide by power of 2.  Strength reduce to a shift.
-(Div8u  n (Const8  [c])) && isUnsignedPowerOfTwo(uint8(c)) => (Rsh8Ux64  n (Const64 <typ.UInt64> [log8u(uint8(c))]))
-(Div16u n (Const16 [c])) && isUnsignedPowerOfTwo(uint16(c)) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16u(uint16(c))]))
-(Div32u n (Const32 [c])) && isUnsignedPowerOfTwo(uint32(c)) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32u(uint32(c))]))
-(Div64u n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64u(uint64(c))]))
-
-// Signed non-negative divide by power of 2.
-(Div8  n (Const8  [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh8Ux64  n (Const64 <typ.UInt64> [log8(c)]))
-(Div16 n (Const16 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16(c)]))
-(Div32 n (Const32 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32(c)]))
-(Div64 n (Const64 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64(c)]))
-(Div64 n (Const64 [-1<<63])) && isNonNegative(n)                 => (Const64 [0])
-
-// Unsigned divide, not a power of 2.  Strength reduce to a multiply.
-// For 8-bit divides, we just do a direct 9-bit by 8-bit multiply.
-(Div8u x (Const8 [c])) && umagicOK8(c) =>
-  (Trunc32to8
-    (Rsh32Ux64 <typ.UInt32>
-      (Mul32 <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(1<<8+umagic8(c).m)])
-        (ZeroExt8to32 x))
-      (Const64 <typ.UInt64> [8+umagic8(c).s])))
-
-// For 16-bit divides on 64-bit machines, we do a direct 17-bit by 16-bit multiply.
-(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 8 =>
-  (Trunc64to16
-    (Rsh64Ux64 <typ.UInt64>
-      (Mul64 <typ.UInt64>
-        (Const64 <typ.UInt64> [int64(1<<16+umagic16(c).m)])
-        (ZeroExt16to64 x))
-      (Const64 <typ.UInt64> [16+umagic16(c).s])))
-
-// For 16-bit divides on 32-bit machines
-(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && umagic16(c).m&1 == 0 =>
-  (Trunc32to16
-    (Rsh32Ux64 <typ.UInt32>
-      (Mul32 <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(1<<15+umagic16(c).m/2)])
-        (ZeroExt16to32 x))
-      (Const64 <typ.UInt64> [16+umagic16(c).s-1])))
-(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && c&1 == 0 =>
-  (Trunc32to16
-    (Rsh32Ux64 <typ.UInt32>
-      (Mul32 <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(1<<15+(umagic16(c).m+1)/2)])
-        (Rsh32Ux64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [1])))
-      (Const64 <typ.UInt64> [16+umagic16(c).s-2])))
-(Div16u x (Const16 [c])) && umagicOK16(c) && config.RegSize == 4 && config.useAvg =>
-  (Trunc32to16
-    (Rsh32Ux64 <typ.UInt32>
-      (Avg32u
-        (Lsh32x64 <typ.UInt32> (ZeroExt16to32 x) (Const64 <typ.UInt64> [16]))
-        (Mul32 <typ.UInt32>
-          (Const32 <typ.UInt32> [int32(umagic16(c).m)])
-          (ZeroExt16to32 x)))
-      (Const64 <typ.UInt64> [16+umagic16(c).s-1])))
-
-// For 32-bit divides on 32-bit machines
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && umagic32(c).m&1 == 0 && config.useHmul =>
-  (Rsh32Ux64 <typ.UInt32>
-    (Hmul32u <typ.UInt32>
-      (Const32 <typ.UInt32> [int32(1<<31+umagic32(c).m/2)])
-      x)
-    (Const64 <typ.UInt64> [umagic32(c).s-1]))
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && c&1 == 0 && config.useHmul =>
-  (Rsh32Ux64 <typ.UInt32>
-    (Hmul32u <typ.UInt32>
-      (Const32 <typ.UInt32> [int32(1<<31+(umagic32(c).m+1)/2)])
-      (Rsh32Ux64 <typ.UInt32> x (Const64 <typ.UInt64> [1])))
-    (Const64 <typ.UInt64> [umagic32(c).s-2]))
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 4 && config.useAvg && config.useHmul =>
-  (Rsh32Ux64 <typ.UInt32>
-    (Avg32u
-      x
-      (Hmul32u <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(umagic32(c).m)])
-        x))
-    (Const64 <typ.UInt64> [umagic32(c).s-1]))
-
-// For 32-bit divides on 64-bit machines
-// We'll use a regular (non-hi) multiply for this case.
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && umagic32(c).m&1 == 0 =>
-  (Trunc64to32
-    (Rsh64Ux64 <typ.UInt64>
-      (Mul64 <typ.UInt64>
-        (Const64 <typ.UInt64> [int64(1<<31+umagic32(c).m/2)])
-        (ZeroExt32to64 x))
-      (Const64 <typ.UInt64> [32+umagic32(c).s-1])))
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && c&1 == 0 =>
-  (Trunc64to32
-    (Rsh64Ux64 <typ.UInt64>
-      (Mul64 <typ.UInt64>
-        (Const64 <typ.UInt64> [int64(1<<31+(umagic32(c).m+1)/2)])
-        (Rsh64Ux64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [1])))
-      (Const64 <typ.UInt64> [32+umagic32(c).s-2])))
-(Div32u x (Const32 [c])) && umagicOK32(c) && config.RegSize == 8 && config.useAvg =>
-  (Trunc64to32
-    (Rsh64Ux64 <typ.UInt64>
-      (Avg64u
-        (Lsh64x64 <typ.UInt64> (ZeroExt32to64 x) (Const64 <typ.UInt64> [32]))
-        (Mul64 <typ.UInt64>
-          (Const64 <typ.UInt32> [int64(umagic32(c).m)])
-          (ZeroExt32to64 x)))
-      (Const64 <typ.UInt64> [32+umagic32(c).s-1])))
-
-// For unsigned 64-bit divides on 32-bit machines,
-// if the constant fits in 16 bits (so that the last term
-// fits in 32 bits), convert to three 32-bit divides by a constant.
-//
-// If 1<<32 = Q * c + R
-// and    x = hi << 32 + lo
-//
-// Then x = (hi/c*c + hi%c) << 32 + lo
-//        = hi/c*c<<32 + hi%c<<32 + lo
-//        = hi/c*c<<32 + (hi%c)*(Q*c+R) + lo/c*c + lo%c
-//        = hi/c*c<<32 + (hi%c)*Q*c + lo/c*c + (hi%c*R+lo%c)
-// and x / c = (hi/c)<<32 + (hi%c)*Q + lo/c + (hi%c*R+lo%c)/c
-(Div64u x (Const64 [c])) && c > 0 && c <= 0xFFFF && umagicOK32(int32(c)) && config.RegSize == 4 && config.useHmul =>
-  (Add64
-    (Add64 <typ.UInt64>
-      (Add64 <typ.UInt64>
-        (Lsh64x64 <typ.UInt64>
-          (ZeroExt32to64
-            (Div32u <typ.UInt32>
-              (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
-              (Const32 <typ.UInt32> [int32(c)])))
-          (Const64 <typ.UInt64> [32]))
-        (ZeroExt32to64 (Div32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)]))))
-      (Mul64 <typ.UInt64>
-        (ZeroExt32to64 <typ.UInt64>
-          (Mod32u <typ.UInt32>
-            (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
-            (Const32 <typ.UInt32> [int32(c)])))
-        (Const64 <typ.UInt64> [int64((1<<32)/c)])))
-      (ZeroExt32to64
-        (Div32u <typ.UInt32>
-          (Add32 <typ.UInt32>
-            (Mod32u <typ.UInt32> (Trunc64to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(c)]))
-            (Mul32 <typ.UInt32>
-              (Mod32u <typ.UInt32>
-                (Trunc64to32 <typ.UInt32> (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [32])))
-                (Const32 <typ.UInt32> [int32(c)]))
-              (Const32 <typ.UInt32> [int32((1<<32)%c)])))
-          (Const32 <typ.UInt32> [int32(c)]))))
-
-// For 64-bit divides on 64-bit machines
-// (64-bit divides on 32-bit machines are lowered to a runtime call by the walk pass.)
-(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && umagic64(c).m&1 == 0 && config.useHmul =>
-  (Rsh64Ux64 <typ.UInt64>
-    (Hmul64u <typ.UInt64>
-      (Const64 <typ.UInt64> [int64(1<<63+umagic64(c).m/2)])
-      x)
-    (Const64 <typ.UInt64> [umagic64(c).s-1]))
-(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && c&1 == 0 && config.useHmul =>
-  (Rsh64Ux64 <typ.UInt64>
-    (Hmul64u <typ.UInt64>
-      (Const64 <typ.UInt64> [int64(1<<63+(umagic64(c).m+1)/2)])
-      (Rsh64Ux64 <typ.UInt64> x (Const64 <typ.UInt64> [1])))
-    (Const64 <typ.UInt64> [umagic64(c).s-2]))
-(Div64u x (Const64 [c])) && umagicOK64(c) && config.RegSize == 8 && config.useAvg && config.useHmul =>
-  (Rsh64Ux64 <typ.UInt64>
-    (Avg64u
-      x
-      (Hmul64u <typ.UInt64>
-        (Const64 <typ.UInt64> [int64(umagic64(c).m)])
-        x))
-    (Const64 <typ.UInt64> [umagic64(c).s-1]))
+// Simplification of divisions.
+// Only trivial, easily analyzed (by prove) rewrites here.
+// Strength reduction of div to mul is delayed to divmod.rules.

 // Signed divide by a negative constant.  Rewrite to divide by a positive constant.
 (Div8  <t> n (Const8  [c])) && c < 0 && c != -1<<7  => (Neg8  (Div8  <t> n (Const8  <t> [-c])))
@ -1214,107 +1039,41 @@
 // Dividing by the most-negative number.  Result is always 0 except
 // if the input is also the most-negative number.
 // We can detect that using the sign bit of x & -x.
+(Div64 x (Const64 [-1<<63])) && isNonNegative(x) => (Const64 [0])
 (Div8  <t> x (Const8  [-1<<7 ])) => (Rsh8Ux64  (And8  <t> x (Neg8  <t> x)) (Const64 <typ.UInt64> [7 ]))
 (Div16 <t> x (Const16 [-1<<15])) => (Rsh16Ux64 (And16 <t> x (Neg16 <t> x)) (Const64 <typ.UInt64> [15]))
 (Div32 <t> x (Const32 [-1<<31])) => (Rsh32Ux64 (And32 <t> x (Neg32 <t> x)) (Const64 <typ.UInt64> [31]))
 (Div64 <t> x (Const64 [-1<<63])) => (Rsh64Ux64 (And64 <t> x (Neg64 <t> x)) (Const64 <typ.UInt64> [63]))

-// Signed divide by power of 2.
-// n / c =       n >> log(c) if n >= 0
-//       = (n+c-1) >> log(c) if n < 0
-// We conditionally add c-1 by adding n>>63>>(64-log(c)) (first shift signed, second shift unsigned).
-(Div8  <t> n (Const8  [c])) && isPowerOfTwo(c) =>
-  (Rsh8x64
-    (Add8  <t> n (Rsh8Ux64  <t> (Rsh8x64  <t> n (Const64 <typ.UInt64> [ 7])) (Const64 <typ.UInt64> [int64( 8-log8(c))])))
-    (Const64 <typ.UInt64> [int64(log8(c))]))
-(Div16 <t> n (Const16 [c])) && isPowerOfTwo(c) =>
-  (Rsh16x64
-    (Add16 <t> n (Rsh16Ux64 <t> (Rsh16x64 <t> n (Const64 <typ.UInt64> [15])) (Const64 <typ.UInt64> [int64(16-log16(c))])))
-    (Const64 <typ.UInt64> [int64(log16(c))]))
-(Div32 <t> n (Const32 [c])) && isPowerOfTwo(c) =>
-  (Rsh32x64
-    (Add32 <t> n (Rsh32Ux64 <t> (Rsh32x64 <t> n (Const64 <typ.UInt64> [31])) (Const64 <typ.UInt64> [int64(32-log32(c))])))
-    (Const64 <typ.UInt64> [int64(log32(c))]))
-(Div64 <t> n (Const64 [c])) && isPowerOfTwo(c) =>
-  (Rsh64x64
-    (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [int64(64-log64(c))])))
-    (Const64 <typ.UInt64> [int64(log64(c))]))
+// Unsigned divide by power of 2.  Strength reduce to a shift.
+(Div8u  n (Const8  [c])) && isUnsignedPowerOfTwo(uint8(c)) => (Rsh8Ux64  n (Const64 <typ.UInt64> [log8u(uint8(c))]))
+(Div16u n (Const16 [c])) && isUnsignedPowerOfTwo(uint16(c)) => (Rsh16Ux64 n (Const64 <typ.UInt64> [log16u(uint16(c))]))
+(Div32u n (Const32 [c])) && isUnsignedPowerOfTwo(uint32(c)) => (Rsh32Ux64 n (Const64 <typ.UInt64> [log32u(uint32(c))]))
+(Div64u n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (Rsh64Ux64 n (Const64 <typ.UInt64> [log64u(uint64(c))]))

-// Signed divide, not a power of 2.  Strength reduce to a multiply.
-(Div8 <t> x (Const8 [c])) && smagicOK8(c) =>
-  (Sub8 <t>
-    (Rsh32x64 <t>
-      (Mul32 <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(smagic8(c).m)])
-        (SignExt8to32 x))
-      (Const64 <typ.UInt64> [8+smagic8(c).s]))
-    (Rsh32x64 <t>
-      (SignExt8to32 x)
-      (Const64 <typ.UInt64> [31])))
-(Div16 <t> x (Const16 [c])) && smagicOK16(c) =>
-  (Sub16 <t>
-    (Rsh32x64 <t>
-      (Mul32 <typ.UInt32>
-        (Const32 <typ.UInt32> [int32(smagic16(c).m)])
-        (SignExt16to32 x))
-      (Const64 <typ.UInt64> [16+smagic16(c).s]))
-    (Rsh32x64 <t>
-      (SignExt16to32 x)
-      (Const64 <typ.UInt64> [31])))
-(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 8 =>
-  (Sub32 <t>
-    (Rsh64x64 <t>
-      (Mul64 <typ.UInt64>
-        (Const64 <typ.UInt64> [int64(smagic32(c).m)])
-        (SignExt32to64 x))
-      (Const64 <typ.UInt64> [32+smagic32(c).s]))
-    (Rsh64x64 <t>
-      (SignExt32to64 x)
-      (Const64 <typ.UInt64> [63])))
-(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 == 0 && config.useHmul =>
-  (Sub32 <t>
-    (Rsh32x64 <t>
-      (Hmul32 <t>
-        (Const32 <typ.UInt32> [int32(smagic32(c).m/2)])
-        x)
-      (Const64 <typ.UInt64> [smagic32(c).s-1]))
-    (Rsh32x64 <t>
-      x
-      (Const64 <typ.UInt64> [31])))
-(Div32 <t> x (Const32 [c])) && smagicOK32(c) && config.RegSize == 4 && smagic32(c).m&1 != 0 && config.useHmul =>
-  (Sub32 <t>
-    (Rsh32x64 <t>
-      (Add32 <t>
-        (Hmul32 <t>
-          (Const32 <typ.UInt32> [int32(smagic32(c).m)])
-          x)
-        x)
-      (Const64 <typ.UInt64> [smagic32(c).s]))
-    (Rsh32x64 <t>
-      x
-      (Const64 <typ.UInt64> [31])))
-(Div64 <t> x (Const64 [c])) && smagicOK64(c) && smagic64(c).m&1 == 0 && config.useHmul =>
-  (Sub64 <t>
-    (Rsh64x64 <t>
-      (Hmul64 <t>
-        (Const64 <typ.UInt64> [int64(smagic64(c).m/2)])
-        x)
-      (Const64 <typ.UInt64> [smagic64(c).s-1]))
-    (Rsh64x64 <t>
-      x
-      (Const64 <typ.UInt64> [63])))
-(Div64 <t> x (Const64 [c])) && smagicOK64(c) && smagic64(c).m&1 != 0 && config.useHmul =>
-  (Sub64 <t>
-    (Rsh64x64 <t>
-      (Add64 <t>
-        (Hmul64 <t>
-          (Const64 <typ.UInt64> [int64(smagic64(c).m)])
-          x)
-        x)
-      (Const64 <typ.UInt64> [smagic64(c).s]))
-    (Rsh64x64 <t>
-      x
-      (Const64 <typ.UInt64> [63])))
+// Strength reduce multiplication by a power of two to a shift.
+// Excluded from early opt so that prove can recognize mod
+// by the x - (x/d)*d pattern.
+// (Runs during "middle opt" and "late opt".)
+(Mul8  <t> x (Const8  [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" =>
+  (Lsh8x64  <t> x (Const64 <typ.UInt64> [log8(c)]))
+(Mul16 <t> x (Const16 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" =>
+  (Lsh16x64 <t> x (Const64 <typ.UInt64> [log16(c)]))
+(Mul32 <t> x (Const32 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" =>
+  (Lsh32x64 <t> x (Const64 <typ.UInt64> [log32(c)]))
+(Mul64 <t> x (Const64 [c])) && isPowerOfTwo(c) && v.Block.Func.pass.name != "opt" =>
+  (Lsh64x64 <t> x (Const64 <typ.UInt64> [log64(c)]))
+(Mul8  <t> x (Const8  [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" =>
+  (Neg8  (Lsh8x64  <t> x (Const64 <typ.UInt64> [log8(-c)])))
+(Mul16 <t> x (Const16 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" =>
+  (Neg16 (Lsh16x64 <t> x (Const64 <typ.UInt64> [log16(-c)])))
+(Mul32 <t> x (Const32 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" =>
+  (Neg32 (Lsh32x64 <t> x (Const64 <typ.UInt64> [log32(-c)])))
+(Mul64 <t> x (Const64 [c])) && t.IsSigned() && isPowerOfTwo(-c) && v.Block.Func.pass.name != "opt" =>
+  (Neg64 (Lsh64x64 <t> x (Const64 <typ.UInt64> [log64(-c)])))
+
+// Strength reduction of mod to div.
+// Strength reduction of div to mul is delayed to genericlateopt.rules.

 // Unsigned mod by power of 2 constant.
 (Mod8u  <t> n (Const8  [c])) && isUnsignedPowerOfTwo(uint8(c)) => (And8  n (Const8  <t> [c-1]))
@ -1323,6 +1082,7 @@
 (Mod64u <t> n (Const64 [c])) && isUnsignedPowerOfTwo(uint64(c)) => (And64 n (Const64 <t> [c-1]))

 // Signed non-negative mod by power of 2 constant.
+// TODO: Replace ModN with ModNu in prove.
 (Mod8  <t> n (Const8  [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And8  n (Const8  <t> [c-1]))
 (Mod16 <t> n (Const16 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And16 n (Const16 <t> [c-1]))
 (Mod32 <t> n (Const32 [c])) && isNonNegative(n) && isPowerOfTwo(c) => (And32 n (Const32 <t> [c-1]))
@ -1355,7 +1115,9 @@
 (Mod64u <t> x (Const64 [c])) && x.Op != OpConst64 && c > 0 && umagicOK64(c)
  => (Sub64 x (Mul64 <t> (Div64u <t> x (Const64 <t> [c])) (Const64 <t> [c])))

-// For architectures without rotates on less than 32-bits, promote these checks to 32-bit.
+// Set up for mod->mul+rot optimization in genericlateopt.rules.
+// For architectures without rotates on less than 32-bits, promote to 32-bit.
+// TODO: Also != 0 case?
 (Eq8 (Mod8u x (Const8  [c])) (Const8 [0])) && x.Op != OpConst8 && udivisibleOK8(c) && !hasSmallRotate(config) =>
 	(Eq32 (Mod32u <typ.UInt32> (ZeroExt8to32 <typ.UInt32> x) (Const32 <typ.UInt32> [int32(uint8(c))])) (Const32 <typ.UInt32> [0]))
 (Eq16 (Mod16u x (Const16  [c])) (Const16 [0])) && x.Op != OpConst16 && udivisibleOK16(c) && !hasSmallRotate(config) =>
@ -1365,557 +1127,6 @@
 (Eq16 (Mod16 x (Const16  [c])) (Const16 [0])) && x.Op != OpConst16 && sdivisibleOK16(c) && !hasSmallRotate(config) =>
 	(Eq32 (Mod32 <typ.Int32> (SignExt16to32 <typ.Int32> x) (Const32 <typ.Int32> [int32(c)])) (Const32 <typ.Int32> [0]))

-// Divisibility checks x%c == 0 convert to multiply and rotate.
-// Note, x%c == 0 is rewritten as x == c*(x/c) during the opt pass
-// where (x/c) is performed using multiplication with magic constants.
-// To rewrite x%c == 0 requires pattern matching the rewritten expression
-// and checking that the division by the same constant wasn't already calculated.
-// This check is made by counting uses of the magic constant multiplication.
-// Note that if there were an intermediate opt pass, this rule could be applied
-// directly on the Div op and magic division rewrites could be delayed to late opt.
-
-// Unsigned divisibility checks convert to multiply and rotate.
-(Eq8 x (Mul8 (Const8 [c])
-  (Trunc32to8
-    (Rsh32Ux64
-      mul:(Mul32
-        (Const32 [m])
-        (ZeroExt8to32 x))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(1<<8+umagic8(c).m) && s == 8+umagic8(c).s
-  && x.Op != OpConst8 && udivisibleOK8(c)
- => (Leq8U
-			(RotateLeft8 <typ.UInt8>
-				(Mul8 <typ.UInt8>
-					(Const8 <typ.UInt8> [int8(udivisible8(c).m)])
-					x)
-				(Const8 <typ.UInt8> [int8(8-udivisible8(c).k)])
-				)
-			(Const8 <typ.UInt8> [int8(udivisible8(c).max)])
-		)
-
-(Eq16 x (Mul16 (Const16 [c])
-  (Trunc64to16
-    (Rsh64Ux64
-      mul:(Mul64
-        (Const64 [m])
-        (ZeroExt16to64 x))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(1<<16+umagic16(c).m) && s == 16+umagic16(c).s
-  && x.Op != OpConst16 && udivisibleOK16(c)
- => (Leq16U
-			(RotateLeft16 <typ.UInt16>
-				(Mul16 <typ.UInt16>
-					(Const16 <typ.UInt16> [int16(udivisible16(c).m)])
-					x)
-				(Const16 <typ.UInt16> [int16(16-udivisible16(c).k)])
-				)
-			(Const16 <typ.UInt16> [int16(udivisible16(c).max)])
-		)
-
-(Eq16 x (Mul16 (Const16 [c])
-  (Trunc32to16
-    (Rsh32Ux64
-      mul:(Mul32
-        (Const32 [m])
-        (ZeroExt16to32 x))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(1<<15+umagic16(c).m/2) && s == 16+umagic16(c).s-1
-  && x.Op != OpConst16 && udivisibleOK16(c)
- => (Leq16U
-			(RotateLeft16 <typ.UInt16>
-				(Mul16 <typ.UInt16>
-					(Const16 <typ.UInt16> [int16(udivisible16(c).m)])
-					x)
-				(Const16 <typ.UInt16> [int16(16-udivisible16(c).k)])
-				)
-			(Const16 <typ.UInt16> [int16(udivisible16(c).max)])
-		)
-
-(Eq16 x (Mul16 (Const16 [c])
-  (Trunc32to16
-    (Rsh32Ux64
-      mul:(Mul32
-        (Const32 [m])
-        (Rsh32Ux64 (ZeroExt16to32 x) (Const64 [1])))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(1<<15+(umagic16(c).m+1)/2) && s == 16+umagic16(c).s-2
-  && x.Op != OpConst16 && udivisibleOK16(c)
- => (Leq16U
-			(RotateLeft16 <typ.UInt16>
-				(Mul16 <typ.UInt16>
-					(Const16 <typ.UInt16> [int16(udivisible16(c).m)])
-					x)
-				(Const16 <typ.UInt16> [int16(16-udivisible16(c).k)])
-				)
-			(Const16 <typ.UInt16> [int16(udivisible16(c).max)])
-		)
-
-(Eq16 x (Mul16 (Const16 [c])
-  (Trunc32to16
-    (Rsh32Ux64
-      (Avg32u
-        (Lsh32x64 (ZeroExt16to32 x) (Const64 [16]))
-        mul:(Mul32
-          (Const32 [m])
-          (ZeroExt16to32 x)))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(umagic16(c).m) && s == 16+umagic16(c).s-1
-  && x.Op != OpConst16 && udivisibleOK16(c)
- => (Leq16U
-			(RotateLeft16 <typ.UInt16>
-				(Mul16 <typ.UInt16>
-					(Const16 <typ.UInt16> [int16(udivisible16(c).m)])
-					x)
-				(Const16 <typ.UInt16> [int16(16-udivisible16(c).k)])
-				)
-			(Const16 <typ.UInt16> [int16(udivisible16(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-	(Rsh32Ux64
-		mul:(Hmul32u
-			(Const32 [m])
-			x)
-		(Const64 [s]))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(1<<31+umagic32(c).m/2) && s == umagic32(c).s-1
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Rsh32Ux64
-    mul:(Hmul32u
-      (Const32 <typ.UInt32> [m])
-      (Rsh32Ux64 x (Const64 [1])))
-    (Const64 [s]))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(1<<31+(umagic32(c).m+1)/2) && s == umagic32(c).s-2
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Rsh32Ux64
-    (Avg32u
-      x
-      mul:(Hmul32u
-        (Const32 [m])
-        x))
-    (Const64 [s]))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(umagic32(c).m) && s == umagic32(c).s-1
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Trunc64to32
-    (Rsh64Ux64
-      mul:(Mul64
-        (Const64 [m])
-        (ZeroExt32to64 x))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(1<<31+umagic32(c).m/2) && s == 32+umagic32(c).s-1
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Trunc64to32
-    (Rsh64Ux64
-      mul:(Mul64
-        (Const64 [m])
-        (Rsh64Ux64 (ZeroExt32to64 x) (Const64 [1])))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(1<<31+(umagic32(c).m+1)/2) && s == 32+umagic32(c).s-2
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Trunc64to32
-    (Rsh64Ux64
-      (Avg64u
-        (Lsh64x64 (ZeroExt32to64 x) (Const64 [32]))
-        mul:(Mul64
-          (Const64 [m])
-          (ZeroExt32to64 x)))
-      (Const64 [s])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(umagic32(c).m) && s == 32+umagic32(c).s-1
-	&& x.Op != OpConst32 && udivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Mul32 <typ.UInt32>
-					(Const32 <typ.UInt32> [int32(udivisible32(c).m)])
-					x)
-				(Const32 <typ.UInt32> [int32(32-udivisible32(c).k)])
-				)
-			(Const32 <typ.UInt32> [int32(udivisible32(c).max)])
-		)
-
-(Eq64 x (Mul64 (Const64 [c])
-	(Rsh64Ux64
-		mul:(Hmul64u
-			(Const64 [m])
-			x)
-		(Const64 [s]))
-	)
-) && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(1<<63+umagic64(c).m/2) && s == umagic64(c).s-1
-  && x.Op != OpConst64 && udivisibleOK64(c)
- => (Leq64U
-			(RotateLeft64 <typ.UInt64>
-				(Mul64 <typ.UInt64>
-					(Const64 <typ.UInt64> [int64(udivisible64(c).m)])
-					x)
-				(Const64 <typ.UInt64> [64-udivisible64(c).k])
-				)
-			(Const64 <typ.UInt64> [int64(udivisible64(c).max)])
-		)
-(Eq64 x (Mul64 (Const64 [c])
-	(Rsh64Ux64
-		mul:(Hmul64u
-			(Const64 [m])
-			(Rsh64Ux64 x (Const64 [1])))
-		(Const64 [s]))
-	)
-) && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(1<<63+(umagic64(c).m+1)/2) && s == umagic64(c).s-2
-  && x.Op != OpConst64 && udivisibleOK64(c)
- => (Leq64U
-			(RotateLeft64 <typ.UInt64>
-				(Mul64 <typ.UInt64>
-					(Const64 <typ.UInt64> [int64(udivisible64(c).m)])
-					x)
-				(Const64 <typ.UInt64> [64-udivisible64(c).k])
-				)
-			(Const64 <typ.UInt64> [int64(udivisible64(c).max)])
-		)
-(Eq64 x (Mul64 (Const64 [c])
-	(Rsh64Ux64
-		(Avg64u
-			x
-			mul:(Hmul64u
-				(Const64 [m])
-				x))
-		(Const64 [s]))
-	)
-) && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(umagic64(c).m) && s == umagic64(c).s-1
-  && x.Op != OpConst64 && udivisibleOK64(c)
- => (Leq64U
-			(RotateLeft64 <typ.UInt64>
-				(Mul64 <typ.UInt64>
-					(Const64 <typ.UInt64> [int64(udivisible64(c).m)])
-					x)
-				(Const64 <typ.UInt64> [64-udivisible64(c).k])
-				)
-			(Const64 <typ.UInt64> [int64(udivisible64(c).max)])
-		)
-
-// Signed divisibility checks convert to multiply, add and rotate.
-(Eq8 x (Mul8 (Const8 [c])
-  (Sub8
-    (Rsh32x64
-      mul:(Mul32
-        (Const32 [m])
-        (SignExt8to32 x))
-      (Const64 [s]))
-    (Rsh32x64
-      (SignExt8to32 x)
-      (Const64 [31])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(smagic8(c).m) && s == 8+smagic8(c).s
-	&& x.Op != OpConst8 && sdivisibleOK8(c)
- => (Leq8U
-			(RotateLeft8 <typ.UInt8>
-				(Add8 <typ.UInt8>
-					(Mul8 <typ.UInt8>
-						(Const8 <typ.UInt8> [int8(sdivisible8(c).m)])
-						x)
-					(Const8 <typ.UInt8> [int8(sdivisible8(c).a)])
-				)
-				(Const8 <typ.UInt8> [int8(8-sdivisible8(c).k)])
-			)
-			(Const8 <typ.UInt8> [int8(sdivisible8(c).max)])
-		)
-
-(Eq16 x (Mul16 (Const16 [c])
-  (Sub16
-    (Rsh32x64
-      mul:(Mul32
-        (Const32 [m])
-        (SignExt16to32 x))
-      (Const64 [s]))
-    (Rsh32x64
-      (SignExt16to32 x)
-      (Const64 [31])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(smagic16(c).m) && s == 16+smagic16(c).s
-	&& x.Op != OpConst16 && sdivisibleOK16(c)
- => (Leq16U
-			(RotateLeft16 <typ.UInt16>
-				(Add16 <typ.UInt16>
-					(Mul16 <typ.UInt16>
-						(Const16 <typ.UInt16> [int16(sdivisible16(c).m)])
-						x)
-					(Const16 <typ.UInt16> [int16(sdivisible16(c).a)])
-				)
-				(Const16 <typ.UInt16> [int16(16-sdivisible16(c).k)])
-			)
-			(Const16 <typ.UInt16> [int16(sdivisible16(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Sub32
-    (Rsh64x64
-      mul:(Mul64
-        (Const64 [m])
-        (SignExt32to64 x))
-      (Const64 [s]))
-    (Rsh64x64
-      (SignExt32to64 x)
-      (Const64 [63])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(smagic32(c).m) && s == 32+smagic32(c).s
-	&& x.Op != OpConst32 && sdivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Add32 <typ.UInt32>
-					(Mul32 <typ.UInt32>
-						(Const32 <typ.UInt32> [int32(sdivisible32(c).m)])
-						x)
-					(Const32 <typ.UInt32> [int32(sdivisible32(c).a)])
-				)
-				(Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)])
-			)
-			(Const32 <typ.UInt32> [int32(sdivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Sub32
-    (Rsh32x64
-      mul:(Hmul32
-        (Const32 [m])
-        x)
-      (Const64 [s]))
-    (Rsh32x64
-      x
-      (Const64 [31])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(smagic32(c).m/2) && s == smagic32(c).s-1
-	&& x.Op != OpConst32 && sdivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Add32 <typ.UInt32>
-					(Mul32 <typ.UInt32>
-						(Const32 <typ.UInt32> [int32(sdivisible32(c).m)])
-						x)
-					(Const32 <typ.UInt32> [int32(sdivisible32(c).a)])
-				)
-				(Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)])
-			)
-			(Const32 <typ.UInt32> [int32(sdivisible32(c).max)])
-		)
-
-(Eq32 x (Mul32 (Const32 [c])
-  (Sub32
-    (Rsh32x64
-      (Add32
-        mul:(Hmul32
-          (Const32 [m])
-          x)
-        x)
-      (Const64 [s]))
-    (Rsh32x64
-      x
-      (Const64 [31])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int32(smagic32(c).m) && s == smagic32(c).s
-	&& x.Op != OpConst32 && sdivisibleOK32(c)
- => (Leq32U
-			(RotateLeft32 <typ.UInt32>
-				(Add32 <typ.UInt32>
-					(Mul32 <typ.UInt32>
-						(Const32 <typ.UInt32> [int32(sdivisible32(c).m)])
-						x)
-					(Const32 <typ.UInt32> [int32(sdivisible32(c).a)])
-				)
-				(Const32 <typ.UInt32> [int32(32-sdivisible32(c).k)])
-			)
-			(Const32 <typ.UInt32> [int32(sdivisible32(c).max)])
-		)
-
-(Eq64 x (Mul64 (Const64 [c])
-  (Sub64
-    (Rsh64x64
-      mul:(Hmul64
-        (Const64 [m])
-        x)
-      (Const64 [s]))
-    (Rsh64x64
-      x
-      (Const64 [63])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(smagic64(c).m/2) && s == smagic64(c).s-1
-	&& x.Op != OpConst64 && sdivisibleOK64(c)
- => (Leq64U
-			(RotateLeft64 <typ.UInt64>
-				(Add64 <typ.UInt64>
-					(Mul64 <typ.UInt64>
-						(Const64 <typ.UInt64> [int64(sdivisible64(c).m)])
-						x)
-					(Const64 <typ.UInt64> [int64(sdivisible64(c).a)])
-				)
-				(Const64 <typ.UInt64> [64-sdivisible64(c).k])
-			)
-			(Const64 <typ.UInt64> [int64(sdivisible64(c).max)])
-		)
-
-(Eq64 x (Mul64 (Const64 [c])
-  (Sub64
-    (Rsh64x64
-      (Add64
-        mul:(Hmul64
-          (Const64 [m])
-          x)
-        x)
-      (Const64 [s]))
-    (Rsh64x64
-      x
-      (Const64 [63])))
-	)
-)
-  && v.Block.Func.pass.name != "opt" && mul.Uses == 1
-  && m == int64(smagic64(c).m) && s == smagic64(c).s
-	&& x.Op != OpConst64 && sdivisibleOK64(c)
- => (Leq64U
-			(RotateLeft64 <typ.UInt64>
-				(Add64 <typ.UInt64>
-					(Mul64 <typ.UInt64>
-						(Const64 <typ.UInt64> [int64(sdivisible64(c).m)])
-						x)
-					(Const64 <typ.UInt64> [int64(sdivisible64(c).a)])
-				)
-				(Const64 <typ.UInt64> [64-sdivisible64(c).k])
-			)
-			(Const64 <typ.UInt64> [int64(sdivisible64(c).max)])
-		)
-
-// Divisibility check for signed integers for power of two constant are simple mask.
-// However, we must match against the rewritten n%c == 0 -> n - c*(n/c) == 0 -> n == c*(n/c)
-// where n/c contains fixup code to handle signed n.
-((Eq8|Neq8) n (Lsh8x64
-  (Rsh8x64
-    (Add8  <t> n (Rsh8Ux64  <t> (Rsh8x64  <t> n (Const64 <typ.UInt64> [ 7])) (Const64 <typ.UInt64> [kbar])))
-    (Const64 <typ.UInt64> [k]))
-	(Const64 <typ.UInt64> [k]))
-) && k > 0 && k < 7 && kbar == 8 - k
-  => ((Eq8|Neq8) (And8 <t> n (Const8 <t> [1<<uint(k)-1])) (Const8 <t> [0]))
-
-((Eq16|Neq16) n (Lsh16x64
-  (Rsh16x64
-    (Add16 <t> n (Rsh16Ux64 <t> (Rsh16x64 <t> n (Const64 <typ.UInt64> [15])) (Const64 <typ.UInt64> [kbar])))
-    (Const64 <typ.UInt64> [k]))
-	(Const64 <typ.UInt64> [k]))
-) && k > 0 && k < 15 && kbar == 16 - k
-  => ((Eq16|Neq16) (And16 <t> n (Const16 <t> [1<<uint(k)-1])) (Const16 <t> [0]))
-
-((Eq32|Neq32) n (Lsh32x64
-  (Rsh32x64
-    (Add32 <t> n (Rsh32Ux64 <t> (Rsh32x64 <t> n (Const64 <typ.UInt64> [31])) (Const64 <typ.UInt64> [kbar])))
-    (Const64 <typ.UInt64> [k]))
-	(Const64 <typ.UInt64> [k]))
-) && k > 0 && k < 31 && kbar == 32 - k
-  => ((Eq32|Neq32) (And32 <t> n (Const32 <t> [1<<uint(k)-1])) (Const32 <t> [0]))
-
-((Eq64|Neq64) n (Lsh64x64
-  (Rsh64x64
-    (Add64 <t> n (Rsh64Ux64 <t> (Rsh64x64 <t> n (Const64 <typ.UInt64> [63])) (Const64 <typ.UInt64> [kbar])))
-    (Const64 <typ.UInt64> [k]))
-	(Const64 <typ.UInt64> [k]))
-) && k > 0 && k < 63 && kbar == 64 - k
-  => ((Eq64|Neq64) (And64 <t> n (Const64 <t> [1<<uint(k)-1])) (Const64 <t> [0]))
-
 (Eq(8|16|32|64)  s:(Sub(8|16|32|64) x y) (Const(8|16|32|64) [0])) && s.Uses == 1 => (Eq(8|16|32|64)  x y)
 (Neq(8|16|32|64) s:(Sub(8|16|32|64) x y) (Const(8|16|32|64) [0])) && s.Uses == 1 => (Neq(8|16|32|64) x y)

@ -1925,6 +1136,20 @@
 (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && oneBit(y)
  => (Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))

+// Mark newly generated bounded shifts as bounded, for opt passes after prove.
+(Lsh64x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Lsh64x(8|16|32|64)  [true] x con)
+(Rsh64x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Rsh64x(8|16|32|64)  [true] x con)
+(Rsh64Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 64 => (Rsh64Ux(8|16|32|64) [true] x con)
+(Lsh32x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Lsh32x(8|16|32|64)  [true] x con)
+(Rsh32x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Rsh32x(8|16|32|64)  [true] x con)
+(Rsh32Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 32 => (Rsh32Ux(8|16|32|64) [true] x con)
+(Lsh16x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Lsh16x(8|16|32|64)  [true] x con)
+(Rsh16x(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Rsh16x(8|16|32|64)  [true] x con)
+(Rsh16Ux(8|16|32|64) [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 16 => (Rsh16Ux(8|16|32|64) [true] x con)
+(Lsh8x(8|16|32|64)   [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8  => (Lsh8x(8|16|32|64)   [true] x con)
+(Rsh8x(8|16|32|64)   [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8  => (Rsh8x(8|16|32|64)   [true] x con)
+(Rsh8Ux(8|16|32|64)  [false] x con:(Const(8|16|32|64) [c])) && 0 < c && c < 8  => (Rsh8Ux(8|16|32|64)  [true] x con)
+
 // Reassociate expressions involving
 // constants such that constants come first,
 // exposing obvious constant-folding opportunities.
--- a/src/cmd/compile/internal/ssa/compile.go
+++ b/src/cmd/compile/internal/ssa/compile.go
@ -461,7 +461,7 @@ var passes = [...]pass{
 	{name: "short circuit", fn: shortcircuit},
 	{name: "decompose user", fn: decomposeUser, required: true},
 	{name: "pre-opt deadcode", fn: deadcode},
-	{name: "opt", fn: opt, required: true},               // NB: some generic rules know the name of the opt pass. TODO: split required rules and optimizing rules
+	{name: "opt", fn: opt, required: true},
 	{name: "zero arg cse", fn: zcse, required: true},     // required to merge OpSB values
 	{name: "opt deadcode", fn: deadcode, required: true}, // remove any blocks orphaned during opt
 	{name: "generic cse", fn: cse},
@ -469,12 +469,15 @@ var passes = [...]pass{
 	{name: "gcse deadcode", fn: deadcode, required: true}, // clean out after cse and phiopt
 	{name: "nilcheckelim", fn: nilcheckelim},
 	{name: "prove", fn: prove},
+	{name: "divisible", fn: divisible, required: true},
+	{name: "divmod", fn: divmod, required: true},
+	{name: "middle opt", fn: opt, required: true},
 	{name: "early fuse", fn: fuseEarly},
 	{name: "expand calls", fn: expandCalls, required: true},
 	{name: "decompose builtin", fn: postExpandCallsDecompose, required: true},
 	{name: "softfloat", fn: softfloat, required: true},
 	{name: "branchelim", fn: branchelim},
-	{name: "late opt", fn: opt, required: true}, // TODO: split required rules and optimizing rules
+	{name: "late opt", fn: opt, required: true},
 	{name: "dead auto elim", fn: elimDeadAutosGeneric},
 	{name: "sccp", fn: sccp},
 	{name: "generic deadcode", fn: deadcode, required: true}, // remove dead stores, which otherwise mess up store chain
@ -529,6 +532,12 @@ var passOrder = [...]constraint{
 	{"generic cse", "prove"},
 	// deadcode after prove to eliminate all new dead blocks.
 	{"prove", "generic deadcode"},
+	// divisible after prove to let prove analyze div and mod
+	{"prove", "divisible"},
+	// divmod after divisible to avoid rewriting subexpressions of ones divisible will handle
+	{"divisible", "divmod"},
+	// divmod before decompose builtin to handle 64-bit on 32-bit systems
+	{"divmod", "decompose builtin"},
 	// common-subexpression before dead-store elim, so that we recognize
 	// when two address expressions are the same.
 	{"generic cse", "dse"},
@ -538,7 +547,7 @@ var passOrder = [...]constraint{
 	{"nilcheckelim", "generic deadcode"},
 	// nilcheckelim generates sequences of plain basic blocks
 	{"nilcheckelim", "late fuse"},
-	// nilcheckelim relies on opt to rewrite user nil checks
+	// nilcheckelim relies on the first opt to rewrite user nil checks
 	{"opt", "nilcheckelim"},
 	// tighten will be most effective when as many values have been removed as possible
 	{"generic deadcode", "tighten"},
--- a/src/cmd/compile/internal/ssa/decompose.go
+++ b/src/cmd/compile/internal/ssa/decompose.go
@ -13,14 +13,14 @@ import (
 // decompose converts phi ops on compound builtin types into phi
 // ops on simple types, then invokes rewrite rules to decompose
 // other ops on those types.
-func decomposeBuiltIn(f *Func) {
+func decomposeBuiltin(f *Func) {
 	// Decompose phis
 	for _, b := range f.Blocks {
 		for _, v := range b.Values {
 			if v.Op != OpPhi {
 				continue
 			}
-			decomposeBuiltInPhi(v)
+			decomposeBuiltinPhi(v)
 		}
 	}

@ -121,7 +121,7 @@ func maybeAppend2(f *Func, ss []*LocalSlot, s1, s2 *LocalSlot) []*LocalSlot {
 	return maybeAppend(f, maybeAppend(f, ss, s1), s2)
 }

-func decomposeBuiltInPhi(v *Value) {
+func decomposeBuiltinPhi(v *Value) {
 	switch {
 	case v.Type.IsInteger() && v.Type.Size() > v.Block.Func.Config.RegSize:
 		decomposeInt64Phi(v)
--- a/src/cmd/compile/internal/ssa/expand_calls.go
+++ b/src/cmd/compile/internal/ssa/expand_calls.go
@ -15,7 +15,7 @@ import (

 func postExpandCallsDecompose(f *Func) {
 	decomposeUser(f)    // redo user decompose to cleanup after expand calls
-	decomposeBuiltIn(f) // handles both regular decomposition and cleanup.
+	decomposeBuiltin(f) // handles both regular decomposition and cleanup.
 }

 func expandCalls(f *Func) {
--- a/src/cmd/compile/internal/ssa/opt.go
+++ b/src/cmd/compile/internal/ssa/opt.go
@ -8,3 +8,11 @@ package ssa
 func opt(f *Func) {
 	applyRewrite(f, rewriteBlockgeneric, rewriteValuegeneric, removeDeadValues)
 }
+
+func divisible(f *Func) {
+	applyRewrite(f, rewriteBlockdivisible, rewriteValuedivisible, removeDeadValues)
+}
+
+func divmod(f *Func) {
+	applyRewrite(f, rewriteBlockdivmod, rewriteValuedivmod, removeDeadValues)
+}
--- a/src/cmd/compile/internal/ssa/prove.go
+++ b/src/cmd/compile/internal/ssa/prove.go
@ -1946,7 +1946,7 @@ func (ft *factsTable) flowLimit(v *Value) bool {
 		a := ft.limits[v.Args[0].ID]
 		b := ft.limits[v.Args[1].ID]
 		sub := ft.newLimit(v, a.sub(b, uint(v.Type.Size())*8))
-		mod := ft.detectSignedMod(v)
+		mod := ft.detectMod(v)
 		inferred := ft.detectSliceLenRelation(v)
 		return sub || mod || inferred
 	case OpNeg64, OpNeg32, OpNeg16, OpNeg8:
@ -1984,6 +1984,10 @@ func (ft *factsTable) flowLimit(v *Value) bool {
 			lim = lim.unsignedMax(a.umax / b.umin)
 		}
 		return ft.newLimit(v, lim)
+	case OpMod64, OpMod32, OpMod16, OpMod8:
+		return ft.modLimit(true, v, v.Args[0], v.Args[1])
+	case OpMod64u, OpMod32u, OpMod16u, OpMod8u:
+		return ft.modLimit(false, v, v.Args[0], v.Args[1])

 	case OpPhi:
 		// Compute the union of all the input phis.
@ -2008,32 +2012,6 @@ func (ft *factsTable) flowLimit(v *Value) bool {
 	return false
 }

-// See if we can get any facts because v is the result of signed mod by a constant.
-// The mod operation has already been rewritten, so we have to try and reconstruct it.
-//
-//	x % d
-//
-// is rewritten as
-//
-//	x - (x / d) * d
-//
-// furthermore, the divide itself gets rewritten. If d is a power of 2 (d == 1<<k), we do
-//
-//	(x / d) * d = ((x + adj) >> k) << k
-//	            = (x + adj) & (-1<<k)
-//
-// with adj being an adjustment in case x is negative (see below).
-// if d is not a power of 2, we do
-//
-//	x / d = ... TODO ...
-func (ft *factsTable) detectSignedMod(v *Value) bool {
-	if ft.detectSignedModByPowerOfTwo(v) {
-		return true
-	}
-	// TODO: non-powers-of-2
-	return false
-}
-
 // detectSliceLenRelation matches the pattern where
 //  1. v := slicelen - index, OR v := slicecap - index
 //     AND
@ -2095,102 +2073,64 @@ func (ft *factsTable) detectSliceLenRelation(v *Value) (inferred bool) {
 	return inferred
 }

-func (ft *factsTable) detectSignedModByPowerOfTwo(v *Value) bool {
-	// We're looking for:
-	//
-	//   x % d ==
-	//   x - (x / d) * d
-	//
-	// which for d a power of 2, d == 1<<k, is done as
-	//
-	//   x - ((x + (x>>(w-1))>>>(w-k)) & (-1<<k))
-	//
-	// w = bit width of x.
-	// (>> = signed shift, >>> = unsigned shift).
-	// See ./_gen/generic.rules, search for "Signed divide by power of 2".
-
-	var w int64
-	var addOp, andOp, constOp, sshiftOp, ushiftOp Op
+// x%d has been rewritten to x - (x/d)*d.
+func (ft *factsTable) detectMod(v *Value) bool {
+	var opDiv, opDivU, opMul, opConst Op
 	switch v.Op {
 	case OpSub64:
-		w = 64
-		addOp = OpAdd64
-		andOp = OpAnd64
-		constOp = OpConst64
-		sshiftOp = OpRsh64x64
-		ushiftOp = OpRsh64Ux64
+		opDiv = OpDiv64
+		opDivU = OpDiv64u
+		opMul = OpMul64
+		opConst = OpConst64
 	case OpSub32:
-		w = 32
-		addOp = OpAdd32
-		andOp = OpAnd32
-		constOp = OpConst32
-		sshiftOp = OpRsh32x64
-		ushiftOp = OpRsh32Ux64
+		opDiv = OpDiv32
+		opDivU = OpDiv32u
+		opMul = OpMul32
+		opConst = OpConst32
 	case OpSub16:
-		w = 16
-		addOp = OpAdd16
-		andOp = OpAnd16
-		constOp = OpConst16
-		sshiftOp = OpRsh16x64
-		ushiftOp = OpRsh16Ux64
+		opDiv = OpDiv16
+		opDivU = OpDiv16u
+		opMul = OpMul16
+		opConst = OpConst16
 	case OpSub8:
-		w = 8
-		addOp = OpAdd8
-		andOp = OpAnd8
-		constOp = OpConst8
-		sshiftOp = OpRsh8x64
-		ushiftOp = OpRsh8Ux64
-	default:
-		return false
+		opDiv = OpDiv8
+		opDivU = OpDiv8u
+		opMul = OpMul8
+		opConst = OpConst8
 	}

-	x := v.Args[0]
-	and := v.Args[1]
-	if and.Op != andOp {
+	mul := v.Args[1]
+	if mul.Op != opMul {
 		return false
 	}
-	var add, mask *Value
-	if and.Args[0].Op == addOp && and.Args[1].Op == constOp {
-		add = and.Args[0]
-		mask = and.Args[1]
-	} else if and.Args[1].Op == addOp && and.Args[0].Op == constOp {
-		add = and.Args[1]
-		mask = and.Args[0]
-	} else {
-		return false
-	}
-	var ushift *Value
-	if add.Args[0] == x {
-		ushift = add.Args[1]
-	} else if add.Args[1] == x {
-		ushift = add.Args[0]
-	} else {
-		return false
-	}
-	if ushift.Op != ushiftOp {
-		return false
-	}
-	if ushift.Args[1].Op != OpConst64 {
-		return false
-	}
-	k := w - ushift.Args[1].AuxInt // Now we know k!
-	d := int64(1) << k             // divisor
-	sshift := ushift.Args[0]
-	if sshift.Op != sshiftOp {
-		return false
-	}
-	if sshift.Args[0] != x {
-		return false
-	}
-	if sshift.Args[1].Op != OpConst64 || sshift.Args[1].AuxInt != w-1 {
-		return false
-	}
-	if mask.AuxInt != -d {
+	div, con := mul.Args[0], mul.Args[1]
+	if div.Op == opConst {
+		div, con = con, div
+	}
+	if con.Op != opConst || (div.Op != opDiv && div.Op != opDivU) || div.Args[0] != v.Args[0] || div.Args[1].Op != opConst || div.Args[1].AuxInt != con.AuxInt {
 		return false
 	}
+	return ft.modLimit(div.Op == opDiv, v, v.Args[0], con)
+}

-	// All looks ok. x % d is at most +/- d-1.
-	return ft.signedMinMax(v, -d+1, d-1)
+// modLimit sets v with facts derived from v = p % q.
+func (ft *factsTable) modLimit(signed bool, v, p, q *Value) bool {
+	a := ft.limits[p.ID]
+	b := ft.limits[q.ID]
+	if signed {
+		if a.min < 0 && b.min > 0 {
+			return ft.signedMinMax(v, -(b.max - 1), b.max-1)
+		}
+		if !(a.nonnegative() && b.nonnegative()) {
+			// TODO: we could handle signed limits but I didn't bother.
+			return false
+		}
+		if a.min >= 0 && b.min > 0 {
+			ft.setNonNegative(v)
+		}
+	}
+	// Underflow in the arithmetic below is ok, it gives to MaxUint64 which does nothing to the limit.
+	return ft.unsignedMax(v, min(a.umax, b.umax-1))
 }

 // getBranch returns the range restrictions added by p
@ -2468,6 +2408,10 @@ func addLocalFacts(ft *factsTable, b *Block) {
 			// TODO: investigate how to always add facts without much slowdown, see issue #57959
 			//ft.update(b, v, v.Args[0], unsigned, gt|eq)
 			//ft.update(b, v, v.Args[1], unsigned, gt|eq)
+		case OpDiv64, OpDiv32, OpDiv16, OpDiv8:
+			if ft.isNonNegative(v.Args[0]) && ft.isNonNegative(v.Args[1]) {
+				ft.update(b, v, v.Args[0], unsigned, lt|eq)
+			}
 		case OpDiv64u, OpDiv32u, OpDiv16u, OpDiv8u,
 			OpRsh8Ux64, OpRsh8Ux32, OpRsh8Ux16, OpRsh8Ux8,
 			OpRsh16Ux64, OpRsh16Ux32, OpRsh16Ux16, OpRsh16Ux8,
@ -2510,10 +2454,7 @@ func addLocalFacts(ft *factsTable, b *Block) {
 			}
 			ft.update(b, v, v.Args[0], unsigned, lt|eq)
 		case OpMod64, OpMod32, OpMod16, OpMod8:
-			a := ft.limits[v.Args[0].ID]
-			b := ft.limits[v.Args[1].ID]
-			if !(a.nonnegative() && b.nonnegative()) {
-				// TODO: we could handle signed limits but I didn't bother.
+			if !ft.isNonNegative(v.Args[0]) || !ft.isNonNegative(v.Args[1]) {
 				break
 			}
 			fallthrough
@ -2631,14 +2572,30 @@ func addLocalFactsPhi(ft *factsTable, v *Value) {
 	ft.update(b, v, y, dom, rel)
 }

-var ctzNonZeroOp = map[Op]Op{OpCtz8: OpCtz8NonZero, OpCtz16: OpCtz16NonZero, OpCtz32: OpCtz32NonZero, OpCtz64: OpCtz64NonZero}
+var ctzNonZeroOp = map[Op]Op{
+	OpCtz8:  OpCtz8NonZero,
+	OpCtz16: OpCtz16NonZero,
+	OpCtz32: OpCtz32NonZero,
+	OpCtz64: OpCtz64NonZero,
+}
 var mostNegativeDividend = map[Op]int64{
 	OpDiv16: -1 << 15,
 	OpMod16: -1 << 15,
 	OpDiv32: -1 << 31,
 	OpMod32: -1 << 31,
 	OpDiv64: -1 << 63,
-	OpMod64: -1 << 63}
+	OpMod64: -1 << 63,
+}
+var unsignedOp = map[Op]Op{
+	OpDiv8:  OpDiv8u,
+	OpDiv16: OpDiv16u,
+	OpDiv32: OpDiv32u,
+	OpDiv64: OpDiv64u,
+	OpMod8:  OpMod8u,
+	OpMod16: OpMod16u,
+	OpMod32: OpMod32u,
+	OpMod64: OpMod64u,
+}

 var bytesizeToConst = [...]Op{
 	8 / 8:  OpConst8,
@ -2746,34 +2703,51 @@ func simplifyBlock(sdom SparseTree, ft *factsTable, b *Block) {
 					b.Func.Warnl(v.Pos, "Proved %v bounded", v.Op)
 				}
 			}
-		case OpDiv16, OpDiv32, OpDiv64, OpMod16, OpMod32, OpMod64:
-			// On amd64 and 386 fix-up code can be avoided if we know
-			//  the divisor is not -1 or the dividend > MinIntNN.
-			// Don't modify AuxInt on other architectures,
-			// as that can interfere with CSE.
-			// TODO: add other architectures?
-			if b.Func.Config.arch != "386" && b.Func.Config.arch != "amd64" {
+		case OpDiv8, OpDiv16, OpDiv32, OpDiv64, OpMod8, OpMod16, OpMod32, OpMod64:
+			p, q := ft.limits[v.Args[0].ID], ft.limits[v.Args[1].ID] // p/q
+			if p.nonnegative() && q.nonnegative() {
+				if b.Func.pass.debug > 0 {
+					b.Func.Warnl(v.Pos, "Proved %v is unsigned", v.Op)
+				}
+				v.Op = unsignedOp[v.Op]
+				v.AuxInt = 0
 				break
 			}
-			divr := v.Args[1]
-			divrLim := ft.limits[divr.ID]
-			divd := v.Args[0]
-			divdLim := ft.limits[divd.ID]
-			if divrLim.max < -1 || divrLim.min > -1 || divdLim.min > mostNegativeDividend[v.Op] {
+			// Fixup code can be avoided on x86 if we know
+			//  the divisor is not -1 or the dividend > MinIntNN.
+			if v.Op != OpDiv8 && v.Op != OpMod8 && (q.max < -1 || q.min > -1 || p.min > mostNegativeDividend[v.Op]) {
 				// See DivisionNeedsFixUp in rewrite.go.
-				// v.AuxInt = 1 means we have proved both that the divisor is not -1
-				// and that the dividend is not the most negative integer,
+				// v.AuxInt = 1 means we have proved that the divisor is not -1
+				// or that the dividend is not the most negative integer,
 				// so we do not need to add fix-up code.
-				v.AuxInt = 1
 				if b.Func.pass.debug > 0 {
 					b.Func.Warnl(v.Pos, "Proved %v does not need fix-up", v.Op)
 				}
+				// Only usable on amd64 and 386, and only for ≥ 16-bit ops.
+				// Don't modify AuxInt on other architectures, as that can interfere with CSE.
+				// (Print the debug info above always, so that test/prove.go can be
+				// checked on non-x86 systems.)
+				// TODO: add other architectures?
+				if b.Func.Config.arch == "386" || b.Func.Config.arch == "amd64" {
+					v.AuxInt = 1
+				}
 			}
 		case OpMul64, OpMul32, OpMul16, OpMul8:
+			if vl := ft.limits[v.ID]; vl.min == vl.max || vl.umin == vl.umax {
+				// v is going to be constant folded away; don't "optimize" it.
+				break
+			}
 			x := v.Args[0]
 			xl := ft.limits[x.ID]
 			y := v.Args[1]
 			yl := ft.limits[y.ID]
+			if xl.umin == xl.umax && isPowerOfTwo(int64(xl.umin)) ||
+				xl.min == xl.max && isPowerOfTwo(xl.min) ||
+				yl.umin == yl.umax && isPowerOfTwo(int64(yl.umin)) ||
+				yl.min == yl.max && isPowerOfTwo(yl.min) {
+				// 0,1 * a power of two is better done as a shift
+				break
+			}
 			switch xOne, yOne := xl.umax <= 1, yl.umax <= 1; {
 			case xOne && yOne:
 				v.Op = bytesizeToAnd[v.Type.Size()]
@ -2807,6 +2781,7 @@ func simplifyBlock(sdom SparseTree, ft *factsTable, b *Block) {
 				}
 			}
 		}
+
 		// Fold provable constant results.
 		// Helps in cases where we reuse a value after branching on its equality.
 		for i, arg := range v.Args {
--- a/src/cmd/compile/internal/ssa/rewrite.go
+++ b/src/cmd/compile/internal/ssa/rewrite.go
@ -57,11 +57,15 @@ func applyRewrite(f *Func, rb blockRewriter, rv valueRewriter, deadcode deadValu
 	var iters int
 	var states map[string]bool
 	for {
+		if debug > 1 {
+			fmt.Printf("%s: iter %d\n", f.pass.name, iters)
+		}
 		change := false
 		deadChange := false
 		for _, b := range f.Blocks {
 			var b0 *Block
 			if debug > 1 {
+				fmt.Printf("%s: start block\n", f.pass.name)
 				b0 = new(Block)
 				*b0 = *b
 				b0.Succs = append([]Edge{}, b.Succs...) // make a new copy, not aliasing
@ -79,6 +83,9 @@ func applyRewrite(f *Func, rb blockRewriter, rv valueRewriter, deadcode deadValu
 				}
 			}
 			for j, v := range b.Values {
+				if debug > 1 {
+					fmt.Printf("%s: consider %v\n", f.pass.name, v.LongString())
+				}
 				var v0 *Value
 				if debug > 1 {
 					v0 = new(Value)
@ -1260,10 +1267,8 @@ func logRule(s string) {
 		}
 		ruleFile = w
 	}
-	_, err := fmt.Fprintln(ruleFile, s)
-	if err != nil {
-		panic(err)
-	}
+	// Ignore errors in case of multiple processes fighting over the file.
+	fmt.Fprintln(ruleFile, s)
 }

 var ruleFile io.Writer
--- a/src/cmd/compile/internal/ssa/rewritedec64.go
+++ b/src/cmd/compile/internal/ssa/rewritedec64.go
@ -1310,6 +1310,8 @@ func rewriteValuedec64_OpRotateLeft32(v *Value) bool {
 func rewriteValuedec64_OpRotateLeft64(v *Value) bool {
 	v_1 := v.Args[1]
 	v_0 := v.Args[0]
+	b := v.Block
+	typ := &b.Func.Config.Types
 	// match: (RotateLeft64 x (Int64Make hi lo))
 	// result: (RotateLeft64 x lo)
 	for {
@ -1322,6 +1324,458 @@ func rewriteValuedec64_OpRotateLeft64(v *Value) bool {
 		v.AddArg2(x, lo)
 		return true
 	}
+	// match: (RotateLeft64 <t> x (Const64 [c]))
+	// cond: c&63 == 0
+	// result: x
+	for {
+		x := v_0
+		if v_1.Op != OpConst64 {
+			break
+		}
+		c := auxIntToInt64(v_1.AuxInt)
+		if !(c&63 == 0) {
+			break
+		}
+		v.copyOf(x)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const32 [c]))
+	// cond: c&63 == 0
+	// result: x
+	for {
+		x := v_0
+		if v_1.Op != OpConst32 {
+			break
+		}
+		c := auxIntToInt32(v_1.AuxInt)
+		if !(c&63 == 0) {
+			break
+		}
+		v.copyOf(x)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const16 [c]))
+	// cond: c&63 == 0
+	// result: x
+	for {
+		x := v_0
+		if v_1.Op != OpConst16 {
+			break
+		}
+		c := auxIntToInt16(v_1.AuxInt)
+		if !(c&63 == 0) {
+			break
+		}
+		v.copyOf(x)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const8 [c]))
+	// cond: c&63 == 0
+	// result: x
+	for {
+		x := v_0
+		if v_1.Op != OpConst8 {
+			break
+		}
+		c := auxIntToInt8(v_1.AuxInt)
+		if !(c&63 == 0) {
+			break
+		}
+		v.copyOf(x)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const64 [c]))
+	// cond: c&63 == 32
+	// result: (Int64Make <t> (Int64Lo x) (Int64Hi x))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst64 {
+			break
+		}
+		c := auxIntToInt64(v_1.AuxInt)
+		if !(c&63 == 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v0.AddArg(x)
+		v1 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v1.AddArg(x)
+		v.AddArg2(v0, v1)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const32 [c]))
+	// cond: c&63 == 32
+	// result: (Int64Make <t> (Int64Lo x) (Int64Hi x))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst32 {
+			break
+		}
+		c := auxIntToInt32(v_1.AuxInt)
+		if !(c&63 == 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v0.AddArg(x)
+		v1 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v1.AddArg(x)
+		v.AddArg2(v0, v1)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const16 [c]))
+	// cond: c&63 == 32
+	// result: (Int64Make <t> (Int64Lo x) (Int64Hi x))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst16 {
+			break
+		}
+		c := auxIntToInt16(v_1.AuxInt)
+		if !(c&63 == 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v0.AddArg(x)
+		v1 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v1.AddArg(x)
+		v.AddArg2(v0, v1)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const8 [c]))
+	// cond: c&63 == 32
+	// result: (Int64Make <t> (Int64Lo x) (Int64Hi x))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst8 {
+			break
+		}
+		c := auxIntToInt8(v_1.AuxInt)
+		if !(c&63 == 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v0.AddArg(x)
+		v1 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v1.AddArg(x)
+		v.AddArg2(v0, v1)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const64 [c]))
+	// cond: 0 < c&63 && c&63 < 32
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst64 {
+			break
+		}
+		c := auxIntToInt64(v_1.AuxInt)
+		if !(0 < c&63 && c&63 < 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const32 [c]))
+	// cond: 0 < c&63 && c&63 < 32
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst32 {
+			break
+		}
+		c := auxIntToInt32(v_1.AuxInt)
+		if !(0 < c&63 && c&63 < 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const16 [c]))
+	// cond: 0 < c&63 && c&63 < 32
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst16 {
+			break
+		}
+		c := auxIntToInt16(v_1.AuxInt)
+		if !(0 < c&63 && c&63 < 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const8 [c]))
+	// cond: 0 < c&63 && c&63 < 32
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst8 {
+			break
+		}
+		c := auxIntToInt8(v_1.AuxInt)
+		if !(0 < c&63 && c&63 < 32) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const64 [c]))
+	// cond: 32 < c&63 && c&63 < 64
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst64 {
+			break
+		}
+		c := auxIntToInt64(v_1.AuxInt)
+		if !(32 < c&63 && c&63 < 64) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const32 [c]))
+	// cond: 32 < c&63 && c&63 < 64
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst32 {
+			break
+		}
+		c := auxIntToInt32(v_1.AuxInt)
+		if !(32 < c&63 && c&63 < 64) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const16 [c]))
+	// cond: 32 < c&63 && c&63 < 64
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst16 {
+			break
+		}
+		c := auxIntToInt16(v_1.AuxInt)
+		if !(32 < c&63 && c&63 < 64) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
+	// match: (RotateLeft64 <t> x (Const8 [c]))
+	// cond: 32 < c&63 && c&63 < 64
+	// result: (Int64Make <t> (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(32-c&31)]))) (Or32 <typ.UInt32> (Lsh32x32 <typ.UInt32> (Int64Hi x) (Const32 <typ.UInt32> [int32(c&31)])) (Rsh32Ux32 <typ.UInt32> (Int64Lo x) (Const32 <typ.UInt32> [int32(32-c&31)]))))
+	for {
+		t := v.Type
+		x := v_0
+		if v_1.Op != OpConst8 {
+			break
+		}
+		c := auxIntToInt8(v_1.AuxInt)
+		if !(32 < c&63 && c&63 < 64) {
+			break
+		}
+		v.reset(OpInt64Make)
+		v.Type = t
+		v0 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v1 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v2 := b.NewValue0(v.Pos, OpInt64Lo, typ.UInt32)
+		v2.AddArg(x)
+		v3 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v3.AuxInt = int32ToAuxInt(int32(c & 31))
+		v1.AddArg2(v2, v3)
+		v4 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v5 := b.NewValue0(v.Pos, OpInt64Hi, typ.UInt32)
+		v5.AddArg(x)
+		v6 := b.NewValue0(v.Pos, OpConst32, typ.UInt32)
+		v6.AuxInt = int32ToAuxInt(int32(32 - c&31))
+		v4.AddArg2(v5, v6)
+		v0.AddArg2(v1, v4)
+		v7 := b.NewValue0(v.Pos, OpOr32, typ.UInt32)
+		v8 := b.NewValue0(v.Pos, OpLsh32x32, typ.UInt32)
+		v8.AddArg2(v5, v3)
+		v9 := b.NewValue0(v.Pos, OpRsh32Ux32, typ.UInt32)
+		v9.AddArg2(v2, v6)
+		v7.AddArg2(v8, v9)
+		v.AddArg2(v0, v7)
+		return true
+	}
 	return false
 }
 func rewriteValuedec64_OpRotateLeft8(v *Value) bool {
--- a/src/cmd/compile/internal/ssa/rewritedivisible.go
+++ b/src/cmd/compile/internal/ssa/rewritedivisible.go
--- a/src/cmd/compile/internal/ssa/rewritedivmod.go
+++ b/src/cmd/compile/internal/ssa/rewritedivmod.go
--- a/src/cmd/compile/internal/ssa/rewritegeneric.go
+++ b/src/cmd/compile/internal/ssa/rewritegeneric.go
--- a/src/cmd/compile/internal/ssa/softfloat.go
+++ b/src/cmd/compile/internal/ssa/softfloat.go
@ -73,7 +73,7 @@ func softfloat(f *Func) {

 	if newInt64 && f.Config.RegSize == 4 {
 		// On 32bit arch, decompose Uint64 introduced in the switch above.
-		decomposeBuiltIn(f)
+		decomposeBuiltin(f)
 		applyRewrite(f, rewriteBlockdec64, rewriteValuedec64, removeDeadValues)
 	}

--- a/src/cmd/compile/internal/test/testdata/arith_test.go
+++ b/src/cmd/compile/internal/test/testdata/arith_test.go
@ -1390,11 +1390,17 @@ func div19_int64(n int64) bool {
 	return n%19 == 0
 }

+var (
+	// These have to be global to avoid getting constant-folded in the function body:
+	// as locals, prove can see that they are actually constants.
+	sixU, nineteenU uint64 = 6, 19
+	sixS, nineteenS int64 = 6, 19
+)
+
 // testDivisibility confirms that rewrite rules x%c ==0 for c constant are correct.
 func testDivisibility(t *testing.T) {
 	// unsigned tests
 	// test an even and an odd divisor
-	var sixU, nineteenU uint64 = 6, 19
 	// test all inputs for uint8, uint16
 	for i := uint64(0); i <= math.MaxUint16; i++ {
 		if i <= math.MaxUint8 {
@ -1402,7 +1408,7 @@ func testDivisibility(t *testing.T) {
 				t.Errorf("div6_uint8(%d) = %v want %v", i, got, want)
 			}
 			if want, got := uint8(i)%uint8(nineteenU) == 0, div19_uint8(uint8(i)); got != want {
-				t.Errorf("div6_uint19(%d) = %v want %v", i, got, want)
+				t.Errorf("div19_uint8(%d) = %v want %v", i, got, want)
 			}
 		}
 		if want, got := uint16(i)%uint16(sixU) == 0, div6_uint16(uint16(i)); got != want {
@ -1450,7 +1456,6 @@ func testDivisibility(t *testing.T) {

 	// signed tests
 	// test an even and an odd divisor
-	var sixS, nineteenS int64 = 6, 19
 	// test all inputs for int8, int16
 	for i := int64(math.MinInt16); i <= math.MaxInt16; i++ {
 		if math.MinInt8 <= i && i <= math.MaxInt8 {
@ -1458,7 +1463,7 @@ func testDivisibility(t *testing.T) {
 				t.Errorf("div6_int8(%d) = %v want %v", i, got, want)
 			}
 			if want, got := int8(i)%int8(nineteenS) == 0, div19_int8(int8(i)); got != want {
-				t.Errorf("div6_int19(%d) = %v want %v", i, got, want)
+				t.Errorf("div19_int8(%d) = %v want %v", i, got, want)
 			}
 		}
 		if want, got := int16(i)%int16(sixS) == 0, div6_int16(int16(i)); got != want {
--- a/test/checkbce.go
+++ b/test/checkbce.go
@ -26,7 +26,7 @@ func f1(a [256]int, i int) {
 	var j int
 	useInt(a[i]) // ERROR "Found IsInBounds$"
 	j = i % 256
-	useInt(a[j]) // ERROR "Found IsInBounds$"
+	useInt(a[j])
 	j = i & 255
 	useInt(a[j])
 	j = i & 17
--- a/test/codegen/arithmetic.go
+++ b/test/codegen/arithmetic.go
@ -10,9 +10,7 @@ package codegen
 // simplifications and optimizations on integer types.
 // For codegen tests on float types, see floats.go.

-// ----------------- //
-//    Addition       //
-// ----------------- //
+// Addition

 func AddLargeConst(a uint64, out []uint64) {
 	// ppc64x/power10:"ADD [$]4294967296,"
@ -56,9 +54,7 @@ func AddLargeConst2(a int, out []int) {
 	out[0] = a + 0x10000
 }

-// ----------------- //
-//    Subtraction    //
-// ----------------- //
+// Subtraction

 var ef int

@ -90,58 +86,58 @@ func SubMem(arr []int, b, c, d int) int {

 func SubFromConst(a int) int {
 	// ppc64x: `SUBC R[0-9]+,\s[$]40,\sR`
-	// riscv64: "ADDI \\$-40" "NEG"
+	// riscv64: "ADDI [$]-40" "NEG"
 	b := 40 - a
 	return b
 }

 func SubFromConstNeg(a int) int {
-	// arm64: "ADD \\$40"
-	// loong64: "ADDV[U] \\$40"
-	// mips: "ADD[U] \\$40"
-	// mips64: "ADDV[U] \\$40"
+	// arm64: "ADD [$]40"
+	// loong64: "ADDV[U] [$]40"
+	// mips: "ADD[U] [$]40"
+	// mips64: "ADDV[U] [$]40"
 	// ppc64x: `ADD [$]40,\sR[0-9]+,\sR`
-	// riscv64: "ADDI \\$40" -"NEG"
+	// riscv64: "ADDI [$]40" -"NEG"
 	c := 40 - (-a)
 	return c
 }

 func SubSubFromConst(a int) int {
-	// arm64: "ADD \\$20"
-	// loong64: "ADDV[U] \\$20"
-	// mips: "ADD[U] \\$20"
-	// mips64: "ADDV[U] \\$20"
+	// arm64: "ADD [$]20"
+	// loong64: "ADDV[U] [$]20"
+	// mips: "ADD[U] [$]20"
+	// mips64: "ADDV[U] [$]20"
 	// ppc64x: `ADD [$]20,\sR[0-9]+,\sR`
-	// riscv64: "ADDI \\$20" -"NEG"
+	// riscv64: "ADDI [$]20" -"NEG"
 	c := 40 - (20 - a)
 	return c
 }

 func AddSubFromConst(a int) int {
 	// ppc64x: `SUBC R[0-9]+,\s[$]60,\sR`
-	// riscv64: "ADDI \\$-60" "NEG"
+	// riscv64: "ADDI [$]-60" "NEG"
 	c := 40 + (20 - a)
 	return c
 }

 func NegSubFromConst(a int) int {
-	// arm64: "SUB \\$20"
-	// loong64: "ADDV[U] \\$-20"
-	// mips: "ADD[U] \\$-20"
-	// mips64: "ADDV[U] \\$-20"
+	// arm64: "SUB [$]20"
+	// loong64: "ADDV[U] [$]-20"
+	// mips: "ADD[U] [$]-20"
+	// mips64: "ADDV[U] [$]-20"
 	// ppc64x: `ADD [$]-20,\sR[0-9]+,\sR`
-	// riscv64: "ADDI \\$-20"
+	// riscv64: "ADDI [$]-20"
 	c := -(20 - a)
 	return c
 }

 func NegAddFromConstNeg(a int) int {
-	// arm64: "SUB \\$40" "NEG"
-	// loong64: "ADDV[U] \\$-40" "SUBV"
-	// mips: "ADD[U] \\$-40" "SUB"
-	// mips64: "ADDV[U] \\$-40" "SUBV"
+	// arm64: "SUB [$]40" "NEG"
+	// loong64: "ADDV[U] [$]-40" "SUBV"
+	// mips: "ADD[U] [$]-40" "SUB"
+	// mips64: "ADDV[U] [$]-40" "SUBV"
 	// ppc64x: `SUBC R[0-9]+,\s[$]40,\sR`
-	// riscv64: "ADDI \\$-40" "NEG"
+	// riscv64: "ADDI [$]-40" "NEG"
 	c := -(-40 + a)
 	return c
 }
@ -361,16 +357,16 @@ func Pow2Divs(n1 uint, n2 int) (uint, int) {

 // Check that constant divisions get turned into MULs
 func ConstDivs(n1 uint, n2 int) (uint, int) {
-	// amd64:"MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
-	// 386:"MOVL [$]-252645135" "MULL" -"DIVL"
-	// arm64:`MOVD`,`UMULH`,-`DIV`
-	// arm:`MOVW`,`MUL`,-`.*udiv`
+	// amd64: "MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
+	// 386: "MOVL [$]-252645135" "MULL" -"DIVL"
+	// arm64: `MOVD`,`UMULH`,-`DIV`
+	// arm: `MOVW`,`MUL`,-`.*udiv`
 	a := n1 / 17 // unsigned

-	// amd64:"MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
-	// 386:"IMULL" -"IDIVL"
-	// arm64:`SMULH`,-`DIV`
-	// arm:`MOVW`,`MUL`,-`.*udiv`
+	// amd64: "MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
+	// 386: "IMULL" "SARL [$]4," "SARL [$]31," "SUBL" -".*DIV"
+	// arm64: `SMULH` -`DIV`
+	// arm: `MOVW` `MUL` -`.*udiv`
 	b := n2 / 17 // signed

 	return a, b
@ -421,16 +417,16 @@ func Pow2DivisibleSigned(n1, n2 int) (bool, bool) {

 // Check that constant modulo divs get turned into MULs
 func ConstMods(n1 uint, n2 int) (uint, int) {
-	// amd64:"MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
-	// 386:"MOVL [$]-252645135" "MULL" -"DIVL"
-	// arm64:`MOVD`,`UMULH`,-`DIV`
-	// arm:`MOVW`,`MUL`,-`.*udiv`
+	// amd64: "MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
+	// 386: "MOVL [$]-252645135" "MULL" -".*DIVL"
+	// arm64: `MOVD` `UMULH` -`DIV`
+	// arm: `MOVW` `MUL` -`.*udiv`
 	a := n1 % 17 // unsigned

-	// amd64:"MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
-	// 386: "IMULL" -"IDIVL"
-	// arm64:`SMULH`,-`DIV`
-	// arm:`MOVW`,`MUL`,-`.*udiv`
+	// amd64: "MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
+	// 386: "IMULL" "SARL [$]4," "SARL [$]31," "SUBL" "SHLL [$]4," "SUBL" -".*DIV"
+	// arm64: `SMULH` -`DIV`
+	// arm: `MOVW` `MUL` -`.*udiv`
 	b := n2 % 17 // signed

 	return a, b
@ -675,12 +671,13 @@ func addSpecial(a, b, c uint32) (uint32, uint32, uint32) {
 }

 // Divide -> shift rules usually require fixup for negative inputs.
-// If the input is non-negative, make sure the fixup is eliminated.
+// If the input is non-negative, make sure the unsigned form is generated.
 func divInt(v int64) int64 {
 	if v < 0 {
-		return 0
+		// amd64:`SARQ.*63,`, `SHRQ.*56,`, `SARQ.*8,`
+		return v / 256
 	}
-	// amd64:-`.*SARQ.*63,`, -".*SHRQ", ".*SARQ.*[$]9,"
+	// amd64:-`.*SARQ`, `SHRQ.*9,`
 	return v / 512
 }

@ -721,9 +718,7 @@ func constantFold3(i, j int) int {
 	return r
 }

-// ----------------- //
-//  Integer Min/Max  //
-// ----------------- //
+// Integer Min/Max

 func Int64Min(a, b int64) int64 {
 	// amd64: "CMPQ" "CMOVQLT"
--- a/test/codegen/divmod.go
+++ b/test/codegen/divmod.go
--- a/test/prove.go
+++ b/test/prove.go
@ -1,6 +1,6 @@
 // errorcheck -0 -d=ssa/prove/debug=1

-//go:build amd64
+//go:build amd64 || arm64

 // Copyright 2016 The Go Authors. All rights reserved.
 // Use of this source code is governed by a BSD-style
@ -1018,21 +1018,21 @@ func divShiftClean(n int) int {
 	if n < 0 {
 		return n
 	}
-	return n / int(8) // ERROR "Proved Rsh64x64 shifts to zero"
+	return n / int(8) // ERROR "Proved Div64 is unsigned$"
 }

 func divShiftClean64(n int64) int64 {
 	if n < 0 {
 		return n
 	}
-	return n / int64(16) // ERROR "Proved Rsh64x64 shifts to zero"
+	return n / int64(16)  // ERROR "Proved Div64 is unsigned$"
 }

 func divShiftClean32(n int32) int32 {
 	if n < 0 {
 		return n
 	}
-	return n / int32(16) // ERROR "Proved Rsh32x64 shifts to zero"
+	return n / int32(16)  // ERROR "Proved Div32 is unsigned$"
 }

 // Bounds check elimination
@ -1112,7 +1112,7 @@ func modu2(x, y uint) int {
 }

 func issue57077(s []int) (left, right []int) {
-	middle := len(s) / 2
+	middle := len(s) / 2 // ERROR "Proved Div64 is unsigned$"
 	left = s[:middle]  // ERROR "Proved IsSliceInBounds$"
 	right = s[middle:] // ERROR "Proved IsSliceInBounds$"
 	return
@ -1501,7 +1501,7 @@ func mod64sPositiveWithSmallerDividendMax(a, b int64, ensureBothBranchesCouldHap
 	a = min(a, 0xff)
 	b = min(b, 0xfff)

-	z := a % b // ERROR "Proved Mod64 does not need fix-up$"
+	z := a % b // ERROR "Proved Mod64 is unsigned$"

 	if ensureBothBranchesCouldHappen {
 		if z > 0xff { // ERROR "Disproved Less64$"
@ -1521,7 +1521,7 @@ func mod64sPositiveWithSmallerDivisorMax(a, b int64, ensureBothBranchesCouldHapp
 	a = min(a, 0xfff)
 	b = min(b, 0xff)

-	z := a % b // ERROR "Proved Mod64 does not need fix-up$"
+	z := a % b // ERROR "Proved Mod64 is unsigned$"

 	if ensureBothBranchesCouldHappen {
 		if z > 0xff-1 { // ERROR "Disproved Less64$"
@ -1541,7 +1541,7 @@ func mod64sPositiveWithIdenticalMax(a, b int64, ensureBothBranchesCouldHappen bo
 	a = min(a, 0xfff)
 	b = min(b, 0xfff)

-	z := a % b // ERROR "Proved Mod64 does not need fix-up$"
+	z := a % b // ERROR "Proved Mod64 is unsigned$"

 	if ensureBothBranchesCouldHappen {
 		if z > 0xfff-1 { // ERROR "Disproved Less64$"
@ -1586,7 +1586,7 @@ func div64s(a, b int64, ensureAllBranchesCouldHappen func() bool) int64 {
 	b = min(b, 0xff)
 	b = max(b, 0xf)

-	z := a / b // ERROR "(Proved Div64 does not need fix-up|Proved Neq64)$"
+	z := a / b // ERROR "Proved Div64 is unsigned|Proved Neq64"

 	if ensureAllBranchesCouldHappen() && z > 0xffff/0xf { // ERROR "Disproved Less64$"
 		return 42
@ -2507,6 +2507,7 @@ func mulIntoAnd(a, b uint) uint {
 	}
 	return a * b // ERROR "Rewrote Mul v[0-9]+ into And$"
 }
+
 func mulIntoCondSelect(a, b uint) uint {
 	if a > 1 {
 		return 0
@ -2514,6 +2515,75 @@ func mulIntoCondSelect(a, b uint) uint {
 	return a * b // ERROR "Rewrote Mul v[0-9]+ into CondSelect"
 }

+func div7pos(x int32) bool {
+	if x > 0 {
+		return x%7 == 0 // ERROR "Proved Div32 is unsigned"
+	}
+	return false
+}
+
+func div2pos(x []int) int {
+	return len(x) / 2 // ERROR "Proved Div64 is unsigned"
+}
+
+func div3pos(x []int) int {
+	return len(x) / 3 // ERROR "Proved Div64 is unsigned"
+}
+
+
+var len200 [200]int
+
+func modbound1(u uint64) int {
+	s := 0
+	for u > 0 {
+		var d uint64
+		u, d = u/100, u%100
+		s += len200[d*2+1] // ERROR "Proved IsInBounds"
+	}
+	return s
+}
+
+func modbound2(p *[10]int, x uint) int {
+	return p[x%9+1] // ERROR "Proved IsInBounds"
+}
+
+func shiftbound(x int) int {
+	return 1 << (x % 11) // ERROR "Proved Lsh(32x32|64x64) bounded" "Proved Div64 does not need fix-up"
+}
+
+func shiftbound2(x int) int {
+	return 1 << (x % 8) // ERROR "Proved Lsh(32x32|64x64) bounded" "Proved Div64 does not need fix-up"
+}
+
+func rangebound1(x []int) int {
+	s := 0
+	for i := range 1000 { // ERROR "Induction variable"
+		if i < len(x) {
+			s += x[i] // ERROR "Proved IsInBounds"
+		}
+	}
+	return s
+}
+
+func rangebound2(x []int) int {
+	s := 0
+	if len(x) > 0 {
+		for i := range 1000 { // ERROR "Induction variable"
+			s += x[i%len(x)] // ERROR "Proved Mod64 is unsigned" "Proved Neq64" "Proved IsInBounds"
+		}
+	}
+	return s
+}
+
+func swapbound(v []int) {
+	for i := 0; i < len(v)/2; i++ { // ERROR "Proved Div64 is unsigned|Induction variable"
+		v[i], // ERROR "Proved IsInBounds"
+		v[len(v)-1-i] = // ERROR "Proved IsInBounds"
+		v[len(v)-1-i],
+		v[i] // ERROR "Proved IsInBounds"
+	}
+}
+
 //go:noinline
 func useInt(a int) {
 }
--- a/test/prove_constant_folding.go
+++ b/test/prove_constant_folding.go
@ -1,6 +1,6 @@
 // errorcheck -0 -d=ssa/prove/debug=2

-//go:build amd64
+//go:build amd64 || arm64

 // Copyright 2022 The Go Authors. All rights reserved.
 // Use of this source code is governed by a BSD-style
@ -17,7 +17,7 @@ func f0i(x int) int {
 		return x + 5 // ERROR "Proved.+is constant 0$" "Proved.+is constant 5$" "x\+d >=? w"
 	}

-	return x / 2
+	return x + 1
 }

 func f0u(x uint) uint {
@ -29,5 +29,5 @@ func f0u(x uint) uint {
 		return x + 5 // ERROR "Proved.+is constant 0$" "Proved.+is constant 5$" "x\+d >=? w"
 	}

-	return x / 2
+	return x + 1
 }
--- a/test/prove_invert_loop_with_unused_iterators.go
+++ b/test/prove_invert_loop_with_unused_iterators.go
@ -1,6 +1,6 @@
 // errorcheck -0 -d=ssa/prove/debug=1

-//go:build amd64
+//go:build amd64 || arm64

 package main