cmd/compile: combine x*n + y*n into (x+y)*n

There are a few cases where this can be useful. Apart from the obvious (and silly) 100*n + 200*n where we generate one IMUL instead of two, consider: 15*n + 31*n Currently, the compiler strength-reduces both imuls, generating: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 MOVQ AX, CX 0x0008 00008 SHLQ $4, AX 0x000c 00012 SUBQ CX, AX 0x000f 00015 MOVQ CX, DX 0x0012 00018 SHLQ $5, CX 0x0016 00022 SUBQ DX, CX 0x0019 00025 ADDQ CX, AX 0x001c 00028 MOVQ AX, "".~r1+16(SP) 0x0021 00033 RET But combining the imuls is both faster and shorter: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $46, AX 0x0009 00009 MOVQ AX, "".~r1+16(SP) 0x000e 00014 RET even without strength-reduction. Moreover, consider: 5*n + 7*(n+1) + 11*(n+2) We already have a rule that rewrites 7(n+1) into 7n+7, so the generated code (without imuls merging) looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 LEAQ (AX)(AX*4), CX 0x0009 00009 MOVQ AX, DX 0x000c 00012 NEGQ AX 0x000f 00015 LEAQ (AX)(DX*8), AX 0x0013 00019 ADDQ CX, AX 0x0016 00022 LEAQ (DX)(CX*2), CX 0x001a 00026 LEAQ 29(AX)(CX*1), AX 0x001f 00031 MOVQ AX, "".~r1+16(SP) But with imuls merging, the 5n, 7n and 11n factors get merged, and the generated code looks like this: 0x0000 00000 MOVQ "".n+8(SP), AX 0x0005 00005 IMULQ $23, AX 0x0009 00009 ADDQ $29, AX 0x000d 00013 MOVQ AX, "".~r1+16(SP) 0x0012 00018 RET Which is both faster and shorter; that's also the exact same code that clang and the intel c compiler generate for the above expression. Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9 Reviewed-on: https://go-review.googlesource.com/55143 Reviewed-by: Keith Randall <khr@golang.org>
2025-12-08 06:10:04 +00:00 · 2017-08-14 11:44:09 +02:00 · 2017-08-14 11:44:09 +02:00 · a0453a180f
commit a0453a180f
parent e70fae8a64
4 changed files with 1149 additions and 44 deletions
--- a/src/cmd/compile/internal/gc/asm_test.go
+++ b/src/cmd/compile/internal/gc/asm_test.go
@ -741,6 +741,29 @@ var linuxAMD64Tests = []*asmTest{
 		}`,
 		[]string{"\tPOPCNTQ\t", "support_popcnt"},
 	},
+	// multiplication merging tests
+	{
+		`
+		func mul1(n int) int {
+			return 15*n + 31*n
+		}`,
+		[]string{"\tIMULQ\t[$]46"}, // 46*n
+	},
+	{
+		`
+		func mul2(n int) int {
+			return 5*n + 7*(n+1) + 11*(n+2)
+		}`,
+		[]string{"\tIMULQ\t[$]23", "\tADDQ\t[$]29"}, // 23*n + 29
+	},
+	{
+		`
+		func mul3(a, n int) int {
+			return a*n + 19*n
+		}`,
+		[]string{"\tADDQ\t[$]19", "\tIMULQ"}, // (a+19)*n
+	},
+
 	// see issue 19595.
 	// We want to merge load+op in f58, but not in f59.
 	{
@ -928,6 +951,21 @@ var linux386Tests = []*asmTest{
 		`,
 		[]string{"\tMOVL\t\\(.*\\)\\(.*\\*1\\),"},
 	},
+	// multiplication merging tests
+	{
+		`
+		func mul1(n int) int {
+			return 9*n + 14*n
+		}`,
+		[]string{"\tIMULL\t[$]23"}, // 23*n
+	},
+	{
+		`
+		func mul2(a, n int) int {
+			return 19*a + a*n
+		}`,
+		[]string{"\tADDL\t[$]19", "\tIMULL"}, // (n+19)*a
+	},
 }

 var linuxS390XTests = []*asmTest{