cmd/compile: combine x*n + y*n into (x+y)*n

There are a few cases where this can be useful. Apart from the obvious
(and silly)

  100*n + 200*n

where we generate one IMUL instead of two, consider:

  15*n + 31*n

Currently, the compiler strength-reduces both imuls, generating:

    0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	MOVQ	AX, CX
	0x0008 00008 	SHLQ	$4, AX
	0x000c 00012 	SUBQ	CX, AX
	0x000f 00015 	MOVQ	CX, DX
	0x0012 00018 	SHLQ	$5, CX
	0x0016 00022 	SUBQ	DX, CX
	0x0019 00025 	ADDQ	CX, AX
	0x001c 00028 	MOVQ	AX, "".~r1+16(SP)
	0x0021 00033 	RET

But combining the imuls is both faster and shorter:

	0x0000 00000	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$46, AX
	0x0009 00009	MOVQ	AX, "".~r1+16(SP)
	0x000e 00014 	RET

even without strength-reduction.

Moreover, consider:

  5*n + 7*(n+1) + 11*(n+2)

We already have a rule that rewrites 7(n+1) into 7n+7, so the
generated code (without imuls merging) looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	LEAQ	(AX)(AX*4), CX
	0x0009 00009 	MOVQ	AX, DX
	0x000c 00012 	NEGQ	AX
	0x000f 00015 	LEAQ	(AX)(DX*8), AX
	0x0013 00019 	ADDQ	CX, AX
	0x0016 00022 	LEAQ	(DX)(CX*2), CX
	0x001a 00026 	LEAQ	29(AX)(CX*1), AX
	0x001f 00031 	MOVQ	AX, "".~r1+16(SP)

But with imuls merging, the 5n, 7n and 11n factors get merged, and the
generated code looks like this:

	0x0000 00000 	MOVQ	"".n+8(SP), AX
	0x0005 00005 	IMULQ	$23, AX
	0x0009 00009 	ADDQ	$29, AX
	0x000d 00013 	MOVQ	AX, "".~r1+16(SP)
	0x0012 00018 	RET

Which is both faster and shorter; that's also the exact same code that
clang and the intel c compiler generate for the above expression.

Change-Id: Ib4d5503f05d2f2efe31a1be14e2fe6cac33730a9
Reviewed-on: https://go-review.googlesource.com/55143
Reviewed-by: Keith Randall <khr@golang.org>
This commit is contained in:
Alberto Donizetti 2017-08-14 11:44:09 +02:00
parent e70fae8a64
commit a0453a180f
4 changed files with 1149 additions and 44 deletions

View file

@ -741,6 +741,29 @@ var linuxAMD64Tests = []*asmTest{
}`,
[]string{"\tPOPCNTQ\t", "support_popcnt"},
},
// multiplication merging tests
{
`
func mul1(n int) int {
return 15*n + 31*n
}`,
[]string{"\tIMULQ\t[$]46"}, // 46*n
},
{
`
func mul2(n int) int {
return 5*n + 7*(n+1) + 11*(n+2)
}`,
[]string{"\tIMULQ\t[$]23", "\tADDQ\t[$]29"}, // 23*n + 29
},
{
`
func mul3(a, n int) int {
return a*n + 19*n
}`,
[]string{"\tADDQ\t[$]19", "\tIMULQ"}, // (a+19)*n
},
// see issue 19595.
// We want to merge load+op in f58, but not in f59.
{
@ -928,6 +951,21 @@ var linux386Tests = []*asmTest{
`,
[]string{"\tMOVL\t\\(.*\\)\\(.*\\*1\\),"},
},
// multiplication merging tests
{
`
func mul1(n int) int {
return 9*n + 14*n
}`,
[]string{"\tIMULL\t[$]23"}, // 23*n
},
{
`
func mul2(a, n int) int {
return 19*a + a*n
}`,
[]string{"\tADDL\t[$]19", "\tIMULL"}, // (n+19)*a
},
}
var linuxS390XTests = []*asmTest{