Commit graph

109 commits

Author SHA1 Message Date
Russ Cox
1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.

Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:

	(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
	(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
	(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)

For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:

	(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
	(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
	(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)

The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.

The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.

This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.

pkg: cmd/compile/internal/test

benchmark \ host                      local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                    vs base      vs base  vs base    vs base        vs base
DivconstI64                               ~            ~        ~    -49.66%        -21.02%
ModconstI64                               ~            ~        ~    -13.45%        +14.52%
DivisiblePow2constI64                     ~            ~        ~     +0.97%         -1.32%
DivisibleconstI64                         ~            ~        ~    -20.01%        -48.28%
DivisibleWDivconstI64                     ~            ~   -1.76%    -38.59%        -42.74%
DivconstU64/3                             ~            ~        ~    -13.82%         -4.09%
DivconstU64/5                             ~            ~        ~    -14.10%         -3.54%
DivconstU64/37                       -2.07%       -4.45%        ~    -19.60%         -9.55%
DivconstU64/1234567                       ~            ~        ~    -61.55%        -56.93%
ModconstU64                               ~            ~        ~     -6.25%              ~
DivisibleconstU64                         ~            ~        ~     -2.78%         -7.82%
DivisibleWDivconstU64                     ~            ~        ~     +4.23%         +2.56%

pkg: math/bits

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Add                       ~            ~          ~              ~
Add32                +1.59%            ~          ~              ~
Add64                     ~            ~          ~              ~
Add64multiple             ~            ~          ~              ~
Sub                       ~            ~          ~              ~
Sub32                     ~            ~          ~              ~
Sub64                     ~            ~     -9.20%              ~
Sub64multiple             ~            ~          ~              ~
Mul                       ~            ~          ~              ~
Mul32                     ~            ~          ~              ~
Mul64                     ~            ~    -41.58%        -53.21%
Div                       ~            ~          ~              ~
Div32                     ~            ~          ~              ~
Div64                     ~            ~          ~              ~

pkg: strconv

benchmark \ host                       s7  linux-amd64  linux-386  s7:GOARCH=386
                                  vs base      vs base    vs base        vs base
ParseInt/Pos/7bit                       ~            ~    -11.08%         -6.75%
ParseInt/Pos/26bit                      ~            ~    -13.65%        -11.02%
ParseInt/Pos/31bit                      ~            ~    -14.65%         -9.71%
ParseInt/Pos/56bit                 -1.80%            ~    -17.97%        -10.78%
ParseInt/Pos/63bit                      ~            ~    -13.85%         -9.63%
ParseInt/Neg/7bit                       ~            ~    -12.14%         -7.26%
ParseInt/Neg/26bit                      ~            ~    -14.18%         -9.81%
ParseInt/Neg/31bit                      ~            ~    -14.51%         -9.02%
ParseInt/Neg/56bit                      ~            ~    -15.79%         -9.79%
ParseInt/Neg/63bit                      ~            ~    -15.68%        -11.07%
AppendFloat/Decimal                     ~            ~     -7.25%        -12.26%
AppendFloat/Float                       ~            ~    -15.96%        -19.45%
AppendFloat/Exp                         ~            ~    -13.96%        -17.76%
AppendFloat/NegExp                      ~            ~    -14.89%        -20.27%
AppendFloat/LongExp                     ~            ~    -12.68%        -17.97%
AppendFloat/Big                         ~            ~    -11.10%        -16.64%
AppendFloat/BinaryExp                   ~            ~          ~              ~
AppendFloat/32Integer                   ~            ~    -10.05%        -10.91%
AppendFloat/32ExactFraction             ~            ~     -8.93%        -13.00%
AppendFloat/32Point                     ~            ~    -10.36%        -14.89%
AppendFloat/32Exp                       ~            ~     -9.88%        -13.54%
AppendFloat/32NegExp                    ~            ~    -10.16%        -14.26%
AppendFloat/32Shortest                  ~            ~    -11.39%        -14.96%
AppendFloat/32Fixed8Hard                ~            ~          ~         -2.31%
AppendFloat/32Fixed9Hard                ~            ~          ~         -7.01%
AppendFloat/64Fixed1                    ~            ~     -2.83%         -8.23%
AppendFloat/64Fixed2                    ~            ~          ~         -7.94%
AppendFloat/64Fixed3                    ~            ~     -4.07%         -7.22%
AppendFloat/64Fixed4                    ~            ~     -7.24%         -7.62%
AppendFloat/64Fixed12                   ~            ~     -6.57%         -4.82%
AppendFloat/64Fixed16                   ~            ~     -4.00%         -5.81%
AppendFloat/64Fixed12Hard          -2.22%            ~     -4.07%         -6.35%
AppendFloat/64Fixed17Hard          -2.12%            ~          ~         -3.79%
AppendFloat/64Fixed18Hard          -1.89%            ~     +2.48%              ~
AppendFloat/Slowpath64             -1.85%            ~    -14.49%        -18.21%
AppendFloat/SlowpathDenormal64          ~            ~    -13.08%        -19.41%

pkg: crypto/internal/fips140/nistec/fiat

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Mul/P224                  ~            ~    -29.95%        -39.60%
Mul/P384                  ~            ~    -37.11%        -63.33%
Mul/P521                  ~            ~    -26.62%        -12.42%
Square/P224          +1.46%            ~    -40.62%        -49.18%
Square/P384               ~            ~    -45.51%        -69.68%
Square/P521         +90.37%            ~    -25.26%        -11.23%

(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)

pkg: crypto/internal/fips140/edwards25519

benchmark \ host                    s7  linux-amd64  linux-386  s7:GOARCH=386
                               vs base      vs base    vs base        vs base
EncodingDecoding                     ~            ~    -34.67%        -35.75%
ScalarBaseMult                       ~            ~    -31.25%        -30.29%
ScalarMult                           ~            ~    -33.45%        -32.54%
VarTimeDoubleScalarBaseMult          ~            ~    -33.78%        -33.68%

Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-30 08:04:20 -07:00
Keith Randall
af03343f93 cmd/compile: fix bounds check report
For constant-index, variable length situations.

Inadvertant shadowing of the yVal variable. Oops.

Fixes #75327

Change-Id: I3403066fc39b7664222a3098cf0f22b5761ea66a
Reviewed-on: https://go-review.googlesource.com/c/go/+/702015
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-09 07:37:40 -07:00
Keith Randall
7dccd6395c cmd/compile: move arm32 over to new bounds check strategy
Change-Id: I529edd805875a4833cabcf4692f0c6d4163b07d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/682398
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-29 10:46:49 -07:00
David Chase
c2ae5c7443 cmd/compile, runtime: use PC of deferreturn for panic transfer
this removes the old conditional-on-register-value
handshake from the deferproc/deferprocstack logic.

The "line" for the recovery-exit frame itself (not the defers
that it runs) is the closing brace of the function.

Reduces code size slightly (e.g. go command is 0.2% smaller)

Sample output showing effect of this change, also what sort of
code it requires to observe the effect:
```
package main

import "os"

func main() {
	g(len(os.Args) - 1)           // stack[0]
}

var gi int
var pi *int = &gi

//go:noinline
func g(i int) {
	switch i {
	case 0:
		defer func() {
			println("g0", i)
			q()                  // stack[2] if i == 0
		}()
		for j := *pi; j < 1; j++ {
			defer func() {
				println("recover0", recover().(string))
			}()
		}
	default:
		for j := *pi; j < 1; j++ {
			defer func() {
				println("g1", i)
				q()              // stack[2] if i == 1
			}()
		}
		defer func() {
			println("recover1", recover().(string))
		}()
	}
	p()
}                                // stack[1] (deferreturn)

//go:noinline
func p() {
	panic("p()")
}

//go:noinline
func q() {
	panic("q()")                 // stack[3]
}

/* Sample output for "./foo foo":
recover1 p()
g1 1
panic: q()

goroutine 1 [running]:
main.q()
	.../main.go:46 +0x2c
main.g.func3()
	.../main.go:29 +0x48
main.g(0x1?)
	.../main.go:37 +0x68
main.main()
	.../main.go:6 +0x28
*/
```

Change-Id: Ie39ea62ecc244213500380ea06d44024cadc2317
Reviewed-on: https://go-review.googlesource.com/c/go/+/650795
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-25 08:35:38 -08:00
Ludi Rehak
ee6b34797b all: add floating point option for ARM targets
This change introduces new options to set the floating point
mode on ARM targets. The GOARM version number can optionally be
followed by ',hardfloat' or ',softfloat' to select whether to
use hardware instructions or software emulation for floating
point computations, respectively. For example,
GOARM=7,softfloat.

Previously, software floating point support was limited to
GOARM=5. With these options, software floating point is now
extended to all ARM versions, including GOARM=6 and 7. This
change also extends hardware floating point to GOARM=5.

GOARM=5 defaults to softfloat and GOARM=6 and 7 default to
hardfloat.

For #61588

Change-Id: I23dc86fbd0733b262004a2ed001e1032cf371e94
Reviewed-on: https://go-review.googlesource.com/c/go/+/514907
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-11-20 17:19:36 +00:00
ruinan
cf624a6127 cmd/asm: refine some APIs related to Prog.RestArgs[]
Before this CL, we use GetFrom3&SetFrom3 to get or set a source operand
which not fit into Prog.Reg. Those APIs operate the first element in
Prog.RestArgs without checking the type so they're fragile to break if
we have more than one different type of operands in the slice, which
will be a common case in Arm64.

This CL deprecates & renames some APIs related to Prog.RestArgs to make
those APIs more reasonable and robust than before.

Change-Id: I70d56edc1f23ccfffbcd6df34844e2cef2288432
Reviewed-on: https://go-review.googlesource.com/c/go/+/493355
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2023-05-24 01:26:58 +00:00
Keith Randall
21d82e6ac8 cmd/compile: batch write barrier calls
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.

Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24 00:21:13 +00:00
cui fliter
b2faff18ce all: add missing periods in comments
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2022-11-18 17:59:44 +00:00
Austin Clements
5f625de4d0 cmd/compile,cmd/internal/obj: replace Ctxt.FixedFrameSize method with Arch field
And delete now-unused FixedFrameSize methods.

Change-Id: Id257e1647dbeb4eb4ab866c53744010c4efeb953
Reviewed-on: https://go-review.googlesource.com/c/go/+/400819
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-19 15:59:22 +00:00
Cherry Mui
ccfc41eee0 cmd/compile: check out-of-range shifts on ARM and ARM64
When encoding ARM or ARM64 shifted register operand, check that
the shift is in range.

Change-Id: If0014933bfd0a1b8eaaa01e0220a6eeb17ab9f40
Reviewed-on: https://go-review.googlesource.com/c/go/+/351530
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-22 18:32:21 +00:00
Cherry Mui
c10b980220 cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.

This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as

OTAILCALL (OCALL target args...)

This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.

(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)

The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).

This slightly reduces binary size:
              old        new
cmd/go     14003088   13953936
cmd/link    6275552    6271456

Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-17 22:59:44 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Cherry Zhang
6ae3b70ef2 cmd/compile: add clobberdeadreg mode
When -clobberdeadreg flag is set, the compiler inserts code that
clobbers integer registers at call sites. This may be helpful for
debugging register ABI.

Only implemented on AMD64 for now.

Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899
Reviewed-on: https://go-review.googlesource.com/c/go/+/302809
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-03-19 23:21:21 +00:00
fanzha02
2b50ab2aee cmd/compile: optimize single-precision floating point square root
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.

On arm64 flatform.

previous:
  FCVTSD F0, F0
  FSQRTD F0, F0
  FCVTDS F0, F0

optimized:
  FSQRTS S0, S0

And this patch adds the test case to check the correctness.

This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)

Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-02 06:38:07 +00:00
Josh Bleecher Snyder
a61524d103 cmd/internal/obj: add Prog.SetFrom3{Reg,Const}
These are the the most common uses, and they reduce line noise.

I don't love adding new deprecated APIs,
but since they're trivial wrappers,
it'll be very easy to update them along with the rest.
No functional changes; passes toolstash-check.

Change-Id: I691a8175cfef9081180e463c63f326376af3f3a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/296009
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-02-25 23:19:16 +00:00
Josh Bleecher Snyder
4ebb6f5110 cmd/compile: automate resultInArg0 register checks
No functional changes; passes toolstash-check.
No measureable performance changes.

Change-Id: I2629f73d4a3cc56d80f512f33cf57cf41d8f15d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/296010
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-02-25 18:57:20 +00:00
Russ Cox
6c34d2f420 [dev.regabi] cmd/compile: split out package ssagen [generated]
[git-generate]

cd src/cmd/compile/internal/gc
rf '
	# maxOpenDefers is declared in ssa.go but used only by walk.
	mv maxOpenDefers walk.go

	# gc.Arch -> ssagen.Arch
	# It is not as nice but will do for now.
	mv Arch ArchInfo
	mv thearch Arch
	mv Arch ArchInfo arch.go

	# Pull dwarf out of pgen.go.
	mv debuginfo declPos createDwarfVars preInliningDcls \
		createSimpleVars createSimpleVar \
		createComplexVars createComplexVar \
		dwarf.go

	# Pull high-level compilation out of pgen.go,
	# leaving only the SSA code.
	mv compilequeue funccompile compile compilenow \
		compileFunctions isInlinableButNotInlined \
		initLSym \
		compile.go

	mv BoundsCheckFunc GCWriteBarrierReg ssa.go
	mv largeStack largeStackFrames CheckLargeStacks pgen.go

	# All that is left in dcl.go is the nowritebarrierrecCheck
	mv dcl.go nowb.go

	# Export API and unexport non-API.
	mv initssaconfig InitConfig
	mv isIntrinsicCall IsIntrinsicCall
	mv ssaDumpInline DumpInline
	mv initSSATables InitTables
	mv initSSAEnv InitEnv
	mv compileSSA Compile
	mv stackOffset StackOffset
	mv canSSAType TypeOK
	mv SSAGenState State
	mv FwdRefAux fwdRefAux

	mv cgoSymABIs CgoSymABIs
	mv readSymABIs ReadSymABIs
	mv initLSym InitLSym
	mv useABIWrapGen symabiDefs CgoSymABIs ReadSymABIs InitLSym selectLSym makeABIWrapper setupTextLSym abi.go

	mv arch.go abi.go nowb.go phi.go pgen.go pgen_test.go ssa.go cmd/compile/internal/ssagen
'
rm go.go gsubr.go

Change-Id: I47fad6cbf1d1e583fd9139003a08401d7cd048a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/279476
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 06:39:29 +00:00
Russ Cox
0ced54062e [dev.regabi] cmd/compile: split out package objw [generated]
Object file writing routines are used not just at the end
of the compilation but also during static data layout in walk.
Split them into their own package.

[git-generate]

cd src/cmd/compile/internal/gc
rf '
	# Move bit vector to new package bitvec
	mv bvec.n bvec.N
	mv bvec.b bvec.B
	mv bvec BitVec
	mv bvalloc New
	mv bvbulkalloc NewBulk
	mv bulkBvec.next bulkBvec.Next
	mv bulkBvec Bulk
	mv H0 h0
	mv Hp hp

	# Leave bvecSet and bitmap hashes behind - not needed as broadly.
	mv bvecSet.extractUniqe bvecSet.extractUnique
	mv h0 bvecSet bvecSet.grow bvecSet.add \
		bvecSet.extractUnique hashbitmap bvset.go

	mv bv.go cmd/compile/internal/bitvec

	ex . ../arm ../arm64 ../mips ../mips64 ../ppc64 ../s390x ../riscv64 {
		import "cmd/internal/obj"
		var a *obj.Addr
		var i int64
		Addrconst(a, i) -> a.SetConst(i)
		var p, to *obj.Prog
		Patch(p, to) -> p.To.SetTarget(to)
	}
	rm Addrconst Patch

	# Move object-writing API to new package objw
	mv duint8 Objw_Uint8
	mv duint16 Objw_Uint16
	mv duint32 Objw_Uint32
	mv duintptr Objw_Uintptr
	mv duintxx Objw_UintN
	mv dsymptr Objw_SymPtr
	mv dsymptrOff Objw_SymPtrOff
	mv dsymptrWeakOff Objw_SymPtrWeakOff
	mv ggloblsym Objw_Global
	mv dbvec Objw_BitVec
	mv newProgs NewProgs
	mv Progs.clearp Progs.Clear
	mv Progs.settext Progs.SetText
	mv Progs.next Progs.Next
	mv Progs.pc Progs.PC
	mv Progs.pos Progs.Pos
	mv Progs.curfn Progs.CurFunc
	mv Progs.progcache Progs.Cache
	mv Progs.cacheidx Progs.CacheIndex
	mv Progs.nextLive Progs.NextLive
	mv Progs.prevLive Progs.PrevLive
	mv Progs.Appendpp Progs.Append
	mv LivenessIndex.stackMapIndex LivenessIndex.StackMapIndex
	mv LivenessIndex.isUnsafePoint LivenessIndex.IsUnsafePoint

	mv Objw_Uint8 Objw_Uint16 Objw_Uint32 Objw_Uintptr Objw_UintN \
		Objw_SymPtr Objw_SymPtrOff Objw_SymPtrWeakOff Objw_Global \
		Objw_BitVec \
		objw.go
	mv sharedProgArray NewProgs Progs \
		LivenessIndex StackMapDontCare \
		LivenessDontCare LivenessIndex.StackMapValid \
		Progs.NewProg Progs.Flush Progs.Free Progs.Prog Progs.Clear Progs.Append Progs.SetText \
		prog.go
	mv prog.go objw.go cmd/compile/internal/objw

	# Move ggloblnod to obj with the rest of the non-objw higher-level writing.
	mv ggloblnod obj.go
'

cd ../objw
rf '
	mv Objw_Uint8 Uint8
	mv Objw_Uint16 Uint16
	mv Objw_Uint32 Uint32
	mv Objw_Uintptr Uintptr
	mv Objw_UintN UintN
	mv Objw_SymPtr SymPtr
	mv Objw_SymPtrOff SymPtrOff
	mv Objw_SymPtrWeakOff SymPtrWeakOff
	mv Objw_Global Global
	mv Objw_BitVec BitVec
'

Change-Id: I2b87085aa788564fb322e9c55bddd73347b4d5fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/279310
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 06:38:47 +00:00
Russ Cox
65c4c6dfb2 [dev.regabi] cmd/compile: group known symbols, packages, names [generated]
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.

If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.

Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.

[git-generate]
cd src/cmd/compile/internal/gc

rf '
	add go.go:$ \
		// Names holds known names. \
		var Names struct{} \
		\
		// Syms holds known symbols. \
		var Syms struct {} \
		\
		// Pkgs holds known packages. \
		var Pkgs struct {} \

	mv staticuint64s Names.Staticuint64s
	mv zerobase Names.Zerobase

	mv assertE2I Syms.AssertE2I
	mv assertE2I2 Syms.AssertE2I2
	mv assertI2I Syms.AssertI2I
	mv assertI2I2 Syms.AssertI2I2
	mv deferproc Syms.Deferproc
	mv deferprocStack Syms.DeferprocStack
	mv Deferreturn Syms.Deferreturn
	mv Duffcopy Syms.Duffcopy
	mv Duffzero Syms.Duffzero
	mv gcWriteBarrier Syms.GCWriteBarrier
	mv goschedguarded Syms.Goschedguarded
	mv growslice Syms.Growslice
	mv msanread Syms.Msanread
	mv msanwrite Syms.Msanwrite
	mv msanmove Syms.Msanmove
	mv newobject Syms.Newobject
	mv newproc Syms.Newproc
	mv panicdivide Syms.Panicdivide
	mv panicshift Syms.Panicshift
	mv panicdottypeE Syms.PanicdottypeE
	mv panicdottypeI Syms.PanicdottypeI
	mv panicnildottype Syms.Panicnildottype
	mv panicoverflow Syms.Panicoverflow
	mv raceread Syms.Raceread
	mv racereadrange Syms.Racereadrange
	mv racewrite Syms.Racewrite
	mv racewriterange Syms.Racewriterange
	mv SigPanic Syms.SigPanic
	mv typedmemclr Syms.Typedmemclr
	mv typedmemmove Syms.Typedmemmove
	mv Udiv Syms.Udiv
	mv writeBarrier Syms.WriteBarrier
	mv zerobaseSym Syms.Zerobase
	mv arm64HasATOMICS Syms.ARM64HasATOMICS
	mv armHasVFPv4 Syms.ARMHasVFPv4
	mv x86HasFMA Syms.X86HasFMA
	mv x86HasPOPCNT Syms.X86HasPOPCNT
	mv x86HasSSE41 Syms.X86HasSSE41
	mv WasmDiv Syms.WasmDiv
	mv WasmMove Syms.WasmMove
	mv WasmZero Syms.WasmZero
	mv WasmTruncS Syms.WasmTruncS
	mv WasmTruncU Syms.WasmTruncU

	mv gopkg Pkgs.Go
	mv itabpkg Pkgs.Itab
	mv itablinkpkg Pkgs.Itablink
	mv mappkg Pkgs.Map
	mv msanpkg Pkgs.Msan
	mv racepkg Pkgs.Race
	mv Runtimepkg Pkgs.Runtime
	mv trackpkg Pkgs.Track
	mv unsafepkg Pkgs.Unsafe

	mv Names Syms Pkgs symtab.go
	mv symtab.go cmd/compile/internal/ir
'

Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 06:38:07 +00:00
Matthew Dempsky
6db970e20a [dev.regabi] cmd/compile: rewrite Aux uses of ir.Node to *ir.Name [generated]
Now that the only remaining ir.Node implementation that is stored
(directly) into ssa.Aux, we can rewrite all of the conversions between
ir.Node and ssa.Aux to use *ir.Name instead.

rf doesn't have a way to rewrite the type switch case clauses, so we
just use sed instead. There's only a handful, and they're the only
times that "case ir.Node" appears anyway.

The next CL will move the tag method declarations so that ir.Node no
longer implements ssa.Aux.

Passes buildall w/ toolstash -cmp.

Updates #42982.

[git-generate]
cd src/cmd/compile/internal
sed -i -e 's/case ir.Node/case *ir.Name/' gc/plive.go */ssa.go

cd ssa
rf '
ex . ../gc {
  import "cmd/compile/internal/ir"

  var v *Value
  v.Aux.(ir.Node) -> v.Aux.(*ir.Name)

  var n ir.Node
  var asAux func(Aux)
  strict n        # only match ir.Node-typed expressions; not *ir.Name
  implicit asAux  # match implicit assignments to ssa.Aux
  asAux(n)        -> n.(*ir.Name)
}
'

Change-Id: I3206ef5f12a7cfa37c5fecc67a1ca02ea4d52b32
Reviewed-on: https://go-review.googlesource.com/c/go/+/275789
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2020-12-08 01:46:57 +00:00
Russ Cox
41f3af9d04 [dev.regabi] cmd/compile: replace *Node type with an interface Node [generated]
The plan is to introduce a Node interface that replaces the old *Node pointer-to-struct.

The previous CL defined an interface INode modeling a *Node.

This CL:
 - Changes all references outside internal/ir to use INode,
   along with many references inside internal/ir as well.
 - Renames Node to node.
 - Renames INode to Node

So now ir.Node is an interface implemented by *ir.node, which is otherwise inaccessible,
and the code outside package ir is now (clearly) using only the interface.

The usual rule is never to redefine an existing name with a new meaning,
so that old code that hasn't been updated gets a "unknown name" error
instead of more mysterious errors or silent misbehavior. That rule would
caution against replacing Node-the-struct with Node-the-interface,
as in this CL, because code that says *Node would now be using a pointer
to an interface. But this CL is being landed at the same time as another that
moves Node from gc to ir. So the net effect is to replace *gc.Node with ir.Node,
which does follow the rule: any lingering references to gc.Node will be told
it's gone, not silently start using pointers to interfaces. So the rule is followed
by the CL sequence, just not this specific CL.

Overall, the loss of inlining caused by using interfaces cuts the compiler speed
by about 6%, a not insignificant amount. However, as we convert the representation
to concrete structs that are not the giant Node over the next weeks, that speed
should come back as more of the compiler starts operating directly on concrete types
and the memory taken up by the graph of Nodes drops due to the more precise
structs. Honestly, I was expecting worse.

% benchstat bench.old bench.new
name                      old time/op       new time/op       delta
Template                        168ms ± 4%        182ms ± 2%   +8.34%  (p=0.000 n=9+9)
Unicode                        72.2ms ±10%       82.5ms ± 6%  +14.38%  (p=0.000 n=9+9)
GoTypes                         563ms ± 8%        598ms ± 2%   +6.14%  (p=0.006 n=9+9)
Compiler                        2.89s ± 4%        3.04s ± 2%   +5.37%  (p=0.000 n=10+9)
SSA                             6.45s ± 4%        7.25s ± 5%  +12.41%  (p=0.000 n=9+10)
Flate                           105ms ± 2%        115ms ± 1%   +9.66%  (p=0.000 n=10+8)
GoParser                        144ms ±10%        152ms ± 2%   +5.79%  (p=0.011 n=9+8)
Reflect                         345ms ± 9%        370ms ± 4%   +7.28%  (p=0.001 n=10+9)
Tar                             149ms ± 9%        161ms ± 5%   +8.05%  (p=0.001 n=10+9)
XML                             190ms ± 3%        209ms ± 2%   +9.54%  (p=0.000 n=9+8)
LinkCompiler                    327ms ± 2%        325ms ± 2%     ~     (p=0.382 n=8+8)
ExternalLinkCompiler            1.77s ± 4%        1.73s ± 6%     ~     (p=0.113 n=9+10)
LinkWithoutDebugCompiler        214ms ± 4%        211ms ± 2%     ~     (p=0.360 n=10+8)
StdCmd                          14.8s ± 3%        15.9s ± 1%   +6.98%  (p=0.000 n=10+9)
[Geo mean]                      480ms             510ms        +6.31%

name                      old user-time/op  new user-time/op  delta
Template                        223ms ± 3%        237ms ± 3%   +6.16%  (p=0.000 n=9+10)
Unicode                         103ms ± 6%        113ms ± 3%   +9.53%  (p=0.000 n=9+9)
GoTypes                         758ms ± 8%        800ms ± 2%   +5.55%  (p=0.003 n=10+9)
Compiler                        3.95s ± 2%        4.12s ± 2%   +4.34%  (p=0.000 n=10+9)
SSA                             9.43s ± 1%        9.74s ± 4%   +3.25%  (p=0.000 n=8+10)
Flate                           132ms ± 2%        141ms ± 2%   +6.89%  (p=0.000 n=9+9)
GoParser                        177ms ± 9%        183ms ± 4%     ~     (p=0.050 n=9+9)
Reflect                         467ms ±10%        495ms ± 7%   +6.17%  (p=0.029 n=10+10)
Tar                             183ms ± 9%        197ms ± 5%   +7.92%  (p=0.001 n=10+10)
XML                             249ms ± 5%        268ms ± 4%   +7.82%  (p=0.000 n=10+9)
LinkCompiler                    544ms ± 5%        544ms ± 6%     ~     (p=0.863 n=9+9)
ExternalLinkCompiler            1.79s ± 4%        1.75s ± 6%     ~     (p=0.075 n=10+10)
LinkWithoutDebugCompiler        248ms ± 6%        246ms ± 2%     ~     (p=0.965 n=10+8)
[Geo mean]                      483ms             504ms        +4.41%

[git-generate]
cd src/cmd/compile/internal/ir
: # We need to do the conversion in multiple steps, so we introduce
: # a temporary type alias that will start out meaning the pointer-to-struct
: # and then change to mean the interface.
rf '
	mv Node OldNode

	add node.go \
		type Node = *OldNode
'

: # It should work to do this ex in ir, but it misses test files, due to a bug in rf.
: # Run the command in gc to handle gc's tests, and then again in ssa for ssa's tests.
cd ../gc
rf '
        ex .  ../arm ../riscv64 ../arm64 ../mips64 ../ppc64 ../mips ../wasm {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'
cd ../ssa
rf '
        ex {
                import "cmd/compile/internal/ir"
                *ir.OldNode -> ir.Node
        }
'

: # Back in ir, finish conversion clumsily with sed,
: # because type checking and circular aliases do not mix.
cd ../ir
sed -i '' '
	/type Node = \*OldNode/d
	s/\*OldNode/Node/g
	s/^func (n Node)/func (n *OldNode)/
	s/OldNode/node/g
	s/type INode interface/type Node interface/
	s/var _ INode = (Node)(nil)/var _ Node = (*node)(nil)/
' *.go
gofmt -w *.go

sed -i '' '
	s/{Func{}, 136, 248}/{Func{}, 152, 280}/
	s/{Name{}, 32, 56}/{Name{}, 44, 80}/
	s/{Param{}, 24, 48}/{Param{}, 44, 88}/
	s/{node{}, 76, 128}/{node{}, 88, 152}/
' sizeof_test.go

cd ../ssa
sed -i '' '
	s/{LocalSlot{}, 28, 40}/{LocalSlot{}, 32, 48}/
' sizeof_test.go

cd ../gc
sed -i '' 's/\*ir.Node/ir.Node/' mkbuiltin.go

cd ../../../..
go install std cmd
cd cmd/compile
go test -u || go test -u

Change-Id: I196bbe3b648e4701662e4a2bada40bf155e2a553
Reviewed-on: https://go-review.googlesource.com/c/go/+/272935
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 17:30:43 +00:00
Russ Cox
84e2bd611f [dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated]
If we want to break up package gc at all, we will need to move
the compiler IR it defines into a separate package that can be
imported by packages that gc itself imports. This CL does that.
It also removes the TINT8 etc aliases so that all code is clear
about which package things are coming from.

This CL is automatically generated by the script below.
See the comments in the script for details about the changes.

[git-generate]
cd src/cmd/compile/internal/gc

rf '
        # These names were never fully qualified
        # when the types package was added.
        # Do it now, to avoid confusion about where they live.
        inline -rm \
                Txxx \
                TINT8 \
                TUINT8 \
                TINT16 \
                TUINT16 \
                TINT32 \
                TUINT32 \
                TINT64 \
                TUINT64 \
                TINT \
                TUINT \
                TUINTPTR \
                TCOMPLEX64 \
                TCOMPLEX128 \
                TFLOAT32 \
                TFLOAT64 \
                TBOOL \
                TPTR \
                TFUNC \
                TSLICE \
                TARRAY \
                TSTRUCT \
                TCHAN \
                TMAP \
                TINTER \
                TFORW \
                TANY \
                TSTRING \
                TUNSAFEPTR \
                TIDEAL \
                TNIL \
                TBLANK \
                TFUNCARGS \
                TCHANARGS \
                NTYPE \
                BADWIDTH

        # esc.go and escape.go do not need to be split.
        # Append esc.go onto the end of escape.go.
        mv esc.go escape.go

        # Pull out the type format installation from func Main,
        # so it can be carried into package ir.
        mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats

        # Names that need to be exported for use by code left in gc.
        mv Isconst IsConst
        mv asNode AsNode
        mv asNodes AsNodes
        mv asTypesNode AsTypesNode
        mv basicnames BasicTypeNames
        mv builtinpkg BuiltinPkg
        mv consttype ConstType
        mv dumplist DumpList
        mv fdumplist FDumpList
        mv fmtMode FmtMode
        mv goopnames OpNames
        mv inspect Inspect
        mv inspectList InspectList
        mv localpkg LocalPkg
        mv nblank BlankNode
        mv numImport NumImport
        mv opprec OpPrec
        mv origSym OrigSym
        mv stmtwithinit StmtWithInit
        mv dump DumpAny
        mv fdump FDumpAny
        mv nod Nod
        mv nodl NodAt
        mv newname NewName
        mv newnamel NewNameAt
        mv assertRepresents AssertValidTypeForConst
        mv represents ValidTypeForConst
        mv nodlit NewLiteral

        # Types and fields that need to be exported for use by gc.
        mv nowritebarrierrecCallSym SymAndPos
        mv SymAndPos.lineno SymAndPos.Pos
        mv SymAndPos.target SymAndPos.Sym

        mv Func.lsym Func.LSym
        mv Func.setWBPos Func.SetWBPos
        mv Func.numReturns Func.NumReturns
        mv Func.numDefers Func.NumDefers
        mv Func.nwbrCalls Func.NWBRCalls

        # initLSym is an algorithm left behind in gc,
        # not an operation on Func itself.
        mv Func.initLSym initLSym

        mv nodeQueue NodeQueue
        mv NodeQueue.empty NodeQueue.Empty
        mv NodeQueue.popLeft NodeQueue.PopLeft
        mv NodeQueue.pushRight NodeQueue.PushRight

        # Many methods on Node are actually algorithms that
        # would apply to any node implementation.
        # Those become plain functions.
        mv Node.funcname FuncName
        mv Node.isBlank IsBlank
        mv Node.isGoConst isGoConst
        mv Node.isNil IsNil
        mv Node.isParamHeapCopy isParamHeapCopy
        mv Node.isParamStackCopy isParamStackCopy
        mv Node.isSimpleName isSimpleName
        mv Node.mayBeShared MayBeShared
        mv Node.pkgFuncName PkgFuncName
        mv Node.backingArrayPtrLen backingArrayPtrLen
        mv Node.isterminating isTermNode
        mv Node.labeledControl labeledControl
        mv Nodes.isterminating isTermNodes
        mv Nodes.sigerr fmtSignature
        mv Node.MethodName methodExprName
        mv Node.MethodFunc methodExprFunc
        mv Node.IsMethod IsMethod

        # Every node will need to implement RawCopy;
        # Copy and SepCopy algorithms will use it.
        mv Node.rawcopy Node.RawCopy
        mv Node.copy Copy
        mv Node.sepcopy SepCopy

        # Extract Node.Format method body into func FmtNode,
        # but leave method wrapper behind.
        mv Node.Format:0,$ FmtNode

        # Formatting helpers that will apply to all node implementations.
        mv Node.Line Line
        mv Node.exprfmt exprFmt
        mv Node.jconv jconvFmt
        mv Node.modeString modeString
        mv Node.nconv nconvFmt
        mv Node.nodedump nodeDumpFmt
        mv Node.nodefmt nodeFmt
        mv Node.stmtfmt stmtFmt

	# Constant support needed for code moving to ir.
        mv okforconst OKForConst
        mv vconv FmtConst
        mv int64Val Int64Val
        mv float64Val Float64Val
        mv Node.ValueInterface ConstValue

        # Organize code into files.
        mv LocalPkg BuiltinPkg ir.go
        mv NumImport InstallTypeFormats Line fmt.go
        mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \
                AsNode AsTypesNode BlankNode OrigSym \
                Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \
                IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \
                Node.RawCopy SepCopy Copy \
                IsNil IsBlank IsMethod \
                Node.Typ Node.StorageClass node.go
        mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go

        # Move files to new ir package.
        mv bitset.go class_string.go dump.go fmt.go \
                ir.go node.go op_string.go val.go \
                sizeof_test.go cmd/compile/internal/ir
'

: # fix mkbuiltin.go to generate the changes made to builtin.go during rf
sed -i '' '
        s/\[T/[types.T/g
        s/\*Node/*ir.Node/g
        /internal\/types/c \
                fmt.Fprintln(&b, `import (`) \
                fmt.Fprintln(&b, `      "cmd/compile/internal/ir"`) \
                fmt.Fprintln(&b, `      "cmd/compile/internal/types"`) \
                fmt.Fprintln(&b, `)`)
' mkbuiltin.go
gofmt -w mkbuiltin.go

: # update cmd/dist to add internal/ir
cd ../../../dist
sed -i '' '/compile.internal.gc/a\
        "cmd/compile/internal/ir",
' buildtool.go
gofmt -w buildtool.go

: # update cmd/compile TestFormats
cd ../..
go install std cmd
cd cmd/compile
go test -u || go test  # first one updates but fails; second passes

Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/273008
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:53:33 +00:00
Russ Cox
26b66fd60b [dev.regabi] cmd/compile: introduce cmd/compile/internal/base [generated]
Move Flag, Debug, Ctxt, Exit, and error messages to
new package cmd/compile/internal/base.

These are the core functionality that everything in gc uses
and which otherwise prevent splitting any other code
out of gc into different packages.

A minor milestone: the compiler source code
no longer contains the string "yy".

[git-generate]
cd src/cmd/compile/internal/gc
rf '
        mv atExit AtExit
        mv Ctxt atExitFuncs AtExit Exit base.go

        mv lineno Pos
        mv linestr FmtPos
        mv flusherrors FlushErrors
        mv yyerror Errorf
        mv yyerrorl ErrorfAt
        mv yyerrorv ErrorfVers
        mv noder.yyerrorpos noder.errorAt
        mv Warnl WarnfAt
        mv errorexit ErrorExit

        mv base.go debug.go flag.go print.go cmd/compile/internal/base
'

: # update comments
sed -i '' 's/yyerrorl/ErrorfAt/g; s/yyerror/Errorf/g' *.go

: # bootstrap.go is not built by default so invisible to rf
sed -i '' 's/Fatalf/base.Fatalf/' bootstrap.go
goimports -w bootstrap.go

: # update cmd/dist to add internal/base
cd ../../../dist
sed -i '' '/internal.amd64/a\
	"cmd/compile/internal/base",
' buildtool.go
gofmt -w buildtool.go

Change-Id: I59903c7084222d6eaee38823fd222159ba24a31a
Reviewed-on: https://go-review.googlesource.com/c/go/+/272250
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:39:54 +00:00
Russ Cox
3c240f5d17 [dev.regabi] cmd/compile: clean up debug flag (-d) handling [generated]
The debug table is not as haphazard as flags, but there are still
a few mismatches between command-line names and variable names.
This CL moves them all into a consistent home (var Debug, like var Flag).

Code updated automatically using the rf command below.
A followup CL will make a few manual cleanups, leaving this CL
completely automated and easier to regenerate during merge
conflicts.

[git-generate]
cd src/cmd/compile/internal/gc
rf '
	add main.go var Debug struct{}
	mv Debug_append Debug.Append
	mv Debug_checkptr Debug.Checkptr
	mv Debug_closure Debug.Closure
	mv Debug_compilelater Debug.CompileLater
	mv disable_checknil Debug.DisableNil
	mv debug_dclstack Debug.DclStack
	mv Debug_gcprog Debug.GCProg
	mv Debug_libfuzzer Debug.Libfuzzer
	mv Debug_checknil Debug.Nil
	mv Debug_panic Debug.Panic
	mv Debug_slice Debug.Slice
	mv Debug_typeassert Debug.TypeAssert
	mv Debug_wb Debug.WB
	mv Debug_export Debug.Export
	mv Debug_pctab Debug.PCTab
	mv Debug_locationlist Debug.LocationLists
	mv Debug_typecheckinl Debug.TypecheckInl
	mv Debug_gendwarfinl Debug.DwarfInl
	mv Debug_softfloat Debug.SoftFloat
	mv Debug_defer Debug.Defer
	mv Debug_dumpptrs Debug.DumpPtrs

	mv flag.go:/parse.-d/-1,/unknown.debug/+2 parseDebug

	mv debugtab Debug parseDebug \
		debugHelpHeader debugHelpFooter \
		debug.go

	# Remove //go:generate line copied from main.go
	rm debug.go:/go:generate/-+
'

Change-Id: I625761ca5659be4052f7161a83baa00df75cca91
Reviewed-on: https://go-review.googlesource.com/c/go/+/272246
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-25 16:39:42 +00:00
Keith Randall
40ef1faabc cmd/compile: redo flag constant ops for arm
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.

This is only for arm, arm64 is in a subsequent CL.

We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.

Update #39505

Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-18 20:57:49 +00:00
Xiangdong Ji
e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
David Chase
40ebcfaa17 cmd/compile: enable nil check logging for other architectures.
Change-Id: If82ebd9cd6470863eb5de9e031e7905a66218857
Reviewed-on: https://go-review.googlesource.com/c/go/+/204159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-10 17:12:15 +00:00
smasher164
58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
Michael Munday
9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Ben Shi
3cfd003a8a cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:58 +00:00
Ben Shi
c683ab8128 cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM.

The benchmark results of src/math/ show big improvements.
name                   old time/op  new time/op  delta
Acos-4                  181ns ± 0%   182ns ± 0%   +0.30%  (p=0.000 n=40+40)
Acosh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Asin-4                  163ns ± 0%   163ns ± 0%     ~     (all equal)
Asinh-4                 242ns ± 0%   242ns ± 0%     ~     (all equal)
Atan-4                  120ns ± 0%   121ns ± 0%   +0.83%  (p=0.000 n=40+40)
Atanh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Atan2-4                 173ns ± 0%   173ns ± 0%     ~     (all equal)
Cbrt-4                 1.06µs ± 0%  1.06µs ± 0%   +0.09%  (p=0.000 n=39+37)
Ceil-4                 72.9ns ± 0%  72.8ns ± 0%     ~     (p=0.237 n=40+40)
Copysign-4             13.2ns ± 0%  13.2ns ± 0%     ~     (all equal)
Cos-4                   193ns ± 0%   183ns ± 0%   -5.18%  (p=0.000 n=40+40)
Cosh-4                  254ns ± 0%   239ns ± 0%   -5.91%  (p=0.000 n=40+40)
Erf-4                   112ns ± 0%   112ns ± 0%     ~     (all equal)
Erfc-4                  117ns ± 0%   117ns ± 0%     ~     (all equal)
Erfinv-4                127ns ± 0%   127ns ± 1%     ~     (p=0.492 n=40+40)
Erfcinv-4               128ns ± 0%   128ns ± 0%     ~     (all equal)
Exp-4                   212ns ± 0%   206ns ± 0%   -3.05%  (p=0.000 n=40+40)
ExpGo-4                 216ns ± 0%   209ns ± 0%   -3.24%  (p=0.000 n=40+40)
Expm1-4                 142ns ± 0%   142ns ± 0%     ~     (all equal)
Exp2-4                  191ns ± 0%   184ns ± 0%   -3.45%  (p=0.000 n=40+40)
Exp2Go-4                194ns ± 0%   187ns ± 0%   -3.61%  (p=0.000 n=40+40)
Abs-4                  14.4ns ± 0%   6.3ns ± 0%  -56.39%  (p=0.000 n=38+39)
Dim-4                  12.6ns ± 0%  12.6ns ± 0%     ~     (all equal)
Floor-4                49.6ns ± 0%  49.6ns ± 0%     ~     (all equal)
Max-4                  27.6ns ± 0%  27.6ns ± 0%     ~     (all equal)
Min-4                  27.0ns ± 0%  27.0ns ± 0%     ~     (all equal)
Mod-4                   349ns ± 0%   305ns ± 1%  -12.55%  (p=0.000 n=33+40)
Frexp-4                54.0ns ± 0%  47.1ns ± 0%  -12.78%  (p=0.000 n=38+38)
Gamma-4                 242ns ± 0%   234ns ± 0%   -3.16%  (p=0.000 n=36+40)
Hypot-4                84.8ns ± 0%  67.8ns ± 0%  -20.05%  (p=0.000 n=31+35)
HypotGo-4              88.5ns ± 0%  71.6ns ± 0%  -19.12%  (p=0.000 n=40+38)
Ilogb-4                45.8ns ± 0%  38.9ns ± 0%  -15.12%  (p=0.000 n=40+32)
J0-4                    821ns ± 0%   802ns ± 0%   -2.33%  (p=0.000 n=33+40)
J1-4                    816ns ± 0%   807ns ± 0%   -1.05%  (p=0.000 n=40+29)
Jn-4                   1.67µs ± 0%  1.65µs ± 0%   -1.45%  (p=0.000 n=40+39)
Ldexp-4                61.5ns ± 0%  54.6ns ± 0%  -11.27%  (p=0.000 n=40+32)
Lgamma-4                188ns ± 0%   188ns ± 0%     ~     (all equal)
Log-4                   154ns ± 0%   147ns ± 0%   -4.78%  (p=0.000 n=40+40)
Logb-4                 50.9ns ± 0%  42.7ns ± 0%  -16.11%  (p=0.000 n=34+39)
Log1p-4                 160ns ± 0%   159ns ± 0%     ~     (p=0.828 n=40+40)
Log10-4                 173ns ± 0%   166ns ± 0%   -4.05%  (p=0.000 n=40+40)
Log2-4                 65.3ns ± 0%  58.4ns ± 0%  -10.57%  (p=0.000 n=37+37)
Modf-4                 36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter32-4          36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter64-4          32.7ns ± 0%  32.6ns ± 0%     ~     (p=0.375 n=40+40)
PowInt-4                300ns ± 0%   277ns ± 0%   -7.78%  (p=0.000 n=40+40)
PowFrac-4               676ns ± 0%   635ns ± 0%   -6.00%  (p=0.000 n=40+35)
Pow10Pos-4             17.6ns ± 0%  17.6ns ± 0%     ~     (all equal)
Pow10Neg-4             22.0ns ± 0%  22.0ns ± 0%     ~     (all equal)
Round-4                30.1ns ± 0%  30.1ns ± 0%     ~     (all equal)
RoundToEven-4          38.9ns ± 0%  38.9ns ± 0%     ~     (all equal)
Remainder-4             291ns ± 0%   263ns ± 0%   -9.62%  (p=0.000 n=40+40)
Signbit-4              11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Sin-4                   185ns ± 0%   185ns ± 0%     ~     (all equal)
Sincos-4                230ns ± 0%   230ns ± 0%     ~     (all equal)
Sinh-4                  253ns ± 0%   246ns ± 0%   -2.77%  (p=0.000 n=39+39)
SqrtIndirect-4         41.4ns ± 0%  41.4ns ± 0%     ~     (all equal)
SqrtLatency-4          13.8ns ± 0%  13.8ns ± 0%     ~     (all equal)
SqrtIndirectLatency-4  37.0ns ± 0%  37.0ns ± 0%     ~     (p=0.632 n=40+40)
SqrtGoLatency-4         911ns ± 0%   911ns ± 0%   +0.08%  (p=0.000 n=40+40)
SqrtPrime-4            13.2µs ± 0%  13.2µs ± 0%   +0.01%  (p=0.038 n=38+40)
Tan-4                   205ns ± 0%   205ns ± 0%     ~     (all equal)
Tanh-4                  264ns ± 0%   247ns ± 0%   -6.44%  (p=0.000 n=39+32)
Trunc-4                45.2ns ± 0%  45.2ns ± 0%     ~     (all equal)
Y0-4                    796ns ± 0%   792ns ± 0%   -0.55%  (p=0.000 n=35+40)
Y1-4                    804ns ± 0%   797ns ± 0%   -0.82%  (p=0.000 n=24+40)
Yn-4                   1.64µs ± 0%  1.62µs ± 0%   -1.27%  (p=0.000 n=40+39)
Float64bits-4          8.16ns ± 0%  8.16ns ± 0%   +0.04%  (p=0.000 n=35+40)
Float64frombits-4      10.7ns ± 0%  10.7ns ± 0%     ~     (all equal)
Float32bits-4          7.53ns ± 0%  7.53ns ± 0%     ~     (p=0.760 n=40+40)
Float32frombits-4      6.91ns ± 0%  6.91ns ± 0%   -0.04%  (p=0.002 n=32+38)
[Geo mean]              111ns        106ns        -3.98%

Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Tobias Klauser
f125b32d19 cmd/compile/internal/arm: merge cases in ssaGenValue
Merge case statement for OpARMSLL, OpARMSRL and OpARMSRA into an
existing one using the same logic.

Change-Id: Ic4224668228902e5188fb0559b5f1949cfea1381
Reviewed-on: https://go-review.googlesource.com/c/go/+/171724
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-04-13 17:08:29 +00:00
Keith Randall
ad6c691542 cmd/compile: remove AUNDEF opcode
This opcode was only used to mark unreachable code for plive to use.
plive now uses the SSA representation, so it knows locations are
unreachable because they are ends of Exit blocks. It doesn't need
these opcodes any more.

These opcodes actually used space in the binary, 2 bytes per undef
on x86 and more for other archs.

Makes the amd64 go binary 0.2% smaller.

Change-Id: I64c84c35db7c7949617a3a5830f09c8e5fcd2620
Reviewed-on: https://go-review.googlesource.com/c/go/+/171058
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-07 01:15:28 +00:00
Keith Randall
2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
erifan01
fee84cc905 cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
This optimization rule can be used for math/bits.ReverseBytes16.

Benchmarks on arm v6:
name               old time/op  new time/op  delta
ReverseBytes-32    2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
ReverseBytes16-32  2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
ReverseBytes32-32  1.29ns ± 0%  1.29ns ± 0%   ~     (all equal)
ReverseBytes64-32  1.43ns ± 0%  1.43ns ± 0%   ~     (all equal)

Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
Reviewed-on: https://go-review.googlesource.com/c/go/+/159019
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-07 13:37:54 +00:00
Ben Shi
9f2411894b cmd/compile: optimize arm's bit operation
BFC (Bit Field Clear) was introduced in ARMv7, which can simplify
ANDconst and BICconst. And this CL implements that optimization.

1. The total size of pkg/android_arm decreases about 3KB, excluding
cmd/compile/.

2. There is no regression in the go1 benchmark result, and some
cases (FmtFprintfEmpty-4 and RegexpMatchMedium_32-4) even get
slight improvement.

name                     old time/op    new time/op    delta
BinaryTree17-4              25.3s ± 1%     25.2s ± 1%    ~     (p=0.072 n=30+29)
Fannkuch11-4                13.3s ± 0%     13.3s ± 0%  +0.13%  (p=0.000 n=30+26)
FmtFprintfEmpty-4           407ns ± 0%     394ns ± 0%  -3.19%  (p=0.000 n=26+28)
FmtFprintfString-4          664ns ± 0%     662ns ± 0%  -0.22%  (p=0.000 n=30+30)
FmtFprintfInt-4             712ns ± 0%     706ns ± 0%  -0.79%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         1.06µs ± 0%    1.05µs ± 0%  -0.38%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    1.16µs ± 0%    1.16µs ± 0%  -0.13%  (p=0.000 n=30+29)
FmtFprintfFloat-4          2.24µs ± 0%    2.23µs ± 0%  -0.51%  (p=0.000 n=29+21)
FmtManyArgs-4              4.09µs ± 0%    4.06µs ± 0%  -0.83%  (p=0.000 n=28+30)
GobDecode-4                55.0ms ± 5%    55.4ms ± 5%    ~     (p=0.307 n=30+30)
GobEncode-4                51.2ms ± 1%    51.9ms ± 1%  +1.23%  (p=0.000 n=29+30)
Gzip-4                      2.64s ± 0%     2.60s ± 0%  -1.35%  (p=0.000 n=30+29)
Gunzip-4                    309ms ± 0%     308ms ± 0%  -0.27%  (p=0.000 n=30+30)
HTTPClientServer-4         1.03ms ± 5%    1.02ms ± 4%    ~     (p=0.117 n=30+29)
JSONEncode-4                101ms ± 2%     101ms ± 2%    ~     (p=0.338 n=29+29)
JSONDecode-4                383ms ± 2%     382ms ± 2%    ~     (p=0.751 n=26+30)
Mandelbrot200-4            18.4ms ± 0%    18.4ms ± 0%  -0.10%  (p=0.000 n=29+29)
GoParse-4                  22.6ms ± 0%    22.5ms ± 0%  -0.39%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4       761ns ± 0%     750ns ± 0%  -1.47%  (p=0.000 n=26+29)
RegexpMatchEasy0_1K-4      4.33µs ± 0%    4.34µs ± 0%  +0.27%  (p=0.000 n=25+28)
RegexpMatchEasy1_32-4       809ns ± 0%     795ns ± 0%  -1.74%  (p=0.000 n=27+25)
RegexpMatchEasy1_1K-4      5.54µs ± 0%    5.53µs ± 0%  -0.18%  (p=0.000 n=29+29)
RegexpMatchMedium_32-4     1.11µs ± 0%    1.08µs ± 0%  -2.78%  (p=0.000 n=27+29)
RegexpMatchMedium_1K-4      255µs ± 0%     255µs ± 0%  -0.02%  (p=0.029 n=30+30)
RegexpMatchHard_32-4       14.7µs ± 0%    14.7µs ± 0%  -0.28%  (p=0.000 n=30+29)
RegexpMatchHard_1K-4        439µs ± 0%     439µs ± 0%    ~     (p=0.907 n=23+27)
Revcomp-4                  41.9ms ± 1%    41.9ms ± 1%    ~     (p=0.230 n=28+30)
Template-4                  522ms ± 1%     528ms ± 1%  +1.25%  (p=0.000 n=30+30)
TimeParse-4                3.34µs ± 0%    3.35µs ± 0%  +0.23%  (p=0.000 n=30+27)
TimeFormat-4               6.06µs ± 0%    6.13µs ± 0%  +1.08%  (p=0.000 n=29+29)
[Geo mean]                  384µs          382µs       -0.37%

name                     old speed      new speed      delta
GobDecode-4              14.0MB/s ± 5%  13.9MB/s ± 5%    ~     (p=0.308 n=30+30)
GobEncode-4              15.0MB/s ± 1%  14.8MB/s ± 1%  -1.22%  (p=0.000 n=29+30)
Gzip-4                   7.36MB/s ± 0%  7.46MB/s ± 0%  +1.35%  (p=0.000 n=30+30)
Gunzip-4                 62.8MB/s ± 0%  63.0MB/s ± 0%  +0.27%  (p=0.000 n=30+30)
JSONEncode-4             19.2MB/s ± 2%  19.2MB/s ± 2%    ~     (p=0.312 n=29+29)
JSONDecode-4             5.05MB/s ± 3%  5.08MB/s ± 2%    ~     (p=0.356 n=29+30)
GoParse-4                2.56MB/s ± 0%  2.57MB/s ± 0%  +0.39%  (p=0.000 n=23+27)
RegexpMatchEasy0_32-4    42.0MB/s ± 0%  42.6MB/s ± 0%  +1.50%  (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4     236MB/s ± 0%   236MB/s ± 0%  -0.27%  (p=0.000 n=25+28)
RegexpMatchEasy1_32-4    39.6MB/s ± 0%  40.2MB/s ± 0%  +1.73%  (p=0.000 n=27+27)
RegexpMatchEasy1_1K-4     185MB/s ± 0%   185MB/s ± 0%  +0.18%  (p=0.000 n=29+29)
RegexpMatchMedium_32-4    900kB/s ± 0%   920kB/s ± 0%  +2.22%  (p=0.000 n=29+29)
RegexpMatchMedium_1K-4   4.02MB/s ± 0%  4.02MB/s ± 0%  +0.07%  (p=0.004 n=30+27)
RegexpMatchHard_32-4     2.17MB/s ± 0%  2.18MB/s ± 0%  +0.46%  (p=0.000 n=30+26)
RegexpMatchHard_1K-4     2.33MB/s ± 0%  2.33MB/s ± 0%    ~     (all equal)
Revcomp-4                60.6MB/s ± 1%  60.7MB/s ± 1%    ~     (p=0.207 n=28+30)
Template-4               3.72MB/s ± 1%  3.67MB/s ± 1%  -1.23%  (p=0.000 n=30+30)
[Geo mean]               12.9MB/s       12.9MB/s       +0.29%

Change-Id: I07f497f8bb476c950dc555491d00c9066fb64a4e
Reviewed-on: https://go-review.googlesource.com/134232
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-11 14:37:51 +00:00
Wei Xiao
20102594a0 cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures:
  arm
  mips mipsle mips64 mips64le
  ppc64 ppc64le
  s390x

Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02 16:59:27 +00:00
Austin Clements
8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
David Chase
ab48574bab cmd/compile: use Block.Likely to put likely branch first
When a neither of a conditional block's successors follows,
the block must end with a conditional branch followed by a
an unconditional branch.  If the (conditional) branch is
"unlikely", invert it and swap successors to make it
likely instead.

This doesn't matter to most benchmarks on amd64, but in one
instance on amd64 it caused a 30% improvement, and it is
otherwise harmless.  The problematic loop is

		for i := 0; i < w; i++ {
			if pw[i] != 0 {
				return true
			}
		}

compiled under GOEXPERIMENT=preemptibleloops
This the very worst-case benchmark for that experiment.

Also in this CL is a commoning up of heavily-repeated
boilerplate, which made it much easier to see that the
changes were applied correctly.  In the future this should
allow un-exporting of SSAGenState.Branches once the
boilerplate-replacement is done everywhere.

Change-Id: I0e5ded6eeb3ab1e3e0138e12d54c7e056bd99335
Reviewed-on: https://go-review.googlesource.com/104977
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-11 17:35:34 +00:00
Austin Clements
1de1f316df runtime: buffered write barrier for arm
Updates #22460.

Change-Id: I5581df7ad553237db7df3701b117ad99e0593b78
Reviewed-on: https://go-review.googlesource.com/92698
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 16:34:17 +00:00
Ben Shi
78ea9a7129 cmd/compile: optimize MOVBS/MOVBU/MOVHS/MOVHU on ARMv6 and ARMv7
MOVBS/MOVBU/MOVHS/MOVHU can be optimized with a single instruction
on ARMv6 and ARMv7, instead of a pair of left/right shifts.

The benchmark tests show big improvement in special cases and a little
improvement in total.

1. A special case gets about 29% improvement.
name                     old time/op    new time/op    delta
TypePro-4                  3.81ms ± 1%    2.71ms ± 1%  -28.97%  (p=0.000 n=26+25)
The source code of this case can be found at
https://github.com/benshi001/ugo1/blob/master/typepromotion_test.go

2. There is a little improvement in the go1 benchmark, excluding the noise.
name                     old time/op    new time/op    delta
BinaryTree17-4              42.1s ± 3%     42.1s ± 2%    ~     (p=0.883 n=28+30)
Fannkuch11-4                24.3s ± 4%     24.7s ± 7%  +1.64%  (p=0.026 n=30+30)
FmtFprintfEmpty-4           833ns ± 2%     835ns ± 2%    ~     (p=0.371 n=26+28)
FmtFprintfString-4         1.36µs ± 3%    1.35µs ± 1%    ~     (p=0.202 n=26+23)
FmtFprintfInt-4            1.42µs ± 3%    1.43µs ± 1%  +0.66%  (p=0.000 n=26+27)
FmtFprintfIntInt-4         2.10µs ± 1%    2.10µs ± 2%    ~     (p=0.104 n=25+26)
FmtFprintfPrefixedInt-4    2.37µs ± 2%    2.33µs ± 1%  -1.75%  (p=0.000 n=25+28)
FmtFprintfFloat-4          4.50µs ± 0%    4.37µs ± 1%  -2.81%  (p=0.000 n=23+25)
FmtManyArgs-4              8.08µs ± 0%    8.13µs ± 3%    ~     (p=0.160 n=23+26)
GobDecode-4                 102ms ± 4%     103ms ± 4%  +1.08%  (p=0.001 n=28+26)
GobEncode-4                96.0ms ± 2%    95.2ms ± 3%  -0.81%  (p=0.000 n=24+25)
Gzip-4                      4.17s ± 3%     4.11s ± 2%  -1.45%  (p=0.000 n=25+25)
Gunzip-4                    597ms ± 2%     594ms ± 2%  -0.57%  (p=0.000 n=24+26)
HTTPClientServer-4          708µs ± 4%     708µs ± 4%    ~     (p=0.852 n=28+28)
JSONEncode-4                241ms ± 1%     245ms ± 3%  +1.62%  (p=0.000 n=27+28)
JSONDecode-4                906ms ± 3%     889ms ± 3%  -1.85%  (p=0.000 n=23+24)
Mandelbrot200-4            41.8ms ± 1%    41.8ms ± 1%    ~     (p=0.929 n=25+24)
GoParse-4                  47.1ms ± 2%    45.3ms ± 4%  -3.80%  (p=0.000 n=28+24)
RegexpMatchEasy0_32-4      1.27µs ± 2%    1.28µs ± 1%  +0.77%  (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4      8.08µs ± 9%    7.83µs ±10%  -3.10%  (p=0.012 n=26+26)
RegexpMatchEasy1_32-4      1.29µs ± 5%    1.29µs ± 2%    ~     (p=0.301 n=26+29)
RegexpMatchEasy1_1K-4      10.5µs ± 4%    10.3µs ± 5%  -1.95%  (p=0.003 n=26+26)
RegexpMatchMedium_32-4     1.94µs ± 1%    1.95µs ± 1%    ~     (p=0.251 n=24+27)
RegexpMatchMedium_1K-4      502µs ± 2%     502µs ± 2%    ~     (p=0.336 n=25+28)
RegexpMatchHard_32-4       26.7µs ± 1%    26.6µs ± 3%    ~     (p=0.454 n=27+26)
RegexpMatchHard_1K-4        801µs ± 3%     799µs ± 2%    ~     (p=0.097 n=24+26)
Revcomp-4                  73.5ms ± 5%    73.2ms ± 3%    ~     (p=0.240 n=26+26)
Template-4                  1.07s ± 2%     1.05s ± 1%  -2.39%  (p=0.000 n=26+24)
TimeParse-4                6.87µs ± 1%    6.85µs ± 1%    ~     (p=0.094 n=28+23)
TimeFormat-4               13.4µs ± 1%    13.4µs ± 1%    ~     (p=0.664 n=25+29)
[Geo mean]                  717µs          713µs       -0.54%

name                     old speed      new speed      delta
GobDecode-4              7.52MB/s ± 4%  7.44MB/s ± 4%  -1.10%  (p=0.001 n=28+26)
GobEncode-4              7.99MB/s ± 2%  8.06MB/s ± 3%  +0.81%  (p=0.000 n=24+25)
Gzip-4                   4.66MB/s ± 3%  4.72MB/s ± 2%  +1.43%  (p=0.000 n=25+25)
Gunzip-4                 32.5MB/s ± 2%  32.7MB/s ± 2%  +0.56%  (p=0.001 n=24+26)
JSONEncode-4             8.04MB/s ± 1%  7.92MB/s ± 3%  -1.59%  (p=0.000 n=27+28)
JSONDecode-4             2.14MB/s ± 3%  2.18MB/s ± 3%  +1.90%  (p=0.000 n=23+24)
GoParse-4                1.23MB/s ± 3%  1.28MB/s ± 4%  +4.23%  (p=0.000 n=30+24)
RegexpMatchEasy0_32-4    25.2MB/s ± 2%  25.0MB/s ± 1%  -0.76%  (p=0.000 n=26+28)
RegexpMatchEasy0_1K-4     127MB/s ± 8%   131MB/s ± 9%  +3.29%  (p=0.012 n=26+26)
RegexpMatchEasy1_32-4    24.8MB/s ± 5%  24.8MB/s ± 2%    ~     (p=0.339 n=26+29)
RegexpMatchEasy1_1K-4    97.9MB/s ± 4%  99.8MB/s ± 5%  +1.98%  (p=0.004 n=26+26)
RegexpMatchMedium_32-4    514kB/s ± 3%   515kB/s ± 3%    ~     (p=0.391 n=28+28)
RegexpMatchMedium_1K-4   2.04MB/s ± 2%  2.04MB/s ± 2%    ~     (p=0.517 n=25+28)
RegexpMatchHard_32-4     1.20MB/s ± 3%  1.20MB/s ± 3%    ~     (p=0.203 n=28+28)
RegexpMatchHard_1K-4     1.28MB/s ± 3%  1.28MB/s ± 2%    ~     (p=0.499 n=24+26)
Revcomp-4                34.6MB/s ± 4%  34.7MB/s ± 3%    ~     (p=0.245 n=26+26)
Template-4               1.81MB/s ± 2%  1.85MB/s ± 3%  +2.30%  (p=0.000 n=26+25)
[Geo mean]               6.82MB/s       6.88MB/s       +0.84%

fixes #20653

Change-Id: Ief0d6e726e517e51ae511325b21ee72598e759ff
Reviewed-on: https://go-review.googlesource.com/71992
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-26 12:34:02 +00:00
Ben Shi
1ec78d1dd1 cmd/compile: optimize ARM code with CMN/TST/TEQ
CMN/TST/TEQ were supported since ARMv4, which can be used to
simplify comparisons.

This patch implements the optimization and here are the benchmark
results.

1. A special test case got 18.21% improvement.
name                     old time/op    new time/op    delta
TSTTEQ-4                    806µs ± 1%     659µs ± 0%  -18.21%  (p=0.000 n=20+18)
(https://github.com/benshi001/ugo1/blob/master/tstteq_test.go)

2. There is no regression in the compilecmp benchmark.
name        old time/op       new time/op       delta
Template          2.31s ± 1%        2.30s ± 1%    ~     (p=0.661 n=10+9)
Unicode           1.32s ± 3%        1.32s ± 5%    ~     (p=0.280 n=10+10)
GoTypes           7.69s ± 1%        7.65s ± 0%  -0.52%  (p=0.027 n=10+8)
Compiler          36.5s ± 1%        36.4s ± 1%    ~     (p=0.546 n=9+9)
SSA               85.1s ± 2%        84.9s ± 1%    ~     (p=0.529 n=10+10)
Flate             1.43s ± 2%        1.43s ± 2%    ~     (p=0.661 n=10+9)
GoParser          1.81s ± 2%        1.81s ± 1%    ~     (p=0.796 n=10+10)
Reflect           5.10s ± 2%        5.09s ± 1%    ~     (p=0.853 n=10+10)
Tar               2.47s ± 1%        2.48s ± 1%    ~     (p=0.123 n=10+10)
XML               2.59s ± 1%        2.58s ± 1%    ~     (p=0.853 n=10+10)
[Geo mean]        4.78s             4.77s       -0.17%

name        old user-time/op  new user-time/op  delta
Template          2.72s ± 3%        2.73s ± 2%    ~     (p=0.928 n=10+10)
Unicode           1.58s ± 4%        1.60s ± 1%    ~     (p=0.087 n=10+9)
GoTypes           9.41s ± 2%        9.36s ± 1%    ~     (p=0.060 n=10+10)
Compiler          44.4s ± 2%        44.2s ± 2%    ~     (p=0.289 n=10+10)
SSA                110s ± 2%         110s ± 1%    ~     (p=0.739 n=10+10)
Flate             1.67s ± 2%        1.63s ± 3%    ~     (p=0.063 n=10+10)
GoParser          2.12s ± 1%        2.12s ± 2%    ~     (p=0.840 n=10+10)
Reflect           5.94s ± 1%        5.98s ± 1%    ~     (p=0.063 n=9+10)
Tar               3.01s ± 2%        3.02s ± 2%    ~     (p=0.584 n=10+10)
XML               3.04s ± 3%        3.02s ± 2%    ~     (p=0.696 n=10+10)
[Geo mean]        5.73s             5.72s       -0.20%

name        old text-bytes    new text-bytes    delta
HelloSize         579kB ± 0%        579kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.8kB ± 0%       72.8kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. There is little change in the go1 benchmark (excluding the noise).
name                     old time/op    new time/op     delta
BinaryTree17-4              40.3s ± 1%      40.6s ± 1%  +0.80%  (p=0.000 n=30+30)
Fannkuch11-4                24.2s ± 1%      24.1s ± 0%    ~     (p=0.093 n=30+30)
FmtFprintfEmpty-4           834ns ± 0%      826ns ± 0%  -0.93%  (p=0.000 n=29+24)
FmtFprintfString-4         1.39µs ± 1%     1.36µs ± 0%  -2.02%  (p=0.000 n=30+30)
FmtFprintfInt-4            1.43µs ± 1%     1.44µs ± 1%    ~     (p=0.155 n=30+29)
FmtFprintfIntInt-4         2.09µs ± 0%     2.11µs ± 0%  +1.16%  (p=0.000 n=28+30)
FmtFprintfPrefixedInt-4    2.33µs ± 1%     2.36µs ± 0%  +1.25%  (p=0.000 n=30+30)
FmtFprintfFloat-4          4.27µs ± 1%     4.32µs ± 1%  +1.27%  (p=0.000 n=30+30)
FmtManyArgs-4              8.18µs ± 0%     8.14µs ± 0%  -0.46%  (p=0.000 n=25+27)
GobDecode-4                 101ms ± 1%      101ms ± 1%    ~     (p=0.182 n=29+29)
GobEncode-4                89.6ms ± 1%     87.8ms ± 2%  -2.02%  (p=0.000 n=30+29)
Gzip-4                      4.07s ± 1%      4.08s ± 1%    ~     (p=0.173 n=30+27)
Gunzip-4                    602ms ± 1%      600ms ± 1%  -0.29%  (p=0.000 n=29+28)
HTTPClientServer-4          679µs ± 4%      683µs ± 3%    ~     (p=0.197 n=30+30)
JSONEncode-4                241ms ± 1%      239ms ± 1%  -0.84%  (p=0.000 n=30+30)
JSONDecode-4                903ms ± 1%      882ms ± 1%  -2.33%  (p=0.000 n=30+30)
Mandelbrot200-4            41.8ms ± 0%     41.8ms ± 0%    ~     (p=0.719 n=30+30)
GoParse-4                  45.5ms ± 1%     45.8ms ± 1%  +0.52%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4      1.27µs ± 1%     1.27µs ± 0%  -0.60%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4      7.77µs ± 6%     7.69µs ± 4%  -0.96%  (p=0.040 n=30+30)
RegexpMatchEasy1_32-4      1.29µs ± 1%     1.28µs ± 1%  -0.54%  (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4      10.3µs ± 6%     10.2µs ± 3%    ~     (p=0.453 n=30+27)
RegexpMatchMedium_32-4     1.98µs ± 1%     2.00µs ± 1%  +0.85%  (p=0.000 n=30+29)
RegexpMatchMedium_1K-4      503µs ± 0%      503µs ± 1%    ~     (p=0.752 n=30+30)
RegexpMatchHard_32-4       27.1µs ± 1%     26.5µs ± 0%  -1.96%  (p=0.000 n=30+24)
RegexpMatchHard_1K-4        809µs ± 1%      799µs ± 1%  -1.29%  (p=0.000 n=29+30)
Revcomp-4                  67.3ms ± 2%     67.2ms ± 1%    ~     (p=0.265 n=29+29)
Template-4                  1.08s ± 1%      1.07s ± 0%  -1.39%  (p=0.000 n=30+22)
TimeParse-4                6.93µs ± 1%     6.96µs ± 1%  +0.40%  (p=0.005 n=30+30)
TimeFormat-4               13.3µs ± 0%     13.3µs ± 1%    ~     (p=0.734 n=30+30)
[Geo mean]                  709µs           707µs       -0.32%

name                     old speed      new speed       delta
GobDecode-4              7.59MB/s ± 1%   7.57MB/s ± 1%    ~     (p=0.145 n=29+29)
GobEncode-4              8.56MB/s ± 1%   8.74MB/s ± 1%  +2.07%  (p=0.000 n=30+29)
Gzip-4                   4.76MB/s ± 1%   4.75MB/s ± 1%  -0.25%  (p=0.037 n=30+30)
Gunzip-4                 32.2MB/s ± 1%   32.3MB/s ± 1%  +0.29%  (p=0.000 n=29+28)
JSONEncode-4             8.04MB/s ± 1%   8.11MB/s ± 1%  +0.85%  (p=0.000 n=30+30)
JSONDecode-4             2.15MB/s ± 1%   2.20MB/s ± 1%  +2.29%  (p=0.000 n=30+30)
GoParse-4                1.27MB/s ± 1%   1.26MB/s ± 1%  -0.73%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4    25.1MB/s ± 1%   25.3MB/s ± 0%  +0.61%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4     131MB/s ± 6%    133MB/s ± 4%  +1.35%  (p=0.009 n=28+30)
RegexpMatchEasy1_32-4    24.9MB/s ± 1%   25.0MB/s ± 1%  +0.54%  (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4    99.2MB/s ± 6%  100.2MB/s ± 3%    ~     (p=0.448 n=30+27)
RegexpMatchMedium_32-4    503kB/s ± 1%    500kB/s ± 0%  -0.66%  (p=0.002 n=30+24)
RegexpMatchMedium_1K-4   2.04MB/s ± 0%   2.04MB/s ± 1%    ~     (p=0.358 n=30+30)
RegexpMatchHard_32-4     1.18MB/s ± 1%   1.20MB/s ± 1%  +1.75%  (p=0.000 n=30+30)
RegexpMatchHard_1K-4     1.26MB/s ± 1%   1.28MB/s ± 1%  +1.42%  (p=0.000 n=30+30)
Revcomp-4                37.8MB/s ± 2%   37.8MB/s ± 1%    ~     (p=0.266 n=29+29)
Template-4               1.80MB/s ± 1%   1.82MB/s ± 1%  +1.46%  (p=0.000 n=30+30)
[Geo mean]               6.91MB/s        6.96MB/s       +0.70%

fixes #21583

Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214
Reviewed-on: https://go-review.googlesource.com/67490
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-11 14:03:00 +00:00
Cherry Zhang
6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
Ben Shi
9732485851 cmd/compile: optimized ARM code with BFX/BFXU
BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is
more efficiently than a pair of left-shift/right-shift in bit
field extraction.

This patch implements this optimization. And the benchmark tests
show big improvement in special cases and little change in total.

1. There is big improvement in a special test case.
name                     old time/op    new time/op    delta
BFX-4                       665µs ± 1%     595µs ± 0%  -10.61%  (p=0.000 n=20+20)
(The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go)

2. The compilecmp benchmark shows no regression.
name        old time/op       new time/op       delta
Template          2.33s ± 2%        2.34s ± 2%    ~     (p=0.356 n=9+10)
Unicode           1.32s ± 2%        1.30s ± 2%    ~     (p=0.139 n=9+8)
GoTypes           7.77s ± 1%        7.76s ± 1%    ~     (p=0.780 n=10+9)
Compiler          37.3s ± 1%        37.1s ± 1%    ~     (p=0.211 n=10+9)
SSA               84.3s ± 2%        84.3s ± 2%    ~     (p=0.842 n=10+9)
Flate             1.45s ± 1%        1.45s ± 3%    ~     (p=0.853 n=10+10)
GoParser          1.83s ± 2%        1.83s ± 2%    ~     (p=0.739 n=10+10)
Reflect           5.08s ± 2%        5.09s ± 2%    ~     (p=0.720 n=9+10)
Tar               2.44s ± 1%        2.44s ± 2%    ~     (p=0.684 n=10+10)
XML               2.62s ± 2%        2.62s ± 2%    ~     (p=0.529 n=10+10)
[Geo mean]        4.80s             4.79s       -0.06%

name        old user-time/op  new user-time/op  delta
Template          2.76s ± 2%        2.75s ± 3%    ~     (p=0.893 n=10+10)
Unicode           1.63s ± 1%        1.60s ± 1%  -2.07%  (p=0.000 n=8+9)
GoTypes           9.54s ± 1%        9.52s ± 1%    ~     (p=0.215 n=10+10)
Compiler          46.0s ± 1%        46.0s ± 1%    ~     (p=0.853 n=10+10)
SSA                110s ± 1%         110s ± 1%    ~     (p=0.838 n=10+10)
Flate             1.69s ± 3%        1.69s ± 5%    ~     (p=0.957 n=10+10)
GoParser          2.15s ± 2%        2.15s ± 2%    ~     (p=0.749 n=10+10)
Reflect           6.03s ± 1%        5.99s ± 2%    ~     (p=0.060 n=9+10)
Tar               3.02s ± 2%        2.99s ± 2%    ~     (p=0.214 n=10+10)
XML               3.10s ± 2%        3.08s ± 2%    ~     (p=0.732 n=9+10)
[Geo mean]        5.82s             5.79s       -0.41%

name        old text-bytes    new text-bytes    delta
HelloSize         589kB ± 0%        589kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        76.9kB ± 0%       76.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark shows little change in total. (excluding noise)
name                     old time/op    new time/op    delta
BinaryTree17-4              41.5s ± 1%     41.6s ± 1%    ~     (p=0.373 n=30+26)
Fannkuch11-4                23.6s ± 1%     23.6s ± 1%  +0.28%  (p=0.003 n=29+30)
FmtFprintfEmpty-4           826ns ± 1%     827ns ± 1%    ~     (p=0.155 n=30+30)
FmtFprintfString-4         1.35µs ± 1%    1.35µs ± 1%    ~     (p=0.499 n=30+30)
FmtFprintfInt-4            1.43µs ± 1%    1.41µs ± 1%  -1.19%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         2.15µs ± 1%    2.11µs ± 1%  -1.78%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    2.21µs ± 1%    2.21µs ± 1%    ~     (p=0.881 n=30+30)
FmtFprintfFloat-4          4.41µs ± 1%    4.44µs ± 0%  +0.64%  (p=0.000 n=30+30)
FmtManyArgs-4              8.06µs ± 1%    8.06µs ± 0%    ~     (p=0.871 n=30+30)
GobDecode-4                 103ms ± 1%     104ms ± 2%  +0.54%  (p=0.013 n=28+29)
GobEncode-4                92.4ms ± 1%    92.6ms ± 1%    ~     (p=0.447 n=30+29)
Gzip-4                      4.17s ± 1%     4.06s ± 1%  -2.56%  (p=0.000 n=29+30)
Gunzip-4                    603ms ± 1%     602ms ± 1%    ~     (p=0.423 n=30+30)
HTTPClientServer-4          688µs ± 2%     674µs ± 3%  -2.09%  (p=0.000 n=29+30)
JSONEncode-4                237ms ± 1%     237ms ± 1%    ~     (p=0.061 n=29+30)
JSONDecode-4                907ms ± 1%     910ms ± 1%    ~     (p=0.061 n=30+30)
Mandelbrot200-4            41.7ms ± 0%    41.7ms ± 0%  +0.19%  (p=0.000 n=24+20)
GoParse-4                  45.7ms ± 2%    45.5ms ± 2%  -0.29%  (p=0.005 n=30+30)
RegexpMatchEasy0_32-4      1.27µs ± 0%    1.27µs ± 0%  +0.12%  (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4      7.77µs ± 4%    7.73µs ± 3%    ~     (p=0.169 n=30+30)
RegexpMatchEasy1_32-4      1.29µs ± 1%    1.29µs ± 1%    ~     (p=0.126 n=30+30)
RegexpMatchEasy1_1K-4      10.4µs ± 3%    10.3µs ± 2%  -1.32%  (p=0.004 n=30+29)
RegexpMatchMedium_32-4     2.06µs ± 0%    2.06µs ± 0%    ~     (p=0.071 n=30+30)
RegexpMatchMedium_1K-4      531µs ± 1%     530µs ± 0%    ~     (p=0.121 n=30+23)
RegexpMatchHard_32-4       28.7µs ± 1%    28.6µs ± 1%  -0.21%  (p=0.001 n=30+27)
RegexpMatchHard_1K-4        860µs ± 1%     857µs ± 1%    ~     (p=0.105 n=30+27)
Revcomp-4                  67.3ms ± 2%    67.3ms ± 2%    ~     (p=0.805 n=29+29)
Template-4                  1.08s ± 1%     1.08s ± 1%    ~     (p=0.260 n=30+30)
TimeParse-4                7.04µs ± 0%    7.04µs ± 0%    ~     (p=0.315 n=30+30)
TimeFormat-4               13.2µs ± 1%    13.2µs ± 1%    ~     (p=0.077 n=30+30)
[Geo mean]                  715µs          713µs       -0.30%

name                     old speed      new speed      delta
GobDecode-4              7.42MB/s ± 1%  7.38MB/s ± 2%  -0.54%  (p=0.011 n=28+29)
GobEncode-4              8.30MB/s ± 1%  8.29MB/s ± 1%    ~     (p=0.484 n=30+29)
Gzip-4                   4.65MB/s ± 2%  4.78MB/s ± 1%  +2.73%  (p=0.000 n=30+30)
Gunzip-4                 32.2MB/s ± 1%  32.2MB/s ± 1%    ~     (p=0.357 n=30+30)
JSONEncode-4             8.18MB/s ± 1%  8.19MB/s ± 1%    ~     (p=0.052 n=29+30)
JSONDecode-4             2.14MB/s ± 1%  2.13MB/s ± 1%    ~     (p=0.074 n=30+29)
GoParse-4                1.27MB/s ± 1%  1.27MB/s ± 2%    ~     (p=0.618 n=24+30)
RegexpMatchEasy0_32-4    25.2MB/s ± 0%  25.2MB/s ± 0%  -0.12%  (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4     132MB/s ± 5%   132MB/s ± 2%    ~     (p=0.171 n=30+30)
RegexpMatchEasy1_32-4    24.8MB/s ± 1%  24.9MB/s ± 1%    ~     (p=0.106 n=30+30)
RegexpMatchEasy1_1K-4    98.4MB/s ± 3%  99.6MB/s ± 4%  +1.19%  (p=0.011 n=30+30)
RegexpMatchMedium_32-4    483kB/s ± 1%   484kB/s ± 1%    ~     (p=0.426 n=30+30)
RegexpMatchMedium_1K-4   1.93MB/s ± 1%  1.93MB/s ± 0%    ~     (p=0.157 n=30+17)
RegexpMatchHard_32-4     1.12MB/s ± 1%  1.12MB/s ± 0%  +0.33%  (p=0.001 n=30+24)
RegexpMatchHard_1K-4     1.19MB/s ± 1%  1.19MB/s ± 1%    ~     (p=0.290 n=30+30)
Revcomp-4                37.8MB/s ± 2%  37.8MB/s ± 1%    ~     (p=0.815 n=29+29)
Template-4               1.80MB/s ± 1%  1.80MB/s ± 1%    ~     (p=0.586 n=30+30)
[Geo mean]               6.80MB/s       6.81MB/s       +0.25%

fixes #20966

Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5
Reviewed-on: https://go-review.googlesource.com/64950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-21 12:41:04 +00:00
Keith Randall
1787ced894 cmd/compile: remove Symbol wrappers from Aux fields
We used to have {Arg,Auto,Extern}Symbol structs with which we wrapped
a *gc.Node or *obj.LSym before storing them in the Aux field
of an ssa.Value.  This let the SSA part of the compiler distinguish
between autos and args, for example.  We no longer need the wrappers
as we can query the underlying objects directly.

There was also some sloppy usage, where VarDef had a *gc.Node
directly in its Aux field, whereas the use of that variable had
that *gc.Node wrapped in an AutoSymbol. Thus the Aux fields didn't
match (using ==) when they probably should.
This sloppy usage cleanup is the only thing in the CL that changes the
generated code - we can get rid of some more unused auto variables if
the matching happens reliably.

Removing this wrapper also lets us get rid of the varsyms cache
(which was used to prevent wrapping the same *gc.Node twice).

Change-Id: I0dedf8f82f84bfee413d310342b777316bd1d478
Reviewed-on: https://go-review.googlesource.com/64452
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-19 22:03:10 +00:00
Ben Shi
a07176b45a cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD
The go compiler can generate better ARM code with those more
efficient FP instructions. And there is little improvement
in total but big improvement in special cases.

1. The size of pkg/linux_arm/math.a shrinks by 2.4%.

2. there is neither improvement nor regression in compilecmp benchmark.
name        old time/op       new time/op       delta
Template          2.32s ± 2%        2.32s ± 1%    ~     (p=1.000 n=9+10)
Unicode           1.32s ± 4%        1.32s ± 4%    ~     (p=0.912 n=10+10)
GoTypes           7.76s ± 1%        7.79s ± 1%    ~     (p=0.447 n=9+10)
Compiler          37.4s ± 2%        37.2s ± 2%    ~     (p=0.218 n=10+10)
SSA               84.8s ± 2%        85.0s ± 1%    ~     (p=0.604 n=10+9)
Flate             1.45s ± 2%        1.44s ± 2%    ~     (p=0.075 n=10+10)
GoParser          1.82s ± 1%        1.81s ± 1%    ~     (p=0.190 n=10+10)
Reflect           5.06s ± 1%        5.05s ± 1%    ~     (p=0.315 n=10+9)
Tar               2.37s ± 1%        2.37s ± 2%    ~     (p=0.912 n=10+10)
XML               2.56s ± 1%        2.58s ± 2%    ~     (p=0.089 n=10+10)
[Geo mean]        4.77s             4.77s       -0.08%

name        old user-time/op  new user-time/op  delta
Template          2.74s ± 2%        2.75s ± 2%    ~     (p=0.856 n=9+10)
Unicode           1.61s ± 4%        1.62s ± 3%    ~     (p=0.693 n=10+10)
GoTypes           9.55s ± 1%        9.49s ± 2%    ~     (p=0.056 n=9+10)
Compiler          45.9s ± 1%        45.8s ± 1%    ~     (p=0.345 n=9+10)
SSA                110s ± 1%         110s ± 1%    ~     (p=0.763 n=9+10)
Flate             1.68s ± 2%        1.68s ± 3%    ~     (p=0.616 n=10+10)
GoParser          2.14s ± 4%        2.14s ± 1%    ~     (p=0.825 n=10+9)
Reflect           5.95s ± 1%        5.97s ± 3%    ~     (p=0.951 n=9+10)
Tar               2.94s ± 3%        2.93s ± 2%    ~     (p=0.359 n=10+10)
XML               3.03s ± 3%        3.07s ± 6%    ~     (p=0.166 n=10+10)
[Geo mean]        5.76s             5.77s       +0.12%

name        old text-bytes    new text-bytes    delta
HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The performance of Mandelbrot200 improves 15%, though little
   improvement in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.7s ± 1%     41.7s ± 1%     ~     (p=0.264 n=29+23)
Fannkuch11-4                24.2s ± 0%     24.1s ± 1%   -0.13%  (p=0.050 n=30+30)
FmtFprintfEmpty-4           826ns ± 1%     824ns ± 1%   -0.24%  (p=0.038 n=25+30)
FmtFprintfString-4         1.38µs ± 1%    1.38µs ± 0%   -0.42%  (p=0.000 n=27+25)
FmtFprintfInt-4            1.46µs ± 1%    1.46µs ± 0%     ~     (p=0.060 n=30+23)
FmtFprintfIntInt-4         2.11µs ± 1%    2.08µs ± 0%   -1.04%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    2.23µs ± 1%    2.22µs ± 1%   -0.51%  (p=0.000 n=30+30)
FmtFprintfFloat-4          4.49µs ± 1%    4.48µs ± 1%   -0.22%  (p=0.004 n=26+30)
FmtManyArgs-4              8.06µs ± 1%    8.12µs ± 1%   +0.68%  (p=0.000 n=25+30)
GobDecode-4                 104ms ± 1%     104ms ± 2%     ~     (p=0.362 n=29+29)
GobEncode-4                92.9ms ± 1%    92.8ms ± 2%     ~     (p=0.786 n=30+30)
Gzip-4                      4.12s ± 1%     4.12s ± 1%     ~     (p=0.314 n=30+30)
Gunzip-4                    602ms ± 1%     603ms ± 1%     ~     (p=0.164 n=30+30)
HTTPClientServer-4          659µs ± 1%     655µs ± 2%   -0.64%  (p=0.006 n=25+28)
JSONEncode-4                234ms ± 1%     235ms ± 1%   +0.29%  (p=0.050 n=30+30)
JSONDecode-4                912ms ± 0%     911ms ± 0%     ~     (p=0.385 n=18+24)
Mandelbrot200-4            49.2ms ± 0%    41.7ms ± 0%  -15.35%  (p=0.000 n=25+27)
GoParse-4                  46.3ms ± 1%    46.3ms ± 2%     ~     (p=0.572 n=30+30)
RegexpMatchEasy0_32-4      1.29µs ± 1%    1.27µs ± 0%   -1.59%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4      7.62µs ± 4%    7.71µs ± 3%     ~     (p=0.074 n=30+30)
RegexpMatchEasy1_32-4      1.31µs ± 0%    1.30µs ± 1%   -0.71%  (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.3µs ± 5%     ~     (p=0.105 n=30+30)
RegexpMatchMedium_32-4     2.06µs ± 1%    2.06µs ± 1%     ~     (p=0.100 n=30+30)
RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%     ~     (p=0.254 n=29+30)
RegexpMatchHard_32-4       28.9µs ± 0%    28.9µs ± 0%     ~     (p=0.154 n=30+30)
RegexpMatchHard_1K-4        868µs ± 1%     867µs ± 0%     ~     (p=0.729 n=30+23)
Revcomp-4                  66.9ms ± 1%    67.2ms ± 2%     ~     (p=0.102 n=28+29)
Template-4                  1.07s ± 1%     1.06s ± 1%   -0.53%  (p=0.000 n=30+30)
TimeParse-4                7.07µs ± 1%    7.01µs ± 0%   -0.85%  (p=0.000 n=30+25)
TimeFormat-4               13.1µs ± 0%    13.2µs ± 1%   +0.77%  (p=0.000 n=27+27)
[Geo mean]                  721µs          716µs        -0.70%

name                     old speed      new speed      delta
GobDecode-4              7.38MB/s ± 1%  7.37MB/s ± 2%     ~     (p=0.399 n=29+29)
GobEncode-4              8.26MB/s ± 1%  8.27MB/s ± 2%     ~     (p=0.790 n=30+30)
Gzip-4                   4.71MB/s ± 1%  4.71MB/s ± 1%     ~     (p=0.885 n=30+30)
Gunzip-4                 32.2MB/s ± 1%  32.2MB/s ± 1%     ~     (p=0.190 n=30+30)
JSONEncode-4             8.28MB/s ± 1%  8.25MB/s ± 1%     ~     (p=0.053 n=30+30)
JSONDecode-4             2.13MB/s ± 0%  2.12MB/s ± 1%     ~     (p=0.072 n=18+30)
GoParse-4                1.25MB/s ± 1%  1.25MB/s ± 2%     ~     (p=0.863 n=30+30)
RegexpMatchEasy0_32-4    24.8MB/s ± 0%  25.2MB/s ± 1%   +1.61%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4     134MB/s ± 4%   133MB/s ± 3%     ~     (p=0.074 n=30+30)
RegexpMatchEasy1_32-4    24.5MB/s ± 0%  24.6MB/s ± 1%   +0.72%  (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  99.8MB/s ± 5%     ~     (p=0.105 n=30+30)
RegexpMatchMedium_32-4    483kB/s ± 1%   487kB/s ± 1%   +0.83%  (p=0.002 n=30+30)
RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%     ~     (p=0.058 n=30+30)
RegexpMatchHard_32-4     1.10MB/s ± 0%  1.11MB/s ± 0%     ~     (p=0.804 n=30+30)
RegexpMatchHard_1K-4     1.18MB/s ± 0%  1.18MB/s ± 0%     ~     (all equal)
Revcomp-4                38.0MB/s ± 1%  37.8MB/s ± 2%     ~     (p=0.098 n=28+29)
Template-4               1.82MB/s ± 1%  1.83MB/s ± 1%   +0.55%  (p=0.000 n=29+29)
[Geo mean]               6.79MB/s       6.79MB/s        +0.09%

Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b
Reviewed-on: https://go-review.googlesource.com/63770
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-15 22:30:34 +00:00
Ben Shi
2899c3e8cb cmd/compile: optimize ARM code with NMULF/NMULD
NMULF and NMULD are efficient FP instructions, and the go compiler can
use them to generate better code.

The benchmark tests of my patch did not show general change, but big
improvement in special cases.

1.A special test case improved 12.6%.
https://github.com/benshi001/ugo1/blob/master/fpmul_test.go
name                     old time/op    new time/op    delta
FPMul-4                     398µs ± 1%     348µs ± 1%  -12.64%  (p=0.000 n=40+40)

2. the compilecmp test showed little change.
name        old time/op       new time/op       delta
Template          2.30s ± 1%        2.31s ± 1%    ~     (p=0.754 n=17+19)
Unicode           1.31s ± 3%        1.32s ± 5%    ~     (p=0.265 n=20+20)
GoTypes           7.73s ± 2%        7.73s ± 1%    ~     (p=0.925 n=20+20)
Compiler          37.0s ± 1%        37.3s ± 2%  +0.79%  (p=0.002 n=19+20)
SSA               83.8s ± 4%        83.5s ± 2%    ~     (p=0.964 n=20+17)
Flate             1.43s ± 2%        1.44s ± 1%    ~     (p=0.602 n=20+20)
GoParser          1.82s ± 2%        1.81s ± 2%    ~     (p=0.141 n=19+20)
Reflect           5.08s ± 2%        5.08s ± 3%    ~     (p=0.835 n=20+19)
Tar               2.36s ± 1%        2.35s ± 1%    ~     (p=0.195 n=18+17)
XML               2.57s ± 2%        2.56s ± 1%    ~     (p=0.283 n=20+17)
[Geo mean]        4.74s             4.75s       +0.05%

name        old user-time/op  new user-time/op  delta
Template          2.75s ± 2%        2.75s ± 0%    ~     (p=0.620 n=20+15)
Unicode           1.59s ± 4%        1.60s ± 4%    ~     (p=0.479 n=20+19)
GoTypes           9.48s ± 1%        9.47s ± 1%    ~     (p=0.743 n=20+20)
Compiler          45.7s ± 1%        45.7s ± 1%    ~     (p=0.482 n=19+20)
SSA                109s ± 1%         109s ± 2%    ~     (p=0.800 n=18+20)
Flate             1.67s ± 3%        1.67s ± 3%    ~     (p=0.598 n=19+18)
GoParser          2.15s ± 4%        2.13s ± 3%    ~     (p=0.153 n=20+20)
Reflect           5.95s ± 2%        5.95s ± 2%    ~     (p=0.961 n=19+20)
Tar               2.93s ± 2%        2.92s ± 3%    ~     (p=0.242 n=20+19)
XML               3.02s ± 3%        3.04s ± 3%    ~     (p=0.233 n=19+18)
[Geo mean]        5.74s             5.74s       -0.04%

name        old text-bytes    new text-bytes    delta
HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark showed little change in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.8s ± 1%     41.8s ± 1%    ~     (p=0.388 n=40+39)
Fannkuch11-4                24.1s ± 1%     24.1s ± 1%    ~     (p=0.077 n=40+40)
FmtFprintfEmpty-4           834ns ± 1%     831ns ± 1%  -0.31%  (p=0.002 n=40+37)
FmtFprintfString-4         1.34µs ± 1%    1.34µs ± 0%    ~     (p=0.387 n=40+40)
FmtFprintfInt-4            1.44µs ± 1%    1.44µs ± 1%    ~     (p=0.421 n=40+40)
FmtFprintfIntInt-4         2.09µs ± 0%    2.09µs ± 1%    ~     (p=0.589 n=40+39)
FmtFprintfPrefixedInt-4    2.32µs ± 1%    2.33µs ± 1%  +0.15%  (p=0.001 n=40+40)
FmtFprintfFloat-4          4.51µs ± 0%    4.44µs ± 1%  -1.50%  (p=0.000 n=40+40)
FmtManyArgs-4              7.94µs ± 0%    7.97µs ± 0%  +0.36%  (p=0.001 n=32+40)
GobDecode-4                 104ms ± 1%     102ms ± 2%  -1.27%  (p=0.000 n=39+37)
GobEncode-4                90.5ms ± 1%    90.9ms ± 2%  +0.40%  (p=0.006 n=37+40)
Gzip-4                      4.10s ± 2%     4.08s ± 1%  -0.30%  (p=0.004 n=40+40)
Gunzip-4                    603ms ± 0%     602ms ± 1%    ~     (p=0.303 n=37+40)
HTTPClientServer-4          672µs ± 3%     658µs ± 2%  -2.08%  (p=0.000 n=39+37)
JSONEncode-4                238ms ± 1%     239ms ± 0%  +0.26%  (p=0.001 n=40+25)
JSONDecode-4                884ms ± 1%     885ms ± 1%  +0.16%  (p=0.012 n=40+40)
Mandelbrot200-4            49.3ms ± 0%    49.3ms ± 0%    ~     (p=0.588 n=40+38)
GoParse-4                  46.3ms ± 1%    46.4ms ± 2%    ~     (p=0.487 n=40+40)
RegexpMatchEasy0_32-4      1.28µs ± 1%    1.28µs ± 0%  +0.12%  (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4      7.78µs ± 5%    7.78µs ± 4%    ~     (p=0.825 n=40+40)
RegexpMatchEasy1_32-4      1.29µs ± 1%    1.29µs ± 0%    ~     (p=0.659 n=40+40)
RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.4µs ± 2%    ~     (p=0.266 n=40+40)
RegexpMatchMedium_32-4     2.05µs ± 1%    2.05µs ± 0%  -0.18%  (p=0.002 n=40+28)
RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%    ~     (p=0.397 n=37+40)
RegexpMatchHard_32-4       28.9µs ± 1%    28.9µs ± 1%  -0.22%  (p=0.002 n=40+40)
RegexpMatchHard_1K-4        868µs ± 1%     870µs ± 1%  +0.21%  (p=0.015 n=40+40)
Revcomp-4                  67.3ms ± 1%    67.2ms ± 2%    ~     (p=0.262 n=38+39)
Template-4                  1.07s ± 1%     1.07s ± 1%    ~     (p=0.276 n=40+40)
TimeParse-4                7.16µs ± 1%    7.16µs ± 1%    ~     (p=0.610 n=39+40)
TimeFormat-4               13.3µs ± 1%    13.3µs ± 1%    ~     (p=0.617 n=38+40)
[Geo mean]                  720µs          719µs       -0.13%

name                     old speed      new speed      delta
GobDecode-4              7.39MB/s ± 1%  7.49MB/s ± 2%  +1.25%  (p=0.000 n=39+38)
GobEncode-4              8.48MB/s ± 1%  8.45MB/s ± 2%  -0.40%  (p=0.005 n=37+40)
Gzip-4                   4.74MB/s ± 2%  4.75MB/s ± 1%  +0.30%  (p=0.018 n=40+40)
Gunzip-4                 32.2MB/s ± 0%  32.2MB/s ± 1%    ~     (p=0.272 n=36+40)
JSONEncode-4             8.15MB/s ± 1%  8.13MB/s ± 0%  -0.26%  (p=0.003 n=40+25)
JSONDecode-4             2.19MB/s ± 1%  2.19MB/s ± 1%    ~     (p=0.676 n=40+40)
GoParse-4                1.25MB/s ± 2%  1.25MB/s ± 2%    ~     (p=0.823 n=40+40)
RegexpMatchEasy0_32-4    25.1MB/s ± 1%  25.1MB/s ± 0%  -0.12%  (p=0.006 n=40+40)
RegexpMatchEasy0_1K-4     132MB/s ± 5%   132MB/s ± 5%    ~     (p=0.821 n=40+40)
RegexpMatchEasy1_32-4    24.7MB/s ± 1%  24.7MB/s ± 0%    ~     (p=0.630 n=40+40)
RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  98.8MB/s ± 2%    ~     (p=0.268 n=40+40)
RegexpMatchMedium_32-4    487kB/s ± 2%   490kB/s ± 0%  +0.51%  (p=0.001 n=40+40)
RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%    ~     (p=0.208 n=39+40)
RegexpMatchHard_32-4     1.11MB/s ± 1%  1.11MB/s ± 0%  +0.36%  (p=0.000 n=40+33)
RegexpMatchHard_1K-4     1.18MB/s ± 1%  1.18MB/s ± 1%    ~     (p=0.207 n=40+37)
Revcomp-4                37.8MB/s ± 1%  37.8MB/s ± 2%    ~     (p=0.276 n=38+39)
Template-4               1.82MB/s ± 1%  1.81MB/s ± 1%    ~     (p=0.122 n=38+40)
[Geo mean]               6.81MB/s       6.81MB/s       +0.06%

fixes #19843

Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59
Reviewed-on: https://go-review.googlesource.com/61150
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 13:10:33 +00:00
Ben Shi
64607dbd26 cmd/compile: optimize ARM with MULS
MULS was introduced in ARMv7 and corresponding to MULA. This patch
duplicated all MULA related SSA rules with MULS.

Here was the contrast test result against the original go compiler.
There was no improvement in total, but big improvement in special cases.

1. A specific test case accelerated 18.62%.
(https://github.com/benshi001/ugo1/blob/master/mulsub_test.go)
name                     old time/op    new time/op    delta
MulSub-4                    270µs ± 0%     219µs ± 0%  -18.62%  (p=0.000 n=35+40)

2. Total size of all .a files in pkg/ shrank by 0.002%.

3. The compilecmp benchmark showed no decline.
name        old time/op       new time/op       delta
Template          2.37s ± 3%        2.36s ± 1%    ~     (p=0.233 n=19+18)
Unicode           1.32s ± 2%        1.34s ± 5%  +1.32%  (p=0.011 n=20+18)
GoTypes           7.88s ± 1%        7.87s ± 1%    ~     (p=0.758 n=20+20)
Compiler          37.5s ± 1%        37.6s ± 1%    ~     (p=0.194 n=20+19)
SSA               83.7s ± 2%        83.5s ± 2%    ~     (p=0.569 n=20+19)
Flate             1.46s ± 3%        1.45s ± 1%    ~     (p=0.619 n=20+17)
GoParser          1.87s ± 2%        1.85s ± 1%  -0.58%  (p=0.048 n=20+18)
Reflect           5.10s ± 2%        5.11s ± 2%    ~     (p=0.365 n=19+20)
Tar               1.78s ± 2%        1.78s ± 2%    ~     (p=0.531 n=19+20)
XML               2.62s ± 1%        2.61s ± 2%    ~     (p=0.057 n=17+19)
[Geo mean]        4.68s             4.67s       -0.07%

name        old user-time/op  new user-time/op  delta
Template          2.80s ± 1%        2.79s ± 2%    ~     (p=0.686 n=17+20)
Unicode           1.61s ± 4%        1.63s ± 6%    ~     (p=0.222 n=20+20)
GoTypes           9.59s ± 1%        9.60s ± 1%    ~     (p=0.482 n=17+20)
Compiler          46.1s ± 1%        46.2s ± 1%    ~     (p=0.373 n=20+18)
SSA                108s ± 1%         108s ± 2%    ~     (p=0.784 n=20+20)
Flate             1.68s ± 3%        1.69s ± 3%    ~     (p=0.335 n=20+19)
GoParser          2.20s ± 4%        2.19s ± 2%    ~     (p=0.844 n=20+18)
Reflect           5.97s ± 3%        6.01s ± 2%    ~     (p=0.184 n=20+20)
Tar               2.11s ± 2%        2.11s ± 4%    ~     (p=0.961 n=19+20)
XML               3.07s ± 1%        3.07s ± 3%    ~     (p=0.786 n=16+19)
[Geo mean]        5.61s             5.62s       +0.19%

name        old text-bytes    new text-bytes    delta
HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

4. The go1 benchmark showed no decline in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.7s ± 1%     41.7s ± 1%    ~     (p=0.966 n=40+40)
Fannkuch11-4                23.6s ± 0%     23.6s ± 1%  -0.23%  (p=0.000 n=40+40)
FmtFprintfEmpty-4           844ns ± 1%     834ns ± 1%  -1.23%  (p=0.000 n=40+40)
FmtFprintfString-4         1.39µs ± 1%    1.40µs ± 1%  +0.71%  (p=0.000 n=40+40)
FmtFprintfInt-4            1.44µs ± 1%    1.45µs ± 1%  +0.70%  (p=0.000 n=40+40)
FmtFprintfIntInt-4         2.10µs ± 1%    2.10µs ± 1%  +0.30%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    2.49µs ± 0%    2.50µs ± 1%  +0.66%  (p=0.000 n=32+40)
FmtFprintfFloat-4          4.42µs ± 1%    4.46µs ± 2%  +0.94%  (p=0.000 n=40+40)
FmtManyArgs-4              8.31µs ± 1%    8.22µs ± 1%  -1.09%  (p=0.000 n=40+40)
GobDecode-4                 105ms ± 1%     102ms ± 1%  -2.30%  (p=0.000 n=39+39)
GobEncode-4                90.2ms ± 1%    88.7ms ± 1%  -1.66%  (p=0.000 n=40+39)
Gzip-4                      4.17s ± 1%     4.16s ± 1%    ~     (p=0.785 n=40+40)
Gunzip-4                    608ms ± 1%     608ms ± 1%    ~     (p=0.481 n=40+40)
HTTPClientServer-4          697µs ± 2%     684µs ± 3%  -1.89%  (p=0.000 n=37+40)
JSONEncode-4                255ms ± 1%     256ms ± 1%  +0.35%  (p=0.000 n=40+40)
JSONDecode-4                920ms ± 1%     926ms ± 1%  +0.64%  (p=0.000 n=40+39)
Mandelbrot200-4            49.3ms ± 1%    49.3ms ± 0%  +0.07%  (p=0.005 n=40+40)
GoParse-4                  46.8ms ± 2%    46.7ms ± 1%    ~     (p=1.000 n=40+40)
RegexpMatchEasy0_32-4      1.27µs ± 0%    1.27µs ± 1%    ~     (p=0.057 n=40+40)
RegexpMatchEasy0_1K-4      7.97µs ± 7%    7.92µs ± 5%    ~     (p=0.094 n=40+40)
RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%    ~     (p=0.406 n=40+40)
RegexpMatchEasy1_1K-4      10.5µs ± 4%    10.5µs ± 3%    ~     (p=0.855 n=40+40)
RegexpMatchMedium_32-4     2.04µs ± 0%    2.04µs ± 1%  -0.22%  (p=0.000 n=39+40)
RegexpMatchMedium_1K-4      541µs ± 0%     540µs ± 1%  -0.25%  (p=0.000 n=40+38)
RegexpMatchHard_32-4       29.3µs ± 1%    29.3µs ± 0%    ~     (p=0.149 n=40+40)
RegexpMatchHard_1K-4        878µs ± 1%     880µs ± 0%  +0.14%  (p=0.005 n=36+35)
Revcomp-4                  81.8ms ± 2%    81.4ms ± 2%  -0.43%  (p=0.015 n=38+39)
Template-4                  1.05s ± 1%     1.05s ± 1%    ~     (p=0.302 n=40+35)
TimeParse-4                7.18µs ± 1%    7.26µs ± 1%  +1.05%  (p=0.000 n=40+36)
TimeFormat-4               13.1µs ± 1%    13.1µs ± 1%    ~     (p=0.698 n=37+40)
[Geo mean]                  733µs          732µs       -0.16%

name                     old speed      new speed      delta
GobDecode-4              7.34MB/s ± 1%  7.51MB/s ± 1%  +2.36%  (p=0.000 n=39+39)
GobEncode-4              8.51MB/s ± 1%  8.65MB/s ± 1%  +1.69%  (p=0.000 n=40+39)
Gzip-4                   4.66MB/s ± 1%  4.66MB/s ± 1%    ~     (p=0.783 n=40+40)
Gunzip-4                 31.9MB/s ± 1%  31.9MB/s ± 1%    ~     (p=0.466 n=40+40)
JSONEncode-4             7.61MB/s ± 1%  7.58MB/s ± 1%  -0.35%  (p=0.001 n=40+40)
JSONDecode-4             2.11MB/s ± 1%  2.10MB/s ± 1%  -0.52%  (p=0.000 n=38+39)
GoParse-4                1.24MB/s ± 2%  1.24MB/s ± 1%    ~     (p=0.556 n=40+39)
RegexpMatchEasy0_32-4    25.1MB/s ± 0%  25.1MB/s ± 1%    ~     (p=0.064 n=40+40)
RegexpMatchEasy0_1K-4     129MB/s ± 8%   129MB/s ± 5%    ~     (p=0.094 n=40+40)
RegexpMatchEasy1_32-4    25.0MB/s ± 1%  25.1MB/s ± 1%    ~     (p=0.331 n=40+40)
RegexpMatchEasy1_1K-4    97.7MB/s ± 4%  97.8MB/s ± 3%    ~     (p=0.851 n=40+40)
RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K-4   1.89MB/s ± 0%  1.90MB/s ± 1%  +0.12%  (p=0.031 n=40+40)
RegexpMatchHard_32-4     1.09MB/s ± 1%  1.09MB/s ± 1%    ~     (p=0.597 n=40+40)
RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.16MB/s ± 1%    ~     (p=0.565 n=40+35)
Revcomp-4                31.1MB/s ± 2%  31.2MB/s ± 2%  +0.44%  (p=0.018 n=38+39)
Template-4               1.85MB/s ± 1%  1.85MB/s ± 1%    ~     (p=0.873 n=40+40)
[Geo mean]               6.66MB/s       6.67MB/s       +0.26%

Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7
Reviewed-on: https://go-review.googlesource.com/58950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-08-30 13:45:08 +00:00
Ben Shi
a2f22a6803 cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
generate further optimized ARM code.

My patch implements this optimization. Here are some contrast test
results against the original go compiler.

1. The total size of all .a files in pkg/ shrinks by 0.03%.

2. The compilecmp benchmark shows a little decline.
name        old time/op       new time/op       delta
Template          2.35s ± 1%        2.37s ± 3%  +0.94%  (p=0.006 n=19+19)
Unicode           1.33s ± 3%        1.33s ± 2%    ~     (p=0.158 n=20+18)
GoTypes           7.86s ± 2%        7.84s ± 1%    ~     (p=0.284 n=19+18)
Compiler          37.5s ± 1%        37.7s ± 2%    ~     (p=0.101 n=20+19)
SSA               83.4s ± 2%        83.6s ± 2%    ~     (p=0.231 n=20+20)
Flate             1.46s ± 2%        1.45s ± 1%    ~     (p=0.097 n=20+17)
GoParser          1.86s ± 2%        1.86s ± 4%    ~     (p=0.738 n=20+20)
Reflect           5.10s ± 1%        5.11s ± 1%    ~     (p=0.290 n=20+18)
Tar               1.78s ± 2%        1.77s ± 2%    ~     (p=0.166 n=19+20)
XML               2.61s ± 2%        2.61s ± 2%    ~     (p=0.665 n=19+19)
[Geo mean]        4.67s             4.68s       +0.16%

name        old user-time/op  new user-time/op  delta
Template          2.79s ± 3%        2.80s ± 2%    ~     (p=0.662 n=20+20)
Unicode           1.62s ± 3%        1.64s ± 4%    ~     (p=0.252 n=20+20)
GoTypes           9.58s ± 2%        9.62s ± 2%    ~     (p=0.250 n=20+20)
Compiler          46.2s ± 1%        46.2s ± 1%    ~     (p=0.602 n=20+19)
SSA                108s ± 1%         108s ± 2%    ~     (p=0.242 n=18+20)
Flate             1.69s ± 3%        1.69s ± 4%    ~     (p=0.470 n=20+20)
GoParser          2.16s ± 3%        2.20s ± 4%  +1.70%  (p=0.005 n=19+20)
Reflect           6.02s ± 2%        6.02s ± 2%    ~     (p=0.700 n=20+17)
Tar               2.11s ± 2%        2.11s ± 3%    ~     (p=0.480 n=18+20)
XML               3.07s ± 2%        3.11s ± 4%  +1.50%  (p=0.043 n=20+20)
[Geo mean]        5.61s             5.64s       +0.55%

name        old text-bytes    new text-bytes    delta
HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark shows improvement totally, and even more than 10%
improvement in the test case Revcomp. 
name                     old time/op    new time/op    delta
BinaryTree17-4              42.0s ± 1%     41.5s ± 1%   -1.32%  (p=0.000 n=39+40)
Fannkuch11-4                24.1s ± 1%     23.6s ± 0%   -2.38%  (p=0.000 n=40+40)
FmtFprintfEmpty-4           843ns ± 0%     839ns ± 1%   -0.46%  (p=0.000 n=33+40)
FmtFprintfString-4         1.44µs ± 1%    1.37µs ± 1%   -5.48%  (p=0.000 n=40+35)
FmtFprintfInt-4            1.44µs ± 1%    1.41µs ± 2%   -1.50%  (p=0.000 n=40+40)
FmtFprintfIntInt-4         2.07µs ± 1%    2.06µs ± 0%   -0.78%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    2.50µs ± 1%    2.33µs ± 1%   -6.85%  (p=0.000 n=40+40)
FmtFprintfFloat-4          4.36µs ± 1%    4.34µs ± 0%   -0.39%  (p=0.017 n=40+40)
FmtManyArgs-4              8.11µs ± 0%    8.00µs ± 0%   -1.37%  (p=0.000 n=40+40)
GobDecode-4                 105ms ± 2%     103ms ± 2%   -2.17%  (p=0.000 n=39+39)
GobEncode-4                90.1ms ± 2%    88.6ms ± 1%   -1.67%  (p=0.000 n=40+39)
Gzip-4                      4.18s ± 1%     4.09s ± 1%   -2.03%  (p=0.000 n=40+40)
Gunzip-4                    608ms ± 1%     603ms ± 1%   -0.86%  (p=0.000 n=40+34)
HTTPClientServer-4          674µs ± 3%     661µs ± 2%   -1.82%  (p=0.000 n=40+39)
JSONEncode-4                256ms ± 1%     243ms ± 0%   -5.11%  (p=0.000 n=39+31)
JSONDecode-4                915ms ± 1%     904ms ± 1%   -1.18%  (p=0.000 n=40+36)
Mandelbrot200-4            49.2ms ± 0%    49.3ms ± 0%     ~     (p=0.254 n=34+40)
GoParse-4                  46.9ms ± 2%    46.9ms ± 1%     ~     (p=0.737 n=40+39)
RegexpMatchEasy0_32-4      1.28µs ± 1%    1.27µs ± 1%   -0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4      7.86µs ± 4%    7.67µs ± 4%   -2.46%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%   -0.54%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4      10.4µs ± 2%    10.3µs ± 2%   -0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4     2.05µs ± 0%    2.04µs ± 0%   -0.34%  (p=0.000 n=40+33)
RegexpMatchMedium_1K-4      541µs ± 1%     535µs ± 1%   -1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4       29.3µs ± 1%    29.1µs ± 1%   -0.51%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4        881µs ± 1%     871µs ± 1%   -1.15%  (p=0.000 n=40+40)
Revcomp-4                  81.7ms ± 2%    67.5ms ± 2%  -17.37%  (p=0.000 n=39+39)
Template-4                  1.05s ± 1%     1.08s ± 2%   +3.67%  (p=0.000 n=40+40)
TimeParse-4                7.24µs ± 1%    7.09µs ± 1%   -2.13%  (p=0.000 n=40+40)
TimeFormat-4               13.2µs ± 1%    13.1µs ± 0%   -0.31%  (p=0.007 n=40+31)
[Geo mean]                  733µs          718µs        -2.03%

name                     old speed      new speed      delta
GobDecode-4              7.28MB/s ± 2%  7.44MB/s ± 2%   +2.23%  (p=0.000 n=39+39)
GobEncode-4              8.52MB/s ± 2%  8.67MB/s ± 1%   +1.70%  (p=0.000 n=40+39)
Gzip-4                   4.65MB/s ± 1%  4.74MB/s ± 1%   +1.94%  (p=0.000 n=37+40)
Gunzip-4                 31.9MB/s ± 1%  32.2MB/s ± 1%   +0.90%  (p=0.000 n=40+36)
JSONEncode-4             7.57MB/s ± 1%  7.98MB/s ± 0%   +5.41%  (p=0.000 n=40+31)
JSONDecode-4             2.12MB/s ± 1%  2.15MB/s ± 1%   +1.23%  (p=0.000 n=40+40)
GoParse-4                1.23MB/s ± 1%  1.23MB/s ± 1%     ~     (p=0.769 n=39+40)
RegexpMatchEasy0_32-4    25.0MB/s ± 1%  25.2MB/s ± 1%   +0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4     130MB/s ± 5%   134MB/s ± 4%   +2.53%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4    24.9MB/s ± 1%  25.1MB/s ± 1%   +0.55%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4    98.5MB/s ± 2%  99.4MB/s ± 2%   +0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%     ~     (all equal)
RegexpMatchMedium_1K-4   1.89MB/s ± 1%  1.91MB/s ± 1%   +1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4     1.10MB/s ± 1%  1.10MB/s ± 0%   +0.41%  (p=0.000 n=40+33)
RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.17MB/s ± 1%   +1.21%  (p=0.000 n=40+40)
Revcomp-4                31.1MB/s ± 2%  37.6MB/s ± 2%  +21.03%  (p=0.000 n=39+39)
Template-4               1.86MB/s ± 1%  1.79MB/s ± 1%   -3.51%  (p=0.000 n=40+38)
[Geo mean]               6.66MB/s       6.80MB/s        +2.13%

fixes #21492

Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
Reviewed-on: https://go-review.googlesource.com/58450
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-28 16:10:27 +00:00
Cherry Zhang
f20944de78 cmd/compile: set/unset base register for better assembly print
For address of an auto or arg, on all non-x86 architectures
the assembler backend encodes the actual SP offset in the
instruction but leaves the offset in Prog unchanged. When the
assembly is printed in compile -S, it shows an offset
relative to pseudo FP/SP with an actual hardware SP base
register (e.g. R13 on ARM). This is confusing. Unset the
base register if it is indeed SP, so the assembly output is
consistent. If the base register isn't SP, it should be an
error and the error output contains the actual base register.

For address loading instructions, the base register isn't set
in the compiler on non-x86 architectures. Set it. Normally it
is SP and will be unset in the change mentioned above for
printing. If it is not, it will be an error and the error
output contains the actual base register.

No change in generated binary, only printed assembly. Passes
"go build -a -toolexec 'toolstash -cmp' std cmd" on all
architectures.

Fixes #21064.

Change-Id: Ifafe8d5f9b437efbe824b63b3cbc2f5f6cdc1fd5
Reviewed-on: https://go-review.googlesource.com/49432
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-08-02 12:24:02 +00:00