Commit graph

27 commits

Author SHA1 Message Date
Josh Bleecher Snyder
8fa14ea8b4 cmd/internal/gc: unembed Name field
This is an automated follow-up to CL 10120.
It was generated with a combination of eg and gofmt -r.

No functional changes. Passes toolstash -cmp.

Change-Id: I0dc6d146372012b4cce9cc4064066daa6694eee6
Reviewed-on: https://go-review.googlesource.com/10144
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-15 18:17:51 +00:00
Josh Bleecher Snyder
13485be939 cmd/internal/gc: convert Val.U to interface{}
This CL was generated by updating Val in go.go
and then running:

sed -i "" 's/\.U\.[SBXFC]val = /.U = /' *.go
sed -i "" 's/\.U\.Sval/.U.\(string\)/g' *.go *.y
sed -i "" 's/\.U\.Bval/.U.\(bool\)/g' *.go *.y
sed -i "" 's/\.U\.Xval/.U.\(\*Mpint\)/g' *.go *.y
sed -i "" 's/\.U\.Fval/.U.\(\*Mpflt\)/g' *.go *.y
sed -i "" 's/\.U\.Cval/.U.\(\*Mpcplx\)/g' *.go *.y

No functional changes. Passes toolstash -cmp.

This reduces the size of gc.Node from 424 to 392 bytes.
This in turn reduces the permanent (pprof -inuse_space)
memory usage while compiling the test/rotate?.go tests:

test	old(MB)	new(MB)	change
rotate0	379.49	364.78	-3.87%
rotate1	373.42	359.07	-3.84%
rotate2	381.17	366.24	-3.91%
rotate3	374.30	359.95	-3.83%

CL 8445 was similar to this; gri asked that Val's implementation
be hidden first. CLs 8912, 9263, and 9267 have at least
isolated the changes to the cmd/internal/gc package.

Updates #9933.

Change-Id: I83ddfe003d48e0a73c92e819edd3b5e620023084
Reviewed-on: https://go-review.googlesource.com/10059
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-15 15:12:42 +00:00
Russ Cox
d447279927 cmd/internal/gc: optimize slice + write barrier
The code generated for a slice x[i:j] or x[i:j:k] computes the entire
new slice (base, len, cap) and then uses it as the evaluation of the
slice expression.

If the slice is part of an update x = x[i:j] or x = x[i:j:k], there are
opportunities to avoid computing some of these fields.

For x = x[0:i], we know that only the len is changing;
base can be ignored completely, and cap can be left unmodified.

For x = x[0:i:j], we know that only len and cap are changing;
base can be ignored completely.

For x = x[i:i], we know that the resulting cap is zero, and we don't
adjust the base during a slice producing a zero-cap result,
so again base can be ignored completely.

No write to base, no write barrier.

The old slice code was trying to work at a Go syntax level, mainly
because that was how you wrote code just once instead of once
per architecture. Now the compiler is factored a bit better and we
can implement slice during code generation but still have one copy
of the code. So the new code is working at that lower level.
(It must, to update only parts of the result.)

This CL by itself:
name                   old mean              new mean              delta
BinaryTree17            5.81s × (0.98,1.03)   5.71s × (0.96,1.05)     ~    (p=0.101)
Fannkuch11              4.35s × (1.00,1.00)   4.39s × (1.00,1.00)   +0.79% (p=0.000)
FmtFprintfEmpty        86.0ns × (0.94,1.11)  82.6ns × (0.98,1.04)   -3.86% (p=0.048)
FmtFprintfString        276ns × (0.98,1.04)   273ns × (0.98,1.02)     ~    (p=0.235)
FmtFprintfInt           274ns × (0.98,1.06)   270ns × (0.99,1.01)     ~    (p=0.119)
FmtFprintfIntInt        506ns × (0.99,1.01)   475ns × (0.99,1.01)   -6.02% (p=0.000)
FmtFprintfPrefixedInt   391ns × (0.99,1.01)   393ns × (1.00,1.01)     ~    (p=0.139)
FmtFprintfFloat         566ns × (0.99,1.01)   574ns × (1.00,1.01)   +1.33% (p=0.001)
FmtManyArgs            1.91µs × (0.99,1.01)  1.87µs × (0.99,1.02)   -1.83% (p=0.000)
GobDecode              15.3ms × (0.99,1.02)  15.0ms × (0.98,1.05)   -1.84% (p=0.042)
GobEncode              11.5ms × (0.97,1.03)  11.4ms × (0.99,1.03)     ~    (p=0.152)
Gzip                    645ms × (0.99,1.01)   647ms × (0.99,1.01)     ~    (p=0.265)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)   +0.90% (p=0.000)
HTTPClientServer       90.5µs × (0.97,1.04)  88.5µs × (0.99,1.03)   -2.27% (p=0.014)
JSONEncode             32.0ms × (0.98,1.03)  29.6ms × (0.98,1.01)   -7.51% (p=0.000)
JSONDecode              114ms × (0.99,1.01)   104ms × (1.00,1.01)   -8.60% (p=0.000)
Mandelbrot200          6.04ms × (1.00,1.01)  6.02ms × (1.00,1.00)     ~    (p=0.057)
GoParse                6.47ms × (0.97,1.05)  6.37ms × (0.97,1.04)     ~    (p=0.105)
RegexpMatchEasy0_32     171ns × (0.93,1.07)   152ns × (0.99,1.01)  -11.09% (p=0.000)
RegexpMatchEasy0_1K     550ns × (0.98,1.01)   530ns × (1.00,1.00)   -3.78% (p=0.000)
RegexpMatchEasy1_32     135ns × (0.99,1.02)   134ns × (0.99,1.01)   -1.33% (p=0.002)
RegexpMatchEasy1_1K     879ns × (1.00,1.01)   865ns × (1.00,1.00)   -1.58% (p=0.000)
RegexpMatchMedium_32    243ns × (1.00,1.00)   233ns × (1.00,1.00)   -4.30% (p=0.000)
RegexpMatchMedium_1K   70.3µs × (1.00,1.00)  69.5µs × (1.00,1.00)   -1.13% (p=0.000)
RegexpMatchHard_32     3.82µs × (1.00,1.01)  3.74µs × (1.00,1.00)   -1.95% (p=0.000)
RegexpMatchHard_1K      117µs × (1.00,1.00)   115µs × (1.00,1.00)   -1.69% (p=0.000)
Revcomp                 917ms × (0.97,1.04)   920ms × (0.97,1.04)     ~    (p=0.786)
Template                114ms × (0.99,1.01)   117ms × (0.99,1.01)   +2.58% (p=0.000)
TimeParse               622ns × (0.99,1.01)   615ns × (0.99,1.00)   -1.06% (p=0.000)
TimeFormat              665ns × (0.99,1.01)   654ns × (0.99,1.00)   -1.70% (p=0.000)

This CL and previous CL (append) combined:
name                   old mean              new mean              delta
BinaryTree17            5.68s × (0.97,1.04)   5.71s × (0.96,1.05)     ~    (p=0.638)
Fannkuch11              4.41s × (0.98,1.03)   4.39s × (1.00,1.00)     ~    (p=0.474)
FmtFprintfEmpty        92.7ns × (0.91,1.16)  82.6ns × (0.98,1.04)  -10.89% (p=0.004)
FmtFprintfString        281ns × (0.96,1.08)   273ns × (0.98,1.02)     ~    (p=0.078)
FmtFprintfInt           288ns × (0.97,1.06)   270ns × (0.99,1.01)   -6.37% (p=0.000)
FmtFprintfIntInt        493ns × (0.97,1.04)   475ns × (0.99,1.01)   -3.53% (p=0.002)
FmtFprintfPrefixedInt   423ns × (0.97,1.04)   393ns × (1.00,1.01)   -7.07% (p=0.000)
FmtFprintfFloat         598ns × (0.99,1.01)   574ns × (1.00,1.01)   -4.02% (p=0.000)
FmtManyArgs            1.89µs × (0.98,1.05)  1.87µs × (0.99,1.02)     ~    (p=0.305)
GobDecode              14.8ms × (0.98,1.03)  15.0ms × (0.98,1.05)     ~    (p=0.237)
GobEncode              12.3ms × (0.98,1.01)  11.4ms × (0.99,1.03)   -6.95% (p=0.000)
Gzip                    656ms × (0.99,1.05)   647ms × (0.99,1.01)     ~    (p=0.101)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)   +0.58% (p=0.001)
HTTPClientServer       91.2µs × (0.97,1.04)  88.5µs × (0.99,1.03)   -3.02% (p=0.003)
JSONEncode             32.6ms × (0.97,1.08)  29.6ms × (0.98,1.01)   -9.10% (p=0.000)
JSONDecode              114ms × (0.97,1.05)   104ms × (1.00,1.01)   -8.74% (p=0.000)
Mandelbrot200          6.11ms × (0.98,1.04)  6.02ms × (1.00,1.00)     ~    (p=0.090)
GoParse                6.66ms × (0.97,1.04)  6.37ms × (0.97,1.04)   -4.41% (p=0.000)
RegexpMatchEasy0_32     159ns × (0.99,1.00)   152ns × (0.99,1.01)   -4.69% (p=0.000)
RegexpMatchEasy0_1K     538ns × (1.00,1.01)   530ns × (1.00,1.00)   -1.57% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   134ns × (0.99,1.01)   -2.91% (p=0.000)
RegexpMatchEasy1_1K     869ns × (0.99,1.01)   865ns × (1.00,1.00)   -0.51% (p=0.012)
RegexpMatchMedium_32    252ns × (0.99,1.01)   233ns × (1.00,1.00)   -7.85% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  69.5µs × (1.00,1.00)   -4.43% (p=0.000)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.74µs × (1.00,1.00)   -2.74% (p=0.000)
RegexpMatchHard_1K      118µs × (1.00,1.00)   115µs × (1.00,1.00)   -2.24% (p=0.000)
Revcomp                 920ms × (0.97,1.07)   920ms × (0.97,1.04)     ~    (p=0.998)
Template                129ms × (0.98,1.03)   117ms × (0.99,1.01)   -9.79% (p=0.000)
TimeParse               619ns × (0.99,1.01)   615ns × (0.99,1.00)   -0.57% (p=0.011)
TimeFormat              661ns × (0.98,1.04)   654ns × (0.99,1.00)     ~    (p=0.223)

Change-Id: If054d81ab2c71d8d62cf54b5b1fac2af66b387fc
Reviewed-on: https://go-review.googlesource.com/9813
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-13 19:20:39 +00:00
Russ Cox
5ed4bb6db1 cmd/5g: fix build
The line in cgen.go was lost during the ginscmp CL.
The ggen.go change is not strictly necessary, but
it makes the 5g -S output for x[0] match what it said
before the ginscmp CL.

Change-Id: I5890a9ec1ac69a38509416eda5aea13b8b12b94a
Reviewed-on: https://go-review.googlesource.com/9929
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-12 19:55:50 +00:00
Russ Cox
8552047a32 cmd/internal/gc: optimize append + write barrier
The code generated for x = append(x, v) is roughly:

	t := x
	if len(t)+1 > cap(t) {
		t = grow(t)
	}
	t[len(t)] = v
	len(t)++
	x = t

We used to generate this code as Go pseudocode during walk.
Generate it instead as actual instructions during gen.

Doing so lets us apply a few optimizations. The most important
is that when, as in the above example, the source slice and the
destination slice are the same, the code can instead do:

	t := x
	if len(t)+1 > cap(t) {
		t = grow(t)
		x = {base(t), len(t)+1, cap(t)}
	} else {
		len(x)++
	}
	t[len(t)] = v

That is, in the fast path that does not reallocate the array,
only the updated length needs to be written back to x,
not the array pointer and not the capacity. This is more like
what you'd write by hand in C. It's faster in general, since
the fast path elides two of the three stores, but it's especially
faster when the form of x is such that the base pointer write
would turn into a write barrier. No write, no barrier.

name                   old mean              new mean              delta
BinaryTree17            5.68s × (0.97,1.04)   5.81s × (0.98,1.03)   +2.35% (p=0.023)
Fannkuch11              4.41s × (0.98,1.03)   4.35s × (1.00,1.00)     ~    (p=0.090)
FmtFprintfEmpty        92.7ns × (0.91,1.16)  86.0ns × (0.94,1.11)   -7.31% (p=0.038)
FmtFprintfString        281ns × (0.96,1.08)   276ns × (0.98,1.04)     ~    (p=0.219)
FmtFprintfInt           288ns × (0.97,1.06)   274ns × (0.98,1.06)   -4.94% (p=0.002)
FmtFprintfIntInt        493ns × (0.97,1.04)   506ns × (0.99,1.01)   +2.65% (p=0.009)
FmtFprintfPrefixedInt   423ns × (0.97,1.04)   391ns × (0.99,1.01)   -7.52% (p=0.000)
FmtFprintfFloat         598ns × (0.99,1.01)   566ns × (0.99,1.01)   -5.27% (p=0.000)
FmtManyArgs            1.89µs × (0.98,1.05)  1.91µs × (0.99,1.01)     ~    (p=0.231)
GobDecode              14.8ms × (0.98,1.03)  15.3ms × (0.99,1.02)   +3.01% (p=0.000)
GobEncode              12.3ms × (0.98,1.01)  11.5ms × (0.97,1.03)   -5.93% (p=0.000)
Gzip                    656ms × (0.99,1.05)   645ms × (0.99,1.01)     ~    (p=0.055)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)   -0.32% (p=0.034)
HTTPClientServer       91.2µs × (0.97,1.04)  90.5µs × (0.97,1.04)     ~    (p=0.468)
JSONEncode             32.6ms × (0.97,1.08)  32.0ms × (0.98,1.03)     ~    (p=0.190)
JSONDecode              114ms × (0.97,1.05)   114ms × (0.99,1.01)     ~    (p=0.887)
Mandelbrot200          6.11ms × (0.98,1.04)  6.04ms × (1.00,1.01)     ~    (p=0.167)
GoParse                6.66ms × (0.97,1.04)  6.47ms × (0.97,1.05)   -2.81% (p=0.014)
RegexpMatchEasy0_32     159ns × (0.99,1.00)   171ns × (0.93,1.07)   +7.19% (p=0.002)
RegexpMatchEasy0_1K     538ns × (1.00,1.01)   550ns × (0.98,1.01)   +2.30% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   135ns × (0.99,1.02)   -1.60% (p=0.000)
RegexpMatchEasy1_1K     869ns × (0.99,1.01)   879ns × (1.00,1.01)   +1.08% (p=0.000)
RegexpMatchMedium_32    252ns × (0.99,1.01)   243ns × (1.00,1.00)   -3.71% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  70.3µs × (1.00,1.00)   -3.34% (p=0.000)
RegexpMatchHard_32     3.85µs × (1.00,1.00)  3.82µs × (1.00,1.01)   -0.81% (p=0.000)
RegexpMatchHard_1K      118µs × (1.00,1.00)   117µs × (1.00,1.00)   -0.56% (p=0.000)
Revcomp                 920ms × (0.97,1.07)   917ms × (0.97,1.04)     ~    (p=0.808)
Template                129ms × (0.98,1.03)   114ms × (0.99,1.01)  -12.06% (p=0.000)
TimeParse               619ns × (0.99,1.01)   622ns × (0.99,1.01)     ~    (p=0.062)
TimeFormat              661ns × (0.98,1.04)   665ns × (0.99,1.01)     ~    (p=0.524)

See next CL for combination with a similar optimization for slice.
The benchmarks that are slower in this CL are still faster overall
with the combination of the two.

Change-Id: I2a7421658091b2488c64741b4db15ab6c3b4cb7e
Reviewed-on: https://go-review.googlesource.com/9812
Reviewed-by: David Chase <drchase@google.com>
2015-05-12 17:55:09 +00:00
Russ Cox
f8d14fc3a0 cmd/internal/gc: add backend ginscmp function to emit a comparison
This lets us abstract away which arguments can be constants and so on
and lets the back ends reverse the order of arguments if that helps.

Change-Id: I283ec1d694f2dd84eba22e5eb4aad78a2d2d9eb0
Reviewed-on: https://go-review.googlesource.com/9810
Reviewed-by: David Chase <drchase@google.com>
2015-05-12 17:54:57 +00:00
Russ Cox
fc595b78d2 cmd/internal/gc: mark panicindex calls as not returning
Most of the calls to panicindex are already
marked as not returning, but these two were missed
at some point.

Performance changes below.

name                   old mean              new mean              delta
BinaryTree17            5.70s × (0.98,1.04)   5.68s × (0.97,1.04)    ~    (p=0.681)
Fannkuch11              4.32s × (1.00,1.00)   4.41s × (0.98,1.03)  +1.98% (p=0.018)
FmtFprintfEmpty        92.6ns × (0.91,1.11)  92.7ns × (0.91,1.16)    ~    (p=0.969)
FmtFprintfString        280ns × (0.97,1.05)   281ns × (0.96,1.08)    ~    (p=0.860)
FmtFprintfInt           284ns × (0.99,1.02)   288ns × (0.97,1.06)    ~    (p=0.207)
FmtFprintfIntInt        488ns × (0.98,1.01)   493ns × (0.97,1.04)    ~    (p=0.271)
FmtFprintfPrefixedInt   418ns × (0.98,1.04)   423ns × (0.97,1.04)    ~    (p=0.311)
FmtFprintfFloat         597ns × (1.00,1.00)   598ns × (0.99,1.01)    ~    (p=0.789)
FmtManyArgs            1.87µs × (0.99,1.01)  1.89µs × (0.98,1.05)    ~    (p=0.158)
GobDecode              14.6ms × (0.99,1.01)  14.8ms × (0.98,1.03)  +1.51% (p=0.015)
GobEncode              12.3ms × (0.98,1.03)  12.3ms × (0.98,1.01)    ~    (p=0.474)
Gzip                    647ms × (1.00,1.01)   656ms × (0.99,1.05)    ~    (p=0.104)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)    ~    (p=0.110)
HTTPClientServer       89.6µs × (0.99,1.03)  91.2µs × (0.97,1.04)    ~    (p=0.061)
JSONEncode             31.7ms × (0.99,1.01)  32.6ms × (0.97,1.08)  +2.87% (p=0.038)
JSONDecode              111ms × (1.00,1.01)   114ms × (0.97,1.05)  +2.47% (p=0.040)
Mandelbrot200          6.01ms × (1.00,1.00)  6.11ms × (0.98,1.04)    ~    (p=0.073)
GoParse                6.54ms × (0.99,1.02)  6.66ms × (0.97,1.04)    ~    (p=0.064)
RegexpMatchEasy0_32     159ns × (0.99,1.02)   159ns × (0.99,1.00)    ~    (p=0.693)
RegexpMatchEasy0_1K     540ns × (0.99,1.03)   538ns × (1.00,1.01)    ~    (p=0.360)
RegexpMatchEasy1_32     137ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.511)
RegexpMatchEasy1_1K     867ns × (1.00,1.01)   869ns × (0.99,1.01)    ~    (p=0.193)
RegexpMatchMedium_32    252ns × (1.00,1.00)   252ns × (0.99,1.01)    ~    (p=0.076)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.963)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.85µs × (1.00,1.00)    ~    (p=0.371)
RegexpMatchHard_1K      117µs × (1.00,1.01)   118µs × (1.00,1.00)    ~    (p=0.898)
Revcomp                 909ms × (0.98,1.03)   920ms × (0.97,1.07)    ~    (p=0.368)
Template                128ms × (0.99,1.01)   129ms × (0.98,1.03)  +1.41% (p=0.042)
TimeParse               619ns × (0.98,1.01)   619ns × (0.99,1.01)    ~    (p=0.730)
TimeFormat              651ns × (1.00,1.01)   661ns × (0.98,1.04)    ~    (p=0.097)

Change-Id: I0ec5baff41f5d282307137ce0d927e6301e4fa10
Reviewed-on: https://go-review.googlesource.com/9811
Reviewed-by: David Chase <drchase@google.com>
2015-05-11 15:22:56 +00:00
Keith Randall
135389d0ff cmd/internal/gc: Use shifts for powers-of-two indexing
Fixes #10638

Change-Id: I7bbaad7e5a599aa94d1d158e903596231c7e9897
Reviewed-on: https://go-review.googlesource.com/9535
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-05-05 02:17:55 +00:00
Shenghou Ma
931328b8b8 cmd/internal/gc: fix build on big endian systems
The siz argument to both runtime.newproc and runtime.deferproc is
int32, not uintptr. This problem won't manifest on little-endian
systems because that stack slot is uintptr sized anyway. However,
on big-endian systems, it will make a difference.

Change-Id: I2351d1ec81839abe25375cff95e327b80764c2b5
Reviewed-on: https://go-review.googlesource.com/9647
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-02 19:32:15 +00:00
Shenghou Ma
2b7505e28a cmd/internal/gc: fix write barrier fast path on RISC architectures
They have to read the boolean into a register first and then do
the comparison.

Fixes #10598.

Change-Id: I2b808837a8c6393e1e0778296b6592aaab2b04bf
Signed-off-by: Shenghou Ma <minux@golang.org>
Reviewed-on: https://go-review.googlesource.com/9453
Reviewed-by: Dave Cheney <dave@cheney.net>
2015-04-29 00:29:44 +00:00
Shenghou Ma
e7dd28891e cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack.

Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0
Reviewed-on: https://go-review.googlesource.com/8639
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-29 00:28:01 +00:00
Russ Cox
2a73559023 cmd/internal/gc: emit typedmemmove write barrier from sgen
Emitting it here instead of rewriting the tree earlier sets us up
to generate an inline check, like we do for single pointers.
But even without the inline check, generating at this level lets
us generate significantly more efficient code, probably due to
having fewer temporaries and less complex high-level code
for the compiler to churn through.

Revcomp is worse, almost certainly due to register pressure.

name                                       old                     new          delta
BenchmarkBinaryTree17              18.0s × (0.99,1.01)     18.0s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.43s × (1.00,1.00)     4.36s × (1.00,1.00)  -1.44%
BenchmarkFmtFprintfEmpty           114ns × (0.95,1.05)      86ns × (0.97,1.06)  -24.12%
BenchmarkFmtFprintfString          468ns × (0.99,1.01)     420ns × (0.99,1.02)  -10.16%
BenchmarkFmtFprintfInt             433ns × (1.00,1.01)     386ns × (0.99,1.02)  -10.74%
BenchmarkFmtFprintfIntInt          748ns × (0.99,1.01)     647ns × (0.99,1.01)  -13.56%
BenchmarkFmtFprintfPrefixedInt     547ns × (0.99,1.01)     499ns × (0.99,1.02)  -8.78%
BenchmarkFmtFprintfFloat           756ns × (1.00,1.01)     689ns × (1.00,1.00)  -8.86%
BenchmarkFmtManyArgs              2.79µs × (1.00,1.01)    2.53µs × (1.00,1.00)  -9.30%
BenchmarkGobDecode                39.6ms × (0.99,1.00)    39.2ms × (0.98,1.01)  -1.07%
BenchmarkGobEncode                37.6ms × (1.00,1.01)    37.5ms × (0.99,1.01)  ~
BenchmarkGzip                      663ms × (0.99,1.02)     660ms × (0.98,1.01)  ~
BenchmarkGunzip                    142ms × (1.00,1.00)     143ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          132µs × (0.99,1.01)     133µs × (0.99,1.02)  ~
BenchmarkJSONEncode               56.2ms × (0.99,1.01)    54.0ms × (0.98,1.01)  -3.97%
BenchmarkJSONDecode                138ms × (1.00,1.00)     134ms × (0.99,1.02)  -2.70%
BenchmarkMandelbrot200            6.03ms × (1.00,1.01)    6.00ms × (1.00,1.01)  ~
BenchmarkGoParse                  9.82ms × (0.93,1.10)   10.35ms × (0.88,1.11)  ~
BenchmarkRegexpMatchEasy0_32       207ns × (1.00,1.00)     163ns × (0.99,1.01)  -21.26%
BenchmarkRegexpMatchEasy0_1K       581ns × (1.00,1.01)     566ns × (0.99,1.00)  -2.50%
BenchmarkRegexpMatchEasy1_32       185ns × (0.99,1.01)     138ns × (1.00,1.01)  -25.41%
BenchmarkRegexpMatchEasy1_1K       975ns × (1.00,1.01)     892ns × (1.00,1.00)  -8.51%
BenchmarkRegexpMatchMedium_32      328ns × (0.99,1.00)     252ns × (1.00,1.00)  -23.17%
BenchmarkRegexpMatchMedium_1K     88.6µs × (1.00,1.01)    73.0µs × (1.00,1.01)  -17.66%
BenchmarkRegexpMatchHard_32       4.69µs × (0.95,1.03)    3.85µs × (1.00,1.01)  -17.91%
BenchmarkRegexpMatchHard_1K        133µs × (1.00,1.01)     117µs × (1.00,1.00)  -12.34%
BenchmarkRevcomp                   902ms × (0.99,1.05)    1001ms × (0.94,1.01)  +11.04%
BenchmarkTemplate                  174ms × (0.99,1.01)     160ms × (0.99,1.01)  -7.70%
BenchmarkTimeParse                 639ns × (1.00,1.00)     622ns × (1.00,1.00)  -2.66%
BenchmarkTimeFormat                736ns × (1.00,1.01)     736ns × (1.00,1.02)  ~

Change-Id: Ib3bbeb379f5f4819e6f5dcf69bc88a2b7ed41460
Reviewed-on: https://go-review.googlesource.com/9225
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-04-28 01:38:18 +00:00
Russ Cox
653d56075d cmd/internal/gc: inline writeBarrierEnabled check before calling writebarrierptr
I believe the benchmarks that get slower are under register pressure,
and not making the call unconditionally makes the pressure worse,
and the register allocator doesn't do a great job. But part of the point
of this sequence is to get the write barriers out of the way so I can work
on the register allocator, so that's okay.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (1.00,1.01)     18.0s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.43s × (1.00,1.00)     4.43s × (1.00,1.00)  ~
BenchmarkFmtFprintfEmpty           110ns × (1.00,1.06)     114ns × (0.95,1.05)  ~
BenchmarkFmtFprintfString          487ns × (0.99,1.00)     468ns × (0.99,1.01)  -4.00%
BenchmarkFmtFprintfInt             450ns × (0.99,1.00)     433ns × (1.00,1.01)  -3.88%
BenchmarkFmtFprintfIntInt          762ns × (1.00,1.00)     748ns × (0.99,1.01)  -1.84%
BenchmarkFmtFprintfPrefixedInt     584ns × (0.99,1.01)     547ns × (0.99,1.01)  -6.26%
BenchmarkFmtFprintfFloat           738ns × (1.00,1.00)     756ns × (1.00,1.01)  +2.37%
BenchmarkFmtManyArgs              2.80µs × (1.00,1.01)    2.79µs × (1.00,1.01)  ~
BenchmarkGobDecode                39.0ms × (0.99,1.00)    39.6ms × (0.99,1.00)  +1.54%
BenchmarkGobEncode                37.8ms × (0.98,1.01)    37.6ms × (1.00,1.01)  ~
BenchmarkGzip                      661ms × (0.99,1.01)     663ms × (0.99,1.02)  ~
BenchmarkGunzip                    142ms × (1.00,1.00)     142ms × (1.00,1.00)  ~
BenchmarkHTTPClientServer          132µs × (0.99,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               56.3ms × (0.99,1.01)    56.2ms × (0.99,1.01)  ~
BenchmarkJSONDecode                138ms × (0.99,1.01)     138ms × (1.00,1.00)  ~
BenchmarkMandelbrot200            6.01ms × (1.00,1.00)    6.03ms × (1.00,1.01)  +0.23%
BenchmarkGoParse                  10.2ms × (0.87,1.05)     9.8ms × (0.93,1.10)  ~
BenchmarkRegexpMatchEasy0_32       208ns × (1.00,1.00)     207ns × (1.00,1.00)  ~
BenchmarkRegexpMatchEasy0_1K       588ns × (1.00,1.00)     581ns × (1.00,1.01)  -1.27%
BenchmarkRegexpMatchEasy1_32       182ns × (0.99,1.01)     185ns × (0.99,1.01)  +1.65%
BenchmarkRegexpMatchEasy1_1K       986ns × (1.00,1.01)     975ns × (1.00,1.01)  -1.17%
BenchmarkRegexpMatchMedium_32      323ns × (1.00,1.01)     328ns × (0.99,1.00)  +1.55%
BenchmarkRegexpMatchMedium_1K     89.9µs × (1.00,1.00)    88.6µs × (1.00,1.01)  -1.38%
BenchmarkRegexpMatchHard_32       4.72µs × (0.95,1.01)    4.69µs × (0.95,1.03)  ~
BenchmarkRegexpMatchHard_1K        133µs × (1.00,1.01)     133µs × (1.00,1.01)  ~
BenchmarkRevcomp                   900ms × (1.00,1.05)     902ms × (0.99,1.05)  ~
BenchmarkTemplate                  168ms × (0.99,1.01)     174ms × (0.99,1.01)  +3.30%
BenchmarkTimeParse                 637ns × (1.00,1.00)     639ns × (1.00,1.00)  +0.31%
BenchmarkTimeFormat                738ns × (1.00,1.00)     736ns × (1.00,1.01)  ~

Change-Id: I03ce152852edec404538f6c20eb650fac82e2aa2
Reviewed-on: https://go-review.googlesource.com/9224
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:38:06 +00:00
Russ Cox
0f0bc0f016 cmd/internal/gc: use MOV R0, R1 instead of LEA 0(R0), R1 in Agen
Minor code generation optimization I've been meaning to do
for a while and noticed while working on the emitted write
barrier code. Using MOV lets the compiler and maybe the
processor do copy propagation.

name                                       old                     new          delta
BenchmarkBinaryTree17              17.9s × (0.99,1.01)     18.0s × (0.99,1.01)  ~
BenchmarkFannkuch11                4.42s × (1.00,1.00)     4.36s × (1.00,1.00)  -1.39%
BenchmarkFmtFprintfEmpty           118ns × (0.96,1.02)     120ns × (0.99,1.06)  ~
BenchmarkFmtFprintfString          486ns × (0.99,1.01)     480ns × (0.99,1.01)  -1.34%
BenchmarkFmtFprintfInt             457ns × (0.99,1.01)     451ns × (0.99,1.01)  -1.31%
BenchmarkFmtFprintfIntInt          768ns × (1.00,1.01)     766ns × (0.99,1.01)  ~
BenchmarkFmtFprintfPrefixedInt     584ns × (0.99,1.03)     569ns × (0.99,1.01)  -2.57%
BenchmarkFmtFprintfFloat           739ns × (0.99,1.00)     728ns × (1.00,1.01)  -1.49%
BenchmarkFmtManyArgs              2.77µs × (1.00,1.00)    2.81µs × (1.00,1.01)  +1.53%
BenchmarkGobDecode                39.3ms × (0.99,1.01)    39.4ms × (0.99,1.01)  ~
BenchmarkGobEncode                39.4ms × (0.99,1.00)    39.4ms × (0.99,1.00)  ~
BenchmarkGzip                      661ms × (0.99,1.01)     660ms × (1.00,1.01)  ~
BenchmarkGunzip                    142ms × (1.00,1.00)     143ms × (1.00,1.00)  +0.20%
BenchmarkHTTPClientServer          133µs × (0.98,1.01)     132µs × (0.99,1.01)  ~
BenchmarkJSONEncode               56.5ms × (0.99,1.01)    57.1ms × (0.99,1.01)  +0.94%
BenchmarkJSONDecode                143ms × (1.00,1.00)     138ms × (1.00,1.01)  -3.22%
BenchmarkMandelbrot200            6.01ms × (1.00,1.00)    6.02ms × (1.00,1.00)  ~
BenchmarkGoParse                  9.63ms × (0.94,1.07)    9.79ms × (0.92,1.07)  ~
BenchmarkRegexpMatchEasy0_32       210ns × (1.00,1.00)     210ns × (1.00,1.01)  ~
BenchmarkRegexpMatchEasy0_1K       596ns × (0.99,1.01)     593ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_32       184ns × (0.99,1.01)     182ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (0.99,1.01)    1.01µs × (1.00,1.01)  ~
BenchmarkRegexpMatchMedium_32      327ns × (1.00,1.01)     331ns × (1.00,1.00)  +1.22%
BenchmarkRegexpMatchMedium_1K     93.0µs × (1.00,1.02)    92.6µs × (1.00,1.01)  ~
BenchmarkRegexpMatchHard_32       4.76µs × (0.95,1.01)    4.58µs × (0.99,1.05)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.01)     136µs × (1.00,1.01)  ~
BenchmarkRevcomp                   892ms × (1.00,1.01)     900ms × (0.99,1.06)  ~
BenchmarkTemplate                  175ms × (0.99,1.00)     171ms × (1.00,1.01)  -2.36%
BenchmarkTimeParse                 638ns × (1.00,1.00)     637ns × (1.00,1.00)  ~
BenchmarkTimeFormat                772ns × (1.00,1.00)     742ns × (1.00,1.00)  -3.95%

Change-Id: I6504e310cb9cf48a73d539c478b4dbcacde208b2
Reviewed-on: https://go-review.googlesource.com/9308
Reviewed-by: Austin Clements <austin@google.com>
2015-04-28 01:37:33 +00:00
Russ Cox
0ad4f8b1f7 cmd/internal/gc: emit write barriers at lower level
This is primarily preparation for inlining, not an optimization by itself,
but it still helps some.

name                                       old                     new          delta
BenchmarkBinaryTree17              18.2s × (0.99,1.01)     17.9s × (0.99,1.01)  -1.57%
BenchmarkFannkuch11                4.44s × (1.00,1.00)     4.42s × (1.00,1.00)  -0.40%
BenchmarkFmtFprintfEmpty           119ns × (0.95,1.02)     118ns × (0.96,1.02)  ~
BenchmarkFmtFprintfString          501ns × (0.99,1.02)     486ns × (0.99,1.01)  -2.89%
BenchmarkFmtFprintfInt             474ns × (0.99,1.00)     457ns × (0.99,1.01)  -3.59%
BenchmarkFmtFprintfIntInt          792ns × (1.00,1.00)     768ns × (1.00,1.01)  -3.03%
BenchmarkFmtFprintfPrefixedInt     574ns × (1.00,1.01)     584ns × (0.99,1.03)  +1.83%
BenchmarkFmtFprintfFloat           749ns × (1.00,1.00)     739ns × (0.99,1.00)  -1.34%
BenchmarkFmtManyArgs              2.94µs × (1.00,1.01)    2.77µs × (1.00,1.00)  -5.76%
BenchmarkGobDecode                39.5ms × (0.99,1.01)    39.3ms × (0.99,1.01)  ~
BenchmarkGobEncode                39.4ms × (1.00,1.01)    39.4ms × (0.99,1.00)  ~
BenchmarkGzip                      658ms × (1.00,1.01)     661ms × (0.99,1.01)  ~
BenchmarkGunzip                    142ms × (1.00,1.00)     142ms × (1.00,1.00)  +0.22%
BenchmarkHTTPClientServer          134µs × (0.99,1.01)     133µs × (0.98,1.01)  ~
BenchmarkJSONEncode               57.1ms × (0.99,1.01)    56.5ms × (0.99,1.01)  ~
BenchmarkJSONDecode                141ms × (1.00,1.00)     143ms × (1.00,1.00)  +1.09%
BenchmarkMandelbrot200            6.01ms × (1.00,1.00)    6.01ms × (1.00,1.00)  ~
BenchmarkGoParse                  10.1ms × (0.91,1.09)     9.6ms × (0.94,1.07)  ~
BenchmarkRegexpMatchEasy0_32       207ns × (1.00,1.01)     210ns × (1.00,1.00)  +1.45%
BenchmarkRegexpMatchEasy0_1K       592ns × (0.99,1.00)     596ns × (0.99,1.01)  +0.68%
BenchmarkRegexpMatchEasy1_32       184ns × (0.99,1.01)     184ns × (0.99,1.01)  ~
BenchmarkRegexpMatchEasy1_1K      1.01µs × (1.00,1.00)    1.01µs × (0.99,1.01)  ~
BenchmarkRegexpMatchMedium_32      327ns × (0.99,1.00)     327ns × (1.00,1.01)  ~
BenchmarkRegexpMatchMedium_1K     92.5µs × (1.00,1.00)    93.0µs × (1.00,1.02)  +0.48%
BenchmarkRegexpMatchHard_32       4.79µs × (0.95,1.00)    4.76µs × (0.95,1.01)  ~
BenchmarkRegexpMatchHard_1K        136µs × (1.00,1.00)     136µs × (1.00,1.01)  ~
BenchmarkRevcomp                   900ms × (0.99,1.01)     892ms × (1.00,1.01)  ~
BenchmarkTemplate                  170ms × (0.99,1.01)     175ms × (0.99,1.00)  +2.95%
BenchmarkTimeParse                 645ns × (1.00,1.00)     638ns × (1.00,1.00)  -1.16%
BenchmarkTimeFormat                740ns × (1.00,1.00)     772ns × (1.00,1.00)  +4.39%

Change-Id: I0be905e32791e0cb70ff01f169c4b309a971d981
Reviewed-on: https://go-review.googlesource.com/9159
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-28 01:37:05 +00:00
Josh Bleecher Snyder
13cb62c764 cmd/internal/gc, cmd/6g: generate boolean values without jumps
Use SETcc instructions instead of Jcc to generate boolean values.
This generates shorter, jump-free code, which may in turn enable other
peephole optimizations.

For example, given

func f(i, j int) bool {
	return i == j
}

Before

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	JEQ	21
	0x000f 00015 (x.go:4)	MOVB	$0, "".~r2+24(FP)
	0x0014 00020 (x.go:4)	RET
	0x0015 00021 (x.go:4)	MOVB	$1, "".~r2+24(FP)
	0x001a 00026 (x.go:4)	JMP	20

After

"".f t=1 size=32 value=0 args=0x18 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f(SB), $0-24
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·b4c25e9b09fd0cf9bb429dcefe91c353(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	MOVQ	"".i+8(FP), BX
	0x0005 00005 (x.go:4)	MOVQ	"".j+16(FP), BP
	0x000a 00010 (x.go:4)	CMPQ	BX, BP
	0x000d 00013 (x.go:4)	SETEQ	"".~r2+24(FP)
	0x0012 00018 (x.go:4)	RET

regexp benchmarks, best of 12 runs:

benchmark                                 old ns/op      new ns/op      delta
BenchmarkNotOnePassShortB                 782            733            -6.27%
BenchmarkLiteral                          180            171            -5.00%
BenchmarkNotLiteral                       2855           2721           -4.69%
BenchmarkMatchHard_32                     2672           2557           -4.30%
BenchmarkMatchHard_1K                     80182          76732          -4.30%
BenchmarkMatchEasy1_32M                   76440180       73304748       -4.10%
BenchmarkMatchEasy1_32K                   68798          66350          -3.56%
BenchmarkAnchoredLongMatch                482            465            -3.53%
BenchmarkMatchEasy1_1M                    2373042        2292692        -3.39%
BenchmarkReplaceAll                       2776           2690           -3.10%
BenchmarkNotOnePassShortA                 1397           1360           -2.65%
BenchmarkMatchClass_InRange               3842           3742           -2.60%
BenchmarkMatchEasy0_32                    125            122            -2.40%
BenchmarkMatchEasy0_32K                   11414          11164          -2.19%
BenchmarkMatchEasy0_1K                    668            654            -2.10%
BenchmarkAnchoredShortMatch               260            255            -1.92%
BenchmarkAnchoredLiteralShortNonMatch     164            161            -1.83%
BenchmarkOnePassShortB                    623            612            -1.77%
BenchmarkOnePassShortA                    801            788            -1.62%
BenchmarkMatchClass                       4094           4033           -1.49%
BenchmarkMatchEasy0_32M                   14078800       13890704       -1.34%
BenchmarkMatchHard_32K                    4095844        4045820        -1.22%
BenchmarkMatchEasy1_1K                    1663           1643           -1.20%
BenchmarkMatchHard_1M                     131261708      129708215      -1.18%
BenchmarkMatchHard_32M                    4210112412     4169292003     -0.97%
BenchmarkMatchMedium_32K                  2460752        2438611        -0.90%
BenchmarkMatchEasy0_1M                    422914         419672         -0.77%
BenchmarkMatchMedium_1M                   78581121       78040160       -0.69%
BenchmarkMatchMedium_32M                  2515287278     2498464906     -0.67%
BenchmarkMatchMedium_32                   1754           1746           -0.46%
BenchmarkMatchMedium_1K                   52105          52106          +0.00%
BenchmarkAnchoredLiteralLongNonMatch      185            185            +0.00%
BenchmarkMatchEasy1_32                    107            107            +0.00%
BenchmarkOnePassLongNotPrefix             505            505            +0.00%
BenchmarkOnePassLongPrefix                147            147            +0.00%

The godoc binary is ~0.12% smaller after this CL.

Updates #5729.

toolstash -cmp passes for all architectures other than amd64 and amd64p32.

Other architectures can be done in follow-up CLs.

Change-Id: I0e167e259274b722958567fc0af83a17ca002da7
Reviewed-on: https://go-review.googlesource.com/2284
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 20:12:18 +00:00
Russ Cox
17228f44b2 cmd/internal/gc: make use of new String methods in prints
$ sam -d cmd/internal/gc/*.{go,y} cmd/?g/*.go
X ,s/, (gc\.)?[BFHNST]conv\(([^()]+), 0\)/, \2/g
X/'/w
q
$

Change-Id: Ic28a4807d237b8ae53ceca1e4e7fdb43580ab560
Reviewed-on: https://go-review.googlesource.com/9032
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-17 19:29:25 +00:00
Josh Bleecher Snyder
4a7e5bca3c cmd/internal/gc: clean up bgen
This cleanup is in anticipation of implementing
jump-free booleans (CL 2284) and zero-aware
comparisons (issue 10381).

No functional changes. Passes toolstash -cmp.

Change-Id: I50f394c60fa2927e177d7fc85b75085060a9e912
Reviewed-on: https://go-review.googlesource.com/8738
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 03:25:21 +00:00
Josh Bleecher Snyder
5ed90cbbb0 cmd/internal/gc, cmd/gc: move Reg from Val to Node
Val is used to hold constant values.
Reg was the odd duck out.

Generated using eg.

No functional changes. Passes toolstash -cmp.

Change-Id: Ic1de769a1f92bb02e09a4428d998b716f307e2f6
Reviewed-on: https://go-review.googlesource.com/8912
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-15 19:02:10 +00:00
Josh Bleecher Snyder
e7fe9f56ea cmd/internal/gc: convert Bval to bool
No functional changes. Passes toolstash -cmp.

Change-Id: I4fba0c248645c3910ee3f7fc99dacafb676c5dc2
Reviewed-on: https://go-review.googlesource.com/8911
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-14 23:55:51 +00:00
Russ Cox
92c826b1b2 cmd/internal/gc: inline runtime.getg
This more closely restores what the old C runtime did.
(In C, g was an 'extern register' with the same effective
implementation as in this CL.)

On a late 2012 MacBookPro10,2, best of 5 old vs best of 5 new:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              4981312777     4463426605     -10.40%
BenchmarkFannkuch11                3046495712     3006819428     -1.30%
BenchmarkFmtFprintfEmpty           89.3           79.8           -10.64%
BenchmarkFmtFprintfString          284            262            -7.75%
BenchmarkFmtFprintfInt             282            262            -7.09%
BenchmarkFmtFprintfIntInt          480            448            -6.67%
BenchmarkFmtFprintfPrefixedInt     382            358            -6.28%
BenchmarkFmtFprintfFloat           529            486            -8.13%
BenchmarkFmtManyArgs               1849           1773           -4.11%
BenchmarkGobDecode                 12835963       11794385       -8.11%
BenchmarkGobEncode                 10527170       10288422       -2.27%
BenchmarkGzip                      436109569      438422516      +0.53%
BenchmarkGunzip                    110121663      109843648      -0.25%
BenchmarkHTTPClientServer          81930          85446          +4.29%
BenchmarkJSONEncode                24638574       24280603       -1.45%
BenchmarkJSONDecode                93022423       85753546       -7.81%
BenchmarkMandelbrot200             4703899        4735407        +0.67%
BenchmarkGoParse                   5319853        5086843        -4.38%
BenchmarkRegexpMatchEasy0_32       151            151            +0.00%
BenchmarkRegexpMatchEasy0_1K       452            453            +0.22%
BenchmarkRegexpMatchEasy1_32       131            132            +0.76%
BenchmarkRegexpMatchEasy1_1K       761            722            -5.12%
BenchmarkRegexpMatchMedium_32      228            224            -1.75%
BenchmarkRegexpMatchMedium_1K      63751          64296          +0.85%
BenchmarkRegexpMatchHard_32        3188           3238           +1.57%
BenchmarkRegexpMatchHard_1K        95396          96756          +1.43%
BenchmarkRevcomp                   661587262      687107364      +3.86%
BenchmarkTemplate                  108312598      104008540      -3.97%
BenchmarkTimeParse                 453            459            +1.32%
BenchmarkTimeFormat                475            441            -7.16%

The garbage benchmark from the benchmarks subrepo gets 2.6% faster as well.

Change-Id: I320aeda332db81012688b26ffab23f6581c59cfa
Reviewed-on: https://go-review.googlesource.com/8460
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-04-07 14:26:47 +00:00
Josh Bleecher Snyder
75883bae28 cmd/internal/gc: convert yet more Node fields to bools
Convert Embedded, Method, and Colas to bools.

I believe that this is the last of the Node fields
that can be trivially converted to bools.

No functional changes. Passes toolstash -cmp.

Change-Id: I81962ee47866596341fc60d24d6959c20cd7fc1c
Reviewed-on: https://go-review.googlesource.com/8440
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-07 03:10:30 +00:00
Russ Cox
92dba0d278 cmd/internal/gc: use hardware instruction for math.Sqrt (amd64/arm)
I first prototyped this change in Sept 2011, and I discarded it
because it made no difference in the obvious benchmark loop.
It still makes no difference in the obvious benchmark loop,
but in a less obvious one, doing some extra computation
around the calls to Sqrt, not making the call does have a
significant effect.

benchmark                 old ns/op     new ns/op     delta
BenchmarkSqrt             4.56          4.57          +0.22%
BenchmarkSqrtIndirect     4.56          4.56          +0.00%
BenchmarkSqrtGo           69.4          69.4          +0.00%
BenchmarkSqrtPrime        4417          3647          -17.43%

This is a warmup for using hardware expansions for some
calls to 1-line assembly routines in the runtime (for example getg).

Change-Id: Ie66be23f8c09d0f7dc4ddd7ca8a93cfce28f55a4
Reviewed-on: https://go-review.googlesource.com/8356
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-04-03 16:13:36 +00:00
Josh Bleecher Snyder
3ed9e4ca3c cmd/internal/gc: unembed Node.Func
This is a follow-up to CL 7360.

It was generated with eg and gofmt -r.

The only manual changes are the unembedding in syntax.go
and backporting changes from y.go to go.y.

Passes toolstash -cmp.

Change-Id: I3d6d06ecb659809a4bc8592395d5b9a18967218e
Reviewed-on: https://go-review.googlesource.com/8053
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-01 18:21:22 +00:00
Josh Bleecher Snyder
b09925b31d cmd/5g etc: merge simple case expressions onto fewer lines
The c2go translation left a lot of case expressions on separate lines.
Merge expressions onto single lines subject to these constraints:

* Max 4 clauses, all literals or names
* Don't move expressions with comments

The change was created by running http://play.golang.org/p/yHajs72h-g:

$ mergecase cmd/internal/{ld,gc,obj}/*.go cmd/internal/obj/*/*.go

Passes toolstash -cmp.

Change-Id: Iba41b390d302e5486e5dc6ba7599a92270676556
Reviewed-on: https://go-review.googlesource.com/7593
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2015-04-01 17:27:22 +00:00
Russ Cox
4224d81fae cmd/internal/gc: inline x := y.(*T) and x, ok := y.(*T)
These can be implemented with just a compare and a move instruction.
Do so, avoiding the overhead of a call into the runtime.

These assertions are a significant cost in Go code that uses interface{}
as a safe alternative to C's void* (or unsafe.Pointer), such as the
current version of the Go compiler.

*T here includes pointer to T but also any Go type represented as
a single pointer (chan, func, map). It does not include [1]*T or struct{*int}.
That requires more work in other parts of the compiler; there is a TODO.

Change-Id: I7ff681c20d2c3eb6ad11dd7b3a37b1f3dda23965
Reviewed-on: https://go-review.googlesource.com/7862
Reviewed-by: Rob Pike <r@golang.org>
2015-03-20 20:05:37 +00:00
Russ Cox
b115c35ee3 cmd/internal/gc: move cgen, regalloc, et al to portable code
This CL moves the bulk of the code that has been copy-and-pasted
since the initial 386 port back into a shared place, cutting 5 copies to 1.

The motivation here is not cleanup per se but instead to reduce the
cost of introducing changes in shared concepts like regalloc or general
expression evaluation. For example, a change after this one will
implement x.(*T) without a call into the runtime. This CL makes that
followup work 5x easier.

The single copy still has more special cases for architecture details
than I'd like, but having them called out explicitly like this at least
opens the door to generalizing the conditions and smoothing out
the distinctions in the future.

This is a LARGE CL. I started by trying to pull in one function at a time
in a sequence of CLs and it became clear that everything was so
interrelated that it had to be moved as a whole. Apologies for the size.

It is not clear how many more releases this code will matter for;
eventually it will be replaced by Keith's SSA work. But as noted above,
the deduplication was necessary to reduce the cost of working on
the current code while we have it.

Passes tests on amd64, 386, arm, and ppc64le.
Can build arm64 binaries but not tested there.
Being able to build binaries means it is probably very close.

Change-Id: I735977f04c0614f80215fb12966dfe9bbd1f5861
Reviewed-on: https://go-review.googlesource.com/7853
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-03-20 20:03:52 +00:00