cmd/compile: speed up compiling with -S

Compiling with -S was not implemented with performance in mind.
It allocates profligately. Compiling with -S is ~58% slower,
allocates ~47% more memory, and does ~183% more allocations.

compilecmp now uses -S to do finer-grained comparisons between
compiler versions, so I now care about its performance.

This change picks some of the lowest hanging fruit,
mostly by modifying printing routines to print directly to a writer,
rather than constructing a string first.

I have confirmed that compiling std+cmd with "-gcflags=all=-S -p=1"
and CGO_ENABLED=0 yields identical results before/after this change.
(-p=1 makes package compilation order deterministic. CGO_ENABLED=0
prevents cgo temp workdirs from showing up in filenames.)

Using the -S flag, the compiler performance impact is:

name        old time/op       new time/op       delta
Template          344ms ± 2%        301ms ± 2%  -12.45%  (p=0.000 n=22+24)
Unicode           136ms ± 3%        121ms ± 3%  -11.40%  (p=0.000 n=24+25)
GoTypes           1.24s ± 5%        1.09s ± 3%  -12.58%  (p=0.000 n=25+25)
Compiler          5.66s ± 4%        5.06s ± 2%  -10.56%  (p=0.000 n=25+20)
SSA               19.9s ± 3%        17.2s ± 4%  -13.64%  (p=0.000 n=25+25)
Flate             212ms ± 2%        188ms ± 2%  -11.33%  (p=0.000 n=25+24)
GoParser          278ms ± 3%        242ms ± 1%  -12.84%  (p=0.000 n=23+24)
Reflect           743ms ± 3%        657ms ± 5%  -11.56%  (p=0.000 n=24+25)
Tar               295ms ± 2%        263ms ± 2%  -10.78%  (p=0.000 n=25+25)
XML               409ms ± 2%        360ms ± 3%  -12.03%  (p=0.000 n=24+25)
[Geo mean]        714ms             629ms       -11.92%

name        old user-time/op  new user-time/op  delta
Template          430ms ± 5%        388ms ± 3%   -9.76%  (p=0.000 n=21+24)
Unicode           202ms ±12%        171ms ± 5%  -15.21%  (p=0.000 n=25+23)
GoTypes           1.58s ± 3%        1.42s ± 3%   -9.58%  (p=0.000 n=24+24)
Compiler          7.42s ± 3%        6.68s ± 8%   -9.93%  (p=0.000 n=25+25)
SSA               26.9s ± 3%        22.9s ± 3%  -14.85%  (p=0.000 n=25+25)
Flate             260ms ± 6%        234ms ± 3%   -9.69%  (p=0.000 n=23+25)
GoParser          354ms ± 1%        296ms ± 3%  -16.46%  (p=0.000 n=23+25)
Reflect           953ms ± 2%        865ms ± 4%   -9.14%  (p=0.000 n=24+24)
Tar               380ms ± 2%        348ms ± 2%   -8.28%  (p=0.000 n=25+22)
XML               530ms ± 3%        451ms ± 3%  -15.01%  (p=0.000 n=24+23)
[Geo mean]        929ms             819ms       -11.84%

name        old alloc/op      new alloc/op      delta
Template         54.1MB ± 0%       44.3MB ± 0%  -18.24%  (p=0.000 n=24+24)
Unicode          33.5MB ± 0%       30.6MB ± 0%   -8.57%  (p=0.000 n=25+25)
GoTypes           189MB ± 0%        152MB ± 0%  -19.55%  (p=0.000 n=25+23)
Compiler          875MB ± 0%        703MB ± 0%  -19.70%  (p=0.000 n=25+25)
SSA              3.19GB ± 0%       2.51GB ± 0%  -21.50%  (p=0.000 n=25+25)
Flate            32.9MB ± 0%       27.3MB ± 0%  -17.04%  (p=0.000 n=25+25)
GoParser         43.9MB ± 0%       35.1MB ± 0%  -20.19%  (p=0.000 n=25+25)
Reflect           117MB ± 0%         96MB ± 0%  -18.22%  (p=0.000 n=24+23)
Tar              48.6MB ± 0%       40.6MB ± 0%  -16.39%  (p=0.000 n=25+24)
XML              65.7MB ± 0%       53.9MB ± 0%  -17.93%  (p=0.000 n=25+23)
[Geo mean]        118MB              97MB       -17.80%

name        old allocs/op     new allocs/op     delta
Template          1.07M ± 0%        0.60M ± 0%  -43.90%  (p=0.000 n=25+24)
Unicode            539k ± 0%         398k ± 0%  -26.20%  (p=0.000 n=23+25)
GoTypes           3.97M ± 0%        2.19M ± 0%  -44.90%  (p=0.000 n=25+24)
Compiler          17.6M ± 0%         9.5M ± 0%  -46.39%  (p=0.000 n=22+23)
SSA               66.1M ± 0%        34.1M ± 0%  -48.41%  (p=0.000 n=25+22)
Flate              629k ± 0%         365k ± 0%  -41.95%  (p=0.000 n=25+25)
GoParser           929k ± 0%         500k ± 0%  -46.11%  (p=0.000 n=25+25)
Reflect           2.49M ± 0%        1.47M ± 0%  -41.00%  (p=0.000 n=24+25)
Tar                919k ± 0%         534k ± 0%  -41.94%  (p=0.000 n=25+24)
XML               1.28M ± 0%        0.71M ± 0%  -44.72%  (p=0.000 n=25+24)
[Geo mean]        2.32M             1.33M       -42.82%

This change also speeds up cmd/objdump a modest amount, ~4%.

Change-Id: I7c7aa2b365688bc44b3ef6e1d03bcf934699cabc
Reviewed-on: https://go-review.googlesource.com/c/go/+/216857
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This commit is contained in:
Josh Bleecher Snyder 2020-01-26 09:59:25 -08:00
parent 04040ec9f9
commit 1dcf34f2ec
4 changed files with 134 additions and 94 deletions

View file

@ -13,6 +13,7 @@ import (
"cmd/internal/objabi"
"cmd/internal/sys"
"fmt"
"io"
"log"
"path/filepath"
"sort"
@ -262,13 +263,13 @@ func (ctxt *Link) writeSymDebugNamed(s *LSym, name string) {
fmt.Fprintf(ctxt.Bso, "\n")
if s.Type == objabi.STEXT {
for p := s.Func.Text; p != nil; p = p.Link {
var s string
fmt.Fprintf(ctxt.Bso, "\t%#04x ", uint(int(p.Pc)))
if ctxt.Debugasm > 1 {
s = p.String()
io.WriteString(ctxt.Bso, p.String())
} else {
s = p.InnermostString()
p.InnermostString(ctxt.Bso)
}
fmt.Fprintf(ctxt.Bso, "\t%#04x %s\n", uint(int(p.Pc)), s)
fmt.Fprintln(ctxt.Bso)
}
}
for i := 0; i < len(s.P); i += 16 {
@ -283,11 +284,11 @@ func (ctxt *Link) writeSymDebugNamed(s *LSym, name string) {
fmt.Fprintf(ctxt.Bso, " ")
for j = i; j < i+16 && j < len(s.P); j++ {
c := int(s.P[j])
b := byte('.')
if ' ' <= c && c <= 0x7e {
fmt.Fprintf(ctxt.Bso, "%c", c)
} else {
fmt.Fprintf(ctxt.Bso, ".")
b = byte(c)
}
ctxt.Bso.WriteByte(b)
}
fmt.Fprintf(ctxt.Bso, "\n")