Commit graph

192 commits

Author SHA1 Message Date
Than McIntosh
4435fcfd6c compiler,linker: support for DWARF inlined instances
Compiler and linker changes to support DWARF inlined instances,
see https://go.googlesource.com/proposal/+/HEAD/design/22080-dwarf-inlining.md
for design details.

This functionality is gated via the cmd/compile option -gendwarfinl=N,
where N={0,1,2}, where a value of 0 disables dwarf inline generation,
a value of 1 turns on dwarf generation without tracking of formal/local
vars from inlined routines, and a value of 2 enables inlines with
variable tracking.

Updates #22080

Change-Id: I69309b3b815d9fed04aebddc0b8d33d0dbbfad6e
Reviewed-on: https://go-review.googlesource.com/75550
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-11-30 14:39:19 +00:00
isharipo
1e83f883c5 cmd/internal/obj: make it possible to have all AVX1/2 insts
Current AllowedOpCodes is 1024, which is not enough for modern x86.
Changed limit to 2048 (though AVX512 will exceed this).

Additional Z-cases and ytab tables are added to make it possible
to handle missing AVX1 and AVX2 instructions.

This CL is required by x86avxgen to work properly:
https://go-review.googlesource.com/c/arch/+/66972

Change-Id: I290214bbda554d2cba53349f50dcd34014fe4cee
Reviewed-on: https://go-review.googlesource.com/70650
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-11-02 16:12:17 +00:00
Wei Xiao
531e6c06c4 cmd/asm: refine Go assembly for ARM64
Some ARM64-specific instructions (such as SIMD instructions) are not supported.
This patch adds support for the following:
1. Extended register, e.g.:
     ADD	Rm.<ext>[<<amount], Rn, Rd
     <ext> can have the following values:
       UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX
2. Arrangement for SIMD instructions, e.g.:
     VADDP	Vm.<T>, Vn.<T>, Vd.<T>
     <T> can have the following values:
       B8, B16, H4, H8, S2, S4 and D2
3. Width specifier and element index for SIMD instructions, e.g.:
     VMOV	Vn.<T>[index], Rd // MOV(to general register)
     <T> can have the following values:
       S and D
4. Register List, e.g.:
     VLD1	(Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>]
5. Register offset variant, e.g.:
     VLD1.P	(Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register
6. Go assembly for ARM64 reference manual
     new added instructions are required to have according explanation items in
     the manual and items for existed instructions will be added incrementally

For more information about the refinement background, please refer to the
discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU)

This patch only adds syntax and doesn't break any assembly that already exists.

Change-Id: I34e90b7faae032820593a0e417022c354a882008
Reviewed-on: https://go-review.googlesource.com/41654
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-13 13:41:19 +00:00
Keith Randall
e130dcf051 cmd/compile: abort earlier if stack frame too large
If the stack frame is too large, abort immediately.
We used to generate code first, then abort.
In issue 22200, generating code raised a panic
so we got an ICE instead of an error message.

Change the max frame size to 1GB (from 2GB).
Stack frames between 1.1GB and 2GB didn't used to work anyway,
the pcln table generation would have failed and generated an ICE.

Fixes #22200

Change-Id: I1d918ab27ba6ebf5c87ec65d1bccf973f8c8541e
Reviewed-on: https://go-review.googlesource.com/69810
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-10-11 18:24:13 +00:00
isharipo
8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
Heschi Kreinick
4c54a047c6 [dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.

After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.

There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:

  type myint int
  var a int
  b := myint(int)
and
  b := (*uint64)(unsafe.Pointer(a))

don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.

Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.

A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.

TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.

go build -toolexec 'toolstash -cmp' -a std succeeds.

With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after:  06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean  /tmp/220352263 /tmp/621364410
completed   15 of   15, estimated time remaining 0s (eta 3:52PM)
name        old time/op       new time/op       delta
Template          199ms ± 3%        198ms ± 2%     ~     (p=0.400 n=15+14)
Unicode          96.6ms ± 5%       96.4ms ± 5%     ~     (p=0.838 n=15+15)
GoTypes           653ms ± 2%        647ms ± 2%     ~     (p=0.102 n=15+14)
Flate             133ms ± 6%        129ms ± 3%   -2.62%  (p=0.041 n=15+15)
GoParser          164ms ± 5%        159ms ± 3%   -3.05%  (p=0.000 n=15+15)
Reflect           428ms ± 4%        422ms ± 3%     ~     (p=0.156 n=15+13)
Tar               123ms ±10%        124ms ± 8%     ~     (p=0.461 n=15+15)
XML               228ms ± 3%        224ms ± 3%   -1.57%  (p=0.045 n=15+15)
[Geo mean]        206ms             377ms       +82.86%

name        old user-time/op  new user-time/op  delta
Template          292ms ±10%        301ms ±12%     ~     (p=0.189 n=15+15)
Unicode           166ms ±37%        158ms ±14%     ~     (p=0.418 n=15+14)
GoTypes           962ms ± 6%        963ms ± 7%     ~     (p=0.976 n=15+15)
Flate             207ms ±19%        200ms ±14%     ~     (p=0.345 n=14+15)
GoParser          246ms ±22%        240ms ±15%     ~     (p=0.587 n=15+15)
Reflect           611ms ±13%        587ms ±14%     ~     (p=0.085 n=15+13)
Tar               211ms ±12%        217ms ±14%     ~     (p=0.355 n=14+15)
XML               335ms ±15%        320ms ±18%     ~     (p=0.169 n=15+15)
[Geo mean]        317ms             583ms       +83.72%

name        old alloc/op      new alloc/op      delta
Template         40.2MB ± 0%       40.2MB ± 0%   -0.15%  (p=0.000 n=14+15)
Unicode          29.2MB ± 0%       29.3MB ± 0%     ~     (p=0.624 n=15+15)
GoTypes           114MB ± 0%        114MB ± 0%   -0.15%  (p=0.000 n=15+14)
Flate            25.7MB ± 0%       25.6MB ± 0%   -0.18%  (p=0.000 n=13+15)
GoParser         32.2MB ± 0%       32.2MB ± 0%   -0.14%  (p=0.003 n=15+15)
Reflect          77.8MB ± 0%       77.9MB ± 0%     ~     (p=0.061 n=15+15)
Tar              27.1MB ± 0%       27.0MB ± 0%   -0.11%  (p=0.029 n=15+15)
XML              42.7MB ± 0%       42.5MB ± 0%   -0.29%  (p=0.000 n=15+15)
[Geo mean]       42.1MB            75.0MB       +78.05%

name        old allocs/op     new allocs/op     delta
Template           402k ± 1%         398k ± 0%   -0.91%  (p=0.000 n=15+15)
Unicode            344k ± 1%         344k ± 0%     ~     (p=0.715 n=15+14)
GoTypes           1.18M ± 0%        1.17M ± 0%   -0.91%  (p=0.000 n=15+14)
Flate              243k ± 0%         240k ± 1%   -1.05%  (p=0.000 n=13+15)
GoParser           327k ± 1%         324k ± 1%   -0.96%  (p=0.000 n=15+15)
Reflect            984k ± 1%         982k ± 0%     ~     (p=0.050 n=15+15)
Tar                261k ± 1%         259k ± 1%   -0.77%  (p=0.000 n=15+15)
XML                411k ± 0%         404k ± 1%   -1.55%  (p=0.000 n=15+15)
[Geo mean]         439k              755k       +72.01%

name        old text-bytes    new text-bytes    delta
HelloSize         694kB ± 0%        694kB ± 0%   -0.00%  (p=0.000 n=15+15)

name        old data-bytes    new data-bytes    delta
HelloSize        5.55kB ± 0%       5.55kB ± 0%     ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         133kB ± 0%        133kB ± 0%     ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.04MB ± 0%       1.04MB ± 0%     ~     (all equal)

Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-27 20:19:44 +00:00
Alessandro Arzilli
2ad41a3090 cmd/compile: output DWARF lexical blocks for local variables
Change compiler and linker to emit DWARF lexical blocks in .debug_info
section when compiling with -N -l.

Version of debug_info is updated from DWARF v2 to DWARF v3 since
version 2 does not allow lexical blocks with discontinuous PC ranges.

Remaining open problems:
- scope information is removed from inlined functions
- variables records do not have DW_AT_start_scope attributes so a
variable will shadow other variables with the same name as soon as its
containing scope begins, even before its declaration.

Updates #6913.
Updates #12899.

Change-Id: Idc6808788512ea20e7e45bcf782453acb416fb49
Reviewed-on: https://go-review.googlesource.com/40095
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-05-18 23:10:50 +00:00
Josh Bleecher Snyder
756b9ce3a5 cmd/compile: add initial backend concurrency support
This CL adds initial support for concurrent backend compilation.

BACKGROUND

The compiler currently consists (very roughly) of the following phases:

1. Initialization.
2. Lexing and parsing into the cmd/compile/internal/syntax AST.
3. Translation into the cmd/compile/internal/gc AST.
4. Some gc AST passes: typechecking, escape analysis, inlining,
   closure handling, expression evaluation ordering (order.go),
   and some lowering and optimization (walk.go).
5. Translation into the cmd/compile/internal/ssa SSA form.
6. Optimization and lowering of SSA form.
7. Translation from SSA form to assembler instructions.
8. Translation from assembler instructions to machine code.
9. Writing lots of output: machine code, DWARF symbols,
   type and reflection info, export data.

Phase 2 was already concurrent as of Go 1.8.

Phase 3 is planned for eventual removal;
we hope to go straight from syntax AST to SSA.

Phases 5–8 are per-function; this CL adds support for
processing multiple functions concurrently.
The slowest phases in the compiler are 5 and 6,
so this offers the opportunity for some good speed-ups.

Unfortunately, it's not quite that straightforward.
In the current compiler, the latter parts of phase 4
(order, walk) are done function-at-a-time as needed.
Making order and walk concurrency-safe proved hard,
and they're not particularly slow, so there wasn't much reward.
To enable phases 5–8 to be done concurrently,
when concurrent backend compilation is requested,
we complete phase 4 for all functions
before starting later phases for any functions.

Also, in reality, we automatically generate new
functions in phase 9, such as method wrappers
and equality and has routines.
Those new functions then go through phases 4–8.
This CL disables concurrent backend compilation
after the first, big, user-provided batch of
functions has been compiled.
This is done to keep things simple,
and because the autogenerated functions
tend to be small, few, simple, and fast to compile.

USAGE

Concurrent backend compilation still defaults to off.
To set the number of functions that may be backend-compiled
concurrently, use the compiler flag -c.
In future work, cmd/go will automatically set -c.

Furthermore, this CL has been intentionally written
so that the c=1 path has no backend concurrency whatsoever,
not even spawning any goroutines.
This helps ensure that, should problems arise
late in the development cycle,
we can simply have cmd/go set c=1 always,
and revert to the original compiler behavior.

MUTEXES

Most of the work required to make concurrent backend
compilation safe has occurred over the past month.
This CL adds a handful of mutexes to get the rest of the way there;
they are the mutexes that I didn't see a clean way to avoid.
Some of them may still be eliminable in future work.

In no particular order:

* gc.funcsymsmu. The global funcsyms slice is populated
  lazily when we need function symbols for closures.
  This occurs during gc AST to SSA translation.
  The function funcsym also does a package lookup,
  which is a source of races on types.Pkg.Syms;
  funcsymsmu also covers that package lookup.
  This mutex is low priority: it adds a single global,
  it is in an infrequently used code path, and it is low contention.
  Since funcsyms may now be added in any order,
  we must sort them to preserve reproducible builds.

* gc.largeStackFramesMu. We don't discover until after SSA compilation
  that a function's stack frame is gigantic.
  Recording that error happens basically never,
  but it does happen concurrently.
  Fix with a low priority mutex and sorting.

* obj.Link.hashmu. ctxt.hash stores the mapping from
  types.Syms (compiler symbols) to obj.LSyms (linker symbols).
  It is accessed fairly heavily through all the phases.
  This is the only heavily contended mutex.

* gc.signatlistmu. The global signatlist map is
  populated with types through several of the concurrent phases,
  including notably via ngotype during DWARF generation.
  It is low priority for removal.

* gc.typepkgmu. Looking up symbols in the types package
  happens a fair amount during backend compilation
  and DWARF generation, particularly via ngotype.
  This mutex helps us to avoid a broader mutex on types.Pkg.Syms.
  It has low-to-moderate contention.

* types.internedStringsmu. gc AST to SSA conversion and
  some SSA work introduce new autotmps.
  Those autotmps have their names interned to reduce allocations.
  That interning requires protecting types.internedStrings.
  The autotmp names are heavily re-used, and the mutex
  overhead and contention here are low, so it is probably
  a worthwhile performance optimization to keep this mutex.

TESTING

I have been testing this code locally by running
'go install -race cmd/compile'
and then doing
'go build -a -gcflags=-c=128 std cmd'
for all architectures and a variety of compiler flags.
This obviously needs to be made part of the builders,
but it is too expensive to make part of all.bash.
I have filed #19962 for this.

REPRODUCIBLE BUILDS

This version of the compiler generates reproducible builds.
Testing reproducible builds also needs automation, however,
and is also too expensive for all.bash.
This is #19961.

Also of note is that some of the compiler flags used by 'toolstash -cmp'
are currently incompatible with concurrent backend compilation.
They still work fine with c=1.
Time will tell whether this is a problem.

NEXT STEPS

* Continue to find and fix races and bugs,
  using a combination of code inspection, fuzzing,
  and hopefully some community experimentation.
  I do not know of any outstanding races,
  but there probably are some.
* Improve testing.
* Improve performance, for many values of c.
* Integrate with cmd/go and fine tune.
* Support concurrent compilation with the -race flag.
  It is a sad irony that it does not yet work.
* Minor code cleanup that has been deferred during
  the last month due to uncertainty about the
  ultimate shape of this CL.

PERFORMANCE

Here's the buried lede, at last. :)

All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop.

First, going from tip to this CL with c=1 has almost no impact.

name        old time/op       new time/op       delta
Template          195ms ± 3%        194ms ± 5%    ~     (p=0.370 n=30+29)
Unicode          86.6ms ± 3%       87.0ms ± 7%    ~     (p=0.958 n=29+30)
GoTypes           548ms ± 3%        555ms ± 4%  +1.35%  (p=0.001 n=30+28)
Compiler          2.51s ± 2%        2.54s ± 2%  +1.17%  (p=0.000 n=28+30)
SSA               5.16s ± 3%        5.16s ± 2%    ~     (p=0.910 n=30+29)
Flate             124ms ± 5%        124ms ± 4%    ~     (p=0.947 n=30+30)
GoParser          146ms ± 3%        146ms ± 3%    ~     (p=0.150 n=29+28)
Reflect           354ms ± 3%        352ms ± 4%    ~     (p=0.096 n=29+29)
Tar               107ms ± 5%        106ms ± 3%    ~     (p=0.370 n=30+29)
XML               200ms ± 4%        201ms ± 4%    ~     (p=0.313 n=29+28)
[Geo mean]        332ms             333ms       +0.10%

name        old user-time/op  new user-time/op  delta
Template          227ms ± 5%        225ms ± 5%    ~     (p=0.457 n=28+27)
Unicode           109ms ± 4%        109ms ± 5%    ~     (p=0.758 n=29+29)
GoTypes           713ms ± 4%        721ms ± 5%    ~     (p=0.051 n=30+29)
Compiler          3.36s ± 2%        3.38s ± 3%    ~     (p=0.146 n=30+30)
SSA               7.46s ± 3%        7.47s ± 3%    ~     (p=0.804 n=30+29)
Flate             146ms ± 7%        147ms ± 3%    ~     (p=0.833 n=29+27)
GoParser          179ms ± 5%        179ms ± 5%    ~     (p=0.866 n=30+30)
Reflect           431ms ± 4%        429ms ± 4%    ~     (p=0.593 n=29+30)
Tar               124ms ± 5%        123ms ± 5%    ~     (p=0.140 n=29+29)
XML               243ms ± 4%        242ms ± 7%    ~     (p=0.404 n=29+29)
[Geo mean]        415ms             415ms       +0.02%

name        old obj-bytes     new obj-bytes     delta
Template           382k ± 0%         382k ± 0%    ~     (all equal)
Unicode            203k ± 0%         203k ± 0%    ~     (all equal)
GoTypes           1.18M ± 0%        1.18M ± 0%    ~     (all equal)
Compiler          3.98M ± 0%        3.98M ± 0%    ~     (all equal)
SSA               8.28M ± 0%        8.28M ± 0%    ~     (all equal)
Flate              230k ± 0%         230k ± 0%    ~     (all equal)
GoParser           287k ± 0%         287k ± 0%    ~     (all equal)
Reflect           1.00M ± 0%        1.00M ± 0%    ~     (all equal)
Tar                190k ± 0%         190k ± 0%    ~     (all equal)
XML                416k ± 0%         416k ± 0%    ~     (all equal)
[Geo mean]         660k              660k       +0.00%

Comparing this CL to itself, from c=1 to c=2
improves real times 20-30%, costs 5-10% more CPU time,
and adds about 2% alloc.
The allocation increase comes from allocating more ssa.Caches.

name       old time/op       new time/op       delta
Template         202ms ± 3%        149ms ± 3%  -26.15%  (p=0.000 n=49+49)
Unicode         87.4ms ± 4%       84.2ms ± 3%   -3.68%  (p=0.000 n=48+48)
GoTypes          560ms ± 2%        398ms ± 2%  -28.96%  (p=0.000 n=49+49)
Compiler         2.46s ± 3%        1.76s ± 2%  -28.61%  (p=0.000 n=48+46)
SSA              6.17s ± 2%        4.04s ± 1%  -34.52%  (p=0.000 n=49+49)
Flate            126ms ± 3%         92ms ± 2%  -26.81%  (p=0.000 n=49+48)
GoParser         148ms ± 4%        107ms ± 2%  -27.78%  (p=0.000 n=49+48)
Reflect          361ms ± 3%        281ms ± 3%  -22.10%  (p=0.000 n=49+49)
Tar              109ms ± 4%         86ms ± 3%  -20.81%  (p=0.000 n=49+47)
XML              204ms ± 3%        144ms ± 2%  -29.53%  (p=0.000 n=48+45)

name       old user-time/op  new user-time/op  delta
Template         246ms ± 9%        246ms ± 4%     ~     (p=0.401 n=50+48)
Unicode          109ms ± 4%        111ms ± 4%   +1.47%  (p=0.000 n=44+50)
GoTypes          728ms ± 3%        765ms ± 3%   +5.04%  (p=0.000 n=46+50)
Compiler         3.33s ± 3%        3.41s ± 2%   +2.31%  (p=0.000 n=49+48)
SSA              8.52s ± 2%        9.11s ± 2%   +6.93%  (p=0.000 n=49+47)
Flate            149ms ± 4%        161ms ± 3%   +8.13%  (p=0.000 n=50+47)
GoParser         181ms ± 5%        192ms ± 2%   +6.40%  (p=0.000 n=49+46)
Reflect          452ms ± 9%        474ms ± 2%   +4.99%  (p=0.000 n=50+48)
Tar              126ms ± 6%        136ms ± 4%   +7.95%  (p=0.000 n=50+49)
XML              247ms ± 5%        264ms ± 3%   +6.94%  (p=0.000 n=48+50)

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       39.3MB ± 0%   +1.48%  (p=0.008 n=5+5)
Unicode         29.8MB ± 0%       30.2MB ± 0%   +1.19%  (p=0.008 n=5+5)
GoTypes          113MB ± 0%        114MB ± 0%   +0.69%  (p=0.008 n=5+5)
Compiler         443MB ± 0%        447MB ± 0%   +0.95%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.26GB ± 0%   +0.89%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       25.9MB ± 1%   +2.35%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       32.2MB ± 0%   +1.59%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       78.9MB ± 0%   +0.91%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       27.0MB ± 0%   +1.80%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       43.4MB ± 0%   +2.35%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          379k ± 0%         378k ± 0%     ~     (p=0.421 n=5+5)
Unicode           322k ± 0%         321k ± 0%     ~     (p=0.222 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.548 n=5+5)
Compiler         4.12M ± 0%        4.11M ± 0%   -0.14%  (p=0.032 n=5+5)
SSA              9.72M ± 0%        9.72M ± 0%     ~     (p=0.421 n=5+5)
Flate             234k ± 1%         234k ± 0%     ~     (p=0.421 n=5+5)
GoParser          316k ± 1%         315k ± 0%     ~     (p=0.222 n=5+5)
Reflect           980k ± 0%         979k ± 0%     ~     (p=0.095 n=5+5)
Tar               249k ± 1%         249k ± 1%     ~     (p=0.841 n=5+5)
XML               392k ± 0%         391k ± 0%     ~     (p=0.095 n=5+5)

From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%:

name       old time/op       new time/op       delta
Template         203ms ± 3%        131ms ± 5%  -35.45%  (p=0.000 n=50+50)
Unicode         87.2ms ± 4%       84.1ms ± 2%   -3.61%  (p=0.000 n=48+47)
GoTypes          560ms ± 4%        310ms ± 2%  -44.65%  (p=0.000 n=50+49)
Compiler         2.47s ± 3%        1.41s ± 2%  -43.10%  (p=0.000 n=50+46)
SSA              6.17s ± 2%        3.20s ± 2%  -48.06%  (p=0.000 n=49+49)
Flate            126ms ± 4%         74ms ± 2%  -41.06%  (p=0.000 n=49+48)
GoParser         148ms ± 4%         89ms ± 3%  -39.97%  (p=0.000 n=49+50)
Reflect          360ms ± 3%        242ms ± 3%  -32.81%  (p=0.000 n=49+49)
Tar              108ms ± 4%         73ms ± 4%  -32.48%  (p=0.000 n=50+49)
XML              203ms ± 3%        119ms ± 3%  -41.56%  (p=0.000 n=49+48)

name       old user-time/op  new user-time/op  delta
Template         246ms ± 9%        287ms ± 9%  +16.98%  (p=0.000 n=50+50)
Unicode          109ms ± 4%        118ms ± 5%   +7.56%  (p=0.000 n=46+50)
GoTypes          735ms ± 4%        806ms ± 2%   +9.62%  (p=0.000 n=50+50)
Compiler         3.34s ± 4%        3.56s ± 2%   +6.78%  (p=0.000 n=49+49)
SSA              8.54s ± 3%       10.04s ± 3%  +17.55%  (p=0.000 n=50+50)
Flate            149ms ± 6%        176ms ± 3%  +17.82%  (p=0.000 n=50+48)
GoParser         181ms ± 5%        213ms ± 3%  +17.47%  (p=0.000 n=50+50)
Reflect          453ms ± 6%        499ms ± 2%  +10.11%  (p=0.000 n=50+48)
Tar              126ms ± 5%        149ms ±11%  +18.76%  (p=0.000 n=50+50)
XML              246ms ± 5%        287ms ± 4%  +16.53%  (p=0.000 n=49+50)

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       40.4MB ± 0%   +4.21%  (p=0.008 n=5+5)
Unicode         29.8MB ± 0%       30.9MB ± 0%   +3.68%  (p=0.008 n=5+5)
GoTypes          113MB ± 0%        116MB ± 0%   +2.71%  (p=0.008 n=5+5)
Compiler         443MB ± 0%        455MB ± 0%   +2.75%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.27GB ± 0%   +1.84%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       26.9MB ± 1%   +6.31%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       33.2MB ± 0%   +4.61%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       80.2MB ± 0%   +2.53%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       27.9MB ± 0%   +5.19%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       44.6MB ± 0%   +5.20%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          380k ± 0%         379k ± 0%   -0.39%  (p=0.032 n=5+5)
Unicode           321k ± 0%         321k ± 0%     ~     (p=0.841 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%     ~     (p=0.421 n=5+5)
Compiler         4.12M ± 0%        4.14M ± 0%   +0.52%  (p=0.008 n=5+5)
SSA              9.72M ± 0%        9.76M ± 0%   +0.37%  (p=0.008 n=5+5)
Flate             234k ± 1%         234k ± 1%     ~     (p=0.690 n=5+5)
GoParser          316k ± 0%         317k ± 1%     ~     (p=0.841 n=5+5)
Reflect           981k ± 0%         981k ± 0%     ~     (p=1.000 n=5+5)
Tar               250k ± 0%         249k ± 1%     ~     (p=0.151 n=5+5)
XML               393k ± 0%         392k ± 0%     ~     (p=0.056 n=5+5)

Going beyond c=4 on my machine tends to increase CPU time and allocs
without impacting real time.

The CPU time numbers matter, because when there are many concurrent
compilation processes, that will impact the overall throughput.

The numbers above are in many ways the best case scenario;
we can take full advantage of all cores.
Fortunately, the most common compilation scenario is incremental
re-compilation of a single package during a build/test cycle.

Updates #15756

Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea
Reviewed-on: https://go-review.googlesource.com/40693
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-27 00:59:07 +00:00
Josh Bleecher Snyder
405a280d01 cmd/internal/obj: eliminate LSym.Version
There were only two versions, 0 and 1,
and the only user of version 1 was the assembler,
to indicate that a symbol was static.

Rename LSym.Version to Static,
and add it to LSym.Attributes.
Simplify call-sites.

Passes toolstash-check.

Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f
Reviewed-on: https://go-review.googlesource.com/41201
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 21:56:23 +00:00
Josh Bleecher Snyder
6e97c71cb7 cmd/internal/obj: split Link.hash into version 0 and 1
Though LSym.Version is an int, it can only have the value 0 or 1.
Using that, split Link.hash into two maps, one for version 0
(which is far more common) and one for version 1.
This lets use just the name for lookups,
which is both faster and more compact.
This matters because Link.hash map lookups are frequent,
and will be contended once the backend is concurrent.

name        old time/op       new time/op       delta
Template          194ms ± 3%        192ms ± 5%  -1.46%  (p=0.000 n=47+49)
Unicode          84.5ms ± 3%       83.8ms ± 3%  -0.81%  (p=0.011 n=50+49)
GoTypes           543ms ± 2%        545ms ± 4%    ~     (p=0.566 n=46+49)
Compiler          2.48s ± 2%        2.48s ± 3%    ~     (p=0.706 n=47+50)
SSA               5.94s ± 3%        5.98s ± 2%  +0.55%  (p=0.040 n=49+50)
Flate             119ms ± 6%        119ms ± 4%    ~     (p=0.681 n=48+47)
GoParser          145ms ± 4%        145ms ± 3%    ~     (p=0.662 n=47+49)
Reflect           348ms ± 3%        344ms ± 3%  -1.17%  (p=0.000 n=47+47)
Tar               105ms ± 4%        104ms ± 3%    ~     (p=0.155 n=50+47)
XML               197ms ± 2%        197ms ± 3%    ~     (p=0.666 n=49+49)
[Geo mean]        332ms             331ms       -0.37%

name        old user-time/op  new user-time/op  delta
Template          230ms ±10%        226ms ±10%  -1.85%  (p=0.041 n=50+50)
Unicode           104ms ± 6%        103ms ± 5%    ~     (p=0.076 n=49+49)
GoTypes           707ms ± 4%        705ms ± 5%    ~     (p=0.521 n=50+50)
Compiler          3.30s ± 3%        3.33s ± 4%  +0.76%  (p=0.003 n=50+49)
SSA               8.17s ± 4%        8.23s ± 3%  +0.66%  (p=0.030 n=50+49)
Flate             139ms ± 6%        138ms ± 8%    ~     (p=0.184 n=49+48)
GoParser          174ms ± 5%        172ms ± 6%    ~     (p=0.107 n=48+49)
Reflect           431ms ± 8%        420ms ± 5%  -2.57%  (p=0.000 n=50+46)
Tar               119ms ± 6%        118ms ± 7%  -0.95%  (p=0.033 n=50+49)
XML               236ms ± 4%        236ms ± 4%    ~     (p=0.935 n=50+48)
[Geo mean]        410ms             407ms       -0.67%

name        old alloc/op      new alloc/op      delta
Template         38.7MB ± 0%       38.6MB ± 0%  -0.29%  (p=0.008 n=5+5)
Unicode          29.8MB ± 0%       29.7MB ± 0%  -0.24%  (p=0.008 n=5+5)
GoTypes           113MB ± 0%        113MB ± 0%  -0.29%  (p=0.008 n=5+5)
Compiler          462MB ± 0%        462MB ± 0%  -0.12%  (p=0.008 n=5+5)
SSA              1.27GB ± 0%       1.27GB ± 0%  -0.05%  (p=0.008 n=5+5)
Flate            25.2MB ± 0%       25.1MB ± 0%  -0.37%  (p=0.008 n=5+5)
GoParser         31.7MB ± 0%       31.6MB ± 0%    ~     (p=0.056 n=5+5)
Reflect          77.5MB ± 0%       77.2MB ± 0%  -0.38%  (p=0.008 n=5+5)
Tar              26.4MB ± 0%       26.3MB ± 0%    ~     (p=0.151 n=5+5)
XML              41.9MB ± 0%       41.9MB ± 0%  -0.20%  (p=0.032 n=5+5)
[Geo mean]       74.5MB            74.3MB       -0.23%

name        old allocs/op     new allocs/op     delta
Template           378k ± 1%         377k ± 1%    ~     (p=0.690 n=5+5)
Unicode            321k ± 0%         322k ± 0%    ~     (p=0.595 n=5+5)
GoTypes           1.14M ± 0%        1.14M ± 0%    ~     (p=0.310 n=5+5)
Compiler          4.25M ± 0%        4.25M ± 0%    ~     (p=0.151 n=5+5)
SSA               9.84M ± 0%        9.84M ± 0%    ~     (p=0.841 n=5+5)
Flate              232k ± 1%         232k ± 0%    ~     (p=0.690 n=5+5)
GoParser           315k ± 1%         315k ± 1%    ~     (p=0.841 n=5+5)
Reflect            970k ± 0%         970k ± 0%    ~     (p=0.841 n=5+5)
Tar                248k ± 0%         248k ± 1%    ~     (p=0.841 n=5+5)
XML                389k ± 0%         389k ± 0%    ~     (p=1.000 n=5+5)
[Geo mean]         724k              724k       +0.01%

Updates #15756

Change-Id: I2646332e89f0444ca9d5a41d7172537d904ed636
Reviewed-on: https://go-review.googlesource.com/41050
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 13:07:00 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Matthew Dempsky
1747078695 cmd/internal/obj: un-embed FuncInfo field in LSym
Automated refactoring using github.com/mdempsky/unbed (to rewrite
s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo
field to just Func).

Passes toolstash-check -all.

Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a
Reviewed-on: https://go-review.googlesource.com/40670
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18 17:29:50 +00:00
Josh Bleecher Snyder
9d4a84677b cmd/internal/obj: unexport Link.Hash
A prior CL eliminated the last reference to Ctxt.Hash
from the compiler.

Change-Id: Ic97ff84ed1a14e0c93fb0e8ec0b2617c3397c0e8
Reviewed-on: https://go-review.googlesource.com/40699
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-18 02:14:02 +00:00
Josh Bleecher Snyder
da15fe6870 cmd/internal/obj: rework gclocals handling
The compiler handled gcargs and gclocals LSyms unusually.
It generated placeholder symbols (makefuncdatasym),
filled them in, and then renamed them for content-addressability.
This is an important binary size optimization;
the same locals information occurs over and over.

This CL continues to treat these LSyms unusually,
but in a slightly more explicit way,
and importantly for concurrent compilation,
in a way that does not require concurrent
modification of Ctxt.Hash.

Instead of creating gcargs and gclocals in the usual way,
by creating a types.Sym and then an obj.LSym,
we add them directly to obj.FuncInfo,
initialize them in obj.InitTextSym,
and deduplicate and add them to ctxt.Data at the end.
Then the backend's job is simply to fill them in
and rename them appropriately.

Updates #15756

name       old alloc/op      new alloc/op      delta
Template        38.8MB ± 0%       38.7MB ± 0%  -0.22%  (p=0.016 n=5+5)
Unicode         29.8MB ± 0%       29.8MB ± 0%    ~     (p=0.690 n=5+5)
GoTypes          113MB ± 0%        113MB ± 0%  -0.24%  (p=0.008 n=5+5)
SSA             1.25GB ± 0%       1.24GB ± 0%  -0.39%  (p=0.008 n=5+5)
Flate           25.3MB ± 0%       25.2MB ± 0%  -0.43%  (p=0.008 n=5+5)
GoParser        31.7MB ± 0%       31.7MB ± 0%  -0.22%  (p=0.008 n=5+5)
Reflect         78.2MB ± 0%       77.6MB ± 0%  -0.80%  (p=0.008 n=5+5)
Tar             26.6MB ± 0%       26.3MB ± 0%  -0.85%  (p=0.008 n=5+5)
XML             42.4MB ± 0%       41.9MB ± 0%  -1.04%  (p=0.008 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          378k ± 0%         377k ± 1%    ~     (p=0.151 n=5+5)
Unicode           321k ± 1%         321k ± 0%    ~     (p=0.841 n=5+5)
GoTypes          1.14M ± 0%        1.14M ± 0%  -0.47%  (p=0.016 n=5+5)
SSA              9.71M ± 0%        9.67M ± 0%  -0.33%  (p=0.008 n=5+5)
Flate             233k ± 1%         232k ± 1%    ~     (p=0.151 n=5+5)
GoParser          316k ± 0%         315k ± 0%  -0.49%  (p=0.016 n=5+5)
Reflect           979k ± 0%         972k ± 0%  -0.75%  (p=0.008 n=5+5)
Tar               250k ± 0%         247k ± 1%  -0.92%  (p=0.008 n=5+5)
XML               392k ± 1%         389k ± 0%  -0.67%  (p=0.008 n=5+5)

Change-Id: Idc36186ca9d2f8214b5f7720bbc27b6bb22fdc48
Reviewed-on: https://go-review.googlesource.com/40697
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-18 02:13:32 +00:00
Josh Bleecher Snyder
7b5f94e76c cmd/internal/obj: cache dwarfSym
Follow-up to review feedback from
mdempsky on CL 40507.

Reduces mutex contention by about 1%.

Change-Id: I540ea6772925f4a59e58f55a3458eff15880c328
Reviewed-on: https://go-review.googlesource.com/40575
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-13 15:18:54 +00:00
Josh Bleecher Snyder
ce3ee7cdae cmd/internal/obj: stop storing Text flags in From3
Prior to this CL, flags such as NOSPLIT
on ATEXT Progs were stored in From3.Offset.
Some but not all of those flags were also
duplicated into From.Sym.Attribute.

This CL migrates all of those flags into
From.Sym.Attribute and stops creating a From3.

A side-effect of this is that printing an
ATEXT Prog can no longer simply dump From3.Offset.
That's kind of good, since the raw flag value
wasn't very informative anyway, but it did
necessitate a bunch of updates to the cmd/asm tests.

The reason I'm doing this work now is that
avoiding storing flags in both From.Sym and From3.Offset
simplifies some other changes to fix the data
race first described in CL 40254.

This CL almost passes toolstash-check -all.
The only changes are in cases where the assembler
has decided that a function's flags may be altered,
e.g. to make a function with no calls in it NOSPLIT.
Prior to this CL, that information was not printed.

Sample before:

"".Ctz64 t=1 size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Sample after:

"".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), NOSPLIT, $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Observe the additional "nosplit" in the first line
and the additional "NOSPLIT" in the second line.

Updates #15756

Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13
Reviewed-on: https://go-review.googlesource.com/40495
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-12 21:53:39 +00:00
Josh Bleecher Snyder
49f4b5a4f5 cmd/internal/obj: remove Link.Plan9privates
Move it to the x86 package, matching our handling
of deferreturn in x86 and arm.
While we're here, improve the concurrency safety
of both Plan9privates and deferreturn
by eagerly initializing them in instinit.

Updates #15756

Change-Id: If3b1995c1e4ec816a5443a18f8d715631967a8b1
Reviewed-on: https://go-review.googlesource.com/40408
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-12 15:56:14 +00:00
Josh Bleecher Snyder
f30de83d79 cmd/internal/obj: remove Link.Version
It is zeroed pointlessly and never read.

Change-Id: I65390501a878f545122ec558cb621b91e394a538
Reviewed-on: https://go-review.googlesource.com/40406
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 15:54:39 +00:00
Josh Bleecher Snyder
30ddffadd5 cmd/internal/obj: remove Link.Debugdivmod
It is only used once and never written to.
Switch to a local constant instead.

Change-Id: Icdd84e47b81f0de44ad9ed56ab5f4f91df22e6b6
Reviewed-on: https://go-review.googlesource.com/40405
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 14:38:15 +00:00
Josh Bleecher Snyder
eacfa59220 cmd/internal/obj: remove dead Link fields
These are unused after CLs 39922, 40252, 40370, 40371, and 40372.

Change-Id: I76f9276c581067a8cb555de761550d960f6e39b8
Reviewed-on: https://go-review.googlesource.com/40404
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-12 14:38:06 +00:00
Josh Bleecher Snyder
1e69245418 cmd/internal/obj/s390x: make assembler almost concurrency-safe
CL 39922 made the arm assembler concurrency-safe.
This CL does the same, but for s390x.
The approach is similar: introduce ctxtz to hold
function-local state and thread it through
the assembler as necessary.

One race remains after this CL, similar to CL 40252.

That race is conceptually unrelated to this refactoring,
and will be addressed in a separate CL.

Passes toolstash-check -all.

Updates #15756

Change-Id: Iabf17aa242b70c0b078c2e85dae3d93a5e512372
Reviewed-on: https://go-review.googlesource.com/40371
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-04-11 15:02:28 +00:00
Josh Bleecher Snyder
b5931020b7 cmd/internal/obj/arm64: make assembler almost concurrency-safe
CL 39922 made the arm assembler concurrency-safe.
This CL does the same, but for arm64.
The approach is similar: introduce ctxt7 to hold
function-local state and thread it through
the assembler as necessary.

One race remains after this CL, deep in aclass,
in the check that a Prog does not take the address
of a TLS variable.

That race is conceptually unrelated to this refactoring,
and will be addressed in a separate CL.

Passes toolstash-check -all.

Updates #15756

Change-Id: Icab1ef70008468f9a5b8bf728a77c4520bbcb67d
Reviewed-on: https://go-review.googlesource.com/40252
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-11 14:34:00 +00:00
Josh Bleecher Snyder
dcf643f1fd cmd/internal/obj/arm: make assembler concurrency-safe
Move global state from obj.Link
to a new function-local state struct arm.ctxt5.

This ends up being cleaner than threading
all the state through as parameters; there's a lot of it.
While we're here, move newprog from a parameter to ctxt5.

We reserve the variable name c for ctxt5,
so a few local variables named c have been renamed.

Instead of lazily initializing deferreturn
and Sym_div and friends, initialize them up front.

Passes toolstash-check -all.

Updates #15756

Change-Id: Ifb4e4b9879e4e1f25e6168d8b7b2a25a3390dc11
Reviewed-on: https://go-review.googlesource.com/39922
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-10 16:18:50 +00:00
Josh Bleecher Snyder
3cf1ce40bd Revert "cmd/compile: output DWARF lexical blocks for local variables"
This reverts commit c8b889cc48.

Reason for revert: broke noopt build, compiler performance regression, new Curfn uses

Let's fix those and then try this again.

Change-Id: Icc3cad1365d04cac8fd09da9dbb0bbf55c13ef44
Reviewed-on: https://go-review.googlesource.com/39991
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-07 19:52:26 +00:00
Alessandro Arzilli
c8b889cc48 cmd/compile: output DWARF lexical blocks for local variables
Change compiler and linker to emit DWARF lexical blocks in debug_info.
Version of debug_info is updated from DWARF v.2 to DWARF v.3 since version 2
does not allow lexical blocks with discontinuous ranges.

Second attempt at https://go-review.googlesource.com/#/c/29591/

Remaining open problems:
- scope information is removed from inlined functions
- variables in debug_info do not have DW_AT_start_scope attributes so a
variable will shadow other variables with the same name as soon as its
containing scope begins, before its declaration.

Updates golang/go#12899, golang/go#6913

Change-Id: I0e260a45b564d14a87b88974eb16c5387cb410a5
Reviewed-on: https://go-review.googlesource.com/36879
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-07 18:15:06 +00:00
Josh Bleecher Snyder
63c1aff60b cmd/internal/obj: eagerly initialize assemblers
CL 38662 changed the x86 assembler to be eagerly
initialized, for a concurrent backend.

This CL puts in place a proper mechanism for doing so,
and switches all architectures to use it.

Passes toolstash-check -all.

Updates #15756

Change-Id: Id2aa527d3a8259c95797d63a2f0d1123e3ca2a1c
Reviewed-on: https://go-review.googlesource.com/39917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-07 16:57:03 +00:00
Josh Bleecher Snyder
5c359d8083 cmd/compile: add Prog cache to Progs
The existing bulk/cached Prog allocator, Ctxt.NewProg, is not concurrency-safe.
This CL moves Prog allocation to its clients, the compiler and the assembler.

The assembler is so fast and generates so few Progs that it does not need
optimization of Prog allocation. I could not generate measureable changes.
And even if I could, the assembly is a miniscule portion of build times.

The compiler already has a natural place to manage Prog allocation;
this CL migrates the Prog cache there.
It will be made concurrency-safe in a later CL by
partitioning the Prog cache into chunks and assigning each chunk
to a different goroutine to manage.

This CL does cause a performance degradation when the compiler
is invoked with the -S flag (to dump assembly).
However, such usage is rare and almost always done manually.
The one instance I know of in a test is TestAssembly
in cmd/compile/internal/gc, and I did not detect
a measurable performance impact there.

Passes toolstash-check -all.
Minor compiler performance impact.

Updates #15756

Performance impact from just this CL:

name        old time/op     new time/op     delta
Template        213ms ± 4%      213ms ± 4%    ~     (p=0.571 n=49+49)
Unicode        89.1ms ± 3%     89.4ms ± 3%    ~     (p=0.388 n=47+48)
GoTypes         581ms ± 2%      584ms ± 3%  +0.56%  (p=0.019 n=47+48)
SSA             6.48s ± 2%      6.53s ± 2%  +0.84%  (p=0.000 n=47+49)
Flate           128ms ± 4%      128ms ± 4%    ~     (p=0.832 n=49+49)
GoParser        152ms ± 3%      152ms ± 3%    ~     (p=0.815 n=48+47)
Reflect         371ms ± 4%      371ms ± 3%    ~     (p=0.617 n=50+47)
Tar             112ms ± 4%      112ms ± 3%    ~     (p=0.724 n=49+49)
XML             208ms ± 3%      208ms ± 4%    ~     (p=0.678 n=49+50)
[Geo mean]      284ms           285ms       +0.18%

name        old user-ns/op  new user-ns/op  delta
Template         251M ± 7%       252M ±11%    ~     (p=0.704 n=49+50)
Unicode          107M ± 7%       108M ± 5%  +1.25%  (p=0.036 n=50+49)
GoTypes          738M ± 3%       740M ± 3%    ~     (p=0.305 n=49+48)
SSA             8.83G ± 2%      8.86G ± 4%    ~     (p=0.098 n=47+50)
Flate            146M ± 6%       147M ± 3%    ~     (p=0.584 n=48+41)
GoParser         178M ± 6%       179M ± 5%  +0.93%  (p=0.036 n=49+48)
Reflect          441M ± 4%       446M ± 7%    ~     (p=0.218 n=44+49)
Tar              126M ± 5%       126M ± 5%    ~     (p=0.766 n=48+49)
XML              245M ± 5%       244M ± 4%    ~     (p=0.359 n=50+50)
[Geo mean]       341M            342M       +0.51%

Performance impact from this CL combined with its parent:

name        old time/op     new time/op     delta
Template        213ms ± 3%      214ms ± 4%    ~     (p=0.685 n=47+50)
Unicode        89.8ms ± 6%     90.5ms ± 6%    ~     (p=0.055 n=50+50)
GoTypes         584ms ± 3%      585ms ± 2%    ~     (p=0.710 n=49+47)
SSA             6.50s ± 2%      6.53s ± 2%  +0.39%  (p=0.011 n=46+50)
Flate           128ms ± 3%      128ms ± 4%    ~     (p=0.855 n=47+49)
GoParser        152ms ± 3%      152ms ± 3%    ~     (p=0.666 n=49+49)
Reflect         371ms ± 3%      372ms ± 3%    ~     (p=0.298 n=48+48)
Tar             112ms ± 5%      113ms ± 3%    ~     (p=0.107 n=49+49)
XML             208ms ± 3%      208ms ± 2%    ~     (p=0.881 n=50+49)
[Geo mean]      285ms           285ms       +0.26%

name        old user-ns/op  new user-ns/op  delta
Template         254M ± 9%       252M ± 8%    ~     (p=0.290 n=49+50)
Unicode          106M ± 6%       108M ± 7%  +1.44%  (p=0.034 n=50+50)
GoTypes          741M ± 4%       743M ± 4%    ~     (p=0.992 n=50+49)
SSA             8.86G ± 2%      8.83G ± 3%    ~     (p=0.158 n=47+49)
Flate            147M ± 4%       148M ± 5%    ~     (p=0.832 n=50+49)
GoParser         179M ± 5%       178M ± 5%    ~     (p=0.370 n=48+50)
Reflect          441M ± 6%       445M ± 7%    ~     (p=0.246 n=45+47)
Tar              126M ± 6%       126M ± 6%    ~     (p=0.815 n=49+50)
XML              244M ± 3%       245M ± 4%    ~     (p=0.190 n=50+50)
[Geo mean]       342M            342M       +0.17%

Change-Id: I020f1c079d495fbe2e15ccb51e1ea2cc1b5a1855
Reviewed-on: https://go-review.googlesource.com/39634
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-06 04:53:50 +00:00
Josh Bleecher Snyder
5b59b32c97 cmd/compile: teach assemblers to accept a Prog allocator
The existing bulk Prog allocator is not concurrency-safe.
To allow for concurrency-safe bulk allocation of Progs,
I want to move Prog allocation and caching upstream,
to the clients of cmd/internal/obj.

This is a preliminary enabling refactoring.
After this CL, instead of calling Ctxt.NewProg
throughout the assemblers, we thread through
a newprog function that returns a new Prog.

That function is set up to be Ctxt.NewProg,
so there are no real changes in this CL;
this CL only establishes the plumbing.

Passes toolstash-check -all.
Negligible compiler performance impact.

Updates #15756

name        old time/op     new time/op     delta
Template        213ms ± 3%      214ms ± 4%    ~     (p=0.574 n=49+47)
Unicode        90.1ms ± 5%     89.9ms ± 4%    ~     (p=0.417 n=50+49)
GoTypes         585ms ± 4%      584ms ± 3%    ~     (p=0.466 n=49+49)
SSA             6.50s ± 3%      6.52s ± 2%    ~     (p=0.251 n=49+49)
Flate           128ms ± 4%      128ms ± 4%    ~     (p=0.673 n=49+50)
GoParser        152ms ± 3%      152ms ± 3%    ~     (p=0.810 n=48+49)
Reflect         372ms ± 4%      372ms ± 5%    ~     (p=0.778 n=49+50)
Tar             113ms ± 5%      111ms ± 4%  -0.98%  (p=0.016 n=50+49)
XML             208ms ± 3%      208ms ± 2%    ~     (p=0.483 n=47+49)
[Geo mean]      285ms           285ms       -0.17%

name        old user-ns/op  new user-ns/op  delta
Template         253M ± 8%       254M ± 9%    ~     (p=0.899 n=50+50)
Unicode          106M ± 9%       106M ±11%    ~     (p=0.642 n=50+50)
GoTypes          736M ± 4%       740M ± 4%    ~     (p=0.121 n=50+49)
SSA             8.82G ± 3%      8.88G ± 2%  +0.65%  (p=0.006 n=49+48)
Flate            147M ± 4%       147M ± 5%    ~     (p=0.844 n=47+48)
GoParser         179M ± 4%       178M ± 6%    ~     (p=0.785 n=50+50)
Reflect          443M ± 6%       441M ± 5%    ~     (p=0.850 n=48+47)
Tar              126M ± 5%       126M ± 5%    ~     (p=0.734 n=50+50)
XML              244M ± 5%       244M ± 5%    ~     (p=0.594 n=49+50)
[Geo mean]       341M            341M       +0.11%

Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68
Reviewed-on: https://go-review.googlesource.com/39633
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-06 02:07:21 +00:00
Josh Bleecher Snyder
26308fb481 cmd/internal/obj: use string instead of LSym in Pcln
In a concurrent backend, Ctxt.Lookup will need some
form of concurrency protection, which will make it
more expensive.

This CL changes the pcln table builder to track
filenames as strings rather than LSyms.
Those strings are then converted into LSyms
at the last moment, for writing the object file.

This CL removes over 85% of the calls to Ctxt.Lookup
in a run of make.bash.

Passes toolstash-check.

Updates #15756

Change-Id: I3c53deff6f16f2643169f3bdfcc7aca2ca58b0a4
Reviewed-on: https://go-review.googlesource.com/39291
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-03 15:19:47 +00:00
Alex Brainman
361af94d5d cmd/internal/obj, cmd/link: remove Hwindowsgui everywhere
Hwindowsgui has the same meaning as Hwindows - build PE
executable. So use Hwindows everywhere.

Change-Id: I2cae5777f17c7bc3a043dfcd014c1620cc35fc20
Reviewed-on: https://go-review.googlesource.com/38761
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-30 22:51:43 +00:00
Josh Bleecher Snyder
7e817859b3 cmd/internal/obj: eliminate Curp
Remove the global obj.Link.Curp.

In asmz.go, replace the only use by passing it as an argument.
In asm0.go and asm9.go, it was written but never read.
In asm5.go and asm7.go, thread it through as an argument.

Passes toolstash-check -all.

Updates #15756

Change-Id: I1a0faa89e768820f35d73a8b37ec8088d78d15f7
Reviewed-on: https://go-review.googlesource.com/38715
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-27 18:51:42 +00:00
Josh Bleecher Snyder
cf2b32e719 cmd/internal/obj: eliminate Prog.Mode
Follow-up to CL 38446.

Passes toolstash-check -all.

Change-Id: I04cadc058cbaa5f396136502c574e5a395a33276
Reviewed-on: https://go-review.googlesource.com/38669
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-26 19:48:48 +00:00
Josh Bleecher Snyder
4f122e82fe cmd/internal/obj: move fields from obj.Link to x86.AsmBuf
These fields are used to encode a single instruction.
Add them to AsmBuf, which is also per-instruction,
and which is not global.

Updates #15756

Change-Id: I0e5ea22ffa641b07291e27de6e2ff23b6dc534bd
Reviewed-on: https://go-review.googlesource.com/38668
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-26 19:48:00 +00:00
Josh Bleecher Snyder
2ebe1bdd89 cmd/internal/obj: make x86's asmbuf a local variable
The x86 assembler requires a buffer to build
variable-length instructions.
It used to be an obj.Link field.
That doesn't play nicely with concurrent assembly.
Move the AsmBuf type to the x86 package,
where it belongs anyway,
and make it a local variable.

Passes toolstash-check -all.
No compiler performance impact.

Updates #15756

Change-Id: I8014e52145380bfd378ee374a0c971ee5bada917
Reviewed-on: https://go-review.googlesource.com/38663
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-26 04:12:26 +00:00
Josh Bleecher Snyder
6652572b75 cmd/compile: thread Curfn through to debuginfo
Updates #15756

Change-Id: I860dd45cae9d851c7844654621bbc99efe7c7f03
Reviewed-on: https://go-review.googlesource.com/38591
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-24 16:22:58 +00:00
Josh Bleecher Snyder
cfb3c8df62 cmd/internal/obj: eliminate Link.Asmode
Asmode is always set to p.Mode,
which is always set based on the arch family.
Instead, use the arch family directly.

Passes toolstash-check -all.

Change-Id: Id982472dcc8eeb6dd22cac5ad2f116b54a44caee
Reviewed-on: https://go-review.googlesource.com/38451
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-22 17:33:27 +00:00
Josh Bleecher Snyder
a470e5d4b8 cmd/internal/obj: eliminate Ctxt.Mode
Replace Ctxt.Mode with a method, Ctxt.RegWidth,
which is calculated directly off the arch info.

I believe that Prog.Mode can also be removed; future CL.

This is a step towards obj.Link immutability.

Passes toolstash-check -all.

Updates #15756

Change-Id: Ifd7f8f6ed0a2fdc032d1dd306fcd695a14aa5bc5
Reviewed-on: https://go-review.googlesource.com/38446
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-22 17:27:43 +00:00
Matthew Dempsky
7bb5b2d33a cmd/internal/obj: remove unneeded Addr.Node and Prog.Opt fields
Change-Id: I218b241c32a5948b66ad0d95ecc368648cf4ddf5
Reviewed-on: https://go-review.googlesource.com/38130
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-20 23:49:29 +00:00
Matthew Dempsky
cce4c319d6 cmd/internal/obj: remove unneeded AVARFOO ops
Change-Id: I10e36046ebce8a8741ef019cfe266b9ac9fa322d
Reviewed-on: https://go-review.googlesource.com/38088
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-20 23:40:09 +00:00
Josh Bleecher Snyder
42a915c933 cmd/internal/obj: convert Debug* Link fields into bools
Change-Id: I9ac274dbfe887675a7820d2f8f87b5887b1c9b0e
Reviewed-on: https://go-review.googlesource.com/38383
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-20 22:08:41 +00:00
Matthew Dempsky
68177d9ec0 cmd/internal/obj: move dwarf.Var generation into compiler
Passes toolstash -cmp.

Change-Id: I4bd60f7ebba5457e7b3ece688fee2351bfeeb59a
Reviewed-on: https://go-review.googlesource.com/37874
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2017-03-07 17:32:54 +00:00
Matthew Dempsky
9cb2ee0ff2 cmd/internal/obj: move STEXT-only LSym fields into new FuncInfo struct
Shrinks LSym somewhat for non-STEXT LSyms, which are much more common.

While here, switch to tracking Automs in a slice instead of a linked
list. (Previously, this would have made LSyms larger.)

Passes toolstash-check.

Change-Id: I082e50e1d1f1b544c9e06b6e412a186be6a4a2b5
Reviewed-on: https://go-review.googlesource.com/37872
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-06 22:17:23 +00:00
Matthew Dempsky
7a98bdf1c2 cmd/internal/obj: remove AUSEFIELD pseudo-op
Instead, cmd/compile can directly emit R_USEFIELD relocations.

Manually verified rsc.io/tmp/fieldtrack still passes.

Change-Id: Ib1fb5ab902ff0ad17ef6a862a9a5692caf7f87d1
Reviewed-on: https://go-review.googlesource.com/37871
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-06 22:16:13 +00:00
Eitan Adler
789c5255a4 all: remove the the duplicate words
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-06 04:39:12 +00:00
David Lazar
0824ae6dc1 cmd/compile: add flag for debugging PC-value tables
For example, `-d pctab=pctoinline` prints the PC-inline table and
inlining tree for every function.

Change-Id: Ia6b9ce4d83eed0b494318d40ffe06481ec5d58ab
Reviewed-on: https://go-review.googlesource.com/37235
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:38 +00:00
David Lazar
301149b9e4 cmd/internal/obj: avoid duplicate file name symbols
The meaning of Version=1 was overloaded: it was reserved for file name
symbols (to avoid conflicts with non-file name symbols), but was also
used to mean "give me a fresh version number for this symbol."

With the new inlining tree, the same file name symbol can appear in
multiple entries, but each one would become a distinct symbol with its
own version number.

Now, we avoid duplicating symbols by using Version=0 for file name
symbols and we avoid conflicts with other symbols by prefixing the
symbol name with "gofile..".

Change-Id: I8d0374053b8cdb6a9ca7fb71871b69b4dd369a9c
Reviewed-on: https://go-review.googlesource.com/37234
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-03 21:29:36 +00:00
David Lazar
699175a11a cmd/compile,link: generate PC-value tables with inlining information
In order to generate accurate tracebacks, the runtime needs to know the
inlined call stack for a given PC. This creates two tables per function
for this purpose. The first table is the inlining tree (stored in the
function's funcdata), which has a node containing the file, line, and
function name for every inlined call. The second table is a PC-value
table that maps each PC to a node in the inlining tree (or -1 if the PC
is not the result of inlining).

To give the appearance that inlining hasn't happened, the runtime also
needs the original source position information of inlined AST nodes.
Previously the compiler plastered over the line numbers of inlined AST
nodes with the line number of the call. This meant that the PC-line
table mapped each PC to line number of the outermost call in its inlined
call stack, with no way to access the innermost line number.

Now the compiler retains line numbers of inlined AST nodes and writes
the innermost source position information to the PC-line and PC-file
tables. Some tools and tests expect to see outermost line numbers, so we
provide the OutermostLine function for displaying line info.

To keep track of the inlined call stack for an AST node, we extend the
src.PosBase type with an index into a global inlining tree. Every time
the compiler inlines a call, it creates a node in the global inlining
tree for the call, and writes its index to the PosBase of every inlined
AST node. The parent of this node is the inlining tree index of the
call. -1 signifies no parent.

For each function, the compiler creates a local inlining tree and a
PC-value table mapping each PC to an index in the local tree.  These are
written to an object file, which is read by the linker.  The linker
re-encodes these tables compactly by deduplicating function names and
file names.

This change increases the size of binaries by 4-5%. For example, this is
how the go1 benchmark binary is impacted by this change:

section             old bytes   new bytes   delta
.text               3.49M ± 0%  3.49M ± 0%   +0.06%
.rodata             1.12M ± 0%  1.21M ± 0%   +8.21%
.gopclntab          1.50M ± 0%  1.68M ± 0%  +11.89%
.debug_line          338k ± 0%   435k ± 0%  +28.78%
Total               9.21M ± 0%  9.58M ± 0%   +4.01%

Updates #19348.

Change-Id: Ic4f180c3b516018138236b0c35e0218270d957d3
Reviewed-on: https://go-review.googlesource.com/37231
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2017-03-03 21:29:30 +00:00
Heschi Kreinick
ac7761e1a4 cmd/compile, cmd/asm: remove Link.Plists
Link.Plists never contained more than one Plist, and sometimes none.
Passing around the Plist being worked on is straightforward and makes
the data flow easier to follow.

Change-Id: I79cb30cb2bd3d319fdbb1dfa5d35b27fcb748e5c
Reviewed-on: https://go-review.googlesource.com/37169
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-01 00:29:23 +00:00
Matthew Dempsky
02de5ed748 cmd/internal/obj: add AddrName type and cleanup AddrType values
Passes toolstash -cmp.

Change-Id: Ida3eda9bd9d79a34c1c3f18cb41aea9392698076
Reviewed-on: https://go-review.googlesource.com/36950
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-13 21:56:17 +00:00
Matthew Dempsky
7bad00366b cmd/internal/obj: remove ATYPE
In cmd/compile, we can directly construct obj.Auto to represent local
variables and attach them to the function's obj.LSym.

In preparation for being able to emit more precise DWARF info based on
other compiler available information (e.g., lexical scoping).

Change-Id: I9c4225ec59306bec42552838493022e0e9d70228
Reviewed-on: https://go-review.googlesource.com/36420
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-07 22:38:18 +00:00