runtime: replace GC programs with simpler encoding, faster decoder

Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.

For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.

This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.

Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:

(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.

(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.

(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)

Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:

(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)

(2) There is no case 2 anymore.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.

Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.

Benchmarks for heapBitsSetType, before this CL vs this CL:

name                    old mean               new mean              delta
SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)

The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.

Benchmarks for heapBitsSetType, Go 1.4 vs this CL:

name                    old mean              new mean              delta
SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)

SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.

SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).

Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.

The Go 1 benchmarks are basically unaffected compared to just before this CL.

Go 1 benchmarks, before this CL vs this CL:

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)

For the record, Go 1 benchmarks, Go 1.4 vs this CL:

name                   old mean              new mean              delta
BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)

This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).

Fixes #9625.
Fixes #10524.

Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
This commit is contained in:
Russ Cox 2015-05-08 01:43:18 -04:00
parent ebe733cb40
commit 512f75e8df
19 changed files with 1418 additions and 816 deletions

View file

@ -247,17 +247,17 @@ const (
type rtype struct {
size uintptr
ptrdata uintptr
hash uint32 // hash of type; avoids computation in hash tables
_ uint8 // unused/padding
align uint8 // alignment of variable with this type
fieldAlign uint8 // alignment of struct field with this type
kind uint8 // enumeration for C
alg *typeAlg // algorithm table
gc [2]unsafe.Pointer // garbage collection data
string *string // string form; unnecessary but undeniably useful
*uncommonType // (relatively) uncommon fields
ptrToThis *rtype // type for pointer to this type, if used in binary or has methods
zero unsafe.Pointer // pointer to zero value
hash uint32 // hash of type; avoids computation in hash tables
_ uint8 // unused/padding
align uint8 // alignment of variable with this type
fieldAlign uint8 // alignment of struct field with this type
kind uint8 // enumeration for C
alg *typeAlg // algorithm table
gcdata *byte // garbage collection data
string *string // string form; unnecessary but undeniably useful
*uncommonType // (relatively) uncommon fields
ptrToThis *rtype // type for pointer to this type, if used in binary or has methods
zero unsafe.Pointer // pointer to zero value
}
// a copy of runtime.typeAlg
@ -1670,111 +1670,14 @@ func isReflexive(t *rtype) bool {
}
}
// gcProg is a helper type for generatation of GC pointer info.
type gcProg struct {
gc []byte
size uintptr // size of type in bytes
hasPtr bool
lastZero uintptr // largest offset of a zero-byte field
}
func (gc *gcProg) append(v byte) {
gc.align(unsafe.Sizeof(uintptr(0)))
gc.appendWord(v)
}
// Appends t's type info to the current program.
func (gc *gcProg) appendProg(t *rtype) {
gc.align(uintptr(t.align))
if !t.pointers() {
gc.size += t.size
if t.size == 0 {
gc.lastZero = gc.size
}
return
}
switch t.Kind() {
default:
panic("reflect: non-pointer type marked as having pointers")
case Ptr, UnsafePointer, Chan, Func, Map:
gc.appendWord(1)
case Slice:
gc.appendWord(1)
gc.appendWord(0)
gc.appendWord(0)
case String:
gc.appendWord(1)
gc.appendWord(0)
case Array:
c := t.Len()
e := t.Elem().common()
for i := 0; i < c; i++ {
gc.appendProg(e)
}
case Interface:
gc.appendWord(1)
gc.appendWord(1)
case Struct:
oldsize := gc.size
c := t.NumField()
for i := 0; i < c; i++ {
gc.appendProg(t.Field(i).Type.common())
}
if gc.size > oldsize+t.size {
panic("reflect: struct components are larger than the struct itself")
}
gc.size = oldsize + t.size
}
}
func (gc *gcProg) appendWord(v byte) {
ptrsize := unsafe.Sizeof(uintptr(0))
if gc.size%ptrsize != 0 {
panic("reflect: unaligned GC program")
}
nptr := gc.size / ptrsize
for uintptr(len(gc.gc)) <= nptr/8 {
gc.gc = append(gc.gc, 0)
}
gc.gc[nptr/8] |= v << (nptr % 8)
gc.size += ptrsize
if v == 1 {
gc.hasPtr = true
}
}
func (gc *gcProg) finalize() (unsafe.Pointer, bool) {
if gc.size == 0 {
return nil, false
}
if gc.lastZero == gc.size {
gc.size++
}
ptrsize := unsafe.Sizeof(uintptr(0))
gc.align(ptrsize)
nptr := gc.size / ptrsize
for uintptr(len(gc.gc)) <= nptr/8 {
gc.gc = append(gc.gc, 0)
}
return unsafe.Pointer(&gc.gc[0]), gc.hasPtr
}
func extractGCWord(gc []byte, i uintptr) byte {
return gc[i/8] >> (i % 8) & 1
}
func (gc *gcProg) align(a uintptr) {
gc.size = align(gc.size, a)
}
// Make sure these routines stay in sync with ../../runtime/hashmap.go!
// These types exist only for GC, so we only fill out GC relevant info.
// Currently, that's just size and the GC program. We also fill in string
// for possible debugging use.
const (
bucketSize = 8
maxKeySize = 128
maxValSize = 128
bucketSize uintptr = 8
maxKeySize uintptr = 128
maxValSize uintptr = 128
)
func bucketOf(ktyp, etyp *rtype) *rtype {
@ -1791,33 +1694,70 @@ func bucketOf(ktyp, etyp *rtype) *rtype {
if etyp.size > maxValSize {
etyp = PtrTo(etyp).(*rtype)
}
ptrsize := unsafe.Sizeof(uintptr(0))
var gc gcProg
// topbits
for i := 0; i < int(bucketSize*unsafe.Sizeof(uint8(0))/ptrsize); i++ {
gc.append(0)
// Prepare GC data if any.
// A bucket is at most bucketSize*(1+maxKeySize+maxValSize)+2*ptrSize bytes,
// or 2072 bytes, or 259 pointer-size words, or 33 bytes of pointer bitmap.
// Normally the enforced limit on pointer maps is 16 bytes,
// but larger ones are acceptable, 33 bytes isn't too too big,
// and it's easier to generate a pointer bitmap than a GC program.
// Note that since the key and value are known to be <= 128 bytes,
// they're guaranteed to have bitmaps instead of GC programs.
var gcdata *byte
var ptrdata uintptr
if kind != kindNoPointers {
nptr := (bucketSize*(1+ktyp.size+etyp.size) + ptrSize) / ptrSize
mask := make([]byte, (nptr+7)/8)
base := bucketSize / ptrSize
if ktyp.kind&kindNoPointers == 0 {
if ktyp.kind&kindGCProg != 0 {
panic("reflect: unexpected GC program in MapOf")
}
kmask := (*[16]byte)(unsafe.Pointer(ktyp.gcdata))
for i := uintptr(0); i < ktyp.size/ptrSize; i++ {
if (kmask[i/8]>>(i%8))&1 != 0 {
for j := uintptr(0); j < bucketSize; j++ {
word := base + j*ktyp.size/ptrSize + i
mask[word/8] |= 1 << (word % 8)
}
}
}
}
base += bucketSize * ktyp.size / ptrSize
if etyp.kind&kindNoPointers == 0 {
if etyp.kind&kindGCProg != 0 {
panic("reflect: unexpected GC program in MapOf")
}
emask := (*[16]byte)(unsafe.Pointer(etyp.gcdata))
for i := uintptr(0); i < etyp.size/ptrSize; i++ {
if (emask[i/8]>>(i%8))&1 != 0 {
for j := uintptr(0); j < bucketSize; j++ {
word := base + j*etyp.size/ptrSize + i
mask[word/8] |= 1 << (word % 8)
}
}
}
}
base += bucketSize * etyp.size / ptrSize
word := base
mask[word/8] |= 1 << (word % 8)
gcdata = &mask[0]
ptrdata = (word + 1) * ptrSize
}
// keys
for i := 0; i < bucketSize; i++ {
gc.appendProg(ktyp)
}
// values
for i := 0; i < bucketSize; i++ {
gc.appendProg(etyp)
}
// overflow
gc.append(1)
ptrdata := gc.size
size := bucketSize*(1+ktyp.size+etyp.size) + ptrSize
if runtime.GOARCH == "amd64p32" {
gc.append(0)
size += ptrSize
}
b := new(rtype)
b.size = gc.size
b.size = size
b.ptrdata = ptrdata
b.kind = kind
b.gc[0], _ = gc.finalize()
b.gcdata = gcdata
s := "bucket(" + *ktyp.string + "," + *etyp.string + ")"
b.string = &s
return b
@ -1911,19 +1851,83 @@ func ArrayOf(count int, elem Type) Type {
array.len = uintptr(count)
array.slice = slice.(*rtype)
var gc gcProg
// TODO(sbinet): count could be possibly very large.
// use insArray directives from ../runtime/mbitmap.go.
for i := 0; i < count; i++ {
gc.appendProg(typ)
}
var hasPtr bool
array.gc[0], hasPtr = gc.finalize()
if !hasPtr {
array.kind &^= kindNoPointers
switch {
case typ.kind&kindNoPointers != 0 || array.size == 0:
// No pointers.
array.kind |= kindNoPointers
} else {
array.kind &^= kindNoPointers
array.gcdata = nil
array.ptrdata = 0
case count == 1:
// In memory, 1-element array looks just like the element.
array.kind |= typ.kind & kindGCProg
array.gcdata = typ.gcdata
array.ptrdata = typ.ptrdata
case typ.kind&kindGCProg == 0 && array.size <= 16*8*ptrSize:
// Element is small with pointer mask; array is still small.
// Create direct pointer mask by turning each 1 bit in elem
// into count 1 bits in larger mask.
mask := make([]byte, (array.ptrdata/ptrSize+7)/8)
elemMask := (*[1 << 30]byte)(unsafe.Pointer(typ.gcdata))[:]
elemWords := typ.size / ptrSize
for j := uintptr(0); j < typ.ptrdata/ptrSize; j++ {
if (elemMask[j/8]>>(j%8))&1 != 0 {
for i := uintptr(0); i < array.len; i++ {
k := i*elemWords + j
mask[k/8] |= 1 << (k % 8)
}
}
}
array.gcdata = &mask[0]
default:
// Create program that emits one element
// and then repeats to make the array.
prog := []byte{0, 0, 0, 0} // will be length of prog
elemGC := (*[1 << 30]byte)(unsafe.Pointer(typ.gcdata))[:]
elemPtrs := typ.ptrdata / ptrSize
if typ.kind&kindGCProg == 0 {
// Element is small with pointer mask; use as literal bits.
mask := elemGC
// Emit 120-bit chunks of full bytes (max is 127 but we avoid using partial bytes).
var n uintptr
for n = elemPtrs; n > 120; n -= 120 {
prog = append(prog, 120)
prog = append(prog, mask[:15]...)
mask = mask[15:]
}
prog = append(prog, byte(n))
prog = append(prog, mask[:(n+7)/8]...)
} else {
// Element has GC program; emit one element.
elemProg := elemGC[4 : 4+*(*uint32)(unsafe.Pointer(&elemGC[0]))-1]
prog = append(prog, elemProg...)
}
// Pad from ptrdata to size.
elemWords := typ.size / ptrSize
if elemPtrs < elemWords {
// Emit literal 0 bit, then repeat as needed.
prog = append(prog, 0x01, 0x00)
if elemPtrs+1 < elemWords {
prog = append(prog, 0x81)
prog = appendVarint(prog, elemWords-elemPtrs-1)
}
}
// Repeat count-1 times.
if elemWords < 0x80 {
prog = append(prog, byte(elemWords|0x80))
} else {
prog = append(prog, 0x80)
prog = appendVarint(prog, elemWords)
}
prog = appendVarint(prog, uintptr(count)-1)
prog = append(prog, 0)
*(*uint32)(unsafe.Pointer(&prog[0])) = uint32(len(prog) - 4)
array.kind |= kindGCProg
array.gcdata = &prog[0]
array.ptrdata = array.size // overestimate but ok; must match program
}
etyp := typ.common()
@ -1967,6 +1971,14 @@ func ArrayOf(count int, elem Type) Type {
return cachePut(ckey, &array.rtype)
}
func appendVarint(x []byte, v uintptr) []byte {
for ; v >= 0x80; v >>= 7 {
x = append(x, byte(v|0x80))
}
x = append(x, byte(v))
return x
}
// toType converts from a *rtype to a Type that can be returned
// to the client of package reflect. In gc, the only concern is that
// a nil *rtype must be replaced by a nil Type, but in gccgo this
@ -2003,7 +2015,7 @@ var layoutCache struct {
// The returned type exists only for GC, so we only fill out GC relevant info.
// Currently, that's just size and the GC program. We also fill in
// the name for possible debugging use.
func funcLayout(t *rtype, rcvr *rtype) (frametype *rtype, argSize, retOffset uintptr, stack *bitVector, framePool *sync.Pool) {
func funcLayout(t *rtype, rcvr *rtype) (frametype *rtype, argSize, retOffset uintptr, stk *bitVector, framePool *sync.Pool) {
if t.Kind() != Func {
panic("reflect: funcLayout of non-func type")
}
@ -2026,53 +2038,47 @@ func funcLayout(t *rtype, rcvr *rtype) (frametype *rtype, argSize, retOffset uin
tt := (*funcType)(unsafe.Pointer(t))
// compute gc program & stack bitmap for arguments
stack = new(bitVector)
var gc gcProg
ptrmap := new(bitVector)
var offset uintptr
if rcvr != nil {
// Reflect uses the "interface" calling convention for
// methods, where receivers take one word of argument
// space no matter how big they actually are.
if ifaceIndir(rcvr) {
// we pass a pointer to the receiver.
gc.append(1)
stack.append2(1)
} else if rcvr.pointers() {
// rcvr is a one-word pointer object. Its gc program
// is just what we need here.
gc.append(1)
stack.append2(1)
} else {
gc.append(0)
stack.append2(0)
if ifaceIndir(rcvr) || rcvr.pointers() {
ptrmap.append(1)
}
offset += ptrSize
}
for _, arg := range tt.in {
gc.appendProg(arg)
addTypeBits(stack, &offset, arg)
offset += -offset & uintptr(arg.align-1)
addTypeBits(ptrmap, offset, arg)
offset += arg.size
}
argSize = gc.size
argN := ptrmap.n
argSize = offset
if runtime.GOARCH == "amd64p32" {
gc.align(8)
offset += -offset & (8 - 1)
}
gc.align(ptrSize)
retOffset = gc.size
offset += -offset & (ptrSize - 1)
retOffset = offset
for _, res := range tt.out {
gc.appendProg(res)
// stack map does not need result bits
offset += -offset & uintptr(res.align-1)
addTypeBits(ptrmap, offset, res)
offset += res.size
}
gc.align(ptrSize)
offset += -offset & (ptrSize - 1)
// build dummy rtype holding gc program
x := new(rtype)
x.size = gc.size
x.ptrdata = gc.size // over-approximation
var hasPtr bool
x.gc[0], hasPtr = gc.finalize()
if !hasPtr {
x.size = offset
x.ptrdata = uintptr(ptrmap.n) * ptrSize
if ptrmap.n > 0 {
x.gcdata = &ptrmap.data[0]
} else {
x.kind |= kindNoPointers
}
ptrmap.n = argN
var s string
if rcvr != nil {
s = "methodargs(" + *rcvr.string + ")(" + *t.string + ")"
@ -2092,11 +2098,11 @@ func funcLayout(t *rtype, rcvr *rtype) (frametype *rtype, argSize, retOffset uin
t: x,
argSize: argSize,
retOffset: retOffset,
stack: stack,
stack: ptrmap,
framePool: framePool,
}
layoutCache.Unlock()
return x, argSize, retOffset, stack, framePool
return x, argSize, retOffset, ptrmap, framePool
}
// ifaceIndir reports whether t is stored indirectly in an interface value.
@ -2110,56 +2116,49 @@ type bitVector struct {
data []byte
}
// append a bit pair to the bitmap.
func (bv *bitVector) append2(bits uint8) {
// assume bv.n is a multiple of 2, since append2 is the only operation.
// append a bit to the bitmap.
func (bv *bitVector) append(bit uint8) {
if bv.n%8 == 0 {
bv.data = append(bv.data, 0)
}
bv.data[bv.n/8] |= bits << (bv.n % 8)
bv.n += 2
bv.data[bv.n/8] |= bit << (bv.n % 8)
bv.n++
}
func addTypeBits(bv *bitVector, offset *uintptr, t *rtype) {
*offset = align(*offset, uintptr(t.align))
if !t.pointers() {
*offset += t.size
func addTypeBits(bv *bitVector, offset uintptr, t *rtype) {
if t.kind&kindNoPointers != 0 {
return
}
switch Kind(t.kind & kindMask) {
case Chan, Func, Map, Ptr, Slice, String, UnsafePointer:
// 1 pointer at start of representation
for bv.n < 2*uint32(*offset/uintptr(ptrSize)) {
bv.append2(0)
for bv.n < uint32(offset/uintptr(ptrSize)) {
bv.append(0)
}
bv.append2(1)
bv.append(1)
case Interface:
// 2 pointers
for bv.n < 2*uint32(*offset/uintptr(ptrSize)) {
bv.append2(0)
for bv.n < uint32(offset/uintptr(ptrSize)) {
bv.append(0)
}
bv.append2(1)
bv.append2(1)
bv.append(1)
bv.append(1)
case Array:
// repeat inner type
tt := (*arrayType)(unsafe.Pointer(t))
for i := 0; i < int(tt.len); i++ {
addTypeBits(bv, offset, tt.elem)
addTypeBits(bv, offset+uintptr(i)*tt.elem.size, tt.elem)
}
case Struct:
// apply fields
tt := (*structType)(unsafe.Pointer(t))
start := *offset
for i := range tt.fields {
f := &tt.fields[i]
off := start + f.offset
addTypeBits(bv, &off, f.typ)
addTypeBits(bv, offset+f.offset, f.typ)
}
}
*offset += t.size
}