The encoding/json package uses NumMethod()==0 as a fast check for
interface satisfaction. In the case when a type has no methods at
all, we don't need to grab the RWMutex.
Improves JSON decoding benchmark on linux/amd64:
name old time/op new time/op delta
CodeDecoder-8 44.2ms ± 2% 40.6ms ± 1% -8.11% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 43.9MB/s ± 2% 47.8MB/s ± 1% +8.82% (p=0.000 n=10+10)
For #16117
Change-Id: Id717e7fcd2f41b7d51d50c26ac167af45bae3747
Reviewed-on: https://go-review.googlesource.com/24433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This was removed in CL 19695 but it slows down reflect.New, which ends
up on the hot path of things like JSON decoding.
There is no immediate cost in binary size, but it will make it harder to
further shrink run time type information in Go 1.8.
Before
BenchmarkNew-40 30000000 36.3 ns/op
After
BenchmarkNew-40 50000000 29.5 ns/op
Fixes#16161
Updates #16117
Change-Id: If7cb7f3e745d44678f3f5cf3a5338c59847529d2
Reviewed-on: https://go-review.googlesource.com/24400
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This adds 8 bytes of binary size to every type that has methods. It is
the smallest change I could come up with for 1.7.
Fixes#16037
Change-Id: Ibe15c3165854a21768596967757864b880dbfeed
Reviewed-on: https://go-review.googlesource.com/24070
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This CL documents that StructOf currently does not generate wrapper
methods for embedded fields.
Updates #15924
Change-Id: I932011b1491d68767709559f515f699c04ce70d4
Reviewed-on: https://go-review.googlesource.com/23681
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Also remove some of the now unnecessary corner case handling and
tests I've been adding recently for unexported method data.
For #15673
Change-Id: Ie0c7b03f2370bbe8508cdc5be765028f08000bd7
Reviewed-on: https://go-review.googlesource.com/23410
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The initial implementation of reflect.StructOf in
https://golang.org/cl/9251 had a limitation that field names had to be
ASCII, which was later lifted by https://golang.org/cl/21777. Remove
the out-of-date documentation disallowing UTF-8 field names.
Updates: #5748
Updates: #15064
Change-Id: I2c5bfea46bfd682449c6e847fc972a1a131f51b7
Reviewed-on: https://go-review.googlesource.com/23170
Reviewed-by: David Crawshaw <crawshaw@golang.org>
By picking up a spurious tFlagExtraStar, the method type was printing
as unc instead of func.
Updates #15673
Change-Id: I0c2c189b99bdd4caeb393693be7520b8e3f342bf
Reviewed-on: https://go-review.googlesource.com/23103
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The new type was inheriting the tflagExtraStar from its prototype.
Fixes#15467
Change-Id: Ic22c2a55cee7580cb59228d52b97e1c0a1e60220
Reviewed-on: https://go-review.googlesource.com/22501
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Fixes#15468
Change-Id: I8723171f87774a98d5e80e7832ebb96dd1fbea74
Reviewed-on: https://go-review.googlesource.com/22524
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Introduce and start using nameOff for two encoded names. This pair
of changes is best done together because the linker's method decoder
expects the method layouts to match.
Precursor to converting all existing name and *string fields to
nameOff.
linux/amd64:
cmd/go: -45KB (0.5%)
jujud: -389KB (0.6%)
linux/amd64 PIE:
cmd/go: -170KB (1.4%)
jujud: -1.5MB (1.8%)
For #6853.
Change-Id: Ia044423f010fb987ce070b94c46a16fc78666ff6
Reviewed-on: https://go-review.googlesource.com/21396
Reviewed-by: Ian Lance Taylor <iant@golang.org>
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
By replacing the *string used to represent pkgPath with a
reflect.name everywhere, the embedded *string for package paths
inside the reflect.name can be replaced by an offset, nameOff.
This reduces the number of pointers in the type information.
This also moves all reflect.name types into the same section, making
it possible to use nameOff more widely in later CLs.
No significant binary size change for normal binaries, but:
linux/amd64 PIE:
cmd/go: -440KB (3.7%)
jujud: -2.6MB (3.2%)
For #6853.
Change-Id: I3890b132a784a1090b1b72b32febfe0bea77eaee
Reviewed-on: https://go-review.googlesource.com/21395
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This CL introduces the typeOff type and a lookup method of the same
name that can turn a typeOff offset into an *rtype.
In a typical Go binary (built with buildmode=exe, pie, c-archive, or
c-shared), there is one moduledata and all typeOff values are offsets
relative to firstmoduledata.types. This makes computing the pointer
cheap in typical programs.
With buildmode=shared (and one day, buildmode=plugin) there are
multiple modules whose relative offset is determined at runtime.
We identify a type in the general case by the pair of the original
*rtype that references it and its typeOff value. We determine
the module from the original pointer, and then use the typeOff from
there to compute the final *rtype.
To ensure there is only one *rtype representing each type, the
runtime initializes a typemap for each module, using any identical
type from an earlier module when resolving that offset. This means
that types computed from an offset match the type mapped by the
pointer dynamic relocations.
A series of followup CLs will replace other *rtype values with typeOff
(and name/*string with nameOff).
For types created at runtime by reflect, type offsets are treated as
global IDs and reference into a reflect offset map kept by the runtime.
darwin/amd64:
cmd/go: -57KB (0.6%)
jujud: -557KB (0.8%)
linux/amd64 PIE:
cmd/go: -361KB (3.0%)
jujud: -3.5MB (4.2%)
For #6853.
Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
Reviewed-on: https://go-review.googlesource.com/21285
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is the first in a series of CLs to replace the use of pointers
in binary read-only data with offsets.
In standard Go binaries these CLs have a small effect, shrinking
8-byte pointers to 4-bytes. In position-independent code, it also
saves the dynamic relocation for the pointer. This has a significant
effect on the binary size when building as PIE, c-archive, or
c-shared.
darwin/amd64:
cmd/go: -12KB (0.1%)
jujud: -82KB (0.1%)
linux/amd64 PIE:
cmd/go: -86KB (0.7%)
jujud: -569KB (0.7%)
For #6853.
Change-Id: Iad5625bbeba58dabfd4d334dbee3fcbfe04b2dcf
Reviewed-on: https://go-review.googlesource.com/21284
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change exposes a facility to create new struct types from a slice of
reflect.StructFields.
- reflect: first stab at implementing StructOf
- reflect: tests for StructOf
StructOf creates new struct types in the form of structTypeWithMethods
to accomodate the GC (especially the uncommonType.methods slice field.)
Creating struct types with embedded interfaces with unexported methods
is not supported yet and will panic.
Creating struct types with non-ASCII field names or types is not yet
supported (see #15064.)
Binaries' sizes for linux_amd64:
old=tip (0104a31)
old bytes new bytes delta
bin/go 9911336 9915456 +0.04%
reflect 781704 830048 +6.18%
Updates #5748.
Updates #15064.
Change-Id: I3b8fd4fadd6ce3b1b922e284f0ae72a3a8e3ce44
Reviewed-on: https://go-review.googlesource.com/9251
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
The Lookup method provides a way to extract a tag value, while
determining whether the tag key exists in the struct field's tag.
Fixes#14883
Change-Id: I7460cb68f0ca1aaa025935050b9e182efcb64db3
Reviewed-on: https://go-review.googlesource.com/20864
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Create a byte encoding designed for static Go names.
It is intended to be a compact representation of a name
and optional tag data that can be turned into a Go string
without allocating, and describes whether or not it is
exported without unicode table.
The encoding is described in reflect/type.go:
// The first byte is a bit field containing:
//
// 1<<0 the name is exported
// 1<<1 tag data follows the name
// 1<<2 pkgPath *string follow the name and tag
//
// The next two bytes are the data length:
//
// l := uint16(data[1])<<8 | uint16(data[2])
//
// Bytes [3:3+l] are the string data.
//
// If tag data follows then bytes 3+l and 3+l+1 are the tag length,
// with the data following.
//
// If the import path follows, then ptrSize bytes at the end of
// the data form a *string. The import path is only set for concrete
// methods that are defined in a different package than their type.
Shrinks binary sizes:
cmd/go: 164KB (1.6%)
jujud: 1.0MB (1.5%)
For #6853.
Change-Id: I46b6591015b17936a443c9efb5009de8dfe8b609
Reviewed-on: https://go-review.googlesource.com/20968
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Remove reflect type information for unexported methods that do not
satisfy any interface in the program.
Ideally the unexported method would not appear in the method list at
all, but that is tricky because the slice is built by the compiler.
Reduces binary size:
cmd/go: 81KB (0.8%)
jujud: 258KB (0.4%)
For #6853.
Change-Id: I25ef8df6907e9ac03b18689d584ea46e7d773043
Reviewed-on: https://go-review.googlesource.com/21033
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The type information for a method includes two variants: a func
without the receiver, and a func with the receiver as the first
parameter. The former is used as part of the dynamic interface
checks, but the latter is only returned as a type in the
reflect.Method struct.
Instead of computing it at compile time, construct it at run time
with reflect.FuncOf.
Using cl/20701 as a baseline,
cmd/go: -480KB, (4.4%)
jujud: -5.6MB, (7.8%)
For #6853.
Change-Id: I1b8c73f3ab894735f53d00cb9c0b506d84d54e92
Reviewed-on: https://go-review.googlesource.com/20709
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Alternative to golang.org/cl/19852. This memory layout doesn't have
an easy type representation, but it is noticeably smaller than the
current funcType, and saves significant extra space.
Some notes on the layout are in reflect/type.go:
// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
// struct {
// funcType
// uncommonType
// [2]*rtype // [0] is in, [1] is out
// uncommonTypeSliceContents
// }
There are three arbitrary limits introduced by this CL:
1. No more than 65535 function input parameters.
2. No more than 32767 function output parameters.
3. reflect.FuncOf is limited to 128 parameters.
I don't think these are limits in practice, but are worth noting.
Reduces godoc binary size by 2.4%, 330KB.
For #6853.
Change-Id: I225c0a0516ebdbe92d41dfdf43f716da42dfe347
Reviewed-on: https://go-review.googlesource.com/19916
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Instead of a pointer on every rtype, use a bit flag to indicate that
the contents of uncommonType directly follows the rtype value when it
is needed.
This requires a bit of juggling in the compiler's rtype encoder. The
backing arrays for fields in the rtype are presently encoded directly
after the slice header. This packing requires separating the encoding
of the uncommonType slice headers from their backing arrays.
Reduces binary size of godoc by ~180KB (1.5%).
No measurable change in all.bash time.
For #6853.
Change-Id: I60205948ceb5c0abba76fdf619652da9c465a597
Reviewed-on: https://go-review.googlesource.com/19790
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
go test github.com/onsi/gomega/gbytes now passes at tip, and tests
added to the reflect package.
Fixes#14645
Change-Id: I16216c1a86211a1103d913237fe6bca5000cf885
Reviewed-on: https://go-review.googlesource.com/20221
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Simplifies some code as ptrToThis was unreliable under dynamic
linking. Now the same type lookup is used regardless of execution
mode.
A synthetic relocation, R_USETYPE, is introduced to make sure the
linker includes *T on use of T, if *T is carrying methods.
Changes the heap dump format. Anything reading the format needs to
look at the last bool of a type of an interface value to determine
if the type should be the pointer-to type.
Reduces binary size of cmd/go by 0.2%.
For #6853.
Change-Id: I79fcb19a97402bdb0193f3c7f6d94ddf061ee7b2
Reviewed-on: https://go-review.googlesource.com/19695
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Also eliminates per-maptype hiter and hmap types, since they're not
really needed anyway. Update packages reflect and runtime
accordingly.
Reduces golang.org/x/tools/cmd/godoc's text segment by ~170kB:
text data bss dec hex filename
13085702 140640 151520 13377862 cc2146 godoc.before
12915382 140640 151520 13207542 c987f6 godoc.after
Updates #6853.
Change-Id: I948b2bc1f22d477c1756204996b4e3e1fb568d81
Reviewed-on: https://go-review.googlesource.com/16610
Reviewed-by: Keith Randall <khr@golang.org>
This CL changes reflect to allow access to exported fields and
methods in unexported embedded structs for gccgo and after gc
has been adjusted to disallow access to embedded unexported structs.
Adresses #12367, #7363, #11007, and #7247.
Change-Id: If80536eab35abcd25300d8ddc2d27d5c42d7e78e
Reviewed-on: https://go-review.googlesource.com/14010
Reviewed-by: Russ Cox <rsc@golang.org>
Keep track of which types of keys need an update and which don't.
Strings need an update because the new key might pin a smaller backing store.
Floats need an update because it might be +0/-0.
Interfaces need an update because they may contain strings or floats.
Fixes#11088
Change-Id: I9ade53c1dfb3c1a2870d68d07201bc8128e9f217
Reviewed-on: https://go-review.googlesource.com/10843
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
No longer used after previous hashmap change.
Change-Id: I558470f872281e84a78406132df4e391d077b833
Reviewed-on: https://go-review.googlesource.com/13785
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Previously t.zero always pointed to runtime.zerovalue. Change the hashmap code
to always return a runtime pointer directly, and change that pointer to point
to a larger buffer if one is needed.
(It might be better to only copy from the pointer returned by the mapaccess
functions when the value type is small enough and have the compiler insert
explicit zeroing for larger value types, but I tried and failed to do this).
This removes all uses of the zero field of the type data; the field itself can
be removed in a separate change.
Fixes#11491
Change-Id: I5b81752ff4067d74a5a281c41e88f151bae0171e
Reviewed-on: https://go-review.googlesource.com/13784
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On most systems, a pointer is the worst case alignment, so adding
a pointer field at the end of a struct guarantees there will be no
padding added after that field (to satisfy overall struct alignment
due to some more-aligned field also present).
In the runtime, the map implementation needs a quick way to
get to the overflow pointer, which is last in the bucket struct,
so it uses size - sizeof(pointer) as the offset.
NaCl/amd64p32 is the exception, as always.
The worst case alignment is 64 bits but pointers are 32 bits.
There's a long history that is not worth going into, but when
we moved the overflow pointer to the end of the struct,
we didn't get the padding computation right.
The compiler computed the regular struct size and then
on amd64p32 added another 32-bit field.
And the runtime assumed it could step back two 32-bit fields
(one 64-bit register size) to get to the overflow pointer.
But in fact if the struct needed 64-bit alignment, the computation
of the regular struct size would have added a 32-bit pad already,
and then the code unconditionally added a second 32-bit pad.
This placed the overflow pointer three words from the end, not two.
The last two were padding, and since the runtime was consistent
about using the second-to-last word as the overflow pointer,
no harm done in the sense of overwriting useful memory.
But writing the overflow pointer to a non-pointer word of memory
means that the GC can't see the overflow blocks, so it will
collect them prematurely. Then bad things happen.
Correct all this in a few steps:
1. Add an explicit check at the end of the bucket layout in the
compiler that the overflow field is last in the struct, never
followed by padding.
2. When padding is needed on nacl (not always, just when needed),
insert it before the overflow pointer, to preserve the "last in the struct"
property.
3. Let the compiler have the final word on the width of the struct,
by inserting an explicit padding field instead of overwriting the
results of the width computation it does.
4. For the same reason (tell the truth to the compiler), set the type
of the overflow field when we're trying to pretend its not a pointer
(in this case the runtime maintains a list of the overflow blocks
elsewhere).
5. Make the runtime use "last in the struct" as its location algorithm.
This fixes TestTraceStress on nacl/amd64p32.
The 'bad map state' and 'invalid free list' failures no longer occur.
Fixes#11838.
Change-Id: If918887f8f252d988db0a35159944d2b36512f92
Reviewed-on: https://go-review.googlesource.com/12971
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The one in misc/makerelease/makerelease.go is particularly bad and
probably warrants rotating our keys.
I didn't update old weekly notes, and reverted some changes involving
test code for now, since we're late in the Go 1.5 freeze. Otherwise,
the rest are all auto-generated changes, and all manually reviewed.
Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d
Reviewed-on: https://go-review.googlesource.com/12048
Reviewed-by: Rob Pike <r@golang.org>
A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).
To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)
I have been unable to reproduce this problem in a simple test program.
Could help #11035.
Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.
For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.
This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.
Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:
(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.
(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.
(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.
(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)
Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:
(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)
(2) There is no case 2 anymore.
(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.
Benchmarks for heapBitsSetType, before this CL vs this CL:
name old mean new mean delta
SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000)
SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179)
SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001)
SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001)
SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000)
SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000)
SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000)
SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001)
SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000)
SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070)
SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000)
SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015)
SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069)
SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767)
SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000)
SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001)
SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000)
SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000)
SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000)
SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000)
SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000)
SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001)
SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000)
SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000)
SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000)
SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000)
The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.
Benchmarks for heapBitsSetType, Go 1.4 vs this CL:
name old mean new mean delta
SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000)
SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000)
SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000)
SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000)
SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000)
SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000)
SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000)
SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000)
SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000)
SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000)
SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000)
SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000)
SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000)
SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000)
SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000)
SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000)
SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000)
SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000)
SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000)
SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000)
SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000)
SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000)
SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000)
SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000)
SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000)
SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000)
SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.
SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).
Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.
The Go 1 benchmarks are basically unaffected compared to just before this CL.
Go 1 benchmarks, before this CL vs this CL:
name old mean new mean delta
BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306)
Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006)
FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280)
FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039)
FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000)
FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048)
FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533)
FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000)
FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000)
GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609)
GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000)
Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835)
Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169)
HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045)
JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549)
JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000)
Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878)
GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004)
RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000)
RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088)
RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380)
RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157)
RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021)
RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539)
RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378)
RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067)
Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943)
Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000)
TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000)
TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976)
For the record, Go 1 benchmarks, Go 1.4 vs this CL:
name old mean new mean delta
BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000)
Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212)
FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000)
FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203)
FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000)
FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000)
FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000)
FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000)
FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000)
GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000)
GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000)
Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000)
Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204)
HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000)
JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001)
JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000)
Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184)
GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003)
RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000)
RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000)
RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000)
RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000)
RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000)
RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000)
RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000)
RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000)
Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083)
Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000)
TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000)
TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000)
This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).
Fixes#9625.
Fixes#10524.
Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Preallocating them in reflect means that
(1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data
(2) if you say it again, that's *another* GB of data.
The only use of t.zero in the runtime is for map elements.
Delay the allocation until the creation of a map with that element type,
and share the zeros.
The one downside of the shared zero is that it's not garbage collected,
but it's also never written, so the OS should be able to handle it fairly
efficiently.
Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614
Reviewed-on: https://go-review.googlesource.com/10111
Reviewed-by: Keith Randall <khr@golang.org>
The type information in reflect.Type and the GC programs is now
1 bit per word, down from 2 bits.
The in-memory unrolled type bitmap representation are now
1 bit per word, down from 4 bits.
The conversion from the unrolled (now 1-bit) bitmap to the
heap bitmap (still 4-bit) is not optimized. A followup CL will
work on that, after the heap bitmap has been converted to 2-bit.
The typeDead optimization, in which a special value denotes
that there are no more pointers anywhere in the object, is lost
in this CL. A followup CL will bring it back in the final form of
heapBitsSetType.
Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
Reviewed-on: https://go-review.googlesource.com/9702
Reviewed-by: Austin Clements <austin@google.com>
I forgot there is already a ptrSize constant.
Rename field to avoid some confusion.
Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff
Reviewed-on: https://go-review.googlesource.com/9700
Reviewed-by: Austin Clements <austin@google.com>