go/src/reflect/type.go

3151 lines
85 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package reflect implements run-time reflection, allowing a program to
// manipulate objects with arbitrary types. The typical use is to take a value
// with static type interface{} and extract its dynamic type information by
// calling TypeOf, which returns a Type.
//
// A call to ValueOf returns a Value representing the run-time data.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// Zero takes a Type and returns a Value representing a zero value
// for that type.
//
// See "The Laws of Reflection" for an introduction to reflection in Go:
// https://golang.org/doc/articles/laws_of_reflection.html
package reflect
import (
"runtime"
"strconv"
"sync"
"unicode"
"unicode/utf8"
"unsafe"
)
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// Type is the representation of a Go type.
//
// Not all methods apply to all kinds of types. Restrictions,
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// if any, are noted in the documentation for each method.
// Use the Kind method to find out the kind of type before
// calling kind-specific methods. Calling a method
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// inappropriate to the kind of type causes a run-time panic.
//
// Type values are comparable, such as with the == operator,
// so they can be used as map keys.
// Two Type values are equal if they represent identical types.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
type Type interface {
// Methods applicable to all types.
// Align returns the alignment in bytes of a value of
// this type when allocated in memory.
Align() int
// FieldAlign returns the alignment in bytes of a value of
// this type when used as a field in a struct.
FieldAlign() int
// Method returns the i'th method in the type's method set.
// It panics if i is not in the range [0, NumMethod()).
//
// For a non-interface type T or *T, the returned Method's Type and Func
// fields describe a function whose first argument is the receiver.
//
// For an interface type, the returned Method's Type field gives the
// method signature, without a receiver, and the Func field is nil.
//
// Only exported methods are accessible and they are sorted in
// lexicographic order.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
Method(int) Method
// MethodByName returns the method with that name in the type's
// method set and a boolean indicating if the method was found.
//
// For a non-interface type T or *T, the returned Method's Type and Func
// fields describe a function whose first argument is the receiver.
//
// For an interface type, the returned Method's Type field gives the
// method signature, without a receiver, and the Func field is nil.
MethodByName(string) (Method, bool)
// NumMethod returns the number of exported methods in the type's method set.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
NumMethod() int
// Name returns the type's name within its package for a defined type.
// For other (non-defined) types it returns the empty string.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
Name() string
// PkgPath returns a defined type's package path, that is, the import path
// that uniquely identifies the package, such as "encoding/base64".
// If the type was predeclared (string, error) or not defined (*T, struct{},
// []int, or A where A is an alias for a non-defined type), the package path
// will be the empty string.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
PkgPath() string
// Size returns the number of bytes needed to store
// a value of the given type; it is analogous to unsafe.Sizeof.
Size() uintptr
// String returns a string representation of the type.
// The string representation may use shortened package names
// (e.g., base64 instead of "encoding/base64") and is not
// guaranteed to be unique among types. To test for type identity,
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// compare the Types directly.
String() string
// Kind returns the specific kind of this type.
Kind() Kind
// Implements reports whether the type implements the interface type u.
Implements(u Type) bool
// AssignableTo reports whether a value of the type is assignable to type u.
AssignableTo(u Type) bool
// ConvertibleTo reports whether a value of the type is convertible to type u.
ConvertibleTo(u Type) bool
// Comparable reports whether values of this type are comparable.
Comparable() bool
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// Methods applicable only to some types, depending on Kind.
// The methods allowed for each kind are:
//
// Int*, Uint*, Float*, Complex*: Bits
// Array: Elem, Len
// Chan: ChanDir, Elem
// Func: In, NumIn, Out, NumOut, IsVariadic.
// Map: Key, Elem
// Ptr: Elem
// Slice: Elem
// Struct: Field, FieldByIndex, FieldByName, FieldByNameFunc, NumField
// Bits returns the size of the type in bits.
// It panics if the type's Kind is not one of the
// sized or unsized Int, Uint, Float, or Complex kinds.
Bits() int
// ChanDir returns a channel type's direction.
// It panics if the type's Kind is not Chan.
ChanDir() ChanDir
// IsVariadic reports whether a function type's final input parameter
// is a "..." parameter. If so, t.In(t.NumIn() - 1) returns the parameter's
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// implicit actual type []T.
//
// For concreteness, if t represents func(x int, y ... float64), then
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
//
// t.NumIn() == 2
// t.In(0) is the reflect.Type for "int"
// t.In(1) is the reflect.Type for "[]float64"
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// t.IsVariadic() == true
//
// IsVariadic panics if the type's Kind is not Func.
IsVariadic() bool
// Elem returns a type's element type.
// It panics if the type's Kind is not Array, Chan, Map, Ptr, or Slice.
Elem() Type
// Field returns a struct type's i'th field.
// It panics if the type's Kind is not Struct.
// It panics if i is not in the range [0, NumField()).
Field(i int) StructField
// FieldByIndex returns the nested field corresponding
// to the index sequence. It is equivalent to calling Field
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// successively for each index i.
// It panics if the type's Kind is not Struct.
FieldByIndex(index []int) StructField
// FieldByName returns the struct field with the given name
// and a boolean indicating if the field was found.
FieldByName(name string) (StructField, bool)
// FieldByNameFunc returns the struct field with a name
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// that satisfies the match function and a boolean indicating if
// the field was found.
//
// FieldByNameFunc considers the fields in the struct itself
// and then the fields in any embedded structs, in breadth first order,
// stopping at the shallowest nesting depth containing one or more
// fields satisfying the match function. If multiple fields at that depth
// satisfy the match function, they cancel each other
// and FieldByNameFunc returns no match.
// This behavior mirrors Go's handling of name lookup in
// structs containing embedded fields.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
FieldByNameFunc(match func(string) bool) (StructField, bool)
// In returns the type of a function type's i'th input parameter.
// It panics if the type's Kind is not Func.
// It panics if i is not in the range [0, NumIn()).
In(i int) Type
// Key returns a map type's key type.
// It panics if the type's Kind is not Map.
Key() Type
// Len returns an array type's length.
// It panics if the type's Kind is not Array.
Len() int
// NumField returns a struct type's field count.
// It panics if the type's Kind is not Struct.
NumField() int
// NumIn returns a function type's input parameter count.
// It panics if the type's Kind is not Func.
NumIn() int
// NumOut returns a function type's output parameter count.
// It panics if the type's Kind is not Func.
NumOut() int
// Out returns the type of a function type's i'th output parameter.
// It panics if the type's Kind is not Func.
// It panics if i is not in the range [0, NumOut()).
Out(i int) Type
common() *rtype
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
uncommon() *uncommonType
}
// BUG(rsc): FieldByName and related functions consider struct field names to be equal
// if the names are equal, even if they are unexported names originating
// in different packages. The practical effect of this is that the result of
// t.FieldByName("x") is not well defined if the struct type t contains
// multiple fields named x (embedded from different packages).
// FieldByName may return one of the fields named x or may report that there are none.
// See https://golang.org/issue/4876 for more details.
/*
* These data structures are known to the compiler (../../cmd/internal/gc/reflect.go).
* A few are known to ../runtime/type.go to convey to debuggers.
* They are also known to ../runtime/type.go.
*/
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// A Kind represents the specific kind of type that a Type represents.
// The zero Kind is not a valid kind.
type Kind uint
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
const (
Invalid Kind = iota
Bool
Int
Int8
Int16
Int32
Int64
Uint
Uint8
Uint16
Uint32
Uint64
Uintptr
Float32
Float64
Complex64
Complex128
Array
Chan
Func
Interface
Map
Ptr
Slice
String
Struct
UnsafePointer
)
// tflag is used by an rtype to signal what extra type information is
// available in the memory directly following the rtype value.
//
// tflag values must be kept in sync with copies in:
// cmd/compile/internal/gc/reflect.go
// cmd/link/internal/ld/decodesym.go
// runtime/type.go
type tflag uint8
const (
// tflagUncommon means that there is a pointer, *uncommonType,
// just beyond the outer type structure.
//
// For example, if t.Kind() == Struct and t.tflag&tflagUncommon != 0,
// then t has uncommonType data and it can be accessed as:
//
// type tUncommon struct {
// structType
// u uncommonType
// }
// u := &(*tUncommon)(unsafe.Pointer(t)).u
tflagUncommon tflag = 1 << 0
// tflagExtraStar means the name in the str field has an
// extraneous '*' prefix. This is because for most types T in
// a program, the type *T also exists and reusing the str data
// saves binary size.
tflagExtraStar tflag = 1 << 1
// tflagNamed means the type has a name.
tflagNamed tflag = 1 << 2
)
// rtype is the common implementation of most values.
// It is embedded in other struct types.
//
// rtype must be kept in sync with ../runtime/type.go:/^type._type.
type rtype struct {
size uintptr
ptrdata uintptr // number of bytes in the type that can contain pointers
hash uint32 // hash of type; avoids computation in hash tables
tflag tflag // extra type information flags
align uint8 // alignment of variable with this type
fieldAlign uint8 // alignment of struct field with this type
kind uint8 // enumeration for C
alg *typeAlg // algorithm table
gcdata *byte // garbage collection data
str nameOff // string form
ptrToThis typeOff // type for pointer to this type, may be zero
}
// a copy of runtime.typeAlg
type typeAlg struct {
// function for hashing objects of this type
// (ptr to object, seed) -> hash
hash func(unsafe.Pointer, uintptr) uintptr
// function for comparing objects of this type
// (ptr to object A, ptr to object B) -> ==?
equal func(unsafe.Pointer, unsafe.Pointer) bool
}
// Method on non-interface type
type method struct {
name nameOff // name of method
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
mtyp typeOff // method type (without receiver)
ifn textOff // fn used in interface call (one-word receiver)
tfn textOff // fn used for normal method call
}
// uncommonType is present only for defined types or types with methods
// (if T is a defined type, the uncommonTypes for T and *T have methods).
// Using a pointer to this struct reduces the overall size required
// to describe a non-defined type with no methods.
type uncommonType struct {
pkgPath nameOff // import path; empty for built-in types like int, string
mcount uint16 // number of methods
xcount uint16 // number of exported methods
moff uint32 // offset from this uncommontype to [mcount]method
_ uint32 // unused
}
// ChanDir represents a channel type's direction.
type ChanDir int
const (
RecvDir ChanDir = 1 << iota // <-chan
SendDir // chan<-
BothDir = RecvDir | SendDir // chan
)
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// arrayType represents a fixed array type.
type arrayType struct {
rtype
elem *rtype // array element type
slice *rtype // slice type
len uintptr
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
// chanType represents a channel type.
type chanType struct {
rtype
elem *rtype // channel element type
dir uintptr // channel direction (ChanDir)
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// funcType represents a function type.
//
// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
// struct {
// funcType
// uncommonType
// [2]*rtype // [0] is in, [1] is out
// }
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
type funcType struct {
rtype
inCount uint16
outCount uint16 // top bit is set if last input parameter is ...
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// imethod represents a method on an interface type
type imethod struct {
name nameOff // name of method
typ typeOff // .(*FuncType) underneath
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// interfaceType represents an interface type.
type interfaceType struct {
rtype
pkgPath name // import path
methods []imethod // sorted by hash
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// mapType represents a map type.
type mapType struct {
rtype
key *rtype // map key type
elem *rtype // map element (value) type
bucket *rtype // internal bucket structure
keysize uint8 // size of key slot
valuesize uint8 // size of value slot
bucketsize uint16 // size of bucket
flags uint32
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// ptrType represents a pointer type.
type ptrType struct {
rtype
elem *rtype // pointer element (pointed at) type
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// sliceType represents a slice type.
type sliceType struct {
rtype
elem *rtype // slice element type
}
// Struct field
type structField struct {
name name // name is always non-empty
typ *rtype // type of field
offsetEmbed uintptr // byte offset of field<<1 | isEmbedded
}
func (f *structField) offset() uintptr {
return f.offsetEmbed >> 1
}
func (f *structField) embedded() bool {
return f.offsetEmbed&1 != 0
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
// structType represents a struct type.
type structType struct {
rtype
pkgPath name
fields []structField // sorted by offset
}
// name is an encoded type name with optional extra data.
//
// The first byte is a bit field containing:
//
// 1<<0 the name is exported
// 1<<1 tag data follows the name
// 1<<2 pkgPath nameOff follows the name and tag
//
// The next two bytes are the data length:
//
// l := uint16(data[1])<<8 | uint16(data[2])
//
// Bytes [3:3+l] are the string data.
//
// If tag data follows then bytes 3+l and 3+l+1 are the tag length,
// with the data following.
//
// If the import path follows, then 4 bytes at the end of
// the data form a nameOff. The import path is only set for concrete
// methods that are defined in a different package than their type.
//
// If a name starts with "*", then the exported bit represents
// whether the pointed to type is exported.
type name struct {
bytes *byte
}
func (n name) data(off int, whySafe string) *byte {
return (*byte)(add(unsafe.Pointer(n.bytes), uintptr(off), whySafe))
}
func (n name) isExported() bool {
return (*n.bytes)&(1<<0) != 0
}
func (n name) nameLen() int {
return int(uint16(*n.data(1, "name len field"))<<8 | uint16(*n.data(2, "name len field")))
}
func (n name) tagLen() int {
if *n.data(0, "name flag field")&(1<<1) == 0 {
return 0
}
off := 3 + n.nameLen()
return int(uint16(*n.data(off, "name taglen field"))<<8 | uint16(*n.data(off+1, "name taglen field")))
}
func (n name) name() (s string) {
if n.bytes == nil {
return
}
b := (*[4]byte)(unsafe.Pointer(n.bytes))
hdr := (*stringHeader)(unsafe.Pointer(&s))
hdr.Data = unsafe.Pointer(&b[3])
hdr.Len = int(b[1])<<8 | int(b[2])
return s
}
func (n name) tag() (s string) {
tl := n.tagLen()
if tl == 0 {
return ""
}
nl := n.nameLen()
hdr := (*stringHeader)(unsafe.Pointer(&s))
hdr.Data = unsafe.Pointer(n.data(3+nl+2, "non-empty string"))
hdr.Len = tl
return s
}
func (n name) pkgPath() string {
if n.bytes == nil || *n.data(0, "name flag field")&(1<<2) == 0 {
return ""
}
off := 3 + n.nameLen()
if tl := n.tagLen(); tl > 0 {
off += 2 + tl
}
var nameOff int32
// Note that this field may not be aligned in memory,
// so we cannot use a direct int32 assignment here.
copy((*[4]byte)(unsafe.Pointer(&nameOff))[:], (*[4]byte)(unsafe.Pointer(n.data(off, "name offset field")))[:])
pkgPathName := name{(*byte)(resolveTypeOff(unsafe.Pointer(n.bytes), nameOff))}
return pkgPathName.name()
}
func newName(n, tag string, exported bool) name {
if len(n) > 1<<16-1 {
panic("reflect.nameFrom: name too long: " + n)
}
if len(tag) > 1<<16-1 {
panic("reflect.nameFrom: tag too long: " + tag)
}
var bits byte
l := 1 + 2 + len(n)
if exported {
bits |= 1 << 0
}
if len(tag) > 0 {
l += 2 + len(tag)
bits |= 1 << 1
}
b := make([]byte, l)
b[0] = bits
b[1] = uint8(len(n) >> 8)
b[2] = uint8(len(n))
copy(b[3:], n)
if len(tag) > 0 {
tb := b[3+len(n):]
tb[0] = uint8(len(tag) >> 8)
tb[1] = uint8(len(tag))
copy(tb[2:], tag)
}
return name{bytes: &b[0]}
}
/*
* The compiler knows the exact layout of all the data structures above.
* The compiler does not know about the data structures and methods below.
*/
// Method represents a single method.
type Method struct {
// Name is the method name.
// PkgPath is the package path that qualifies a lower case (unexported)
// method name. It is empty for upper case (exported) method names.
// The combination of PkgPath and Name uniquely identifies a method
// in a method set.
// See https://golang.org/ref/spec#Uniqueness_of_identifiers
Name string
PkgPath string
Type Type // method type
Func Value // func with receiver as first argument
Index int // index for Type.Method
}
const (
kindDirectIface = 1 << 5
kindGCProg = 1 << 6 // Type.gc points to GC program
kindMask = (1 << 5) - 1
)
// String returns the name of k.
func (k Kind) String() string {
if int(k) < len(kindNames) {
return kindNames[k]
}
return "kind" + strconv.Itoa(int(k))
}
var kindNames = []string{
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
Invalid: "invalid",
Bool: "bool",
Int: "int",
Int8: "int8",
Int16: "int16",
Int32: "int32",
Int64: "int64",
Uint: "uint",
Uint8: "uint8",
Uint16: "uint16",
Uint32: "uint32",
Uint64: "uint64",
Uintptr: "uintptr",
Float32: "float32",
Float64: "float64",
Complex64: "complex64",
Complex128: "complex128",
Array: "array",
Chan: "chan",
Func: "func",
Interface: "interface",
Map: "map",
Ptr: "ptr",
Slice: "slice",
String: "string",
Struct: "struct",
UnsafePointer: "unsafe.Pointer",
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
func (t *uncommonType) methods() []method {
if t.mcount == 0 {
return nil
}
return (*[1 << 16]method)(add(unsafe.Pointer(t), uintptr(t.moff), "t.mcount > 0"))[:t.mcount:t.mcount]
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
}
func (t *uncommonType) exportedMethods() []method {
if t.xcount == 0 {
return nil
}
return (*[1 << 16]method)(add(unsafe.Pointer(t), uintptr(t.moff), "t.xcount > 0"))[:t.xcount:t.xcount]
}
// resolveNameOff resolves a name offset from a base pointer.
// The (*rtype).nameOff method is a convenience wrapper for this function.
// Implemented in the runtime package.
func resolveNameOff(ptrInModule unsafe.Pointer, off int32) unsafe.Pointer
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
// resolveTypeOff resolves an *rtype offset from a base type.
// The (*rtype).typeOff method is a convenience wrapper for this function.
// Implemented in the runtime package.
func resolveTypeOff(rtype unsafe.Pointer, off int32) unsafe.Pointer
// resolveTextOff resolves an function pointer offset from a base type.
// The (*rtype).textOff method is a convenience wrapper for this function.
// Implemented in the runtime package.
func resolveTextOff(rtype unsafe.Pointer, off int32) unsafe.Pointer
// addReflectOff adds a pointer to the reflection lookup map in the runtime.
// It returns a new ID that can be used as a typeOff or textOff, and will
// be resolved correctly. Implemented in the runtime package.
func addReflectOff(ptr unsafe.Pointer) int32
// resolveReflectType adds a name to the reflection lookup map in the runtime.
// It returns a new nameOff that can be used to refer to the pointer.
func resolveReflectName(n name) nameOff {
return nameOff(addReflectOff(unsafe.Pointer(n.bytes)))
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
// resolveReflectType adds a *rtype to the reflection lookup map in the runtime.
// It returns a new typeOff that can be used to refer to the pointer.
func resolveReflectType(t *rtype) typeOff {
return typeOff(addReflectOff(unsafe.Pointer(t)))
}
// resolveReflectText adds a function pointer to the reflection lookup map in
// the runtime. It returns a new textOff that can be used to refer to the
// pointer.
func resolveReflectText(ptr unsafe.Pointer) textOff {
return textOff(addReflectOff(ptr))
}
type nameOff int32 // offset to a name
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
type typeOff int32 // offset to an *rtype
type textOff int32 // offset from top of text section
func (t *rtype) nameOff(off nameOff) name {
return name{(*byte)(resolveNameOff(unsafe.Pointer(t), int32(off)))}
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
func (t *rtype) typeOff(off typeOff) *rtype {
return (*rtype)(resolveTypeOff(unsafe.Pointer(t), int32(off)))
}
func (t *rtype) textOff(off textOff) unsafe.Pointer {
return resolveTextOff(unsafe.Pointer(t), int32(off))
}
func (t *rtype) uncommon() *uncommonType {
if t.tflag&tflagUncommon == 0 {
return nil
}
switch t.Kind() {
case Struct:
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
return &(*structTypeUncommon)(unsafe.Pointer(t)).u
case Ptr:
type u struct {
ptrType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Func:
type u struct {
funcType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Slice:
type u struct {
sliceType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Array:
type u struct {
arrayType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Chan:
type u struct {
chanType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Map:
type u struct {
mapType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
case Interface:
type u struct {
interfaceType
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
default:
type u struct {
rtype
u uncommonType
}
return &(*u)(unsafe.Pointer(t)).u
}
}
func (t *rtype) String() string {
s := t.nameOff(t.str).name()
if t.tflag&tflagExtraStar != 0 {
return s[1:]
}
return s
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *rtype) Size() uintptr { return t.size }
func (t *rtype) Bits() int {
reflect: more efficient; cannot Set result of NewValue anymore * Reduces malloc counts during gob encoder/decoder test from 6/6 to 3/5. The current reflect uses Set to mean two subtly different things. (1) If you have a reflect.Value v, it might just represent itself (as in v = reflect.NewValue(42)), in which case calling v.Set only changed v, not any other data in the program. (2) If you have a reflect Value v derived from a pointer or a slice (as in x := []int{42}; v = reflect.NewValue(x).Index(0)), v represents the value held there. Changing x[0] affects the value returned by v.Int(), and calling v.Set affects x[0]. This was not really by design; it just happened that way. The motivation for the new reflect implementation was to remove mallocs. The use case (1) has an implicit malloc inside it. If you can do: v := reflect.NewValue(0) v.Set(42) i := v.Int() // i = 42 then that implies that v is referring to some underlying chunk of memory in order to remember the 42; that is, NewValue must have allocated some memory. Almost all the time you are using reflect the goal is to inspect or to change other data, not to manipulate data stored solely inside a reflect.Value. This CL removes use case (1), so that an assignable reflect.Value must always refer to some other piece of data in the program. Put another way, removing this case would make v := reflect.NewValue(0) v.Set(42) as illegal as 0 = 42. It would also make this illegal: x := 0 v := reflect.NewValue(x) v.Set(42) for the same reason. (Note that right now, v.Set(42) "succeeds" but does not change the value of x.) If you really wanted to make v refer to x, you'd start with &x and dereference it: x := 0 v := reflect.NewValue(&x).Elem() // v = *&x v.Set(42) It's pretty rare, except in tests, to want to use NewValue and then call Set to change the Value itself instead of some other piece of data in the program. I haven't seen it happen once yet while making the tree build with this change. For the same reasons, reflect.Zero (formerly reflect.MakeZero) would also return an unassignable, unaddressable value. This invalidates the (awkward) idiom: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.PointTo(v) which, when the API changed, turned into: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.Set(v.Addr()) In both, it is far from clear what the code is trying to do. Now that it is possible, this CL adds reflect.New(Type) Value that does the obvious thing (same as Go's new), so this code would be replaced by: pv := ... some Ptr Value we have ... pv.Set(reflect.New(pv.Type().Elem())) The changes just described can be confusing to think about, but I believe it is because the old API was confusing - it was conflating two different kinds of Values - and that the new API by itself is pretty simple: you can only Set (or call Addr on) a Value if it actually addresses some real piece of data; that is, only if it is the result of dereferencing a Ptr or indexing a Slice. If you really want the old behavior, you'd get it by translating: v := reflect.NewValue(x) into v := reflect.New(reflect.Typeof(x)).Elem() v.Set(reflect.NewValue(x)) Gofix will not be able to help with this, because whether and how to change the code depends on whether the original code meant use (1) or use (2), so the developer has to read and think about the code. You can see the effect on packages in the tree in https://golang.org/cl/4423043/. R=r CC=golang-dev https://golang.org/cl/4435042
2011-04-18 14:35:33 -04:00
if t == nil {
panic("reflect: Bits of nil Type")
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
k := t.Kind()
if k < Int || k > Complex128 {
reflect: more efficient; cannot Set result of NewValue anymore * Reduces malloc counts during gob encoder/decoder test from 6/6 to 3/5. The current reflect uses Set to mean two subtly different things. (1) If you have a reflect.Value v, it might just represent itself (as in v = reflect.NewValue(42)), in which case calling v.Set only changed v, not any other data in the program. (2) If you have a reflect Value v derived from a pointer or a slice (as in x := []int{42}; v = reflect.NewValue(x).Index(0)), v represents the value held there. Changing x[0] affects the value returned by v.Int(), and calling v.Set affects x[0]. This was not really by design; it just happened that way. The motivation for the new reflect implementation was to remove mallocs. The use case (1) has an implicit malloc inside it. If you can do: v := reflect.NewValue(0) v.Set(42) i := v.Int() // i = 42 then that implies that v is referring to some underlying chunk of memory in order to remember the 42; that is, NewValue must have allocated some memory. Almost all the time you are using reflect the goal is to inspect or to change other data, not to manipulate data stored solely inside a reflect.Value. This CL removes use case (1), so that an assignable reflect.Value must always refer to some other piece of data in the program. Put another way, removing this case would make v := reflect.NewValue(0) v.Set(42) as illegal as 0 = 42. It would also make this illegal: x := 0 v := reflect.NewValue(x) v.Set(42) for the same reason. (Note that right now, v.Set(42) "succeeds" but does not change the value of x.) If you really wanted to make v refer to x, you'd start with &x and dereference it: x := 0 v := reflect.NewValue(&x).Elem() // v = *&x v.Set(42) It's pretty rare, except in tests, to want to use NewValue and then call Set to change the Value itself instead of some other piece of data in the program. I haven't seen it happen once yet while making the tree build with this change. For the same reasons, reflect.Zero (formerly reflect.MakeZero) would also return an unassignable, unaddressable value. This invalidates the (awkward) idiom: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.PointTo(v) which, when the API changed, turned into: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.Set(v.Addr()) In both, it is far from clear what the code is trying to do. Now that it is possible, this CL adds reflect.New(Type) Value that does the obvious thing (same as Go's new), so this code would be replaced by: pv := ... some Ptr Value we have ... pv.Set(reflect.New(pv.Type().Elem())) The changes just described can be confusing to think about, but I believe it is because the old API was confusing - it was conflating two different kinds of Values - and that the new API by itself is pretty simple: you can only Set (or call Addr on) a Value if it actually addresses some real piece of data; that is, only if it is the result of dereferencing a Ptr or indexing a Slice. If you really want the old behavior, you'd get it by translating: v := reflect.NewValue(x) into v := reflect.New(reflect.Typeof(x)).Elem() v.Set(reflect.NewValue(x)) Gofix will not be able to help with this, because whether and how to change the code depends on whether the original code meant use (1) or use (2), so the developer has to read and think about the code. You can see the effect on packages in the tree in https://golang.org/cl/4423043/. R=r CC=golang-dev https://golang.org/cl/4435042
2011-04-18 14:35:33 -04:00
panic("reflect: Bits of non-arithmetic Type " + t.String())
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
return int(t.size) * 8
}
func (t *rtype) Align() int { return int(t.align) }
func (t *rtype) FieldAlign() int { return int(t.fieldAlign) }
func (t *rtype) Kind() Kind { return Kind(t.kind & kindMask) }
func (t *rtype) pointers() bool { return t.ptrdata != 0 }
func (t *rtype) common() *rtype { return t }
func (t *rtype) exportedMethods() []method {
ut := t.uncommon()
if ut == nil {
return nil
}
return ut.exportedMethods()
}
func (t *rtype) NumMethod() int {
if t.Kind() == Interface {
tt := (*interfaceType)(unsafe.Pointer(t))
return tt.NumMethod()
}
return len(t.exportedMethods())
}
func (t *rtype) Method(i int) (m Method) {
if t.Kind() == Interface {
tt := (*interfaceType)(unsafe.Pointer(t))
return tt.Method(i)
}
methods := t.exportedMethods()
if i < 0 || i >= len(methods) {
panic("reflect: Method index out of range")
}
p := methods[i]
pname := t.nameOff(p.name)
m.Name = pname.name()
fl := flag(Func)
mtyp := t.typeOff(p.mtyp)
ft := (*funcType)(unsafe.Pointer(mtyp))
in := make([]Type, 0, 1+len(ft.in()))
in = append(in, t)
for _, arg := range ft.in() {
in = append(in, arg)
}
out := make([]Type, 0, len(ft.out()))
for _, ret := range ft.out() {
out = append(out, ret)
}
mt := FuncOf(in, out, ft.IsVariadic())
m.Type = mt
tfn := t.textOff(p.tfn)
fn := unsafe.Pointer(&tfn)
m.Func = Value{mt.(*rtype), fn, fl}
m.Index = i
return m
}
func (t *rtype) MethodByName(name string) (m Method, ok bool) {
if t.Kind() == Interface {
tt := (*interfaceType)(unsafe.Pointer(t))
return tt.MethodByName(name)
}
ut := t.uncommon()
if ut == nil {
return Method{}, false
}
// TODO(mdempsky): Binary search.
for i, p := range ut.exportedMethods() {
if t.nameOff(p.name).name() == name {
return t.Method(i), true
}
}
return Method{}, false
}
func (t *rtype) PkgPath() string {
if t.tflag&tflagNamed == 0 {
return ""
}
ut := t.uncommon()
if ut == nil {
return ""
}
return t.nameOff(ut.pkgPath).name()
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) hasName() bool {
return t.tflag&tflagNamed != 0
}
func (t *rtype) Name() string {
if !t.hasName() {
return ""
}
s := t.String()
i := len(s) - 1
for i >= 0 && s[i] != '.' {
i--
}
return s[i+1:]
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) ChanDir() ChanDir {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Chan {
panic("reflect: ChanDir of non-chan type")
}
tt := (*chanType)(unsafe.Pointer(t))
return ChanDir(tt.dir)
}
func (t *rtype) IsVariadic() bool {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Func {
panic("reflect: IsVariadic of non-func type")
}
tt := (*funcType)(unsafe.Pointer(t))
return tt.outCount&(1<<15) != 0
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) Elem() Type {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
switch t.Kind() {
case Array:
tt := (*arrayType)(unsafe.Pointer(t))
return toType(tt.elem)
case Chan:
tt := (*chanType)(unsafe.Pointer(t))
return toType(tt.elem)
case Map:
tt := (*mapType)(unsafe.Pointer(t))
return toType(tt.elem)
case Ptr:
tt := (*ptrType)(unsafe.Pointer(t))
return toType(tt.elem)
case Slice:
tt := (*sliceType)(unsafe.Pointer(t))
return toType(tt.elem)
}
panic("reflect: Elem of invalid type")
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) Field(i int) StructField {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Struct {
panic("reflect: Field of non-struct type")
}
tt := (*structType)(unsafe.Pointer(t))
return tt.Field(i)
}
func (t *rtype) FieldByIndex(index []int) StructField {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Struct {
panic("reflect: FieldByIndex of non-struct type")
}
tt := (*structType)(unsafe.Pointer(t))
return tt.FieldByIndex(index)
}
func (t *rtype) FieldByName(name string) (StructField, bool) {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Struct {
panic("reflect: FieldByName of non-struct type")
}
tt := (*structType)(unsafe.Pointer(t))
return tt.FieldByName(name)
}
func (t *rtype) FieldByNameFunc(match func(string) bool) (StructField, bool) {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Struct {
panic("reflect: FieldByNameFunc of non-struct type")
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
tt := (*structType)(unsafe.Pointer(t))
return tt.FieldByNameFunc(match)
}
func (t *rtype) In(i int) Type {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Func {
panic("reflect: In of non-func type")
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
tt := (*funcType)(unsafe.Pointer(t))
return toType(tt.in()[i])
}
func (t *rtype) Key() Type {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Map {
panic("reflect: Key of non-map type")
}
tt := (*mapType)(unsafe.Pointer(t))
return toType(tt.key)
}
func (t *rtype) Len() int {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Array {
panic("reflect: Len of non-array type")
}
tt := (*arrayType)(unsafe.Pointer(t))
return int(tt.len)
}
func (t *rtype) NumField() int {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Struct {
panic("reflect: NumField of non-struct type")
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
tt := (*structType)(unsafe.Pointer(t))
return len(tt.fields)
}
func (t *rtype) NumIn() int {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Func {
panic("reflect: NumIn of non-func type")
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
tt := (*funcType)(unsafe.Pointer(t))
return int(tt.inCount)
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) NumOut() int {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Func {
panic("reflect: NumOut of non-func type")
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
tt := (*funcType)(unsafe.Pointer(t))
return len(tt.out())
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (t *rtype) Out(i int) Type {
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if t.Kind() != Func {
panic("reflect: Out of non-func type")
}
tt := (*funcType)(unsafe.Pointer(t))
return toType(tt.out()[i])
}
func (t *funcType) in() []*rtype {
uadd := unsafe.Sizeof(*t)
if t.tflag&tflagUncommon != 0 {
uadd += unsafe.Sizeof(uncommonType{})
}
if t.inCount == 0 {
return nil
}
return (*[1 << 20]*rtype)(add(unsafe.Pointer(t), uadd, "t.inCount > 0"))[:t.inCount]
}
func (t *funcType) out() []*rtype {
uadd := unsafe.Sizeof(*t)
if t.tflag&tflagUncommon != 0 {
uadd += unsafe.Sizeof(uncommonType{})
}
outCount := t.outCount & (1<<15 - 1)
if outCount == 0 {
return nil
}
return (*[1 << 20]*rtype)(add(unsafe.Pointer(t), uadd, "outCount > 0"))[t.inCount : t.inCount+outCount]
}
// add returns p+x.
//
// The whySafe string is ignored, so that the function still inlines
// as efficiently as p+x, but all call sites should use the string to
// record why the addition is safe, which is to say why the addition
// does not cause x to advance to the very end of p's allocation
// and therefore point incorrectly at the next block in memory.
func add(p unsafe.Pointer, x uintptr, whySafe string) unsafe.Pointer {
return unsafe.Pointer(uintptr(p) + x)
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
func (d ChanDir) String() string {
switch d {
case SendDir:
return "chan<-"
case RecvDir:
return "<-chan"
case BothDir:
return "chan"
}
return "ChanDir" + strconv.Itoa(int(d))
}
// Method returns the i'th method in the type's method set.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *interfaceType) Method(i int) (m Method) {
if i < 0 || i >= len(t.methods) {
return
}
p := &t.methods[i]
pname := t.nameOff(p.name)
m.Name = pname.name()
if !pname.isExported() {
m.PkgPath = pname.pkgPath()
if m.PkgPath == "" {
m.PkgPath = t.pkgPath.name()
}
}
m.Type = toType(t.typeOff(p.typ))
m.Index = i
return
}
// NumMethod returns the number of interface methods in the type's method set.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *interfaceType) NumMethod() int { return len(t.methods) }
// MethodByName method with the given name in the type's method set.
func (t *interfaceType) MethodByName(name string) (m Method, ok bool) {
if t == nil {
return
}
var p *imethod
for i := range t.methods {
p = &t.methods[i]
if t.nameOff(p.name).name() == name {
return t.Method(i), true
}
}
return
}
// A StructField describes a single field in a struct.
type StructField struct {
// Name is the field name.
Name string
// PkgPath is the package path that qualifies a lower case (unexported)
// field name. It is empty for upper case (exported) field names.
// See https://golang.org/ref/spec#Uniqueness_of_identifiers
PkgPath string
Type Type // field type
Tag StructTag // field tag string
Offset uintptr // offset within struct, in bytes
Index []int // index sequence for Type.FieldByIndex
Anonymous bool // is an embedded field
}
// A StructTag is the tag string in a struct field.
//
// By convention, tag strings are a concatenation of
// optionally space-separated key:"value" pairs.
// Each key is a non-empty string consisting of non-control
// characters other than space (U+0020 ' '), quote (U+0022 '"'),
// and colon (U+003A ':'). Each value is quoted using U+0022 '"'
// characters and Go string literal syntax.
type StructTag string
// Get returns the value associated with key in the tag string.
// If there is no such key in the tag, Get returns the empty string.
// If the tag does not have the conventional format, the value
// returned by Get is unspecified. To determine whether a tag is
// explicitly set to the empty string, use Lookup.
func (tag StructTag) Get(key string) string {
v, _ := tag.Lookup(key)
return v
}
// Lookup returns the value associated with key in the tag string.
// If the key is present in the tag the value (which may be empty)
// is returned. Otherwise the returned value will be the empty string.
// The ok return value reports whether the value was explicitly set in
// the tag string. If the tag does not have the conventional format,
// the value returned by Lookup is unspecified.
func (tag StructTag) Lookup(key string) (value string, ok bool) {
// When modifying this code, also update the validateStructTag code
// in cmd/vet/structtag.go.
for tag != "" {
// Skip leading space.
i := 0
for i < len(tag) && tag[i] == ' ' {
i++
}
tag = tag[i:]
if tag == "" {
break
}
// Scan to colon. A space, a quote or a control character is a syntax error.
// Strictly speaking, control chars include the range [0x7f, 0x9f], not just
// [0x00, 0x1f], but in practice, we ignore the multi-byte control characters
// as it is simpler to inspect the tag's bytes than the tag's runes.
i = 0
for i < len(tag) && tag[i] > ' ' && tag[i] != ':' && tag[i] != '"' && tag[i] != 0x7f {
i++
}
if i == 0 || i+1 >= len(tag) || tag[i] != ':' || tag[i+1] != '"' {
break
}
name := string(tag[:i])
tag = tag[i+1:]
// Scan quoted string to find value.
i = 1
for i < len(tag) && tag[i] != '"' {
if tag[i] == '\\' {
i++
}
i++
}
if i >= len(tag) {
break
}
qvalue := string(tag[:i+1])
tag = tag[i+1:]
if key == name {
value, err := strconv.Unquote(qvalue)
if err != nil {
break
}
return value, true
}
}
return "", false
}
// Field returns the i'th struct field.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *structType) Field(i int) (f StructField) {
if i < 0 || i >= len(t.fields) {
panic("reflect: Field index out of bounds")
}
p := &t.fields[i]
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
f.Type = toType(p.typ)
f.Name = p.name.name()
f.Anonymous = p.embedded()
if !p.name.isExported() {
f.PkgPath = t.pkgPath.name()
}
if tag := p.name.tag(); tag != "" {
f.Tag = StructTag(tag)
}
f.Offset = p.offset()
// NOTE(rsc): This is the only allocation in the interface
// presented by a reflect.Type. It would be nice to avoid,
// at least in the common cases, but we need to make sure
// that misbehaving clients of reflect cannot affect other
// uses of reflect. One possibility is CL 5371098, but we
// postponed that ugliness until there is a demonstrated
// need for the performance. This is issue 2320.
f.Index = []int{i}
return
}
// TODO(gri): Should there be an error/bool indicator if the index
// is wrong for FieldByIndex?
// FieldByIndex returns the nested field corresponding to index.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *structType) FieldByIndex(index []int) (f StructField) {
f.Type = toType(&t.rtype)
for i, x := range index {
if i > 0 {
ft := f.Type
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
if ft.Kind() == Ptr && ft.Elem().Kind() == Struct {
ft = ft.Elem()
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
f.Type = ft
}
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
f = f.Type.Field(x)
}
return
}
// A fieldScan represents an item on the fieldByNameFunc scan work list.
type fieldScan struct {
typ *structType
index []int
}
// FieldByNameFunc returns the struct field with a name that satisfies the
// match function and a boolean to indicate if the field was found.
func (t *structType) FieldByNameFunc(match func(string) bool) (result StructField, ok bool) {
// This uses the same condition that the Go language does: there must be a unique instance
// of the match at a given depth level. If there are multiple instances of a match at the
// same depth, they annihilate each other and inhibit any possible match at a lower level.
// The algorithm is breadth first search, one depth level at a time.
// The current and next slices are work queues:
// current lists the fields to visit on this depth level,
// and next lists the fields on the next lower level.
current := []fieldScan{}
next := []fieldScan{{typ: t}}
// nextCount records the number of times an embedded type has been
// encountered and considered for queueing in the 'next' slice.
// We only queue the first one, but we increment the count on each.
// If a struct type T can be reached more than once at a given depth level,
// then it annihilates itself and need not be considered at all when we
// process that next depth level.
var nextCount map[*structType]int
// visited records the structs that have been considered already.
// Embedded pointer fields can create cycles in the graph of
// reachable embedded types; visited avoids following those cycles.
// It also avoids duplicated effort: if we didn't find the field in an
// embedded type T at level 2, we won't find it in one at level 4 either.
visited := map[*structType]bool{}
for len(next) > 0 {
current, next = next, current[:0]
count := nextCount
nextCount = nil
// Process all the fields at this depth, now listed in 'current'.
// The loop queues embedded fields found in 'next', for processing during the next
// iteration. The multiplicity of the 'current' field counts is recorded
// in 'count'; the multiplicity of the 'next' field counts is recorded in 'nextCount'.
for _, scan := range current {
t := scan.typ
if visited[t] {
// We've looked through this type before, at a higher level.
// That higher level would shadow the lower level we're now at,
// so this one can't be useful to us. Ignore it.
continue
}
visited[t] = true
for i := range t.fields {
f := &t.fields[i]
// Find name and (for embedded field) type for field f.
fname := f.name.name()
var ntyp *rtype
if f.embedded() {
// Embedded field of type T or *T.
ntyp = f.typ
if ntyp.Kind() == Ptr {
ntyp = ntyp.Elem().common()
}
}
// Does it match?
if match(fname) {
// Potential match
if count[t] > 1 || ok {
// Name appeared multiple times at this level: annihilate.
return StructField{}, false
}
result = t.Field(i)
result.Index = nil
result.Index = append(result.Index, scan.index...)
result.Index = append(result.Index, i)
ok = true
continue
}
// Queue embedded struct fields for processing with next level,
// but only if we haven't seen a match yet at this level and only
// if the embedded types haven't already been queued.
if ok || ntyp == nil || ntyp.Kind() != Struct {
continue
}
styp := (*structType)(unsafe.Pointer(ntyp))
if nextCount[styp] > 0 {
nextCount[styp] = 2 // exact multiple doesn't matter
continue
}
if nextCount == nil {
nextCount = map[*structType]int{}
}
nextCount[styp] = 1
if count[t] > 1 {
nextCount[styp] = 2 // exact multiple doesn't matter
}
var index []int
index = append(index, scan.index...)
index = append(index, i)
next = append(next, fieldScan{styp, index})
}
}
if ok {
break
}
}
return
}
// FieldByName returns the struct field with the given name
// and a boolean to indicate if the field was found.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func (t *structType) FieldByName(name string) (f StructField, present bool) {
// Quick check for top-level name, or struct without embedded fields.
hasEmbeds := false
if name != "" {
for i := range t.fields {
tf := &t.fields[i]
if tf.name.name() == name {
return t.Field(i), true
}
if tf.embedded() {
hasEmbeds = true
}
}
}
if !hasEmbeds {
return
}
return t.FieldByNameFunc(func(s string) bool { return s == name })
}
// TypeOf returns the reflection Type that represents the dynamic type of i.
// If i is a nil interface value, TypeOf returns nil.
func TypeOf(i interface{}) Type {
reflect: more efficient; cannot Set result of NewValue anymore * Reduces malloc counts during gob encoder/decoder test from 6/6 to 3/5. The current reflect uses Set to mean two subtly different things. (1) If you have a reflect.Value v, it might just represent itself (as in v = reflect.NewValue(42)), in which case calling v.Set only changed v, not any other data in the program. (2) If you have a reflect Value v derived from a pointer or a slice (as in x := []int{42}; v = reflect.NewValue(x).Index(0)), v represents the value held there. Changing x[0] affects the value returned by v.Int(), and calling v.Set affects x[0]. This was not really by design; it just happened that way. The motivation for the new reflect implementation was to remove mallocs. The use case (1) has an implicit malloc inside it. If you can do: v := reflect.NewValue(0) v.Set(42) i := v.Int() // i = 42 then that implies that v is referring to some underlying chunk of memory in order to remember the 42; that is, NewValue must have allocated some memory. Almost all the time you are using reflect the goal is to inspect or to change other data, not to manipulate data stored solely inside a reflect.Value. This CL removes use case (1), so that an assignable reflect.Value must always refer to some other piece of data in the program. Put another way, removing this case would make v := reflect.NewValue(0) v.Set(42) as illegal as 0 = 42. It would also make this illegal: x := 0 v := reflect.NewValue(x) v.Set(42) for the same reason. (Note that right now, v.Set(42) "succeeds" but does not change the value of x.) If you really wanted to make v refer to x, you'd start with &x and dereference it: x := 0 v := reflect.NewValue(&x).Elem() // v = *&x v.Set(42) It's pretty rare, except in tests, to want to use NewValue and then call Set to change the Value itself instead of some other piece of data in the program. I haven't seen it happen once yet while making the tree build with this change. For the same reasons, reflect.Zero (formerly reflect.MakeZero) would also return an unassignable, unaddressable value. This invalidates the (awkward) idiom: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.PointTo(v) which, when the API changed, turned into: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.Set(v.Addr()) In both, it is far from clear what the code is trying to do. Now that it is possible, this CL adds reflect.New(Type) Value that does the obvious thing (same as Go's new), so this code would be replaced by: pv := ... some Ptr Value we have ... pv.Set(reflect.New(pv.Type().Elem())) The changes just described can be confusing to think about, but I believe it is because the old API was confusing - it was conflating two different kinds of Values - and that the new API by itself is pretty simple: you can only Set (or call Addr on) a Value if it actually addresses some real piece of data; that is, only if it is the result of dereferencing a Ptr or indexing a Slice. If you really want the old behavior, you'd get it by translating: v := reflect.NewValue(x) into v := reflect.New(reflect.Typeof(x)).Elem() v.Set(reflect.NewValue(x)) Gofix will not be able to help with this, because whether and how to change the code depends on whether the original code meant use (1) or use (2), so the developer has to read and think about the code. You can see the effect on packages in the tree in https://golang.org/cl/4423043/. R=r CC=golang-dev https://golang.org/cl/4435042
2011-04-18 14:35:33 -04:00
eface := *(*emptyInterface)(unsafe.Pointer(&i))
return toType(eface.typ)
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
}
// ptrMap is the cache for PtrTo.
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
var ptrMap sync.Map // map[*rtype]*ptrType
reflect: more efficient; cannot Set result of NewValue anymore * Reduces malloc counts during gob encoder/decoder test from 6/6 to 3/5. The current reflect uses Set to mean two subtly different things. (1) If you have a reflect.Value v, it might just represent itself (as in v = reflect.NewValue(42)), in which case calling v.Set only changed v, not any other data in the program. (2) If you have a reflect Value v derived from a pointer or a slice (as in x := []int{42}; v = reflect.NewValue(x).Index(0)), v represents the value held there. Changing x[0] affects the value returned by v.Int(), and calling v.Set affects x[0]. This was not really by design; it just happened that way. The motivation for the new reflect implementation was to remove mallocs. The use case (1) has an implicit malloc inside it. If you can do: v := reflect.NewValue(0) v.Set(42) i := v.Int() // i = 42 then that implies that v is referring to some underlying chunk of memory in order to remember the 42; that is, NewValue must have allocated some memory. Almost all the time you are using reflect the goal is to inspect or to change other data, not to manipulate data stored solely inside a reflect.Value. This CL removes use case (1), so that an assignable reflect.Value must always refer to some other piece of data in the program. Put another way, removing this case would make v := reflect.NewValue(0) v.Set(42) as illegal as 0 = 42. It would also make this illegal: x := 0 v := reflect.NewValue(x) v.Set(42) for the same reason. (Note that right now, v.Set(42) "succeeds" but does not change the value of x.) If you really wanted to make v refer to x, you'd start with &x and dereference it: x := 0 v := reflect.NewValue(&x).Elem() // v = *&x v.Set(42) It's pretty rare, except in tests, to want to use NewValue and then call Set to change the Value itself instead of some other piece of data in the program. I haven't seen it happen once yet while making the tree build with this change. For the same reasons, reflect.Zero (formerly reflect.MakeZero) would also return an unassignable, unaddressable value. This invalidates the (awkward) idiom: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.PointTo(v) which, when the API changed, turned into: pv := ... some Ptr Value we have ... v := reflect.Zero(pv.Type().Elem()) pv.Set(v.Addr()) In both, it is far from clear what the code is trying to do. Now that it is possible, this CL adds reflect.New(Type) Value that does the obvious thing (same as Go's new), so this code would be replaced by: pv := ... some Ptr Value we have ... pv.Set(reflect.New(pv.Type().Elem())) The changes just described can be confusing to think about, but I believe it is because the old API was confusing - it was conflating two different kinds of Values - and that the new API by itself is pretty simple: you can only Set (or call Addr on) a Value if it actually addresses some real piece of data; that is, only if it is the result of dereferencing a Ptr or indexing a Slice. If you really want the old behavior, you'd get it by translating: v := reflect.NewValue(x) into v := reflect.New(reflect.Typeof(x)).Elem() v.Set(reflect.NewValue(x)) Gofix will not be able to help with this, because whether and how to change the code depends on whether the original code meant use (1) or use (2), so the developer has to read and think about the code. You can see the effect on packages in the tree in https://golang.org/cl/4423043/. R=r CC=golang-dev https://golang.org/cl/4435042
2011-04-18 14:35:33 -04:00
// PtrTo returns the pointer type with element t.
// For example, if t represents type Foo, PtrTo(t) represents *Foo.
reflect: new Type and Value definitions Type is now an interface that implements all the possible type methods. Instead of a type switch on a reflect.Type t, switch on t.Kind(). If a method is invoked on the wrong kind of type (for example, calling t.Field(0) when t.Kind() != Struct), the call panics. There is one method renaming: t.(*ChanType).Dir() is now t.ChanDir(). Value is now a struct value that implements all the possible value methods. Instead of a type switch on a reflect.Value v, switch on v.Kind(). If a method is invoked on the wrong kind of value (for example, calling t.Recv() when t.Kind() != Chan), the call panics. Since Value is now a struct, not an interface, its zero value cannot be compared to nil. Instead of v != nil, use v.IsValid(). Instead of other uses of nil as a Value, use Value{}, the zero value. Many methods have been renamed, most due to signature conflicts: OLD NEW v.(*ArrayValue).Elem v.Index v.(*BoolValue).Get v.Bool v.(*BoolValue).Set v.SetBool v.(*ChanType).Dir v.ChanDir v.(*ChanValue).Get v.Pointer v.(*ComplexValue).Get v.Complex v.(*ComplexValue).Overflow v.OverflowComplex v.(*ComplexValue).Set v.SetComplex v.(*FloatValue).Get v.Float v.(*FloatValue).Overflow v.OverflowFloat v.(*FloatValue).Set v.SetFloat v.(*FuncValue).Get v.Pointer v.(*InterfaceValue).Get v.InterfaceData v.(*IntValue).Get v.Int v.(*IntValue).Overflow v.OverflowInt v.(*IntValue).Set v.SetInt v.(*MapValue).Elem v.MapIndex v.(*MapValue).Get v.Pointer v.(*MapValue).Keys v.MapKeys v.(*MapValue).SetElem v.SetMapIndex v.(*PtrValue).Get v.Pointer v.(*SliceValue).Elem v.Index v.(*SliceValue).Get v.Pointer v.(*StringValue).Get v.String v.(*StringValue).Set v.SetString v.(*UintValue).Get v.Uint v.(*UintValue).Overflow v.OverflowUint v.(*UintValue).Set v.SetUint v.(*UnsafePointerValue).Get v.Pointer v.(*UnsafePointerValue).Set v.SetPointer Part of the motivation for this change is to enable a more efficient implementation of Value, one that does not allocate memory during most operations. To reduce the size of the CL, this CL's implementation is a wrapper around the old API. Later CLs will make the implementation more efficient without changing the API. Other CLs to be submitted at the same time as this one add support for this change to gofix (4343047) and update the Go source tree (4353043). R=gri, iant, niemeyer, r, rog, gustavo, r2 CC=golang-dev https://golang.org/cl/4281055
2011-04-08 12:26:51 -04:00
func PtrTo(t Type) Type {
return t.(*rtype).ptrTo()
}
func (t *rtype) ptrTo() *rtype {
if t.ptrToThis != 0 {
return t.typeOff(t.ptrToThis)
}
// Check the cache.
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if pi, ok := ptrMap.Load(t); ok {
return &pi.(*ptrType).rtype
}
// Look in known types.
s := "*" + t.String()
for _, tt := range typesByString(s) {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
p := (*ptrType)(unsafe.Pointer(tt))
if p.elem != t {
continue
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
pi, _ := ptrMap.LoadOrStore(t, p)
return &pi.(*ptrType).rtype
}
// Create a new ptrType starting with the description
// of an *unsafe.Pointer.
var iptr interface{} = (*unsafe.Pointer)(nil)
prototype := *(**ptrType)(unsafe.Pointer(&iptr))
pp := *prototype
pp.str = resolveReflectName(newName(s, "", false))
pp.ptrToThis = 0
// For the type structures linked into the binary, the
// compiler provides a good hash of the string.
// Create a good hash for the new string by using
// the FNV-1 hash's mixing function to combine the
// old hash and the new "*".
pp.hash = fnv1(t.hash, '*')
pp.elem = t
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
pi, _ := ptrMap.LoadOrStore(t, &pp)
return &pi.(*ptrType).rtype
}
// fnv1 incorporates the list of bytes into the hash x using the FNV-1 hash function.
func fnv1(x uint32, list ...byte) uint32 {
for _, b := range list {
x = x*16777619 ^ uint32(b)
}
return x
}
func (t *rtype) Implements(u Type) bool {
if u == nil {
panic("reflect: nil type passed to Type.Implements")
}
if u.Kind() != Interface {
panic("reflect: non-interface type passed to Type.Implements")
}
return implements(u.(*rtype), t)
}
func (t *rtype) AssignableTo(u Type) bool {
if u == nil {
panic("reflect: nil type passed to Type.AssignableTo")
}
uu := u.(*rtype)
return directlyAssignable(uu, t) || implements(uu, t)
}
func (t *rtype) ConvertibleTo(u Type) bool {
if u == nil {
panic("reflect: nil type passed to Type.ConvertibleTo")
}
uu := u.(*rtype)
return convertOp(uu, t) != nil
}
func (t *rtype) Comparable() bool {
return t.alg != nil && t.alg.equal != nil
}
// implements reports whether the type V implements the interface type T.
func implements(T, V *rtype) bool {
if T.Kind() != Interface {
return false
}
t := (*interfaceType)(unsafe.Pointer(T))
if len(t.methods) == 0 {
return true
}
// The same algorithm applies in both cases, but the
// method tables for an interface type and a concrete type
// are different, so the code is duplicated.
// In both cases the algorithm is a linear scan over the two
// lists - T's methods and V's methods - simultaneously.
// Since method tables are stored in a unique sorted order
// (alphabetical, with no duplicate method names), the scan
// through V's methods must hit a match for each of T's
// methods along the way, or else V does not implement T.
// This lets us run the scan in overall linear time instead of
// the quadratic time a naive search would require.
// See also ../runtime/iface.go.
if V.Kind() == Interface {
v := (*interfaceType)(unsafe.Pointer(V))
i := 0
for j := 0; j < len(v.methods); j++ {
tm := &t.methods[i]
tmName := t.nameOff(tm.name)
vm := &v.methods[j]
vmName := V.nameOff(vm.name)
if vmName.name() == tmName.name() && V.typeOff(vm.typ) == t.typeOff(tm.typ) {
if !tmName.isExported() {
tmPkgPath := tmName.pkgPath()
if tmPkgPath == "" {
tmPkgPath = t.pkgPath.name()
}
vmPkgPath := vmName.pkgPath()
if vmPkgPath == "" {
vmPkgPath = v.pkgPath.name()
}
if tmPkgPath != vmPkgPath {
continue
}
}
if i++; i >= len(t.methods) {
return true
}
}
}
return false
}
v := V.uncommon()
if v == nil {
return false
}
i := 0
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
vmethods := v.methods()
for j := 0; j < int(v.mcount); j++ {
tm := &t.methods[i]
tmName := t.nameOff(tm.name)
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
vm := vmethods[j]
vmName := V.nameOff(vm.name)
if vmName.name() == tmName.name() && V.typeOff(vm.mtyp) == t.typeOff(tm.typ) {
if !tmName.isExported() {
tmPkgPath := tmName.pkgPath()
if tmPkgPath == "" {
tmPkgPath = t.pkgPath.name()
}
vmPkgPath := vmName.pkgPath()
if vmPkgPath == "" {
vmPkgPath = V.nameOff(v.pkgPath).name()
}
if tmPkgPath != vmPkgPath {
continue
}
}
if i++; i >= len(t.methods) {
return true
}
}
}
return false
}
// directlyAssignable reports whether a value x of type V can be directly
// assigned (using memmove) to a value of type T.
// https://golang.org/doc/go_spec.html#Assignability
// Ignoring the interface rules (implemented elsewhere)
// and the ideal constant rules (no ideal constants at run time).
func directlyAssignable(T, V *rtype) bool {
// x's type V is identical to T?
if T == V {
return true
}
// Otherwise at least one of T and V must not be defined
// and they must have the same kind.
if T.hasName() && V.hasName() || T.Kind() != V.Kind() {
return false
}
// x's type T and V must have identical underlying types.
return haveIdenticalUnderlyingType(T, V, true)
}
func haveIdenticalType(T, V Type, cmpTags bool) bool {
if cmpTags {
return T == V
}
if T.Name() != V.Name() || T.Kind() != V.Kind() {
return false
}
return haveIdenticalUnderlyingType(T.common(), V.common(), false)
}
func haveIdenticalUnderlyingType(T, V *rtype, cmpTags bool) bool {
if T == V {
return true
}
kind := T.Kind()
if kind != V.Kind() {
return false
}
// Non-composite types of equal kind have same underlying type
// (the predefined instance of the type).
if Bool <= kind && kind <= Complex128 || kind == String || kind == UnsafePointer {
return true
}
// Composite types.
switch kind {
case Array:
return T.Len() == V.Len() && haveIdenticalType(T.Elem(), V.Elem(), cmpTags)
case Chan:
// Special case:
// x is a bidirectional channel value, T is a channel type,
// and x's type V and T have identical element types.
if V.ChanDir() == BothDir && haveIdenticalType(T.Elem(), V.Elem(), cmpTags) {
return true
}
// Otherwise continue test for identical underlying type.
return V.ChanDir() == T.ChanDir() && haveIdenticalType(T.Elem(), V.Elem(), cmpTags)
case Func:
t := (*funcType)(unsafe.Pointer(T))
v := (*funcType)(unsafe.Pointer(V))
if t.outCount != v.outCount || t.inCount != v.inCount {
return false
}
for i := 0; i < t.NumIn(); i++ {
if !haveIdenticalType(t.In(i), v.In(i), cmpTags) {
return false
}
}
for i := 0; i < t.NumOut(); i++ {
if !haveIdenticalType(t.Out(i), v.Out(i), cmpTags) {
return false
}
}
return true
case Interface:
t := (*interfaceType)(unsafe.Pointer(T))
v := (*interfaceType)(unsafe.Pointer(V))
if len(t.methods) == 0 && len(v.methods) == 0 {
return true
}
// Might have the same methods but still
// need a run time conversion.
return false
case Map:
return haveIdenticalType(T.Key(), V.Key(), cmpTags) && haveIdenticalType(T.Elem(), V.Elem(), cmpTags)
case Ptr, Slice:
return haveIdenticalType(T.Elem(), V.Elem(), cmpTags)
case Struct:
t := (*structType)(unsafe.Pointer(T))
v := (*structType)(unsafe.Pointer(V))
if len(t.fields) != len(v.fields) {
return false
}
if t.pkgPath.name() != v.pkgPath.name() {
return false
}
for i := range t.fields {
tf := &t.fields[i]
vf := &v.fields[i]
if tf.name.name() != vf.name.name() {
return false
}
if !haveIdenticalType(tf.typ, vf.typ, cmpTags) {
return false
}
if cmpTags && tf.name.tag() != vf.name.tag() {
return false
}
if tf.offsetEmbed != vf.offsetEmbed {
return false
}
}
return true
}
return false
}
// typelinks is implemented in package runtime.
// It returns a slice of the sections in each module,
// and a slice of *rtype offsets in each module.
//
// The types in each module are sorted by string. That is, the first
// two linked types of the first module are:
//
// d0 := sections[0]
// t1 := (*rtype)(add(d0, offset[0][0]))
// t2 := (*rtype)(add(d0, offset[0][1]))
//
// and
//
// t1.String() < t2.String()
//
// Note that strings are not unique identifiers for types:
// there can be more than one with a given string.
// Only types we might want to look up are included:
// pointers, channels, maps, slices, and arrays.
func typelinks() (sections []unsafe.Pointer, offset [][]int32)
func rtypeOff(section unsafe.Pointer, off int32) *rtype {
return (*rtype)(add(section, uintptr(off), "sizeof(rtype) > 0"))
}
// typesByString returns the subslice of typelinks() whose elements have
// the given string representation.
// It may be empty (no known types with that string) or may have
// multiple elements (multiple types with that string).
func typesByString(s string) []*rtype {
sections, offset := typelinks()
var ret []*rtype
for offsI, offs := range offset {
section := sections[offsI]
// We are looking for the first index i where the string becomes >= s.
// This is a copy of sort.Search, with f(h) replaced by (*typ[h].String() >= s).
i, j := 0, len(offs)
for i < j {
h := i + (j-i)/2 // avoid overflow when computing h
// i ≤ h < j
if !(rtypeOff(section, offs[h]).String() >= s) {
i = h + 1 // preserves f(i-1) == false
} else {
j = h // preserves f(j) == true
}
}
// i == j, f(i-1) == false, and f(j) (= f(i)) == true => answer is i.
// Having found the first, linear scan forward to find the last.
// We could do a second binary search, but the caller is going
// to do a linear scan anyway.
for j := i; j < len(offs); j++ {
typ := rtypeOff(section, offs[j])
if typ.String() != s {
break
}
ret = append(ret, typ)
}
}
return ret
}
// The lookupCache caches ArrayOf, ChanOf, MapOf and SliceOf lookups.
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
var lookupCache sync.Map // map[cacheKey]*rtype
// A cacheKey is the key for use in the lookupCache.
// Four values describe any of the types we are looking for:
// type kind, one or two subtypes, and an extra integer.
type cacheKey struct {
kind Kind
t1 *rtype
t2 *rtype
extra uintptr
}
// The funcLookupCache caches FuncOf lookups.
// FuncOf does not share the common lookupCache since cacheKey is not
// sufficient to represent functions unambiguously.
var funcLookupCache struct {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
sync.Mutex // Guards stores (but not loads) on m.
// m is a map[uint32][]*rtype keyed by the hash calculated in FuncOf.
// Elements of m are append-only and thus safe for concurrent reading.
m sync.Map
}
// ChanOf returns the channel type with the given direction and element type.
// For example, if t represents int, ChanOf(RecvDir, t) represents <-chan int.
//
// The gc runtime imposes a limit of 64 kB on channel element types.
// If t's size is equal to or exceeds this limit, ChanOf panics.
func ChanOf(dir ChanDir, t Type) Type {
typ := t.(*rtype)
// Look in cache.
ckey := cacheKey{Chan, typ, nil, uintptr(dir)}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if ch, ok := lookupCache.Load(ckey); ok {
return ch.(*rtype)
}
// This restriction is imposed by the gc compiler and the runtime.
if typ.size >= 1<<16 {
panic("reflect.ChanOf: element size too large")
}
// Look in known types.
// TODO: Precedence when constructing string.
var s string
switch dir {
default:
panic("reflect.ChanOf: invalid dir")
case SendDir:
s = "chan<- " + typ.String()
case RecvDir:
s = "<-chan " + typ.String()
case BothDir:
s = "chan " + typ.String()
}
for _, tt := range typesByString(s) {
ch := (*chanType)(unsafe.Pointer(tt))
if ch.elem == typ && ch.dir == uintptr(dir) {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, tt)
return ti.(Type)
}
}
// Make a channel type.
var ichan interface{} = (chan unsafe.Pointer)(nil)
prototype := *(**chanType)(unsafe.Pointer(&ichan))
ch := *prototype
ch.tflag = 0
ch.dir = uintptr(dir)
ch.str = resolveReflectName(newName(s, "", false))
ch.hash = fnv1(typ.hash, 'c', byte(dir))
ch.elem = typ
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, &ch.rtype)
return ti.(Type)
}
func ismapkey(*rtype) bool // implemented in runtime
// MapOf returns the map type with the given key and element types.
// For example, if k represents int and e represents string,
// MapOf(k, e) represents map[int]string.
//
// If the key type is not a valid map key type (that is, if it does
// not implement Go's == operator), MapOf panics.
func MapOf(key, elem Type) Type {
ktyp := key.(*rtype)
etyp := elem.(*rtype)
if !ismapkey(ktyp) {
panic("reflect.MapOf: invalid key type " + ktyp.String())
}
// Look in cache.
ckey := cacheKey{Map, ktyp, etyp, 0}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if mt, ok := lookupCache.Load(ckey); ok {
return mt.(Type)
}
// Look in known types.
s := "map[" + ktyp.String() + "]" + etyp.String()
for _, tt := range typesByString(s) {
mt := (*mapType)(unsafe.Pointer(tt))
if mt.key == ktyp && mt.elem == etyp {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, tt)
return ti.(Type)
}
}
// Make a map type.
// Note: flag values must match those used in the TMAP case
// in ../cmd/compile/internal/gc/reflect.go:dtypesym.
var imap interface{} = (map[unsafe.Pointer]unsafe.Pointer)(nil)
mt := **(**mapType)(unsafe.Pointer(&imap))
mt.str = resolveReflectName(newName(s, "", false))
mt.tflag = 0
mt.hash = fnv1(etyp.hash, 'm', byte(ktyp.hash>>24), byte(ktyp.hash>>16), byte(ktyp.hash>>8), byte(ktyp.hash))
mt.key = ktyp
mt.elem = etyp
mt.bucket = bucketOf(ktyp, etyp)
mt.flags = 0
if ktyp.size > maxKeySize {
mt.keysize = uint8(ptrSize)
mt.flags |= 1 // indirect key
} else {
mt.keysize = uint8(ktyp.size)
}
if etyp.size > maxValSize {
mt.valuesize = uint8(ptrSize)
mt.flags |= 2 // indirect value
} else {
mt.valuesize = uint8(etyp.size)
}
mt.bucketsize = uint16(mt.bucket.size)
if isReflexive(ktyp) {
mt.flags |= 4
}
if needKeyUpdate(ktyp) {
mt.flags |= 8
}
if hashMightPanic(ktyp) {
mt.flags |= 16
}
mt.ptrToThis = 0
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, &mt.rtype)
return ti.(Type)
}
// TODO(crawshaw): as these funcTypeFixedN structs have no methods,
// they could be defined at runtime using the StructOf function.
type funcTypeFixed4 struct {
funcType
args [4]*rtype
}
type funcTypeFixed8 struct {
funcType
args [8]*rtype
}
type funcTypeFixed16 struct {
funcType
args [16]*rtype
}
type funcTypeFixed32 struct {
funcType
args [32]*rtype
}
type funcTypeFixed64 struct {
funcType
args [64]*rtype
}
type funcTypeFixed128 struct {
funcType
args [128]*rtype
}
// FuncOf returns the function type with the given argument and result types.
// For example if k represents int and e represents string,
// FuncOf([]Type{k}, []Type{e}, false) represents func(int) string.
//
// The variadic argument controls whether the function is variadic. FuncOf
// panics if the in[len(in)-1] does not represent a slice and variadic is
// true.
func FuncOf(in, out []Type, variadic bool) Type {
if variadic && (len(in) == 0 || in[len(in)-1].Kind() != Slice) {
panic("reflect.FuncOf: last arg of variadic func must be slice")
}
// Make a func type.
var ifunc interface{} = (func())(nil)
prototype := *(**funcType)(unsafe.Pointer(&ifunc))
n := len(in) + len(out)
var ft *funcType
var args []*rtype
switch {
case n <= 4:
fixed := new(funcTypeFixed4)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
case n <= 8:
fixed := new(funcTypeFixed8)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
case n <= 16:
fixed := new(funcTypeFixed16)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
case n <= 32:
fixed := new(funcTypeFixed32)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
case n <= 64:
fixed := new(funcTypeFixed64)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
case n <= 128:
fixed := new(funcTypeFixed128)
args = fixed.args[:0:len(fixed.args)]
ft = &fixed.funcType
default:
panic("reflect.FuncOf: too many arguments")
}
*ft = *prototype
// Build a hash and minimally populate ft.
var hash uint32
for _, in := range in {
t := in.(*rtype)
args = append(args, t)
hash = fnv1(hash, byte(t.hash>>24), byte(t.hash>>16), byte(t.hash>>8), byte(t.hash))
}
if variadic {
hash = fnv1(hash, 'v')
}
hash = fnv1(hash, '.')
for _, out := range out {
t := out.(*rtype)
args = append(args, t)
hash = fnv1(hash, byte(t.hash>>24), byte(t.hash>>16), byte(t.hash>>8), byte(t.hash))
}
if len(args) > 50 {
panic("reflect.FuncOf does not support more than 50 arguments")
}
ft.tflag = 0
ft.hash = hash
ft.inCount = uint16(len(in))
ft.outCount = uint16(len(out))
if variadic {
ft.outCount |= 1 << 15
}
// Look in cache.
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if ts, ok := funcLookupCache.m.Load(hash); ok {
for _, t := range ts.([]*rtype) {
if haveIdenticalUnderlyingType(&ft.rtype, t, true) {
return t
}
}
}
// Not in cache, lock and retry.
funcLookupCache.Lock()
defer funcLookupCache.Unlock()
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if ts, ok := funcLookupCache.m.Load(hash); ok {
for _, t := range ts.([]*rtype) {
if haveIdenticalUnderlyingType(&ft.rtype, t, true) {
return t
}
}
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
addToCache := func(tt *rtype) Type {
var rts []*rtype
if rti, ok := funcLookupCache.m.Load(hash); ok {
rts = rti.([]*rtype)
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
funcLookupCache.m.Store(hash, append(rts, tt))
return tt
}
// Look in known types for the same string representation.
str := funcStr(ft)
for _, tt := range typesByString(str) {
if haveIdenticalUnderlyingType(&ft.rtype, tt, true) {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
return addToCache(tt)
}
}
// Populate the remaining fields of ft and store in cache.
ft.str = resolveReflectName(newName(str, "", false))
ft.ptrToThis = 0
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
return addToCache(&ft.rtype)
}
// funcStr builds a string representation of a funcType.
func funcStr(ft *funcType) string {
repr := make([]byte, 0, 64)
repr = append(repr, "func("...)
for i, t := range ft.in() {
if i > 0 {
repr = append(repr, ", "...)
}
if ft.IsVariadic() && i == int(ft.inCount)-1 {
repr = append(repr, "..."...)
repr = append(repr, (*sliceType)(unsafe.Pointer(t)).elem.String()...)
} else {
repr = append(repr, t.String()...)
}
}
repr = append(repr, ')')
out := ft.out()
if len(out) == 1 {
repr = append(repr, ' ')
} else if len(out) > 1 {
repr = append(repr, " ("...)
}
for i, t := range out {
if i > 0 {
repr = append(repr, ", "...)
}
repr = append(repr, t.String()...)
}
if len(out) > 1 {
repr = append(repr, ')')
}
return string(repr)
}
// isReflexive reports whether the == operation on the type is reflexive.
// That is, x == x for all values x of type t.
func isReflexive(t *rtype) bool {
switch t.Kind() {
case Bool, Int, Int8, Int16, Int32, Int64, Uint, Uint8, Uint16, Uint32, Uint64, Uintptr, Chan, Ptr, String, UnsafePointer:
return true
case Float32, Float64, Complex64, Complex128, Interface:
return false
case Array:
tt := (*arrayType)(unsafe.Pointer(t))
return isReflexive(tt.elem)
case Struct:
tt := (*structType)(unsafe.Pointer(t))
for _, f := range tt.fields {
if !isReflexive(f.typ) {
return false
}
}
return true
default:
// Func, Map, Slice, Invalid
panic("isReflexive called on non-key type " + t.String())
}
}
// needKeyUpdate reports whether map overwrites require the key to be copied.
func needKeyUpdate(t *rtype) bool {
switch t.Kind() {
case Bool, Int, Int8, Int16, Int32, Int64, Uint, Uint8, Uint16, Uint32, Uint64, Uintptr, Chan, Ptr, UnsafePointer:
return false
case Float32, Float64, Complex64, Complex128, Interface, String:
// Float keys can be updated from +0 to -0.
// String keys can be updated to use a smaller backing store.
// Interfaces might have floats of strings in them.
return true
case Array:
tt := (*arrayType)(unsafe.Pointer(t))
return needKeyUpdate(tt.elem)
case Struct:
tt := (*structType)(unsafe.Pointer(t))
for _, f := range tt.fields {
if needKeyUpdate(f.typ) {
return true
}
}
return false
default:
// Func, Map, Slice, Invalid
panic("needKeyUpdate called on non-key type " + t.String())
}
}
// hashMightPanic reports whether the hash of a map key of type t might panic.
func hashMightPanic(t *rtype) bool {
switch t.Kind() {
case Interface:
return true
case Array:
tt := (*arrayType)(unsafe.Pointer(t))
return hashMightPanic(tt.elem)
case Struct:
tt := (*structType)(unsafe.Pointer(t))
for _, f := range tt.fields {
if hashMightPanic(f.typ) {
return true
}
}
return false
default:
return false
}
}
// Make sure these routines stay in sync with ../../runtime/map.go!
// These types exist only for GC, so we only fill out GC relevant info.
// Currently, that's just size and the GC program. We also fill in string
// for possible debugging use.
const (
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
bucketSize uintptr = 8
maxKeySize uintptr = 128
maxValSize uintptr = 128
)
func bucketOf(ktyp, etyp *rtype) *rtype {
if ktyp.size > maxKeySize {
ktyp = PtrTo(ktyp).(*rtype)
}
if etyp.size > maxValSize {
etyp = PtrTo(etyp).(*rtype)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
// Prepare GC data if any.
// A bucket is at most bucketSize*(1+maxKeySize+maxValSize)+2*ptrSize bytes,
// or 2072 bytes, or 259 pointer-size words, or 33 bytes of pointer bitmap.
// Note that since the key and value are known to be <= 128 bytes,
// they're guaranteed to have bitmaps instead of GC programs.
var gcdata *byte
var ptrdata uintptr
cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl On most systems, a pointer is the worst case alignment, so adding a pointer field at the end of a struct guarantees there will be no padding added after that field (to satisfy overall struct alignment due to some more-aligned field also present). In the runtime, the map implementation needs a quick way to get to the overflow pointer, which is last in the bucket struct, so it uses size - sizeof(pointer) as the offset. NaCl/amd64p32 is the exception, as always. The worst case alignment is 64 bits but pointers are 32 bits. There's a long history that is not worth going into, but when we moved the overflow pointer to the end of the struct, we didn't get the padding computation right. The compiler computed the regular struct size and then on amd64p32 added another 32-bit field. And the runtime assumed it could step back two 32-bit fields (one 64-bit register size) to get to the overflow pointer. But in fact if the struct needed 64-bit alignment, the computation of the regular struct size would have added a 32-bit pad already, and then the code unconditionally added a second 32-bit pad. This placed the overflow pointer three words from the end, not two. The last two were padding, and since the runtime was consistent about using the second-to-last word as the overflow pointer, no harm done in the sense of overwriting useful memory. But writing the overflow pointer to a non-pointer word of memory means that the GC can't see the overflow blocks, so it will collect them prematurely. Then bad things happen. Correct all this in a few steps: 1. Add an explicit check at the end of the bucket layout in the compiler that the overflow field is last in the struct, never followed by padding. 2. When padding is needed on nacl (not always, just when needed), insert it before the overflow pointer, to preserve the "last in the struct" property. 3. Let the compiler have the final word on the width of the struct, by inserting an explicit padding field instead of overwriting the results of the width computation it does. 4. For the same reason (tell the truth to the compiler), set the type of the overflow field when we're trying to pretend its not a pointer (in this case the runtime maintains a list of the overflow blocks elsewhere). 5. Make the runtime use "last in the struct" as its location algorithm. This fixes TestTraceStress on nacl/amd64p32. The 'bad map state' and 'invalid free list' failures no longer occur. Fixes #11838. Change-Id: If918887f8f252d988db0a35159944d2b36512f92 Reviewed-on: https://go-review.googlesource.com/12971 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-07-30 22:05:51 -04:00
var overflowPad uintptr
// On NaCl, pad if needed to make overflow end at the proper struct alignment.
// On other systems, align > ptrSize is not possible.
if runtime.GOARCH == "amd64p32" && (ktyp.align > ptrSize || etyp.align > ptrSize) {
overflowPad = ptrSize
}
size := bucketSize*(1+ktyp.size+etyp.size) + overflowPad + ptrSize
if size&uintptr(ktyp.align-1) != 0 || size&uintptr(etyp.align-1) != 0 {
panic("reflect: bad size computation in MapOf")
}
if ktyp.ptrdata != 0 || etyp.ptrdata != 0 {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
nptr := (bucketSize*(1+ktyp.size+etyp.size) + ptrSize) / ptrSize
mask := make([]byte, (nptr+7)/8)
base := bucketSize / ptrSize
if ktyp.ptrdata != 0 {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if ktyp.kind&kindGCProg != 0 {
panic("reflect: unexpected GC program in MapOf")
}
kmask := (*[16]byte)(unsafe.Pointer(ktyp.gcdata))
for i := uintptr(0); i < ktyp.ptrdata/ptrSize; i++ {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if (kmask[i/8]>>(i%8))&1 != 0 {
for j := uintptr(0); j < bucketSize; j++ {
word := base + j*ktyp.size/ptrSize + i
mask[word/8] |= 1 << (word % 8)
}
}
}
}
base += bucketSize * ktyp.size / ptrSize
if etyp.ptrdata != 0 {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if etyp.kind&kindGCProg != 0 {
panic("reflect: unexpected GC program in MapOf")
}
emask := (*[16]byte)(unsafe.Pointer(etyp.gcdata))
for i := uintptr(0); i < etyp.ptrdata/ptrSize; i++ {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if (emask[i/8]>>(i%8))&1 != 0 {
for j := uintptr(0); j < bucketSize; j++ {
word := base + j*etyp.size/ptrSize + i
mask[word/8] |= 1 << (word % 8)
}
}
}
}
base += bucketSize * etyp.size / ptrSize
cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl On most systems, a pointer is the worst case alignment, so adding a pointer field at the end of a struct guarantees there will be no padding added after that field (to satisfy overall struct alignment due to some more-aligned field also present). In the runtime, the map implementation needs a quick way to get to the overflow pointer, which is last in the bucket struct, so it uses size - sizeof(pointer) as the offset. NaCl/amd64p32 is the exception, as always. The worst case alignment is 64 bits but pointers are 32 bits. There's a long history that is not worth going into, but when we moved the overflow pointer to the end of the struct, we didn't get the padding computation right. The compiler computed the regular struct size and then on amd64p32 added another 32-bit field. And the runtime assumed it could step back two 32-bit fields (one 64-bit register size) to get to the overflow pointer. But in fact if the struct needed 64-bit alignment, the computation of the regular struct size would have added a 32-bit pad already, and then the code unconditionally added a second 32-bit pad. This placed the overflow pointer three words from the end, not two. The last two were padding, and since the runtime was consistent about using the second-to-last word as the overflow pointer, no harm done in the sense of overwriting useful memory. But writing the overflow pointer to a non-pointer word of memory means that the GC can't see the overflow blocks, so it will collect them prematurely. Then bad things happen. Correct all this in a few steps: 1. Add an explicit check at the end of the bucket layout in the compiler that the overflow field is last in the struct, never followed by padding. 2. When padding is needed on nacl (not always, just when needed), insert it before the overflow pointer, to preserve the "last in the struct" property. 3. Let the compiler have the final word on the width of the struct, by inserting an explicit padding field instead of overwriting the results of the width computation it does. 4. For the same reason (tell the truth to the compiler), set the type of the overflow field when we're trying to pretend its not a pointer (in this case the runtime maintains a list of the overflow blocks elsewhere). 5. Make the runtime use "last in the struct" as its location algorithm. This fixes TestTraceStress on nacl/amd64p32. The 'bad map state' and 'invalid free list' failures no longer occur. Fixes #11838. Change-Id: If918887f8f252d988db0a35159944d2b36512f92 Reviewed-on: https://go-review.googlesource.com/12971 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-07-30 22:05:51 -04:00
base += overflowPad / ptrSize
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
word := base
mask[word/8] |= 1 << (word % 8)
gcdata = &mask[0]
ptrdata = (word + 1) * ptrSize
cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl On most systems, a pointer is the worst case alignment, so adding a pointer field at the end of a struct guarantees there will be no padding added after that field (to satisfy overall struct alignment due to some more-aligned field also present). In the runtime, the map implementation needs a quick way to get to the overflow pointer, which is last in the bucket struct, so it uses size - sizeof(pointer) as the offset. NaCl/amd64p32 is the exception, as always. The worst case alignment is 64 bits but pointers are 32 bits. There's a long history that is not worth going into, but when we moved the overflow pointer to the end of the struct, we didn't get the padding computation right. The compiler computed the regular struct size and then on amd64p32 added another 32-bit field. And the runtime assumed it could step back two 32-bit fields (one 64-bit register size) to get to the overflow pointer. But in fact if the struct needed 64-bit alignment, the computation of the regular struct size would have added a 32-bit pad already, and then the code unconditionally added a second 32-bit pad. This placed the overflow pointer three words from the end, not two. The last two were padding, and since the runtime was consistent about using the second-to-last word as the overflow pointer, no harm done in the sense of overwriting useful memory. But writing the overflow pointer to a non-pointer word of memory means that the GC can't see the overflow blocks, so it will collect them prematurely. Then bad things happen. Correct all this in a few steps: 1. Add an explicit check at the end of the bucket layout in the compiler that the overflow field is last in the struct, never followed by padding. 2. When padding is needed on nacl (not always, just when needed), insert it before the overflow pointer, to preserve the "last in the struct" property. 3. Let the compiler have the final word on the width of the struct, by inserting an explicit padding field instead of overwriting the results of the width computation it does. 4. For the same reason (tell the truth to the compiler), set the type of the overflow field when we're trying to pretend its not a pointer (in this case the runtime maintains a list of the overflow blocks elsewhere). 5. Make the runtime use "last in the struct" as its location algorithm. This fixes TestTraceStress on nacl/amd64p32. The 'bad map state' and 'invalid free list' failures no longer occur. Fixes #11838. Change-Id: If918887f8f252d988db0a35159944d2b36512f92 Reviewed-on: https://go-review.googlesource.com/12971 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-07-30 22:05:51 -04:00
// overflow word must be last
if ptrdata != size {
panic("reflect: bad layout computation in MapOf")
}
}
b := &rtype{
align: ptrSize,
size: size,
kind: uint8(Struct),
ptrdata: ptrdata,
gcdata: gcdata,
}
cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl On most systems, a pointer is the worst case alignment, so adding a pointer field at the end of a struct guarantees there will be no padding added after that field (to satisfy overall struct alignment due to some more-aligned field also present). In the runtime, the map implementation needs a quick way to get to the overflow pointer, which is last in the bucket struct, so it uses size - sizeof(pointer) as the offset. NaCl/amd64p32 is the exception, as always. The worst case alignment is 64 bits but pointers are 32 bits. There's a long history that is not worth going into, but when we moved the overflow pointer to the end of the struct, we didn't get the padding computation right. The compiler computed the regular struct size and then on amd64p32 added another 32-bit field. And the runtime assumed it could step back two 32-bit fields (one 64-bit register size) to get to the overflow pointer. But in fact if the struct needed 64-bit alignment, the computation of the regular struct size would have added a 32-bit pad already, and then the code unconditionally added a second 32-bit pad. This placed the overflow pointer three words from the end, not two. The last two were padding, and since the runtime was consistent about using the second-to-last word as the overflow pointer, no harm done in the sense of overwriting useful memory. But writing the overflow pointer to a non-pointer word of memory means that the GC can't see the overflow blocks, so it will collect them prematurely. Then bad things happen. Correct all this in a few steps: 1. Add an explicit check at the end of the bucket layout in the compiler that the overflow field is last in the struct, never followed by padding. 2. When padding is needed on nacl (not always, just when needed), insert it before the overflow pointer, to preserve the "last in the struct" property. 3. Let the compiler have the final word on the width of the struct, by inserting an explicit padding field instead of overwriting the results of the width computation it does. 4. For the same reason (tell the truth to the compiler), set the type of the overflow field when we're trying to pretend its not a pointer (in this case the runtime maintains a list of the overflow blocks elsewhere). 5. Make the runtime use "last in the struct" as its location algorithm. This fixes TestTraceStress on nacl/amd64p32. The 'bad map state' and 'invalid free list' failures no longer occur. Fixes #11838. Change-Id: If918887f8f252d988db0a35159944d2b36512f92 Reviewed-on: https://go-review.googlesource.com/12971 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-07-30 22:05:51 -04:00
if overflowPad > 0 {
b.align = 8
}
s := "bucket(" + ktyp.String() + "," + etyp.String() + ")"
b.str = resolveReflectName(newName(s, "", false))
return b
}
// SliceOf returns the slice type with element type t.
// For example, if t represents int, SliceOf(t) represents []int.
func SliceOf(t Type) Type {
typ := t.(*rtype)
// Look in cache.
ckey := cacheKey{Slice, typ, nil, 0}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if slice, ok := lookupCache.Load(ckey); ok {
return slice.(Type)
}
// Look in known types.
s := "[]" + typ.String()
for _, tt := range typesByString(s) {
slice := (*sliceType)(unsafe.Pointer(tt))
if slice.elem == typ {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, tt)
return ti.(Type)
}
}
// Make a slice type.
var islice interface{} = ([]unsafe.Pointer)(nil)
prototype := *(**sliceType)(unsafe.Pointer(&islice))
slice := *prototype
slice.tflag = 0
slice.str = resolveReflectName(newName(s, "", false))
slice.hash = fnv1(typ.hash, '[')
slice.elem = typ
slice.ptrToThis = 0
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, &slice.rtype)
return ti.(Type)
}
// The structLookupCache caches StructOf lookups.
// StructOf does not share the common lookupCache since we need to pin
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
// the memory associated with *structTypeFixedN.
var structLookupCache struct {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
sync.Mutex // Guards stores (but not loads) on m.
// m is a map[uint32][]Type keyed by the hash calculated in StructOf.
// Elements in m are append-only and thus safe for concurrent reading.
m sync.Map
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
}
type structTypeUncommon struct {
structType
u uncommonType
}
// isLetter reports whether a given 'rune' is classified as a Letter.
func isLetter(ch rune) bool {
return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_' || ch >= utf8.RuneSelf && unicode.IsLetter(ch)
}
// isValidFieldName checks if a string is a valid (struct) field name or not.
//
// According to the language spec, a field name should be an identifier.
//
// identifier = letter { letter | unicode_digit } .
// letter = unicode_letter | "_" .
func isValidFieldName(fieldName string) bool {
for i, c := range fieldName {
if i == 0 && !isLetter(c) {
return false
}
if !(isLetter(c) || unicode.IsDigit(c)) {
return false
}
}
return len(fieldName) > 0
}
// StructOf returns the struct type containing fields.
// The Offset and Index fields are ignored and computed as they would be
// by the compiler.
//
// StructOf currently does not generate wrapper methods for embedded
// fields and panics if passed unexported StructFields.
// These limitations may be lifted in a future version.
func StructOf(fields []StructField) Type {
var (
hash = fnv1(0, []byte("struct {")...)
size uintptr
typalign uint8
comparable = true
hashable = true
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
methods []method
fs = make([]structField, len(fields))
repr = make([]byte, 0, 64)
fset = map[string]struct{}{} // fields' names
hasGCProg = false // records whether a struct-field type has a GCProg
)
lastzero := uintptr(0)
repr = append(repr, "struct {"...)
for i, field := range fields {
if field.Name == "" {
panic("reflect.StructOf: field " + strconv.Itoa(i) + " has no name")
}
if !isValidFieldName(field.Name) {
panic("reflect.StructOf: field " + strconv.Itoa(i) + " has invalid name")
}
if field.Type == nil {
panic("reflect.StructOf: field " + strconv.Itoa(i) + " has no type")
}
f := runtimeStructField(field)
ft := f.typ
if ft.kind&kindGCProg != 0 {
hasGCProg = true
}
// Update string and hash
name := f.name.name()
hash = fnv1(hash, []byte(name)...)
repr = append(repr, (" " + name)...)
if f.embedded() {
// Embedded field
if f.typ.Kind() == Ptr {
// Embedded ** and *interface{} are illegal
elem := ft.Elem()
if k := elem.Kind(); k == Ptr || k == Interface {
panic("reflect.StructOf: illegal embedded field type " + ft.String())
}
}
switch f.typ.Kind() {
case Interface:
ift := (*interfaceType)(unsafe.Pointer(ft))
for im, m := range ift.methods {
if ift.nameOff(m.name).pkgPath() != "" {
// TODO(sbinet). Issue 15924.
panic("reflect: embedded interface with unexported method(s) not implemented")
}
var (
mtyp = ift.typeOff(m.typ)
ifield = i
imethod = im
ifn Value
tfn Value
)
if ft.kind&kindDirectIface != 0 {
tfn = MakeFunc(mtyp, func(in []Value) []Value {
var args []Value
var recv = in[0]
if len(in) > 1 {
args = in[1:]
}
return recv.Field(ifield).Method(imethod).Call(args)
})
ifn = MakeFunc(mtyp, func(in []Value) []Value {
var args []Value
var recv = in[0]
if len(in) > 1 {
args = in[1:]
}
return recv.Field(ifield).Method(imethod).Call(args)
})
} else {
tfn = MakeFunc(mtyp, func(in []Value) []Value {
var args []Value
var recv = in[0]
if len(in) > 1 {
args = in[1:]
}
return recv.Field(ifield).Method(imethod).Call(args)
})
ifn = MakeFunc(mtyp, func(in []Value) []Value {
var args []Value
var recv = Indirect(in[0])
if len(in) > 1 {
args = in[1:]
}
return recv.Field(ifield).Method(imethod).Call(args)
})
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
methods = append(methods, method{
name: resolveReflectName(ift.nameOff(m.name)),
mtyp: resolveReflectType(mtyp),
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
ifn: resolveReflectText(unsafe.Pointer(&ifn)),
tfn: resolveReflectText(unsafe.Pointer(&tfn)),
})
}
case Ptr:
ptr := (*ptrType)(unsafe.Pointer(ft))
if unt := ptr.uncommon(); unt != nil {
if i > 0 && unt.mcount > 0 {
// Issue 15924.
panic("reflect: embedded type with methods not implemented if type is not first field")
}
if len(fields) > 1 {
panic("reflect: embedded type with methods not implemented if there is more than one field")
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
for _, m := range unt.methods() {
mname := ptr.nameOff(m.name)
if mname.pkgPath() != "" {
// TODO(sbinet).
// Issue 15924.
panic("reflect: embedded interface with unexported method(s) not implemented")
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
methods = append(methods, method{
name: resolveReflectName(mname),
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
mtyp: resolveReflectType(ptr.typeOff(m.mtyp)),
ifn: resolveReflectText(ptr.textOff(m.ifn)),
tfn: resolveReflectText(ptr.textOff(m.tfn)),
})
}
}
if unt := ptr.elem.uncommon(); unt != nil {
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
for _, m := range unt.methods() {
mname := ptr.nameOff(m.name)
if mname.pkgPath() != "" {
// TODO(sbinet)
// Issue 15924.
panic("reflect: embedded interface with unexported method(s) not implemented")
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
methods = append(methods, method{
name: resolveReflectName(mname),
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
mtyp: resolveReflectType(ptr.elem.typeOff(m.mtyp)),
ifn: resolveReflectText(ptr.elem.textOff(m.ifn)),
tfn: resolveReflectText(ptr.elem.textOff(m.tfn)),
})
}
}
default:
if unt := ft.uncommon(); unt != nil {
if i > 0 && unt.mcount > 0 {
// Issue 15924.
panic("reflect: embedded type with methods not implemented if type is not first field")
}
if len(fields) > 1 && ft.kind&kindDirectIface != 0 {
panic("reflect: embedded type with methods not implemented for non-pointer type")
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
for _, m := range unt.methods() {
mname := ft.nameOff(m.name)
if mname.pkgPath() != "" {
// TODO(sbinet)
// Issue 15924.
panic("reflect: embedded interface with unexported method(s) not implemented")
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
methods = append(methods, method{
name: resolveReflectName(mname),
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
mtyp: resolveReflectType(ft.typeOff(m.mtyp)),
ifn: resolveReflectText(ft.textOff(m.ifn)),
tfn: resolveReflectText(ft.textOff(m.tfn)),
})
}
}
}
}
if _, dup := fset[name]; dup {
panic("reflect.StructOf: duplicate field " + name)
}
fset[name] = struct{}{}
hash = fnv1(hash, byte(ft.hash>>24), byte(ft.hash>>16), byte(ft.hash>>8), byte(ft.hash))
repr = append(repr, (" " + ft.String())...)
if f.name.tagLen() > 0 {
hash = fnv1(hash, []byte(f.name.tag())...)
repr = append(repr, (" " + strconv.Quote(f.name.tag()))...)
}
if i < len(fields)-1 {
repr = append(repr, ';')
}
comparable = comparable && (ft.alg.equal != nil)
hashable = hashable && (ft.alg.hash != nil)
offset := align(size, uintptr(ft.align))
if ft.align > typalign {
typalign = ft.align
}
size = offset + ft.size
f.offsetEmbed |= offset << 1
if ft.size == 0 {
lastzero = size
}
fs[i] = f
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
if size > 0 && lastzero == size {
// This is a non-zero sized struct that ends in a
// zero-sized field. We add an extra byte of padding,
// to ensure that taking the address of the final
// zero-sized field can't manufacture a pointer to the
// next object in the heap. See issue 9401.
size++
}
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
var typ *structType
var ut *uncommonType
if len(methods) == 0 {
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
t := new(structTypeUncommon)
typ = &t.structType
ut = &t.u
} else {
// A *rtype representing a struct is followed directly in memory by an
// array of method objects representing the methods attached to the
// struct. To get the same layout for a run time generated type, we
// need an array directly following the uncommonType memory.
// A similar strategy is used for funcTypeFixed4, ...funcTypeFixedN.
tt := New(StructOf([]StructField{
{Name: "S", Type: TypeOf(structType{})},
{Name: "U", Type: TypeOf(uncommonType{})},
{Name: "M", Type: ArrayOf(len(methods), TypeOf(methods[0]))},
}))
typ = (*structType)(unsafe.Pointer(tt.Elem().Field(0).UnsafeAddr()))
ut = (*uncommonType)(unsafe.Pointer(tt.Elem().Field(1).UnsafeAddr()))
copy(tt.Elem().Field(2).Slice(0, len(methods)).Interface().([]method), methods)
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
}
// TODO(sbinet): Once we allow embedding multiple types,
// methods will need to be sorted like the compiler does.
// TODO(sbinet): Once we allow non-exported methods, we will
// need to compute xcount as the number of exported methods.
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
ut.mcount = uint16(len(methods))
ut.xcount = ut.mcount
ut.moff = uint32(unsafe.Sizeof(uncommonType{}))
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
if len(fs) > 0 {
repr = append(repr, ' ')
}
repr = append(repr, '}')
hash = fnv1(hash, '}')
str := string(repr)
// Round the size up to be a multiple of the alignment.
size = align(size, uintptr(typalign))
// Make the struct type.
var istruct interface{} = struct{}{}
prototype := *(**structType)(unsafe.Pointer(&istruct))
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
*typ = *prototype
typ.fields = fs
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
// Look in cache.
if ts, ok := structLookupCache.m.Load(hash); ok {
for _, st := range ts.([]Type) {
t := st.common()
if haveIdenticalUnderlyingType(&typ.rtype, t, true) {
return t
}
}
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
// Not in cache, lock and retry.
structLookupCache.Lock()
defer structLookupCache.Unlock()
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if ts, ok := structLookupCache.m.Load(hash); ok {
for _, st := range ts.([]Type) {
t := st.common()
if haveIdenticalUnderlyingType(&typ.rtype, t, true) {
return t
}
}
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
addToCache := func(t Type) Type {
var ts []Type
if ti, ok := structLookupCache.m.Load(hash); ok {
ts = ti.([]Type)
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
structLookupCache.m.Store(hash, append(ts, t))
return t
}
// Look in known types.
for _, t := range typesByString(str) {
if haveIdenticalUnderlyingType(&typ.rtype, t, true) {
// even if 't' wasn't a structType with methods, we should be ok
// as the 'u uncommonType' field won't be accessed except when
// tflag&tflagUncommon is set.
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
return addToCache(t)
}
}
typ.str = resolveReflectName(newName(str, "", false))
typ.tflag = 0
typ.hash = hash
typ.size = size
typ.ptrdata = typeptrdata(typ.common())
typ.align = typalign
typ.fieldAlign = typalign
typ.ptrToThis = 0
cmd/compile, etc: store method tables as offsets This CL introduces the typeOff type and a lookup method of the same name that can turn a typeOff offset into an *rtype. In a typical Go binary (built with buildmode=exe, pie, c-archive, or c-shared), there is one moduledata and all typeOff values are offsets relative to firstmoduledata.types. This makes computing the pointer cheap in typical programs. With buildmode=shared (and one day, buildmode=plugin) there are multiple modules whose relative offset is determined at runtime. We identify a type in the general case by the pair of the original *rtype that references it and its typeOff value. We determine the module from the original pointer, and then use the typeOff from there to compute the final *rtype. To ensure there is only one *rtype representing each type, the runtime initializes a typemap for each module, using any identical type from an earlier module when resolving that offset. This means that types computed from an offset match the type mapped by the pointer dynamic relocations. A series of followup CLs will replace other *rtype values with typeOff (and name/*string with nameOff). For types created at runtime by reflect, type offsets are treated as global IDs and reference into a reflect offset map kept by the runtime. darwin/amd64: cmd/go: -57KB (0.6%) jujud: -557KB (0.8%) linux/amd64 PIE: cmd/go: -361KB (3.0%) jujud: -3.5MB (4.2%) For #6853. Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96 Reviewed-on: https://go-review.googlesource.com/21285 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 10:32:27 -04:00
if len(methods) > 0 {
typ.tflag |= tflagUncommon
}
if hasGCProg {
lastPtrField := 0
for i, ft := range fs {
if ft.typ.pointers() {
lastPtrField = i
}
}
prog := []byte{0, 0, 0, 0} // will be length of prog
var off uintptr
for i, ft := range fs {
if i > lastPtrField {
// gcprog should not include anything for any field after
// the last field that contains pointer data
break
}
if !ft.typ.pointers() {
// Ignore pointerless fields.
continue
}
// Pad to start of this field with zeros.
if ft.offset() > off {
n := (ft.offset() - off) / ptrSize
prog = append(prog, 0x01, 0x00) // emit a 0 bit
if n > 1 {
prog = append(prog, 0x81) // repeat previous bit
prog = appendVarint(prog, n-1) // n-1 times
}
off = ft.offset()
}
elemGC := (*[1 << 30]byte)(unsafe.Pointer(ft.typ.gcdata))[:]
elemPtrs := ft.typ.ptrdata / ptrSize
if ft.typ.kind&kindGCProg == 0 {
// Element is small with pointer mask; use as literal bits.
mask := elemGC
// Emit 120-bit chunks of full bytes (max is 127 but we avoid using partial bytes).
var n uintptr
for n = elemPtrs; n > 120; n -= 120 {
prog = append(prog, 120)
prog = append(prog, mask[:15]...)
mask = mask[15:]
}
prog = append(prog, byte(n))
prog = append(prog, mask[:(n+7)/8]...)
} else {
// Element has GC program; emit one element.
elemProg := elemGC[4 : 4+*(*uint32)(unsafe.Pointer(&elemGC[0]))-1]
prog = append(prog, elemProg...)
}
off += ft.typ.ptrdata
}
prog = append(prog, 0)
*(*uint32)(unsafe.Pointer(&prog[0])) = uint32(len(prog) - 4)
typ.kind |= kindGCProg
typ.gcdata = &prog[0]
} else {
typ.kind &^= kindGCProg
bv := new(bitVector)
addTypeBits(bv, 0, typ.common())
if len(bv.data) > 0 {
typ.gcdata = &bv.data[0]
}
}
typ.alg = new(typeAlg)
if hashable {
typ.alg.hash = func(p unsafe.Pointer, seed uintptr) uintptr {
o := seed
for _, ft := range typ.fields {
pi := add(p, ft.offset(), "&x.field safe")
o = ft.typ.alg.hash(pi, o)
}
return o
}
}
if comparable {
typ.alg.equal = func(p, q unsafe.Pointer) bool {
for _, ft := range typ.fields {
pi := add(p, ft.offset(), "&x.field safe")
qi := add(q, ft.offset(), "&x.field safe")
if !ft.typ.alg.equal(pi, qi) {
return false
}
}
return true
}
}
switch {
case len(fs) == 1 && !ifaceIndir(fs[0].typ):
// structs of 1 direct iface type can be direct
typ.kind |= kindDirectIface
default:
typ.kind &^= kindDirectIface
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
return addToCache(&typ.rtype)
}
func runtimeStructField(field StructField) structField {
if field.PkgPath != "" {
panic("reflect.StructOf: StructOf does not allow unexported fields")
}
// Best-effort check for misuse.
// Since PkgPath is empty, not much harm done if Unicode lowercase slips through.
c := field.Name[0]
if 'a' <= c && c <= 'z' || c == '_' {
panic("reflect.StructOf: field \"" + field.Name + "\" is unexported but missing PkgPath")
}
offsetEmbed := uintptr(0)
if field.Anonymous {
offsetEmbed |= 1
}
resolveReflectType(field.Type.common()) // install in runtime
return structField{
name: newName(field.Name, string(field.Tag), true),
typ: field.Type.common(),
offsetEmbed: offsetEmbed,
}
}
// typeptrdata returns the length in bytes of the prefix of t
// containing pointer data. Anything after this offset is scalar data.
// keep in sync with ../cmd/compile/internal/gc/reflect.go
func typeptrdata(t *rtype) uintptr {
switch t.Kind() {
case Struct:
st := (*structType)(unsafe.Pointer(t))
// find the last field that has pointers.
field := -1
for i := range st.fields {
ft := st.fields[i].typ
if ft.pointers() {
field = i
}
}
if field == -1 {
return 0
}
f := st.fields[field]
return f.offset() + f.typ.ptrdata
default:
panic("reflect.typeptrdata: unexpected type, " + t.String())
}
}
// See cmd/compile/internal/gc/reflect.go for derivation of constant.
const maxPtrmaskBytes = 2048
// ArrayOf returns the array type with the given count and element type.
// For example, if t represents int, ArrayOf(5, t) represents [5]int.
//
// If the resulting type would be larger than the available address space,
// ArrayOf panics.
func ArrayOf(count int, elem Type) Type {
typ := elem.(*rtype)
// Look in cache.
ckey := cacheKey{Array, typ, nil, uintptr(count)}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if array, ok := lookupCache.Load(ckey); ok {
return array.(Type)
}
// Look in known types.
s := "[" + strconv.Itoa(count) + "]" + typ.String()
for _, tt := range typesByString(s) {
array := (*arrayType)(unsafe.Pointer(tt))
if array.elem == typ {
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, tt)
return ti.(Type)
}
}
// Make an array type.
var iarray interface{} = [1]unsafe.Pointer{}
prototype := *(**arrayType)(unsafe.Pointer(&iarray))
array := *prototype
array.tflag = 0
array.str = resolveReflectName(newName(s, "", false))
array.hash = fnv1(typ.hash, '[')
for n := uint32(count); n > 0; n >>= 8 {
array.hash = fnv1(array.hash, byte(n))
}
array.hash = fnv1(array.hash, ']')
array.elem = typ
array.ptrToThis = 0
if typ.size > 0 {
max := ^uintptr(0) / typ.size
if uintptr(count) > max {
panic("reflect.ArrayOf: array size would exceed virtual address space")
}
}
array.size = typ.size * uintptr(count)
if count > 0 && typ.ptrdata != 0 {
array.ptrdata = typ.size*uintptr(count-1) + typ.ptrdata
}
array.align = typ.align
array.fieldAlign = typ.fieldAlign
array.len = uintptr(count)
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
array.slice = SliceOf(elem).(*rtype)
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
switch {
case typ.ptrdata == 0 || array.size == 0:
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
// No pointers.
array.gcdata = nil
array.ptrdata = 0
case count == 1:
// In memory, 1-element array looks just like the element.
array.kind |= typ.kind & kindGCProg
array.gcdata = typ.gcdata
array.ptrdata = typ.ptrdata
case typ.kind&kindGCProg == 0 && array.size <= maxPtrmaskBytes*8*ptrSize:
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
// Element is small with pointer mask; array is still small.
// Create direct pointer mask by turning each 1 bit in elem
// into count 1 bits in larger mask.
mask := make([]byte, (array.ptrdata/ptrSize+7)/8)
elemMask := (*[1 << 30]byte)(unsafe.Pointer(typ.gcdata))[:]
elemWords := typ.size / ptrSize
for j := uintptr(0); j < typ.ptrdata/ptrSize; j++ {
if (elemMask[j/8]>>(j%8))&1 != 0 {
for i := uintptr(0); i < array.len; i++ {
k := i*elemWords + j
mask[k/8] |= 1 << (k % 8)
}
}
}
array.gcdata = &mask[0]
default:
// Create program that emits one element
// and then repeats to make the array.
prog := []byte{0, 0, 0, 0} // will be length of prog
elemGC := (*[1 << 30]byte)(unsafe.Pointer(typ.gcdata))[:]
elemPtrs := typ.ptrdata / ptrSize
if typ.kind&kindGCProg == 0 {
// Element is small with pointer mask; use as literal bits.
mask := elemGC
// Emit 120-bit chunks of full bytes (max is 127 but we avoid using partial bytes).
var n uintptr
for n = elemPtrs; n > 120; n -= 120 {
prog = append(prog, 120)
prog = append(prog, mask[:15]...)
mask = mask[15:]
}
prog = append(prog, byte(n))
prog = append(prog, mask[:(n+7)/8]...)
} else {
// Element has GC program; emit one element.
elemProg := elemGC[4 : 4+*(*uint32)(unsafe.Pointer(&elemGC[0]))-1]
prog = append(prog, elemProg...)
}
// Pad from ptrdata to size.
elemWords := typ.size / ptrSize
if elemPtrs < elemWords {
// Emit literal 0 bit, then repeat as needed.
prog = append(prog, 0x01, 0x00)
if elemPtrs+1 < elemWords {
prog = append(prog, 0x81)
prog = appendVarint(prog, elemWords-elemPtrs-1)
}
}
// Repeat count-1 times.
if elemWords < 0x80 {
prog = append(prog, byte(elemWords|0x80))
} else {
prog = append(prog, 0x80)
prog = appendVarint(prog, elemWords)
}
prog = appendVarint(prog, uintptr(count)-1)
prog = append(prog, 0)
*(*uint32)(unsafe.Pointer(&prog[0])) = uint32(len(prog) - 4)
array.kind |= kindGCProg
array.gcdata = &prog[0]
array.ptrdata = array.size // overestimate but ok; must match program
}
etyp := typ.common()
esize := etyp.Size()
ealg := etyp.alg
array.alg = new(typeAlg)
if ealg.equal != nil {
eequal := ealg.equal
array.alg.equal = func(p, q unsafe.Pointer) bool {
for i := 0; i < count; i++ {
pi := arrayAt(p, i, esize, "i < count")
qi := arrayAt(q, i, esize, "i < count")
if !eequal(pi, qi) {
return false
}
}
return true
}
}
if ealg.hash != nil {
ehash := ealg.hash
array.alg.hash = func(ptr unsafe.Pointer, seed uintptr) uintptr {
o := seed
for i := 0; i < count; i++ {
o = ehash(arrayAt(ptr, i, esize, "i < count"), o)
}
return o
}
}
switch {
case count == 1 && !ifaceIndir(typ):
// array of 1 direct iface type can be direct
array.kind |= kindDirectIface
default:
array.kind &^= kindDirectIface
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
ti, _ := lookupCache.LoadOrStore(ckey, &array.rtype)
return ti.(Type)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
func appendVarint(x []byte, v uintptr) []byte {
for ; v >= 0x80; v >>= 7 {
x = append(x, byte(v|0x80))
}
x = append(x, byte(v))
return x
}
// toType converts from a *rtype to a Type that can be returned
// to the client of package reflect. In gc, the only concern is that
// a nil *rtype must be replaced by a nil Type, but in gccgo this
// function takes care of ensuring that multiple *rtype for the same
// type are coalesced into a single Type.
func toType(t *rtype) Type {
if t == nil {
return nil
}
return t
}
type layoutKey struct {
reflect: ensure correct scanning of return values During a call to a reflect-generated function or method (via makeFuncStub or methodValueCall), when should we scan the return values? When we're starting a reflect call, the space on the stack for the return values is not initialized yet, as it contains whatever junk was on the stack of the caller at the time. The return space must not be scanned during a GC. When we're finishing a reflect call, the return values are initialized, and must be scanned during a GC to make sure that any pointers in the return values are found and their referents retained. When the GC stack walk comes across a reflect call in progress on the stack, it needs to know whether to scan the results or not. It doesn't know the progress of the reflect call, so it can't decide by itself. The reflect package needs to tell it. This CL adds another slot in the frame of makeFuncStub and methodValueCall so we can put a boolean in there which tells the runtime whether to scan the results or not. This CL also adds the args length to reflectMethodValue so the runtime can restrict its scanning to only the args section (not the results) if the reflect package says the results aren't ready yet. Do a delicate dance in the reflect package to set the "results are valid" bit. We need to make sure we set the bit only after we've copied the results back to the stack. But we must set the bit before we drop reflect's copy of the results. Otherwise, we might have a state where (temporarily) no one has a live copy of the results. That's the state we were observing in issue #27695 before this CL. The bitmap used by the runtime currently contains only the args. (Actually, it contains all the bits, but the size is set so we use only the args portion.) This is safe for early in a reflect call, but unsafe late in a reflect call. The test issue27695.go demonstrates this unsafety. We change the bitmap to always include both args and results, and decide at runtime which portion to use. issue27695.go only has a test for method calls. Function calls were ok because there wasn't a safepoint between when reflect dropped its copy of the return values and when the caller is resumed. This may change when we introduce safepoints everywhere. This truncate-to-only-the-args was part of CL 9888 (in 2015). That part of the CL fixed the problem demonstrated in issue27695b.go but introduced the problem demonstrated in issue27695.go. TODO, in another CL: simplify FuncLayout and its test. stack return value is now identical to frametype.ptrdata + frametype.gcdata. Fixes #27695 Change-Id: I2d49b34e34a82c6328b34f02610587a291b25c5f Reviewed-on: https://go-review.googlesource.com/137440 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-09-25 15:54:11 -07:00
ftyp *funcType // function signature
rcvr *rtype // receiver type, or nil if none
}
type layoutType struct {
t *rtype
argSize uintptr // size of arguments
retOffset uintptr // offset of return values.
stack *bitVector
framePool *sync.Pool
}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
var layoutCache sync.Map // map[layoutKey]layoutType
// funcLayout computes a struct type representing the layout of the
// function arguments and return values for the function type t.
// If rcvr != nil, rcvr specifies the type of the receiver.
// The returned type exists only for GC, so we only fill out GC relevant info.
// Currently, that's just size and the GC program. We also fill in
// the name for possible debugging use.
reflect: ensure correct scanning of return values During a call to a reflect-generated function or method (via makeFuncStub or methodValueCall), when should we scan the return values? When we're starting a reflect call, the space on the stack for the return values is not initialized yet, as it contains whatever junk was on the stack of the caller at the time. The return space must not be scanned during a GC. When we're finishing a reflect call, the return values are initialized, and must be scanned during a GC to make sure that any pointers in the return values are found and their referents retained. When the GC stack walk comes across a reflect call in progress on the stack, it needs to know whether to scan the results or not. It doesn't know the progress of the reflect call, so it can't decide by itself. The reflect package needs to tell it. This CL adds another slot in the frame of makeFuncStub and methodValueCall so we can put a boolean in there which tells the runtime whether to scan the results or not. This CL also adds the args length to reflectMethodValue so the runtime can restrict its scanning to only the args section (not the results) if the reflect package says the results aren't ready yet. Do a delicate dance in the reflect package to set the "results are valid" bit. We need to make sure we set the bit only after we've copied the results back to the stack. But we must set the bit before we drop reflect's copy of the results. Otherwise, we might have a state where (temporarily) no one has a live copy of the results. That's the state we were observing in issue #27695 before this CL. The bitmap used by the runtime currently contains only the args. (Actually, it contains all the bits, but the size is set so we use only the args portion.) This is safe for early in a reflect call, but unsafe late in a reflect call. The test issue27695.go demonstrates this unsafety. We change the bitmap to always include both args and results, and decide at runtime which portion to use. issue27695.go only has a test for method calls. Function calls were ok because there wasn't a safepoint between when reflect dropped its copy of the return values and when the caller is resumed. This may change when we introduce safepoints everywhere. This truncate-to-only-the-args was part of CL 9888 (in 2015). That part of the CL fixed the problem demonstrated in issue27695b.go but introduced the problem demonstrated in issue27695.go. TODO, in another CL: simplify FuncLayout and its test. stack return value is now identical to frametype.ptrdata + frametype.gcdata. Fixes #27695 Change-Id: I2d49b34e34a82c6328b34f02610587a291b25c5f Reviewed-on: https://go-review.googlesource.com/137440 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-09-25 15:54:11 -07:00
func funcLayout(t *funcType, rcvr *rtype) (frametype *rtype, argSize, retOffset uintptr, stk *bitVector, framePool *sync.Pool) {
if t.Kind() != Func {
panic("reflect: funcLayout of non-func type")
}
if rcvr != nil && rcvr.Kind() == Interface {
panic("reflect: funcLayout with interface receiver " + rcvr.String())
}
k := layoutKey{t, rcvr}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
if lti, ok := layoutCache.Load(k); ok {
lt := lti.(layoutType)
return lt.t, lt.argSize, lt.retOffset, lt.stack, lt.framePool
}
// compute gc program & stack bitmap for arguments
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
ptrmap := new(bitVector)
var offset uintptr
if rcvr != nil {
// Reflect uses the "interface" calling convention for
// methods, where receivers take one word of argument
// space no matter how big they actually are.
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if ifaceIndir(rcvr) || rcvr.pointers() {
ptrmap.append(1)
} else {
ptrmap.append(0)
}
offset += ptrSize
}
reflect: ensure correct scanning of return values During a call to a reflect-generated function or method (via makeFuncStub or methodValueCall), when should we scan the return values? When we're starting a reflect call, the space on the stack for the return values is not initialized yet, as it contains whatever junk was on the stack of the caller at the time. The return space must not be scanned during a GC. When we're finishing a reflect call, the return values are initialized, and must be scanned during a GC to make sure that any pointers in the return values are found and their referents retained. When the GC stack walk comes across a reflect call in progress on the stack, it needs to know whether to scan the results or not. It doesn't know the progress of the reflect call, so it can't decide by itself. The reflect package needs to tell it. This CL adds another slot in the frame of makeFuncStub and methodValueCall so we can put a boolean in there which tells the runtime whether to scan the results or not. This CL also adds the args length to reflectMethodValue so the runtime can restrict its scanning to only the args section (not the results) if the reflect package says the results aren't ready yet. Do a delicate dance in the reflect package to set the "results are valid" bit. We need to make sure we set the bit only after we've copied the results back to the stack. But we must set the bit before we drop reflect's copy of the results. Otherwise, we might have a state where (temporarily) no one has a live copy of the results. That's the state we were observing in issue #27695 before this CL. The bitmap used by the runtime currently contains only the args. (Actually, it contains all the bits, but the size is set so we use only the args portion.) This is safe for early in a reflect call, but unsafe late in a reflect call. The test issue27695.go demonstrates this unsafety. We change the bitmap to always include both args and results, and decide at runtime which portion to use. issue27695.go only has a test for method calls. Function calls were ok because there wasn't a safepoint between when reflect dropped its copy of the return values and when the caller is resumed. This may change when we introduce safepoints everywhere. This truncate-to-only-the-args was part of CL 9888 (in 2015). That part of the CL fixed the problem demonstrated in issue27695b.go but introduced the problem demonstrated in issue27695.go. TODO, in another CL: simplify FuncLayout and its test. stack return value is now identical to frametype.ptrdata + frametype.gcdata. Fixes #27695 Change-Id: I2d49b34e34a82c6328b34f02610587a291b25c5f Reviewed-on: https://go-review.googlesource.com/137440 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-09-25 15:54:11 -07:00
for _, arg := range t.in() {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
offset += -offset & uintptr(arg.align-1)
addTypeBits(ptrmap, offset, arg)
offset += arg.size
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
argSize = offset
if runtime.GOARCH == "amd64p32" {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
offset += -offset & (8 - 1)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
offset += -offset & (ptrSize - 1)
retOffset = offset
reflect: ensure correct scanning of return values During a call to a reflect-generated function or method (via makeFuncStub or methodValueCall), when should we scan the return values? When we're starting a reflect call, the space on the stack for the return values is not initialized yet, as it contains whatever junk was on the stack of the caller at the time. The return space must not be scanned during a GC. When we're finishing a reflect call, the return values are initialized, and must be scanned during a GC to make sure that any pointers in the return values are found and their referents retained. When the GC stack walk comes across a reflect call in progress on the stack, it needs to know whether to scan the results or not. It doesn't know the progress of the reflect call, so it can't decide by itself. The reflect package needs to tell it. This CL adds another slot in the frame of makeFuncStub and methodValueCall so we can put a boolean in there which tells the runtime whether to scan the results or not. This CL also adds the args length to reflectMethodValue so the runtime can restrict its scanning to only the args section (not the results) if the reflect package says the results aren't ready yet. Do a delicate dance in the reflect package to set the "results are valid" bit. We need to make sure we set the bit only after we've copied the results back to the stack. But we must set the bit before we drop reflect's copy of the results. Otherwise, we might have a state where (temporarily) no one has a live copy of the results. That's the state we were observing in issue #27695 before this CL. The bitmap used by the runtime currently contains only the args. (Actually, it contains all the bits, but the size is set so we use only the args portion.) This is safe for early in a reflect call, but unsafe late in a reflect call. The test issue27695.go demonstrates this unsafety. We change the bitmap to always include both args and results, and decide at runtime which portion to use. issue27695.go only has a test for method calls. Function calls were ok because there wasn't a safepoint between when reflect dropped its copy of the return values and when the caller is resumed. This may change when we introduce safepoints everywhere. This truncate-to-only-the-args was part of CL 9888 (in 2015). That part of the CL fixed the problem demonstrated in issue27695b.go but introduced the problem demonstrated in issue27695.go. TODO, in another CL: simplify FuncLayout and its test. stack return value is now identical to frametype.ptrdata + frametype.gcdata. Fixes #27695 Change-Id: I2d49b34e34a82c6328b34f02610587a291b25c5f Reviewed-on: https://go-review.googlesource.com/137440 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-09-25 15:54:11 -07:00
for _, res := range t.out() {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
offset += -offset & uintptr(res.align-1)
addTypeBits(ptrmap, offset, res)
offset += res.size
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
offset += -offset & (ptrSize - 1)
// build dummy rtype holding gc program
x := &rtype{
align: ptrSize,
size: offset,
ptrdata: uintptr(ptrmap.n) * ptrSize,
}
cmd/compile, runtime: fix placement of map bucket overflow pointer on nacl On most systems, a pointer is the worst case alignment, so adding a pointer field at the end of a struct guarantees there will be no padding added after that field (to satisfy overall struct alignment due to some more-aligned field also present). In the runtime, the map implementation needs a quick way to get to the overflow pointer, which is last in the bucket struct, so it uses size - sizeof(pointer) as the offset. NaCl/amd64p32 is the exception, as always. The worst case alignment is 64 bits but pointers are 32 bits. There's a long history that is not worth going into, but when we moved the overflow pointer to the end of the struct, we didn't get the padding computation right. The compiler computed the regular struct size and then on amd64p32 added another 32-bit field. And the runtime assumed it could step back two 32-bit fields (one 64-bit register size) to get to the overflow pointer. But in fact if the struct needed 64-bit alignment, the computation of the regular struct size would have added a 32-bit pad already, and then the code unconditionally added a second 32-bit pad. This placed the overflow pointer three words from the end, not two. The last two were padding, and since the runtime was consistent about using the second-to-last word as the overflow pointer, no harm done in the sense of overwriting useful memory. But writing the overflow pointer to a non-pointer word of memory means that the GC can't see the overflow blocks, so it will collect them prematurely. Then bad things happen. Correct all this in a few steps: 1. Add an explicit check at the end of the bucket layout in the compiler that the overflow field is last in the struct, never followed by padding. 2. When padding is needed on nacl (not always, just when needed), insert it before the overflow pointer, to preserve the "last in the struct" property. 3. Let the compiler have the final word on the width of the struct, by inserting an explicit padding field instead of overwriting the results of the width computation it does. 4. For the same reason (tell the truth to the compiler), set the type of the overflow field when we're trying to pretend its not a pointer (in this case the runtime maintains a list of the overflow blocks elsewhere). 5. Make the runtime use "last in the struct" as its location algorithm. This fixes TestTraceStress on nacl/amd64p32. The 'bad map state' and 'invalid free list' failures no longer occur. Fixes #11838. Change-Id: If918887f8f252d988db0a35159944d2b36512f92 Reviewed-on: https://go-review.googlesource.com/12971 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2015-07-30 22:05:51 -04:00
if runtime.GOARCH == "amd64p32" {
x.align = 8
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
if ptrmap.n > 0 {
x.gcdata = &ptrmap.data[0]
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
var s string
if rcvr != nil {
s = "methodargs(" + rcvr.String() + ")(" + t.String() + ")"
} else {
s = "funcargs(" + t.String() + ")"
}
x.str = resolveReflectName(newName(s, "", false))
// cache result for future callers
framePool = &sync.Pool{New: func() interface{} {
return unsafe_New(x)
}}
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
lti, _ := layoutCache.LoadOrStore(k, layoutType{
t: x,
argSize: argSize,
retOffset: retOffset,
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
stack: ptrmap,
framePool: framePool,
reflect: use sync.Map instead of RWMutex for type caches This provides a significant speedup when using reflection-heavy code on many CPU cores, such as when marshaling or unmarshaling protocol buffers. updates #17973 updates #18177 name old time/op new time/op delta Call 239ns ±10% 245ns ± 7% ~ (p=0.562 n=10+9) Call-6 201ns ±38% 48ns ±29% -76.39% (p=0.000 n=10+9) Call-48 133ns ± 8% 12ns ± 2% -90.92% (p=0.000 n=10+8) CallArgCopy/size=128 169ns ±12% 197ns ± 2% +16.35% (p=0.000 n=10+7) CallArgCopy/size=128-6 142ns ± 9% 34ns ± 7% -76.10% (p=0.000 n=10+9) CallArgCopy/size=128-48 125ns ± 3% 9ns ± 7% -93.01% (p=0.000 n=8+8) CallArgCopy/size=256 177ns ± 8% 197ns ± 5% +11.24% (p=0.000 n=10+9) CallArgCopy/size=256-6 148ns ±11% 35ns ± 6% -76.23% (p=0.000 n=10+9) CallArgCopy/size=256-48 127ns ± 4% 9ns ± 9% -92.66% (p=0.000 n=10+9) CallArgCopy/size=1024 196ns ± 6% 228ns ± 7% +16.09% (p=0.000 n=10+9) CallArgCopy/size=1024-6 143ns ± 6% 42ns ± 5% -70.39% (p=0.000 n=8+8) CallArgCopy/size=1024-48 130ns ± 7% 10ns ± 1% -91.99% (p=0.000 n=10+8) CallArgCopy/size=4096 330ns ± 9% 351ns ± 5% +6.20% (p=0.004 n=10+9) CallArgCopy/size=4096-6 173ns ±14% 62ns ± 6% -63.83% (p=0.000 n=10+8) CallArgCopy/size=4096-48 141ns ± 6% 15ns ± 6% -89.59% (p=0.000 n=10+8) CallArgCopy/size=65536 7.71µs ±10% 7.74µs ±10% ~ (p=0.859 n=10+9) CallArgCopy/size=65536-6 1.33µs ± 4% 1.34µs ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 347ns ± 2% 344ns ± 2% ~ (p=0.202 n=10+9) PtrTo 30.2ns ±10% 41.3ns ±11% +36.97% (p=0.000 n=10+9) PtrTo-6 126ns ± 6% 7ns ±10% -94.47% (p=0.000 n=9+9) PtrTo-48 86.9ns ± 9% 1.7ns ± 9% -98.08% (p=0.000 n=10+9) FieldByName1 86.6ns ± 5% 87.3ns ± 7% ~ (p=0.737 n=10+9) FieldByName1-6 19.8ns ±10% 18.7ns ±10% ~ (p=0.073 n=9+9) FieldByName1-48 7.54ns ± 4% 7.74ns ± 5% +2.55% (p=0.023 n=9+9) FieldByName2 1.63µs ± 8% 1.70µs ± 4% +4.13% (p=0.020 n=9+9) FieldByName2-6 481ns ± 6% 490ns ±10% ~ (p=0.474 n=9+9) FieldByName2-48 723ns ± 3% 736ns ± 2% +1.76% (p=0.045 n=8+8) FieldByName3 10.5µs ± 7% 10.8µs ± 7% ~ (p=0.234 n=8+8) FieldByName3-6 2.78µs ± 3% 2.94µs ±10% +5.87% (p=0.031 n=9+9) FieldByName3-48 3.72µs ± 2% 3.91µs ± 5% +4.91% (p=0.003 n=9+9) InterfaceBig 10.8ns ± 5% 10.7ns ± 5% ~ (p=0.849 n=9+9) InterfaceBig-6 9.62ns ±81% 1.79ns ± 4% -81.38% (p=0.003 n=9+9) InterfaceBig-48 0.48ns ±34% 0.50ns ± 7% ~ (p=0.071 n=8+9) InterfaceSmall 10.7ns ± 5% 10.9ns ± 4% ~ (p=0.243 n=9+9) InterfaceSmall-6 1.85ns ± 5% 1.79ns ± 1% -2.97% (p=0.006 n=7+8) InterfaceSmall-48 0.49ns ±20% 0.48ns ± 5% ~ (p=0.740 n=7+9) New 28.2ns ±20% 26.6ns ± 3% ~ (p=0.617 n=9+9) New-6 4.69ns ± 4% 4.44ns ± 3% -5.33% (p=0.001 n=9+9) New-48 1.10ns ± 9% 1.08ns ± 6% ~ (p=0.285 n=9+8) name old alloc/op new alloc/op delta Call 0.00B 0.00B ~ (all equal) Call-6 0.00B 0.00B ~ (all equal) Call-48 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call 0.00 0.00 ~ (all equal) Call-6 0.00 0.00 ~ (all equal) Call-48 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128 757MB/s ±11% 649MB/s ± 1% -14.33% (p=0.000 n=10+7) CallArgCopy/size=128-6 901MB/s ± 9% 3781MB/s ± 7% +319.69% (p=0.000 n=10+9) CallArgCopy/size=128-48 1.02GB/s ± 2% 14.63GB/s ± 6% +1337.98% (p=0.000 n=8+8) CallArgCopy/size=256 1.45GB/s ± 9% 1.30GB/s ± 5% -10.17% (p=0.000 n=10+9) CallArgCopy/size=256-6 1.73GB/s ±11% 7.28GB/s ± 7% +320.76% (p=0.000 n=10+9) CallArgCopy/size=256-48 2.00GB/s ± 4% 27.46GB/s ± 9% +1270.85% (p=0.000 n=10+9) CallArgCopy/size=1024 5.21GB/s ± 6% 4.49GB/s ± 8% -13.74% (p=0.000 n=10+9) CallArgCopy/size=1024-6 7.18GB/s ± 7% 24.17GB/s ± 5% +236.64% (p=0.000 n=9+8) CallArgCopy/size=1024-48 7.87GB/s ± 7% 98.43GB/s ± 1% +1150.99% (p=0.000 n=10+8) CallArgCopy/size=4096 12.3GB/s ± 6% 11.7GB/s ± 5% -5.00% (p=0.008 n=9+9) CallArgCopy/size=4096-6 23.8GB/s ±16% 65.6GB/s ± 5% +175.02% (p=0.000 n=10+8) CallArgCopy/size=4096-48 29.0GB/s ± 7% 279.6GB/s ± 6% +862.87% (p=0.000 n=10+8) CallArgCopy/size=65536 8.52GB/s ±11% 8.49GB/s ± 9% ~ (p=0.842 n=10+9) CallArgCopy/size=65536-6 49.3GB/s ± 4% 49.0GB/s ± 6% ~ (p=0.720 n=10+9) CallArgCopy/size=65536-48 189GB/s ± 2% 190GB/s ± 2% ~ (p=0.211 n=10+9) https://perf.golang.org/search?q=upload:20170426.3 Change-Id: Iff68f18ef69defb7f30962e21736ac7685a48a27 Reviewed-on: https://go-review.googlesource.com/41871 Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-16 18:11:07 -05:00
})
lt := lti.(layoutType)
return lt.t, lt.argSize, lt.retOffset, lt.stack, lt.framePool
}
reflect: shorten value to 3 words scalar is no longer needed, now that interfaces always hold pointers. Comparing best of 5 with TurboBoost turned off, on a 2012 Retina MacBook Pro Core i5. Still not completely confident in these numbers, but the gob and template improvements seem real. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 3819892491 3803008185 -0.44% BenchmarkFannkuch11 3623876405 3611776426 -0.33% BenchmarkFmtFprintfEmpty 119 118 -0.84% BenchmarkFmtFprintfString 294 292 -0.68% BenchmarkFmtFprintfInt 310 304 -1.94% BenchmarkFmtFprintfIntInt 513 507 -1.17% BenchmarkFmtFprintfPrefixedInt 427 426 -0.23% BenchmarkFmtFprintfFloat 562 554 -1.42% BenchmarkFmtManyArgs 1873 1832 -2.19% BenchmarkGobDecode 15824504 14746565 -6.81% BenchmarkGobEncode 14347378 14208743 -0.97% BenchmarkGzip 537229271 537973492 +0.14% BenchmarkGunzip 134996775 135406149 +0.30% BenchmarkHTTPClientServer 119065 116937 -1.79% BenchmarkJSONEncode 29134359 28928099 -0.71% BenchmarkJSONDecode 106867289 105770161 -1.03% BenchmarkMandelbrot200 5798475 5791433 -0.12% BenchmarkGoParse 5299169 5379201 +1.51% BenchmarkRegexpMatchEasy0_32 195 195 +0.00% BenchmarkRegexpMatchEasy0_1K 477 477 +0.00% BenchmarkRegexpMatchEasy1_32 170 170 +0.00% BenchmarkRegexpMatchEasy1_1K 1412 1397 -1.06% BenchmarkRegexpMatchMedium_32 336 337 +0.30% BenchmarkRegexpMatchMedium_1K 109025 108977 -0.04% BenchmarkRegexpMatchHard_32 5854 5856 +0.03% BenchmarkRegexpMatchHard_1K 184914 184748 -0.09% BenchmarkRevcomp 829233526 836598734 +0.89% BenchmarkTemplate 142055312 137016166 -3.55% BenchmarkTimeParse 598 597 -0.17% BenchmarkTimeFormat 564 568 +0.71% Fixes #7425. LGTM=r R=golang-codereviews, r CC=golang-codereviews, iant, khr https://golang.org/cl/158890043
2014-10-15 14:24:18 -04:00
// ifaceIndir reports whether t is stored indirectly in an interface value.
func ifaceIndir(t *rtype) bool {
return t.kind&kindDirectIface == 0
}
// Layout matches runtime.gobitvector (well enough).
type bitVector struct {
n uint32 // number of bits
data []byte
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
// append a bit to the bitmap.
func (bv *bitVector) append(bit uint8) {
if bv.n%8 == 0 {
bv.data = append(bv.data, 0)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
bv.data[bv.n/8] |= bit << (bv.n % 8)
bv.n++
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
func addTypeBits(bv *bitVector, offset uintptr, t *rtype) {
if t.ptrdata == 0 {
return
}
switch Kind(t.kind & kindMask) {
case Chan, Func, Map, Ptr, Slice, String, UnsafePointer:
// 1 pointer at start of representation
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
for bv.n < uint32(offset/uintptr(ptrSize)) {
bv.append(0)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
bv.append(1)
case Interface:
// 2 pointers
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
for bv.n < uint32(offset/uintptr(ptrSize)) {
bv.append(0)
}
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
bv.append(1)
bv.append(1)
case Array:
// repeat inner type
tt := (*arrayType)(unsafe.Pointer(t))
for i := 0; i < int(tt.len); i++ {
runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-08 01:43:18 -04:00
addTypeBits(bv, offset+uintptr(i)*tt.elem.size, tt.elem)
}
case Struct:
// apply fields
tt := (*structType)(unsafe.Pointer(t))
for i := range tt.fields {
f := &tt.fields[i]
addTypeBits(bv, offset+f.offset(), f.typ)
}
}
}