2010-04-21 16:40:53 -07:00
|
|
|
// Copyright 2010 The Go Authors. All rights reserved.
|
2009-11-30 13:55:09 -08:00
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
// Represents JSON data structure using native Go types: booleans, floats,
|
|
|
|
|
// strings, arrays, and maps.
|
|
|
|
|
|
|
|
|
|
package json
|
|
|
|
|
|
|
|
|
|
import (
|
2013-08-14 14:56:07 -04:00
|
|
|
"encoding"
|
2011-02-23 11:32:29 -05:00
|
|
|
"encoding/base64"
|
2012-01-12 14:40:29 -08:00
|
|
|
"fmt"
|
2010-04-21 16:40:53 -07:00
|
|
|
"reflect"
|
|
|
|
|
"strconv"
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
"strings"
|
2010-04-21 16:40:53 -07:00
|
|
|
"unicode"
|
2011-11-08 15:40:58 -08:00
|
|
|
"unicode/utf16"
|
|
|
|
|
"unicode/utf8"
|
2009-11-30 13:55:09 -08:00
|
|
|
)
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// Unmarshal parses the JSON-encoded data and stores the result
|
2017-02-06 18:06:40 -08:00
|
|
|
// in the value pointed to by v. If v is nil or not a pointer,
|
|
|
|
|
// Unmarshal returns an InvalidUnmarshalError.
|
2009-12-22 09:47:02 -08:00
|
|
|
//
|
2011-09-19 11:50:41 -04:00
|
|
|
// Unmarshal uses the inverse of the encodings that
|
2010-04-21 16:40:53 -07:00
|
|
|
// Marshal uses, allocating maps, slices, and pointers as necessary,
|
|
|
|
|
// with the following additional rules:
|
|
|
|
|
//
|
2011-09-19 13:19:07 -04:00
|
|
|
// To unmarshal JSON into a pointer, Unmarshal first handles the case of
|
2016-03-01 23:21:55 +00:00
|
|
|
// the JSON being the JSON literal null. In that case, Unmarshal sets
|
|
|
|
|
// the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into
|
|
|
|
|
// the value pointed at by the pointer. If the pointer is nil, Unmarshal
|
2011-09-19 13:19:07 -04:00
|
|
|
// allocates a new value for it to point to.
|
|
|
|
|
//
|
2016-10-12 16:54:02 -04:00
|
|
|
// To unmarshal JSON into a value implementing the Unmarshaler interface,
|
|
|
|
|
// Unmarshal calls that value's UnmarshalJSON method, including
|
|
|
|
|
// when the input is a JSON null.
|
|
|
|
|
// Otherwise, if the value implements encoding.TextUnmarshaler
|
|
|
|
|
// and the input is a JSON quoted string, Unmarshal calls that value's
|
|
|
|
|
// UnmarshalText method with the unquoted form of the string.
|
|
|
|
|
//
|
2013-01-31 07:49:23 -08:00
|
|
|
// To unmarshal JSON into a struct, Unmarshal matches incoming object
|
|
|
|
|
// keys to the keys used by Marshal (either the struct field name or its tag),
|
2017-10-31 13:16:38 -07:00
|
|
|
// preferring an exact match but also accepting a case-insensitive match. By
|
|
|
|
|
// default, object keys which don't have a corresponding struct field are
|
|
|
|
|
// ignored (see Decoder.DisallowUnknownFields for an alternative).
|
2013-01-31 07:49:23 -08:00
|
|
|
//
|
2013-09-09 19:11:05 -04:00
|
|
|
// To unmarshal JSON into an interface value,
|
2011-09-19 13:19:07 -04:00
|
|
|
// Unmarshal stores one of these in the interface value:
|
2010-04-21 16:40:53 -07:00
|
|
|
//
|
|
|
|
|
// bool, for JSON booleans
|
|
|
|
|
// float64, for JSON numbers
|
|
|
|
|
// string, for JSON strings
|
|
|
|
|
// []interface{}, for JSON arrays
|
|
|
|
|
// map[string]interface{}, for JSON objects
|
|
|
|
|
// nil for JSON null
|
|
|
|
|
//
|
2015-11-25 11:45:16 -05:00
|
|
|
// To unmarshal a JSON array into a slice, Unmarshal resets the slice length
|
|
|
|
|
// to zero and then appends each element to the slice.
|
|
|
|
|
// As a special case, to unmarshal an empty JSON array into a slice,
|
|
|
|
|
// Unmarshal replaces the slice with a new empty slice.
|
2015-07-14 21:32:47 -04:00
|
|
|
//
|
2015-11-25 11:45:16 -05:00
|
|
|
// To unmarshal a JSON array into a Go array, Unmarshal decodes
|
|
|
|
|
// JSON array elements into corresponding Go array elements.
|
|
|
|
|
// If the Go array is smaller than the JSON array,
|
|
|
|
|
// the additional JSON array elements are discarded.
|
|
|
|
|
// If the JSON array is smaller than the Go array,
|
|
|
|
|
// the additional Go array elements are set to zero values.
|
|
|
|
|
//
|
2016-03-08 12:41:35 -08:00
|
|
|
// To unmarshal a JSON object into a map, Unmarshal first establishes a map to
|
2016-04-13 16:51:25 -07:00
|
|
|
// use. If the map is nil, Unmarshal allocates a new map. Otherwise Unmarshal
|
2016-10-20 14:51:58 -04:00
|
|
|
// reuses the existing map, keeping existing entries. Unmarshal then stores
|
|
|
|
|
// key-value pairs from the JSON object into the map. The map's key type must
|
2019-08-02 14:13:23 -07:00
|
|
|
// either be any string type, an integer, implement json.Unmarshaler, or
|
|
|
|
|
// implement encoding.TextUnmarshaler.
|
2015-07-14 21:32:47 -04:00
|
|
|
//
|
2010-04-21 16:40:53 -07:00
|
|
|
// If a JSON value is not appropriate for a given target type,
|
|
|
|
|
// or if a JSON number overflows the target type, Unmarshal
|
2015-09-01 17:51:39 +10:00
|
|
|
// skips that field and completes the unmarshaling as best it can.
|
2010-04-21 16:40:53 -07:00
|
|
|
// If no more serious errors are encountered, Unmarshal returns
|
2017-06-03 21:36:51 +02:00
|
|
|
// an UnmarshalTypeError describing the earliest such error. In any
|
|
|
|
|
// case, it's not guaranteed that all the remaining fields following
|
|
|
|
|
// the problematic one will be unmarshaled into the target object.
|
2010-04-21 16:40:53 -07:00
|
|
|
//
|
2014-05-12 23:38:26 -04:00
|
|
|
// The JSON null value unmarshals into an interface, map, pointer, or slice
|
|
|
|
|
// by setting that Go value to nil. Because null is often used in JSON to mean
|
|
|
|
|
// ``not present,'' unmarshaling a JSON null into any other Go type has no effect
|
|
|
|
|
// on the value and produces no error.
|
|
|
|
|
//
|
2013-02-14 14:56:01 -05:00
|
|
|
// When unmarshaling quoted strings, invalid UTF-8 or
|
|
|
|
|
// invalid UTF-16 surrogate pairs are not treated as an error.
|
|
|
|
|
// Instead, they are replaced by the Unicode replacement
|
|
|
|
|
// character U+FFFD.
|
|
|
|
|
//
|
2011-11-01 22:04:37 -04:00
|
|
|
func Unmarshal(data []byte, v interface{}) error {
|
2013-02-14 14:46:15 -05:00
|
|
|
// Check for well-formedness.
|
2010-04-21 16:40:53 -07:00
|
|
|
// Avoids filling out half a data structure
|
|
|
|
|
// before discovering a JSON syntax error.
|
2013-01-30 17:53:48 -08:00
|
|
|
var d decodeState
|
2010-04-21 16:40:53 -07:00
|
|
|
err := checkValid(data, &d.scan)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return err
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2013-01-30 17:53:48 -08:00
|
|
|
d.init(data)
|
2010-04-21 16:40:53 -07:00
|
|
|
return d.unmarshal(v)
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2016-04-13 18:14:52 +00:00
|
|
|
// Unmarshaler is the interface implemented by types
|
2010-04-21 16:40:53 -07:00
|
|
|
// that can unmarshal a JSON description of themselves.
|
2012-10-29 20:58:24 +01:00
|
|
|
// The input can be assumed to be a valid encoding of
|
|
|
|
|
// a JSON value. UnmarshalJSON must copy the JSON data
|
2010-04-21 16:40:53 -07:00
|
|
|
// if it wishes to retain the data after returning.
|
2016-10-12 16:54:02 -04:00
|
|
|
//
|
|
|
|
|
// By convention, to approximate the behavior of Unmarshal itself,
|
|
|
|
|
// Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.
|
2010-04-21 16:40:53 -07:00
|
|
|
type Unmarshaler interface {
|
2011-11-01 22:04:37 -04:00
|
|
|
UnmarshalJSON([]byte) error
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// An UnmarshalTypeError describes a JSON value that was
|
|
|
|
|
// not appropriate for a value of a specific Go type.
|
|
|
|
|
type UnmarshalTypeError struct {
|
2015-01-26 11:51:43 +01:00
|
|
|
Value string // description of JSON value - "bool", "array", "number -5"
|
|
|
|
|
Type reflect.Type // type of Go value it could not be assigned to
|
|
|
|
|
Offset int64 // error occurred after reading Offset bytes
|
2016-01-18 16:26:05 +01:00
|
|
|
Struct string // name of the struct type containing the field
|
2018-10-28 01:44:09 +07:00
|
|
|
Field string // the full path from root node to the field
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *UnmarshalTypeError) Error() string {
|
2016-01-18 16:26:05 +01:00
|
|
|
if e.Struct != "" || e.Field != "" {
|
|
|
|
|
return "json: cannot unmarshal " + e.Value + " into Go struct field " + e.Struct + "." + e.Field + " of type " + e.Type.String()
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
return "json: cannot unmarshal " + e.Value + " into Go value of type " + e.Type.String()
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2010-09-28 14:40:23 -04:00
|
|
|
// An UnmarshalFieldError describes a JSON object key that
|
|
|
|
|
// led to an unexported (and therefore unwritable) struct field.
|
2017-10-11 14:41:25 -07:00
|
|
|
//
|
|
|
|
|
// Deprecated: No longer used; kept for compatibility.
|
2010-09-28 14:40:23 -04:00
|
|
|
type UnmarshalFieldError struct {
|
|
|
|
|
Key string
|
2011-04-08 12:27:58 -04:00
|
|
|
Type reflect.Type
|
2010-09-28 14:40:23 -04:00
|
|
|
Field reflect.StructField
|
|
|
|
|
}
|
|
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *UnmarshalFieldError) Error() string {
|
2010-09-28 14:40:23 -04:00
|
|
|
return "json: cannot unmarshal object key " + strconv.Quote(e.Key) + " into unexported field " + e.Field.Name + " of type " + e.Type.String()
|
|
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// An InvalidUnmarshalError describes an invalid argument passed to Unmarshal.
|
|
|
|
|
// (The argument to Unmarshal must be a non-nil pointer.)
|
|
|
|
|
type InvalidUnmarshalError struct {
|
|
|
|
|
Type reflect.Type
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *InvalidUnmarshalError) Error() string {
|
2010-04-21 16:40:53 -07:00
|
|
|
if e.Type == nil {
|
|
|
|
|
return "json: Unmarshal(nil)"
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
if e.Type.Kind() != reflect.Ptr {
|
2010-04-21 16:40:53 -07:00
|
|
|
return "json: Unmarshal(non-pointer " + e.Type.String() + ")"
|
|
|
|
|
}
|
|
|
|
|
return "json: Unmarshal(nil " + e.Type.String() + ")"
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) unmarshal(v interface{}) error {
|
2011-04-25 13:39:36 -04:00
|
|
|
rv := reflect.ValueOf(v)
|
2012-12-17 02:34:49 +01:00
|
|
|
if rv.Kind() != reflect.Ptr || rv.IsNil() {
|
2011-04-25 13:39:36 -04:00
|
|
|
return &InvalidUnmarshalError{reflect.TypeOf(v)}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
d.scan.reset()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2012-12-17 02:34:49 +01:00
|
|
|
// We decode rv not rv.Elem because the Unmarshaler interface
|
2010-11-08 15:33:00 -08:00
|
|
|
// test must be applied at the top level of the value.
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
if err := d.value(rv); err != nil {
|
2018-09-03 11:20:23 +01:00
|
|
|
return d.addErrorContext(err)
|
2018-03-03 15:20:26 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
return d.savedError
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2012-06-25 17:36:09 -04:00
|
|
|
// A Number represents a JSON number literal.
|
|
|
|
|
type Number string
|
|
|
|
|
|
|
|
|
|
// String returns the literal text of the number.
|
|
|
|
|
func (n Number) String() string { return string(n) }
|
|
|
|
|
|
|
|
|
|
// Float64 returns the number as a float64.
|
|
|
|
|
func (n Number) Float64() (float64, error) {
|
|
|
|
|
return strconv.ParseFloat(string(n), 64)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Int64 returns the number as an int64.
|
|
|
|
|
func (n Number) Int64() (int64, error) {
|
|
|
|
|
return strconv.ParseInt(string(n), 10, 64)
|
|
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// decodeState represents the state while decoding a JSON value.
|
|
|
|
|
type decodeState struct {
|
2016-01-18 16:26:05 +01:00
|
|
|
data []byte
|
2017-06-29 11:51:22 +02:00
|
|
|
off int // next read offset in data
|
|
|
|
|
opcode int // last read result
|
2016-01-18 16:26:05 +01:00
|
|
|
scan scanner
|
|
|
|
|
errorContext struct { // provides context for type errors
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
Struct reflect.Type
|
|
|
|
|
FieldStack []string
|
2016-01-18 16:26:05 +01:00
|
|
|
}
|
2017-10-31 13:16:38 -07:00
|
|
|
savedError error
|
|
|
|
|
useNumber bool
|
|
|
|
|
disallowUnknownFields bool
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
// safeUnquote is the number of current string literal bytes that don't
|
|
|
|
|
// need to be unquoted. When negative, no bytes need unquoting.
|
|
|
|
|
safeUnquote int
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// readIndex returns the position of the last byte read.
|
|
|
|
|
func (d *decodeState) readIndex() int {
|
|
|
|
|
return d.off - 1
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-12 09:26:31 +02:00
|
|
|
// phasePanicMsg is used as a panic message when we end up with something that
|
|
|
|
|
// shouldn't happen. It can indicate a bug in the JSON decoder, or that
|
|
|
|
|
// something is editing the data slice while the decoder executes.
|
|
|
|
|
const phasePanicMsg = "JSON decoder out of sync - data changing underfoot?"
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
func (d *decodeState) init(data []byte) *decodeState {
|
|
|
|
|
d.data = data
|
|
|
|
|
d.off = 0
|
|
|
|
|
d.savedError = nil
|
2018-07-08 13:17:56 +01:00
|
|
|
d.errorContext.Struct = nil
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
|
|
|
|
|
// Reuse the allocated space for the FieldStack slice.
|
|
|
|
|
d.errorContext.FieldStack = d.errorContext.FieldStack[:0]
|
2010-04-21 16:40:53 -07:00
|
|
|
return d
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// saveError saves the first err it is called with,
|
|
|
|
|
// for reporting at the end of the unmarshal.
|
2011-11-01 22:04:37 -04:00
|
|
|
func (d *decodeState) saveError(err error) {
|
2010-04-21 16:40:53 -07:00
|
|
|
if d.savedError == nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.savedError = d.addErrorContext(err)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// addErrorContext returns a new error enhanced with information from d.errorContext
|
|
|
|
|
func (d *decodeState) addErrorContext(err error) error {
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
if d.errorContext.Struct != nil || len(d.errorContext.FieldStack) > 0 {
|
2016-01-18 16:26:05 +01:00
|
|
|
switch err := err.(type) {
|
|
|
|
|
case *UnmarshalTypeError:
|
2018-07-08 13:17:56 +01:00
|
|
|
err.Struct = d.errorContext.Struct.Name()
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
err.Field = strings.Join(d.errorContext.FieldStack, ".")
|
2016-01-18 16:26:05 +01:00
|
|
|
return err
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2016-01-18 16:26:05 +01:00
|
|
|
return err
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// skip scans to the end of what was started.
|
|
|
|
|
func (d *decodeState) skip() {
|
|
|
|
|
s, data, i := &d.scan, d.data, d.off
|
|
|
|
|
depth := len(s.parseState)
|
|
|
|
|
for {
|
|
|
|
|
op := s.step(s, data[i])
|
|
|
|
|
i++
|
|
|
|
|
if len(s.parseState) < depth {
|
|
|
|
|
d.off = i
|
|
|
|
|
d.opcode = op
|
|
|
|
|
return
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// scanNext processes the byte at d.data[d.off].
|
|
|
|
|
func (d *decodeState) scanNext() {
|
2018-07-08 16:14:35 +01:00
|
|
|
if d.off < len(d.data) {
|
|
|
|
|
d.opcode = d.scan.step(&d.scan, d.data[d.off])
|
|
|
|
|
d.off++
|
2010-04-21 16:40:53 -07:00
|
|
|
} else {
|
2018-07-08 16:14:35 +01:00
|
|
|
d.opcode = d.scan.eof()
|
|
|
|
|
d.off = len(d.data) + 1 // mark processed EOF with len+1
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// scanWhile processes bytes in d.data[d.off:] until it
|
|
|
|
|
// receives a scan code not equal to op.
|
2017-06-29 11:51:22 +02:00
|
|
|
func (d *decodeState) scanWhile(op int) {
|
|
|
|
|
s, data, i := &d.scan, d.data, d.off
|
2018-07-08 16:14:35 +01:00
|
|
|
for i < len(data) {
|
2017-06-29 11:51:22 +02:00
|
|
|
newOp := s.step(s, data[i])
|
|
|
|
|
i++
|
2010-04-21 16:40:53 -07:00
|
|
|
if newOp != op {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.opcode = newOp
|
|
|
|
|
d.off = i
|
|
|
|
|
return
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
|
2018-07-08 16:14:35 +01:00
|
|
|
d.off = len(data) + 1 // mark processed EOF with len+1
|
2017-06-29 11:51:22 +02:00
|
|
|
d.opcode = d.scan.eof()
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2018-11-23 16:56:23 +00:00
|
|
|
// rescanLiteral is similar to scanWhile(scanContinue), but it specialises the
|
|
|
|
|
// common case where we're decoding a literal. The decoder scans the input
|
|
|
|
|
// twice, once for syntax errors and to check the length of the value, and the
|
|
|
|
|
// second to perform the decoding.
|
|
|
|
|
//
|
|
|
|
|
// Only in the second step do we use decodeState to tokenize literals, so we
|
|
|
|
|
// know there aren't any syntax errors. We can take advantage of that knowledge,
|
|
|
|
|
// and scan a literal's bytes much more quickly.
|
|
|
|
|
func (d *decodeState) rescanLiteral() {
|
|
|
|
|
data, i := d.data, d.off
|
|
|
|
|
Switch:
|
|
|
|
|
switch data[i-1] {
|
|
|
|
|
case '"': // string
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
// safeUnquote is initialized at -1, which means that all bytes
|
|
|
|
|
// checked so far can be unquoted at a later time with no work
|
|
|
|
|
// at all. When reaching the closing '"', if safeUnquote is
|
|
|
|
|
// still -1, all bytes can be unquoted with no work. Otherwise,
|
|
|
|
|
// only those bytes up until the first '\\' or non-ascii rune
|
|
|
|
|
// can be safely unquoted.
|
|
|
|
|
safeUnquote := -1
|
2018-11-23 16:56:23 +00:00
|
|
|
for ; i < len(data); i++ {
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
if c := data[i]; c == '\\' {
|
|
|
|
|
if safeUnquote < 0 { // first unsafe byte
|
|
|
|
|
safeUnquote = int(i - d.off)
|
|
|
|
|
}
|
2018-11-23 16:56:23 +00:00
|
|
|
i++ // escaped char
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
} else if c == '"' {
|
|
|
|
|
d.safeUnquote = safeUnquote
|
2018-11-23 16:56:23 +00:00
|
|
|
i++ // tokenize the closing quote too
|
|
|
|
|
break Switch
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
} else if c >= utf8.RuneSelf {
|
|
|
|
|
if safeUnquote < 0 { // first unsafe byte
|
|
|
|
|
safeUnquote = int(i - d.off)
|
|
|
|
|
}
|
2018-11-23 16:56:23 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-': // number
|
|
|
|
|
for ; i < len(data); i++ {
|
|
|
|
|
switch data[i] {
|
|
|
|
|
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
|
|
|
|
|
'.', 'e', 'E', '+', '-':
|
|
|
|
|
default:
|
|
|
|
|
break Switch
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
case 't': // true
|
|
|
|
|
i += len("rue")
|
|
|
|
|
case 'f': // false
|
|
|
|
|
i += len("alse")
|
|
|
|
|
case 'n': // null
|
|
|
|
|
i += len("ull")
|
|
|
|
|
}
|
|
|
|
|
if i < len(data) {
|
|
|
|
|
d.opcode = stateEndValue(&d.scan, data[i])
|
|
|
|
|
} else {
|
|
|
|
|
d.opcode = scanEnd
|
|
|
|
|
}
|
|
|
|
|
d.off = i + 1
|
|
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// value consumes a JSON value from d.data[d.off-1:], decoding into v, and
|
|
|
|
|
// reads the following byte ahead. If v is invalid, the value is discarded.
|
|
|
|
|
// The first byte of the value has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) value(v reflect.Value) error {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginArray:
|
2017-06-29 11:51:22 +02:00
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.array(v); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
} else {
|
|
|
|
|
d.skip()
|
|
|
|
|
}
|
|
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginObject:
|
2017-06-29 11:51:22 +02:00
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.object(v); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
} else {
|
|
|
|
|
d.skip()
|
|
|
|
|
}
|
|
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginLiteral:
|
2017-06-29 11:51:22 +02:00
|
|
|
// All bytes inside literal return scanContinue op code.
|
|
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
|
|
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2014-10-07 11:07:04 -04:00
|
|
|
type unquotedValue struct{}
|
|
|
|
|
|
|
|
|
|
// valueQuoted is like value but decodes a
|
|
|
|
|
// quoted string literal or literal null into an interface value.
|
|
|
|
|
// If it finds anything other than a quoted string literal or null,
|
|
|
|
|
// valueQuoted returns unquotedValue{}.
|
2018-09-12 09:26:31 +02:00
|
|
|
func (d *decodeState) valueQuoted() interface{} {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2014-10-07 11:07:04 -04:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2014-10-07 11:07:04 -04:00
|
|
|
|
2018-07-08 16:14:35 +01:00
|
|
|
case scanBeginArray, scanBeginObject:
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
|
|
|
|
d.scanNext()
|
2014-10-07 11:07:04 -04:00
|
|
|
|
|
|
|
|
case scanBeginLiteral:
|
2018-09-12 09:26:31 +02:00
|
|
|
v := d.literalInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
switch v.(type) {
|
2014-10-07 11:07:04 -04:00
|
|
|
case nil, string:
|
2018-09-12 09:26:31 +02:00
|
|
|
return v
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
|
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return unquotedValue{}
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// indirect walks down v allocating pointers as needed,
|
|
|
|
|
// until it gets to a non-pointer.
|
2019-05-01 14:52:57 +02:00
|
|
|
// If it encounters an Unmarshaler, indirect stops and returns that.
|
|
|
|
|
// If decodingNull is true, indirect stops at the first settable pointer so it
|
|
|
|
|
// can be set to nil.
|
2017-06-29 11:51:22 +02:00
|
|
|
func indirect(v reflect.Value, decodingNull bool) (Unmarshaler, encoding.TextUnmarshaler, reflect.Value) {
|
2018-02-28 13:45:06 -08:00
|
|
|
// Issue #24153 indicates that it is generally not a guaranteed property
|
|
|
|
|
// that you may round-trip a reflect.Value by calling Value.Addr().Elem()
|
|
|
|
|
// and expect the value to still be settable for values derived from
|
|
|
|
|
// unexported embedded struct fields.
|
|
|
|
|
//
|
|
|
|
|
// The logic below effectively does this when it first addresses the value
|
|
|
|
|
// (to satisfy possible pointer methods) and continues to dereference
|
|
|
|
|
// subsequent pointers as necessary.
|
|
|
|
|
//
|
|
|
|
|
// After the first round-trip, we set v back to the original value to
|
|
|
|
|
// preserve the original RW flags contained in reflect.Value.
|
|
|
|
|
v0 := v
|
|
|
|
|
haveAddr := false
|
|
|
|
|
|
2011-08-10 09:26:51 -04:00
|
|
|
// If v is a named type and is addressable,
|
|
|
|
|
// start with its address, so that if the type has pointer methods,
|
|
|
|
|
// we find them.
|
|
|
|
|
if v.Kind() != reflect.Ptr && v.Type().Name() != "" && v.CanAddr() {
|
2018-02-28 13:45:06 -08:00
|
|
|
haveAddr = true
|
2011-08-10 09:26:51 -04:00
|
|
|
v = v.Addr()
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
2012-06-07 01:48:55 -04:00
|
|
|
// Load value from interface, but only if the result will be
|
|
|
|
|
// usefully addressable.
|
2012-06-25 16:03:18 -04:00
|
|
|
if v.Kind() == reflect.Interface && !v.IsNil() {
|
|
|
|
|
e := v.Elem()
|
2012-06-07 01:48:55 -04:00
|
|
|
if e.Kind() == reflect.Ptr && !e.IsNil() && (!decodingNull || e.Elem().Kind() == reflect.Ptr) {
|
2018-02-28 13:45:06 -08:00
|
|
|
haveAddr = false
|
2012-06-07 01:48:55 -04:00
|
|
|
v = e
|
|
|
|
|
continue
|
|
|
|
|
}
|
2010-04-27 10:24:00 -07:00
|
|
|
}
|
2011-04-18 14:36:22 -04:00
|
|
|
|
2012-06-25 16:03:18 -04:00
|
|
|
if v.Kind() != reflect.Ptr {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
|
2019-05-01 14:52:57 +02:00
|
|
|
if decodingNull && v.CanSet() {
|
2012-06-25 16:03:18 -04:00
|
|
|
break
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2019-04-29 22:57:25 +07:00
|
|
|
|
|
|
|
|
// Prevent infinite loop if v is an interface pointing to its own address:
|
|
|
|
|
// var v interface{}
|
|
|
|
|
// v = &v
|
|
|
|
|
if v.Elem().Kind() == reflect.Interface && v.Elem().Elem() == v {
|
|
|
|
|
v = v.Elem()
|
|
|
|
|
break
|
|
|
|
|
}
|
2012-06-25 16:03:18 -04:00
|
|
|
if v.IsNil() {
|
|
|
|
|
v.Set(reflect.New(v.Type().Elem()))
|
2010-04-27 10:24:00 -07:00
|
|
|
}
|
2018-10-11 14:43:47 +01:00
|
|
|
if v.Type().NumMethod() > 0 && v.CanInterface() {
|
2013-08-14 14:56:07 -04:00
|
|
|
if u, ok := v.Interface().(Unmarshaler); ok {
|
|
|
|
|
return u, nil, reflect.Value{}
|
|
|
|
|
}
|
2016-10-12 16:54:02 -04:00
|
|
|
if !decodingNull {
|
|
|
|
|
if u, ok := v.Interface().(encoding.TextUnmarshaler); ok {
|
|
|
|
|
return nil, u, reflect.Value{}
|
|
|
|
|
}
|
2012-06-25 16:03:18 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-02-28 13:45:06 -08:00
|
|
|
|
|
|
|
|
if haveAddr {
|
|
|
|
|
v = v0 // restore original value after round-trip Value.Addr().Elem()
|
|
|
|
|
haveAddr = false
|
|
|
|
|
} else {
|
|
|
|
|
v = v.Elem()
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
return nil, nil, v
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// array consumes an array from d.data[d.off-1:], decoding into v.
|
|
|
|
|
// The first byte of the array ('[') has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) array(v reflect.Value) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, false)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
|
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return u.UnmarshalJSON(d.data[start:d.off])
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
if ut != nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
initialSliceCap := 0
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Check type of target.
|
2011-12-19 15:32:06 -05:00
|
|
|
switch v.Kind() {
|
2013-01-14 08:44:16 +01:00
|
|
|
case reflect.Interface:
|
|
|
|
|
if v.NumMethod() == 0 {
|
2017-08-19 22:33:51 +02:00
|
|
|
// Decoding into nil interface? Switch to non-reflect code.
|
2018-09-12 09:26:31 +02:00
|
|
|
ai := d.arrayInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
v.Set(reflect.ValueOf(ai))
|
|
|
|
|
return nil
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
|
|
|
|
// Otherwise it's invalid.
|
|
|
|
|
fallthrough
|
2011-12-19 15:32:06 -05:00
|
|
|
default:
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
case reflect.Slice:
|
|
|
|
|
initialSliceCap = v.Cap()
|
|
|
|
|
case reflect.Array:
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
i := 0
|
|
|
|
|
for {
|
|
|
|
|
// Look ahead for ] - can only happen on first iteration.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
2011-12-19 15:32:06 -05:00
|
|
|
if v.Kind() == reflect.Slice {
|
|
|
|
|
// Grow slice if necessary
|
|
|
|
|
if i >= v.Cap() {
|
|
|
|
|
newcap := v.Cap() + v.Cap()/2
|
|
|
|
|
if newcap < 4 {
|
|
|
|
|
newcap = 4
|
|
|
|
|
}
|
|
|
|
|
newv := reflect.MakeSlice(v.Type(), v.Len(), newcap)
|
|
|
|
|
reflect.Copy(newv, v)
|
|
|
|
|
v.Set(newv)
|
|
|
|
|
}
|
|
|
|
|
if i >= v.Len() {
|
|
|
|
|
v.SetLen(i + 1)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
var into reflect.Value
|
2011-12-19 15:32:06 -05:00
|
|
|
if i < v.Len() {
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
into = v.Index(i)
|
|
|
|
|
if i < initialSliceCap {
|
|
|
|
|
// Reusing an element from the slice's original
|
|
|
|
|
// backing array; zero it before decoding.
|
|
|
|
|
into.Set(reflect.Zero(v.Type().Elem()))
|
2018-03-03 15:20:26 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
i++
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
// Note that we decode the value even if we ran past the end of
|
|
|
|
|
// the fixed array. In that case, we decode into an empty value
|
|
|
|
|
// and do nothing with it.
|
|
|
|
|
if err := d.value(into); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Next token must be , or ].
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanArrayValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2011-11-14 16:03:23 -05:00
|
|
|
|
2011-12-19 15:32:06 -05:00
|
|
|
if i < v.Len() {
|
|
|
|
|
if v.Kind() == reflect.Array {
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
// Zero the remaining elements.
|
|
|
|
|
zero := reflect.Zero(v.Type().Elem())
|
2011-12-19 15:32:06 -05:00
|
|
|
for ; i < v.Len(); i++ {
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
v.Index(i).Set(zero)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
} else {
|
2011-12-19 15:32:06 -05:00
|
|
|
v.SetLen(i)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
encoding/json: don't reuse slice elements when decoding
The previous behavior directly contradicted the docs that have been in
place for years:
To unmarshal a JSON array into a slice, Unmarshal resets the
slice length to zero and then appends each element to the slice.
We could use reflect.New to create a new element and reflect.Append to
then append it to the destination slice, but benchmarks have shown that
reflect.Append is very slow compared to the code that manually grows a
slice in this file.
Instead, if we're decoding into an element that came from the original
backing array, zero it before decoding into it. We're going to be using
the CodeDecoder benchmark, as it has a slice of struct pointers that's
decoded very often.
Note that we still reuse existing values from arrays being decoded into,
as the documentation agrees with the existing implementation in that
case:
To unmarshal a JSON array into a Go array, Unmarshal decodes
JSON array elements into corresponding Go array elements.
The numbers with the benchmark as-is might seem catastrophic, but that's
only because the benchmark is decoding into the same variable over and
over again. Since the old decoder was happy to reuse slice elements, it
would save a lot of allocations by not having to zero and re-allocate
said elements:
name old time/op new time/op delta
CodeDecoder-8 10.4ms ± 1% 10.9ms ± 1% +4.41% (p=0.000 n=10+10)
name old speed new speed delta
CodeDecoder-8 186MB/s ± 1% 178MB/s ± 1% -4.23% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.19MB ± 0% 3.59MB ± 0% +64.09% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.8k ± 0% 92.7k ± 0% +20.71% (p=0.000 n=10+10)
We can prove this by moving 'var r codeResponse' into the loop, so that
the benchmark no longer reuses the destination pointer. And sure enough,
we no longer see the slow-down caused by the extra allocations:
name old time/op new time/op delta
CodeDecoder-8 10.9ms ± 0% 10.9ms ± 1% -0.37% (p=0.043 n=10+10)
name old speed new speed delta
CodeDecoder-8 177MB/s ± 0% 178MB/s ± 1% +0.37% (p=0.041 n=10+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 3.59MB ± 0% 3.59MB ± 0% ~ (p=0.780 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 92.7k ± 0% 92.7k ± 0% ~ (all equal)
I believe that it's useful to leave the benchmarks as they are now,
because the decoder does reuse memory in some cases. For example,
existing map elements are reused. However, subtle changes like this one
need to be benchmarked carefully.
Finally, add a couple of tests involving both a slice and an array of
structs.
Fixes #21092.
Change-Id: I8b1194f25e723a31abd146fbfe9428ac10c1389d
Reviewed-on: https://go-review.googlesource.com/c/go/+/191783
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-29 14:24:16 +02:00
|
|
|
if v.Kind() == reflect.Slice && v.IsNil() {
|
|
|
|
|
// Don't allow the resulting slice to be nil.
|
2011-12-19 15:32:06 -05:00
|
|
|
v.Set(reflect.MakeSlice(v.Type(), 0, 0))
|
2011-11-14 16:03:23 -05:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2014-10-07 11:07:04 -04:00
|
|
|
var nullLiteral = []byte("null")
|
2018-06-19 12:34:17 -03:00
|
|
|
var textUnmarshalerType = reflect.TypeOf((*encoding.TextUnmarshaler)(nil)).Elem()
|
2014-10-07 11:07:04 -04:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// object consumes an object from d.data[d.off-1:], decoding into v.
|
|
|
|
|
// The first byte ('{') of the object has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) object(v reflect.Value) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, false)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
|
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return u.UnmarshalJSON(d.data[start:d.off])
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
if ut != nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
2018-07-08 16:14:35 +01:00
|
|
|
t := v.Type()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-08-19 22:33:51 +02:00
|
|
|
// Decoding into nil interface? Switch to non-reflect code.
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.Kind() == reflect.Interface && v.NumMethod() == 0 {
|
2018-09-12 09:26:31 +02:00
|
|
|
oi := d.objectInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
v.Set(reflect.ValueOf(oi))
|
|
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
encoding/json: index names for the struct decoder
In the common case, structs have a handful of fields and most inputs
match struct field names exactly.
The previous code would do a linear search over the fields, stopping at
the first exact match, and otherwise using the first case insensitive
match.
This is unfortunate, because it means that for the common case, we'd do
a linear search with bytes.Equal. Even for structs with only two or
three fields, that is pretty wasteful.
Worse even, up until the exact match was found via the linear search,
all previous fields would run their equalFold functions, which aren't
cheap even in the simple case.
Instead, cache a map along with the field list that indexes the fields
by their name. This way, a case sensitive field search doesn't involve a
linear search, nor does it involve any equalFold func calls.
This patch should also slightly speed up cases where there's a case
insensitive match but not a case sensitive one, as then we'd avoid
calling bytes.Equal on all the fields. Though that's not a common case,
and there are no benchmarks for it.
name old time/op new time/op delta
CodeDecoder-8 11.0ms ± 0% 10.6ms ± 1% -4.42% (p=0.000 n=9+10)
name old speed new speed delta
CodeDecoder-8 176MB/s ± 0% 184MB/s ± 1% +4.62% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.28MB ± 0% 2.28MB ± 0% ~ (p=0.725 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.9k ± 0% 76.9k ± 0% ~ (all equal)
Updates #28923.
Change-Id: I9929c1f06c76505e5b96914199315dbdaae5dc76
Reviewed-on: https://go-review.googlesource.com/c/go/+/172918
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-22 23:36:43 +07:00
|
|
|
var fields structFields
|
2018-07-08 16:14:35 +01:00
|
|
|
|
2016-03-08 12:41:35 -08:00
|
|
|
// Check type of target:
|
|
|
|
|
// struct or
|
2016-04-13 16:51:25 -07:00
|
|
|
// map[T1]T2 where T1 is string, an integer type,
|
|
|
|
|
// or an encoding.TextUnmarshaler
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
|
|
|
|
case reflect.Map:
|
2016-04-13 16:51:25 -07:00
|
|
|
// Map key must either have string kind, have an integer kind,
|
|
|
|
|
// or be an encoding.TextUnmarshaler.
|
|
|
|
|
switch t.Key().Kind() {
|
|
|
|
|
case reflect.String,
|
|
|
|
|
reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64,
|
|
|
|
|
reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
|
|
|
|
default:
|
|
|
|
|
if !reflect.PtrTo(t.Key()).Implements(textUnmarshalerType) {
|
2018-07-08 16:14:35 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2016-04-13 16:51:25 -07:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2012-12-17 02:34:49 +01:00
|
|
|
if v.IsNil() {
|
|
|
|
|
v.Set(reflect.MakeMap(t))
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Struct:
|
2018-07-08 16:14:35 +01:00
|
|
|
fields = cachedTypeFields(t)
|
2016-01-18 16:26:05 +01:00
|
|
|
// ok
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-07-08 16:14:35 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2020-05-20 17:03:31 +00:00
|
|
|
var mapElem reflect.Value
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
origErrorContext := d.errorContext
|
2013-08-29 14:45:59 +10:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
|
|
|
|
// Read opening " of string key or closing }.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
// closing } - can only happen on first iteration.
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanBeginLiteral {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2013-12-18 07:30:21 -08:00
|
|
|
// Read key.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
key, ok := d.unquoteBytes(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2010-04-27 10:24:00 -07:00
|
|
|
// Figure out field corresponding to key.
|
2020-05-20 17:03:31 +00:00
|
|
|
var subv reflect.Value
|
2013-08-29 14:45:59 +10:00
|
|
|
destring := false // whether the value is wrapped in a string to be decoded first
|
|
|
|
|
|
|
|
|
|
if v.Kind() == reflect.Map {
|
2020-05-20 17:03:31 +00:00
|
|
|
elemType := t.Elem()
|
|
|
|
|
if !mapElem.IsValid() {
|
|
|
|
|
mapElem = reflect.New(elemType).Elem()
|
|
|
|
|
} else {
|
|
|
|
|
mapElem.Set(reflect.Zero(elemType))
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
2020-05-20 17:03:31 +00:00
|
|
|
subv = mapElem
|
2013-08-29 14:45:59 +10:00
|
|
|
} else {
|
|
|
|
|
var f *field
|
encoding/json: index names for the struct decoder
In the common case, structs have a handful of fields and most inputs
match struct field names exactly.
The previous code would do a linear search over the fields, stopping at
the first exact match, and otherwise using the first case insensitive
match.
This is unfortunate, because it means that for the common case, we'd do
a linear search with bytes.Equal. Even for structs with only two or
three fields, that is pretty wasteful.
Worse even, up until the exact match was found via the linear search,
all previous fields would run their equalFold functions, which aren't
cheap even in the simple case.
Instead, cache a map along with the field list that indexes the fields
by their name. This way, a case sensitive field search doesn't involve a
linear search, nor does it involve any equalFold func calls.
This patch should also slightly speed up cases where there's a case
insensitive match but not a case sensitive one, as then we'd avoid
calling bytes.Equal on all the fields. Though that's not a common case,
and there are no benchmarks for it.
name old time/op new time/op delta
CodeDecoder-8 11.0ms ± 0% 10.6ms ± 1% -4.42% (p=0.000 n=9+10)
name old speed new speed delta
CodeDecoder-8 176MB/s ± 0% 184MB/s ± 1% +4.62% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.28MB ± 0% 2.28MB ± 0% ~ (p=0.725 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.9k ± 0% 76.9k ± 0% ~ (all equal)
Updates #28923.
Change-Id: I9929c1f06c76505e5b96914199315dbdaae5dc76
Reviewed-on: https://go-review.googlesource.com/c/go/+/172918
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-22 23:36:43 +07:00
|
|
|
if i, ok := fields.nameIndex[string(key)]; ok {
|
|
|
|
|
// Found an exact name match.
|
|
|
|
|
f = &fields.list[i]
|
|
|
|
|
} else {
|
|
|
|
|
// Fall back to the expensive case-insensitive
|
|
|
|
|
// linear search.
|
|
|
|
|
for i := range fields.list {
|
|
|
|
|
ff := &fields.list[i]
|
|
|
|
|
if ff.equalFold(ff.nameBytes, key) {
|
|
|
|
|
f = ff
|
|
|
|
|
break
|
|
|
|
|
}
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if f != nil {
|
|
|
|
|
subv = v
|
|
|
|
|
destring = f.quoted
|
|
|
|
|
for _, i := range f.index {
|
|
|
|
|
if subv.Kind() == reflect.Ptr {
|
|
|
|
|
if subv.IsNil() {
|
2017-12-05 22:38:36 -08:00
|
|
|
// If a struct embeds a pointer to an unexported type,
|
|
|
|
|
// it is not possible to set a newly allocated value
|
|
|
|
|
// since the field is unexported.
|
|
|
|
|
//
|
|
|
|
|
// See https://golang.org/issue/21357
|
|
|
|
|
if !subv.CanSet() {
|
|
|
|
|
d.saveError(fmt.Errorf("json: cannot set embedded pointer to unexported struct: %v", subv.Type().Elem()))
|
|
|
|
|
// Invalidate subv to ensure d.value(subv) skips over
|
|
|
|
|
// the JSON value without assigning it to subv.
|
|
|
|
|
subv = reflect.Value{}
|
|
|
|
|
destring = false
|
|
|
|
|
break
|
|
|
|
|
}
|
2013-08-29 14:45:59 +10:00
|
|
|
subv.Set(reflect.New(subv.Type().Elem()))
|
|
|
|
|
}
|
|
|
|
|
subv = subv.Elem()
|
|
|
|
|
}
|
|
|
|
|
subv = subv.Field(i)
|
|
|
|
|
}
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
d.errorContext.FieldStack = append(d.errorContext.FieldStack, f.name)
|
2018-07-08 16:14:35 +01:00
|
|
|
d.errorContext.Struct = t
|
2017-10-31 13:16:38 -07:00
|
|
|
} else if d.disallowUnknownFields {
|
|
|
|
|
d.saveError(fmt.Errorf("json: unknown field %q", key))
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Read : before value.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectKey {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-08-29 12:46:32 -07:00
|
|
|
if destring {
|
2018-09-12 09:26:31 +02:00
|
|
|
switch qv := d.valueQuoted().(type) {
|
2014-10-07 11:07:04 -04:00
|
|
|
case nil:
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore(nullLiteral, subv, false); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2014-10-07 11:07:04 -04:00
|
|
|
case string:
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore([]byte(qv), subv, true); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2014-10-07 11:07:04 -04:00
|
|
|
default:
|
2014-12-27 20:52:17 +01:00
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
2011-08-29 12:46:32 -07:00
|
|
|
} else {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.value(subv); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2011-08-29 12:46:32 -07:00
|
|
|
}
|
2012-12-30 15:40:42 +11:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// Write value back to map;
|
|
|
|
|
// if using struct, subv points into struct already.
|
2020-05-20 17:03:31 +00:00
|
|
|
if v.Kind() == reflect.Map {
|
|
|
|
|
kt := t.Key()
|
|
|
|
|
var kv reflect.Value
|
|
|
|
|
switch {
|
|
|
|
|
case reflect.PtrTo(kt).Implements(textUnmarshalerType):
|
|
|
|
|
kv = reflect.New(kt)
|
|
|
|
|
if err := d.literalStore(item, kv, true); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
kv = kv.Elem()
|
|
|
|
|
case kt.Kind() == reflect.String:
|
|
|
|
|
kv = reflect.ValueOf(key).Convert(kt)
|
|
|
|
|
default:
|
|
|
|
|
switch kt.Kind() {
|
|
|
|
|
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
|
|
|
|
|
s := string(key)
|
|
|
|
|
n, err := strconv.ParseInt(s, 10, 64)
|
|
|
|
|
if err != nil || reflect.Zero(kt).OverflowInt(n) {
|
|
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
kv = reflect.ValueOf(n).Convert(kt)
|
|
|
|
|
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
|
|
|
|
s := string(key)
|
|
|
|
|
n, err := strconv.ParseUint(s, 10, 64)
|
|
|
|
|
if err != nil || reflect.Zero(kt).OverflowUint(n) {
|
|
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
kv = reflect.ValueOf(n).Convert(kt)
|
|
|
|
|
default:
|
|
|
|
|
panic("json: Unexpected key type") // should never occur
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if kv.IsValid() {
|
|
|
|
|
v.SetMapIndex(kv, subv)
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Next token must be , or }.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
// Reset errorContext to its original state.
|
|
|
|
|
// Keep the same underlying array for FieldStack, to reuse the
|
|
|
|
|
// space and avoid unnecessary allocs.
|
|
|
|
|
d.errorContext.FieldStack = d.errorContext.FieldStack[:len(origErrorContext.FieldStack)]
|
|
|
|
|
d.errorContext.Struct = origErrorContext.Struct
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2012-06-25 17:36:09 -04:00
|
|
|
// convertNumber converts the number literal s to a float64 or a Number
|
|
|
|
|
// depending on the setting of d.useNumber.
|
|
|
|
|
func (d *decodeState) convertNumber(s string) (interface{}, error) {
|
|
|
|
|
if d.useNumber {
|
|
|
|
|
return Number(s), nil
|
|
|
|
|
}
|
|
|
|
|
f, err := strconv.ParseFloat(s, 64)
|
|
|
|
|
if err != nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
return nil, &UnmarshalTypeError{Value: "number " + s, Type: reflect.TypeOf(0.0), Offset: int64(d.off)}
|
2012-06-25 17:36:09 -04:00
|
|
|
}
|
|
|
|
|
return f, nil
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var numberType = reflect.TypeOf(Number(""))
|
|
|
|
|
|
2011-08-29 12:46:32 -07:00
|
|
|
// literalStore decodes a literal stored in item into v.
|
2012-01-12 14:40:29 -08:00
|
|
|
//
|
|
|
|
|
// fromQuoted indicates whether this literal came from unwrapping a
|
|
|
|
|
// string from the ",string" struct tag option. this is used only to
|
|
|
|
|
// produce more helpful error messages.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) literalStore(item []byte, v reflect.Value, fromQuoted bool) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2012-05-03 17:35:44 -04:00
|
|
|
if len(item) == 0 {
|
|
|
|
|
//Empty string given
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2012-05-03 17:35:44 -04:00
|
|
|
}
|
2016-10-12 16:54:02 -04:00
|
|
|
isNull := item[0] == 'n' // null
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, isNull)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2018-04-19 21:56:45 +03:00
|
|
|
return u.UnmarshalJSON(item)
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
|
|
|
|
if ut != nil {
|
|
|
|
|
if item[0] != '"' {
|
|
|
|
|
if fromQuoted {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
2018-09-11 22:09:00 +02:00
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
val := "number"
|
|
|
|
|
switch item[0] {
|
|
|
|
|
case 'n':
|
|
|
|
|
val = "null"
|
|
|
|
|
case 't', 'f':
|
|
|
|
|
val = "bool"
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-09-11 22:09:00 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: val, Type: v.Type(), Offset: int64(d.readIndex())})
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
s, ok := d.unquoteBytes(item)
|
2013-08-14 14:56:07 -04:00
|
|
|
if !ok {
|
|
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-04-19 21:56:45 +03:00
|
|
|
return ut.UnmarshalText(s)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
|
|
|
|
|
|
|
|
|
switch c := item[0]; c {
|
|
|
|
|
case 'n': // null
|
2016-10-12 15:55:02 -04:00
|
|
|
// The main parser checks that only true and false can reach here,
|
|
|
|
|
// but if this was a quoted string input, it could be anything.
|
|
|
|
|
if fromQuoted && string(item) != "null" {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2011-10-31 13:59:23 -04:00
|
|
|
case reflect.Interface, reflect.Ptr, reflect.Map, reflect.Slice:
|
2011-04-08 12:27:58 -04:00
|
|
|
v.Set(reflect.Zero(v.Type()))
|
2012-11-12 15:35:11 -05:00
|
|
|
// otherwise, ignore null for primitives/string
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
case 't', 'f': // true, false
|
2016-10-12 15:55:02 -04:00
|
|
|
value := item[0] == 't'
|
|
|
|
|
// The main parser checks that only true and false can reach here,
|
|
|
|
|
// but if this was a quoted string input, it could be anything.
|
|
|
|
|
if fromQuoted && string(item) != "true" && string(item) != "false" {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Bool:
|
|
|
|
|
v.SetBool(value)
|
|
|
|
|
case reflect.Interface:
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() == 0 {
|
|
|
|
|
v.Set(reflect.ValueOf(value))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
case '"': // string
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
s, ok := d.unquoteBytes(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Slice:
|
2015-04-26 23:52:42 +02:00
|
|
|
if v.Type().Elem().Kind() != reflect.Uint8 {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-02-23 11:32:29 -05:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
|
|
|
|
|
n, err := base64.StdEncoding.Decode(b, s)
|
|
|
|
|
if err != nil {
|
|
|
|
|
d.saveError(err)
|
|
|
|
|
break
|
|
|
|
|
}
|
2015-10-25 22:42:41 +01:00
|
|
|
v.SetBytes(b[:n])
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.String:
|
2019-09-16 19:46:12 +00:00
|
|
|
if v.Type() == numberType && !isValidNumber(string(s)) {
|
|
|
|
|
return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetString(string(s))
|
|
|
|
|
case reflect.Interface:
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() == 0 {
|
|
|
|
|
v.Set(reflect.ValueOf(string(s)))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
default: // number
|
|
|
|
|
if c != '-' && (c < '0' || c > '9') {
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
s := string(item)
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2012-06-25 17:36:09 -04:00
|
|
|
if v.Kind() == reflect.String && v.Type() == numberType {
|
2019-07-03 00:37:05 +02:00
|
|
|
// s must be a valid number, because it's
|
|
|
|
|
// already been tokenized.
|
2012-06-25 17:36:09 -04:00
|
|
|
v.SetString(s)
|
|
|
|
|
break
|
|
|
|
|
}
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-08-28 15:56:10 +00:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Interface:
|
2012-06-25 17:36:09 -04:00
|
|
|
n, err := d.convertNumber(s)
|
2010-04-21 16:40:53 -07:00
|
|
|
if err != nil {
|
2012-06-25 17:36:09 -04:00
|
|
|
d.saveError(err)
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() != 0 {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-25 13:39:36 -04:00
|
|
|
v.Set(reflect.ValueOf(n))
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
|
2011-12-05 15:48:46 -05:00
|
|
|
n, err := strconv.ParseInt(s, 10, 64)
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowInt(n) {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetInt(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
2011-12-05 15:48:46 -05:00
|
|
|
n, err := strconv.ParseUint(s, 10, 64)
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowUint(n) {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetUint(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Float32, reflect.Float64:
|
2011-12-05 15:48:46 -05:00
|
|
|
n, err := strconv.ParseFloat(s, v.Type().Bits())
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowFloat(n) {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetFloat(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// The xxxInterface routines build up a value to be stored
|
2016-03-01 23:21:55 +00:00
|
|
|
// in an empty interface. They are not strictly necessary,
|
2010-04-21 16:40:53 -07:00
|
|
|
// but they avoid the weight of reflection in this common case.
|
|
|
|
|
|
|
|
|
|
// valueInterface is like value but returns interface{}
|
2018-09-12 09:26:31 +02:00
|
|
|
func (d *decodeState) valueInterface() (val interface{}) {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginArray:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.arrayInterface()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginObject:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.objectInterface()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginLiteral:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.literalInterface()
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
return
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// arrayInterface is like array but returns []interface{}.
|
2018-09-12 09:26:31 +02:00
|
|
|
func (d *decodeState) arrayInterface() []interface{} {
|
2013-01-30 09:10:32 -08:00
|
|
|
var v = make([]interface{}, 0)
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
|
|
|
|
// Look ahead for ] - can only happen on first iteration.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-12 09:26:31 +02:00
|
|
|
v = append(v, d.valueInterface())
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Next token must be , or ].
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanArrayValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return v
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// objectInterface is like object but returns map[string]interface{}.
|
2018-09-12 09:26:31 +02:00
|
|
|
func (d *decodeState) objectInterface() map[string]interface{} {
|
2010-04-21 16:40:53 -07:00
|
|
|
m := make(map[string]interface{})
|
|
|
|
|
for {
|
|
|
|
|
// Read opening " of string key or closing }.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
// closing } - can only happen on first iteration.
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanBeginLiteral {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read string key.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
key, ok := d.unquote(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read : before value.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectKey {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Read value.
|
2018-09-12 09:26:31 +02:00
|
|
|
m[key] = d.valueInterface()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Next token must be , or }.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return m
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// literalInterface consumes and returns a literal from d.data[d.off-1:] and
|
|
|
|
|
// it reads the following byte ahead. The first byte of the literal has been
|
|
|
|
|
// read already (that's how the caller knows it's a literal).
|
2018-09-12 09:26:31 +02:00
|
|
|
func (d *decodeState) literalInterface() interface{} {
|
2010-04-21 16:40:53 -07:00
|
|
|
// All bytes inside literal return scanContinue op code.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
switch c := item[0]; c {
|
|
|
|
|
case 'n': // null
|
2018-09-12 09:26:31 +02:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case 't', 'f': // true, false
|
2018-09-12 09:26:31 +02:00
|
|
|
return c == 't'
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case '"': // string
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
s, ok := d.unquote(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return s
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
default: // number
|
|
|
|
|
if c != '-' && (c < '0' || c > '9') {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2012-06-25 17:36:09 -04:00
|
|
|
n, err := d.convertNumber(string(item))
|
2010-04-21 16:40:53 -07:00
|
|
|
if err != nil {
|
2012-06-25 17:36:09 -04:00
|
|
|
d.saveError(err)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return n
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// getu4 decodes \uXXXX from the beginning of s, returning the hex value,
|
|
|
|
|
// or it returns -1.
|
2011-10-25 22:23:54 -07:00
|
|
|
func getu4(s []byte) rune {
|
2010-04-21 16:40:53 -07:00
|
|
|
if len(s) < 6 || s[0] != '\\' || s[1] != 'u' {
|
|
|
|
|
return -1
|
|
|
|
|
}
|
2017-06-03 13:36:54 -07:00
|
|
|
var r rune
|
|
|
|
|
for _, c := range s[2:6] {
|
|
|
|
|
switch {
|
|
|
|
|
case '0' <= c && c <= '9':
|
|
|
|
|
c = c - '0'
|
|
|
|
|
case 'a' <= c && c <= 'f':
|
|
|
|
|
c = c - 'a' + 10
|
|
|
|
|
case 'A' <= c && c <= 'F':
|
|
|
|
|
c = c - 'A' + 10
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
r = r*16 + rune(c)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-03 13:36:54 -07:00
|
|
|
return r
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// unquote converts a quoted JSON string literal s into an actual string t.
|
|
|
|
|
// The rules are different than for Go, so cannot use strconv.Unquote.
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
// The first byte in s must be '"'.
|
|
|
|
|
func (d *decodeState) unquote(s []byte) (t string, ok bool) {
|
|
|
|
|
s, ok = d.unquoteBytes(s)
|
2011-02-23 11:32:29 -05:00
|
|
|
t = string(s)
|
|
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
func (d *decodeState) unquoteBytes(s []byte) (t []byte, ok bool) {
|
|
|
|
|
// We already know that s[0] == '"'. However, we don't know that the
|
|
|
|
|
// closing quote exists in all cases, such as when the string is nested
|
|
|
|
|
// via the ",string" option.
|
|
|
|
|
if len(s) < 2 || s[len(s)-1] != '"' {
|
2019-08-20 17:29:04 -04:00
|
|
|
return
|
|
|
|
|
}
|
2011-02-23 11:32:29 -05:00
|
|
|
s = s[1 : len(s)-1]
|
|
|
|
|
|
encoding/json: avoid work when unquoting strings, take 2
This is a re-submission of CL 151157, since it was reverted in CL 190909
due to an introduced crash found by a fuzzer. The revert CL included
regression tests, while this CL includes a fixed version of the original
change.
In particular, what we forgot in the original optimization was that we
still need the length and trailing quote checks at the beginning of
unquoteBytes. Without those, we could end up in a crash later on.
We can work out how many bytes can be unquoted trivially in
rescanLiteral, which already iterates over a string's bytes.
Removing the extra loop in unquoteBytes simplifies the function and
speeds it up, especially when decoding simple strings, which are common.
While at it, we can remove the check that s[0]=='"', since all call
sites already meet that condition.
name old time/op new time/op delta
CodeDecoder-8 10.6ms ± 2% 10.5ms ± 1% -1.01% (p=0.004 n=20+10)
name old speed new speed delta
CodeDecoder-8 183MB/s ± 2% 185MB/s ± 1% +1.02% (p=0.003 n=20+10)
Updates #28923.
Change-Id: I8c6b13302bcd86a364bc998d72451332c0809cde
Reviewed-on: https://go-review.googlesource.com/c/go/+/190659
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Peter Weinberger <pjw@google.com>
2019-08-21 18:22:24 +02:00
|
|
|
// If there are no unusual characters, no unquoting is needed, so return
|
|
|
|
|
// a slice of the original bytes.
|
|
|
|
|
r := d.safeUnquote
|
|
|
|
|
if r == -1 {
|
2011-02-23 11:32:29 -05:00
|
|
|
return s, true
|
|
|
|
|
}
|
encoding/json: don't mangle strings in an edge case when decoding
The added comment contains some context. The original optimization
assumed that each call to unquoteBytes (or unquote) followed its
corresponding call to rescanLiteral. Otherwise, unquoting a literal
might use d.safeUnquote from another re-scanned literal.
Unfortunately, this assumption is wrong. When decoding {"foo": "bar"}
into a map[T]string where T implements TextUnmarshaler, the sequence of
calls would be as follows:
1) rescanLiteral "foo"
2) unquoteBytes "foo"
3) rescanLiteral "bar"
4) unquoteBytes "foo" (for UnmarshalText)
5) unquoteBytes "bar"
Note that the call to UnmarshalText happens in literalStore, which
repeats the work to unquote the input string literal. But, since that
happens after we've re-scanned "bar", we're using the wrong safeUnquote
field value.
In the added test case, the second string had a non-zero number of safe
bytes, and the first string had none since it was all non-ASCII. Thus,
"safely" unquoting a number of the first string's bytes could cut a rune
in half, and thus mangle the runes.
A rather simple fix, without a full revert, is to only allow one use of
safeUnquote per call to unquoteBytes. Each call to rescanLiteral when
we have a string is soon followed by a call to unquoteBytes, so it's no
longer possible for us to use the wrong index.
Also add a test case from #38126, which is the same underlying bug, but
affecting the ",string" option.
Before the fix, the test would fail, just like in the original two issues:
--- FAIL: TestUnmarshalRescanLiteralMangledUnquote (0.00s)
decode_test.go:2443: Key "开源" does not exist in map: map[开���:12345开源]
decode_test.go:2458: Unmarshal unexpected error: json: invalid use of ,string struct tag, trying to unmarshal "\"aaa\tbbb\"" into string
Fixes #38105.
For #38126.
Change-Id: I761e54924e9a971a4f9eaa70bbf72014bb1476e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/226218
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-03-27 23:56:09 +00:00
|
|
|
// Only perform up to one safe unquote for each re-scanned string
|
|
|
|
|
// literal. In some edge cases, the decoder unquotes a literal a second
|
|
|
|
|
// time, even after another literal has been re-scanned. Thus, only the
|
|
|
|
|
// first unquote can safely use safeUnquote.
|
|
|
|
|
d.safeUnquote = 0
|
2011-02-23 11:32:29 -05:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
b := make([]byte, len(s)+2*utf8.UTFMax)
|
2011-02-23 11:32:29 -05:00
|
|
|
w := copy(b, s[0:r])
|
|
|
|
|
for r < len(s) {
|
2017-08-19 22:33:51 +02:00
|
|
|
// Out of room? Can only happen if s is full of
|
2010-04-21 16:40:53 -07:00
|
|
|
// malformed UTF-8 and we're replacing each
|
|
|
|
|
// byte with RuneError.
|
|
|
|
|
if w >= len(b)-2*utf8.UTFMax {
|
|
|
|
|
nb := make([]byte, (len(b)+utf8.UTFMax)*2)
|
|
|
|
|
copy(nb, b[0:w])
|
|
|
|
|
b = nb
|
|
|
|
|
}
|
|
|
|
|
switch c := s[r]; {
|
|
|
|
|
case c == '\\':
|
|
|
|
|
r++
|
2011-02-23 11:32:29 -05:00
|
|
|
if r >= len(s) {
|
2010-04-21 16:40:53 -07:00
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
switch s[r] {
|
|
|
|
|
default:
|
|
|
|
|
return
|
|
|
|
|
case '"', '\\', '/', '\'':
|
|
|
|
|
b[w] = s[r]
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'b':
|
|
|
|
|
b[w] = '\b'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'f':
|
|
|
|
|
b[w] = '\f'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'n':
|
|
|
|
|
b[w] = '\n'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'r':
|
|
|
|
|
b[w] = '\r'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 't':
|
|
|
|
|
b[w] = '\t'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'u':
|
|
|
|
|
r--
|
2011-10-25 22:23:54 -07:00
|
|
|
rr := getu4(s[r:])
|
|
|
|
|
if rr < 0 {
|
2010-04-21 16:40:53 -07:00
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
r += 6
|
2011-10-25 22:23:54 -07:00
|
|
|
if utf16.IsSurrogate(rr) {
|
|
|
|
|
rr1 := getu4(s[r:])
|
|
|
|
|
if dec := utf16.DecodeRune(rr, rr1); dec != unicode.ReplacementChar {
|
2010-04-21 16:40:53 -07:00
|
|
|
// A valid pair; consume.
|
|
|
|
|
r += 6
|
2010-11-30 16:59:43 -05:00
|
|
|
w += utf8.EncodeRune(b[w:], dec)
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
// Invalid surrogate; fall back to replacement rune.
|
2011-10-25 22:23:54 -07:00
|
|
|
rr = unicode.ReplacementChar
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-10-25 22:23:54 -07:00
|
|
|
w += utf8.EncodeRune(b[w:], rr)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Quote, control characters are invalid.
|
|
|
|
|
case c == '"', c < ' ':
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
// ASCII
|
|
|
|
|
case c < utf8.RuneSelf:
|
|
|
|
|
b[w] = c
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
|
|
|
|
|
// Coerce to well-formed UTF-8.
|
|
|
|
|
default:
|
2011-10-25 22:23:54 -07:00
|
|
|
rr, size := utf8.DecodeRune(s[r:])
|
2010-04-21 16:40:53 -07:00
|
|
|
r += size
|
2011-10-25 22:23:54 -07:00
|
|
|
w += utf8.EncodeRune(b[w:], rr)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2011-02-23 11:32:29 -05:00
|
|
|
return b[0:w], true
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|