2010-04-21 16:40:53 -07:00
|
|
|
// Copyright 2010 The Go Authors. All rights reserved.
|
2009-11-30 13:55:09 -08:00
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
// Represents JSON data structure using native Go types: booleans, floats,
|
|
|
|
|
// strings, arrays, and maps.
|
|
|
|
|
|
2025-04-11 14:19:51 -07:00
|
|
|
//go:build !goexperiment.jsonv2
|
|
|
|
|
|
2009-11-30 13:55:09 -08:00
|
|
|
package json
|
|
|
|
|
|
|
|
|
|
import (
|
2013-08-14 14:56:07 -04:00
|
|
|
"encoding"
|
2011-02-23 11:32:29 -05:00
|
|
|
"encoding/base64"
|
2012-01-12 14:40:29 -08:00
|
|
|
"fmt"
|
2010-04-21 16:40:53 -07:00
|
|
|
"reflect"
|
|
|
|
|
"strconv"
|
encoding/json: fix performance regression in the decoder
In golang.org/cl/145218, a feature was added where the JSON decoder
would keep track of the entire path to a field when reporting an
UnmarshalTypeError.
However, we all failed to check if this affected the benchmarks - myself
included, as a reviewer. Below are the numbers comparing the CL's parent
with itself, once it was merged:
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 28.2ms ± 2% +119.33% (p=0.002 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 69MB/s ± 3% -54.40% (p=0.002 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 109.39MB ± 0% +3891.83% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 168.5k ± 0% +117.30% (p=0.004 n=6+5)
The reason why the decoder got twice as slow is because it now allocated
~40x as many objects, which puts a lot of pressure on the garbage
collector.
The reason is that the CL concatenated strings every time a nested field
was decoded. In other words, practically every field generated garbage
when decoded. This is hugely wasteful, especially considering that the
vast majority of JSON decoding inputs won't return UnmarshalTypeError.
Instead, use a stack of fields, and make sure to always use the same
backing array, to ensure we only need to grow the slice to the maximum
depth once.
The original CL also introduced a bug. The field stack string wasn't
reset to its original state when reaching "d.opcode == scanEndObject",
so the last field in a decoded struct could leak. For example, an added
test decodes a list of structs, and encoding/json before this CL would
fail:
got: cannot unmarshal string into Go struct field T.Ts.Y.Y.Y of type int
want: cannot unmarshal string into Go struct field T.Ts.Y of type int
To fix that, simply reset the stack after decoding every field, even if
it's the last.
Below is the original performance versus this CL. There's a tiny
performance hit, probably due to the append for every decoded field, but
at least we're back to the usual ~150MB/s.
name old time/op new time/op delta
CodeDecoder-8 12.9ms ± 1% 13.0ms ± 1% +1.25% (p=0.009 n=6+6)
name old speed new speed delta
CodeDecoder-8 151MB/s ± 1% 149MB/s ± 1% -1.24% (p=0.009 n=6+6)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.74MB ± 0% 2.74MB ± 0% +0.00% (p=0.002 n=6+6)
name old allocs/op new allocs/op delta
CodeDecoder-8 77.5k ± 0% 77.5k ± 0% +0.00% (p=0.002 n=6+6)
Finally, make all of these benchmarks report allocs by default. The
decoder ones are pretty sensitive to generated garbage, so ReportAllocs
would have made the performance regression more obvious.
Change-Id: I67b50f86b2e72f55539429450c67bfb1a9464b67
Reviewed-on: https://go-review.googlesource.com/c/go/+/167978
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-17 22:45:30 +00:00
|
|
|
"strings"
|
2010-04-21 16:40:53 -07:00
|
|
|
"unicode"
|
2011-11-08 15:40:58 -08:00
|
|
|
"unicode/utf16"
|
|
|
|
|
"unicode/utf8"
|
2024-05-21 23:24:47 -04:00
|
|
|
_ "unsafe" // for linkname
|
2009-11-30 13:55:09 -08:00
|
|
|
)
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// Unmarshal parses the JSON-encoded data and stores the result
|
2017-02-06 18:06:40 -08:00
|
|
|
// in the value pointed to by v. If v is nil or not a pointer,
|
2023-07-07 11:06:05 +02:00
|
|
|
// Unmarshal returns an [InvalidUnmarshalError].
|
2009-12-22 09:47:02 -08:00
|
|
|
//
|
2011-09-19 11:50:41 -04:00
|
|
|
// Unmarshal uses the inverse of the encodings that
|
2023-09-01 01:54:25 -07:00
|
|
|
// [Marshal] uses, allocating maps, slices, and pointers as necessary,
|
2010-04-21 16:40:53 -07:00
|
|
|
// with the following additional rules:
|
|
|
|
|
//
|
2011-09-19 13:19:07 -04:00
|
|
|
// To unmarshal JSON into a pointer, Unmarshal first handles the case of
|
2016-03-01 23:21:55 +00:00
|
|
|
// the JSON being the JSON literal null. In that case, Unmarshal sets
|
|
|
|
|
// the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into
|
|
|
|
|
// the value pointed at by the pointer. If the pointer is nil, Unmarshal
|
2011-09-19 13:19:07 -04:00
|
|
|
// allocates a new value for it to point to.
|
|
|
|
|
//
|
2023-07-07 11:06:05 +02:00
|
|
|
// To unmarshal JSON into a value implementing [Unmarshaler],
|
|
|
|
|
// Unmarshal calls that value's [Unmarshaler.UnmarshalJSON] method, including
|
2016-10-12 16:54:02 -04:00
|
|
|
// when the input is a JSON null.
|
2023-07-07 11:06:05 +02:00
|
|
|
// Otherwise, if the value implements [encoding.TextUnmarshaler]
|
|
|
|
|
// and the input is a JSON quoted string, Unmarshal calls
|
|
|
|
|
// [encoding.TextUnmarshaler.UnmarshalText] with the unquoted form of the string.
|
2016-10-12 16:54:02 -04:00
|
|
|
//
|
2025-06-26 12:19:23 -07:00
|
|
|
// To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
|
|
|
|
|
// the keys used by [Marshal] (either the struct field name or its tag),
|
|
|
|
|
// ignoring case. If multiple struct fields match an object key, an exact case
|
|
|
|
|
// match is preferred over a case-insensitive one.
|
|
|
|
|
//
|
|
|
|
|
// Incoming object members are processed in the order observed. If an object
|
|
|
|
|
// includes duplicate keys, later duplicates will replace or be merged into
|
|
|
|
|
// prior values.
|
2013-01-31 07:49:23 -08:00
|
|
|
//
|
2013-09-09 19:11:05 -04:00
|
|
|
// To unmarshal JSON into an interface value,
|
2011-09-19 13:19:07 -04:00
|
|
|
// Unmarshal stores one of these in the interface value:
|
2010-04-21 16:40:53 -07:00
|
|
|
//
|
2023-09-01 01:54:25 -07:00
|
|
|
// - bool, for JSON booleans
|
|
|
|
|
// - float64, for JSON numbers
|
|
|
|
|
// - string, for JSON strings
|
2024-05-23 20:50:25 -07:00
|
|
|
// - []any, for JSON arrays
|
|
|
|
|
// - map[string]any, for JSON objects
|
2023-09-01 01:54:25 -07:00
|
|
|
// - nil for JSON null
|
2010-04-21 16:40:53 -07:00
|
|
|
//
|
2015-11-25 11:45:16 -05:00
|
|
|
// To unmarshal a JSON array into a slice, Unmarshal resets the slice length
|
|
|
|
|
// to zero and then appends each element to the slice.
|
|
|
|
|
// As a special case, to unmarshal an empty JSON array into a slice,
|
|
|
|
|
// Unmarshal replaces the slice with a new empty slice.
|
2015-07-14 21:32:47 -04:00
|
|
|
//
|
2015-11-25 11:45:16 -05:00
|
|
|
// To unmarshal a JSON array into a Go array, Unmarshal decodes
|
|
|
|
|
// JSON array elements into corresponding Go array elements.
|
|
|
|
|
// If the Go array is smaller than the JSON array,
|
|
|
|
|
// the additional JSON array elements are discarded.
|
|
|
|
|
// If the JSON array is smaller than the Go array,
|
|
|
|
|
// the additional Go array elements are set to zero values.
|
|
|
|
|
//
|
2016-03-08 12:41:35 -08:00
|
|
|
// To unmarshal a JSON object into a map, Unmarshal first establishes a map to
|
2016-04-13 16:51:25 -07:00
|
|
|
// use. If the map is nil, Unmarshal allocates a new map. Otherwise Unmarshal
|
2016-10-20 14:51:58 -04:00
|
|
|
// reuses the existing map, keeping existing entries. Unmarshal then stores
|
|
|
|
|
// key-value pairs from the JSON object into the map. The map's key type must
|
2024-06-18 23:43:21 +08:00
|
|
|
// either be any string type, an integer, or implement [encoding.TextUnmarshaler].
|
2015-07-14 21:32:47 -04:00
|
|
|
//
|
2023-07-07 11:06:05 +02:00
|
|
|
// If the JSON-encoded data contain a syntax error, Unmarshal returns a [SyntaxError].
|
2022-03-08 12:21:00 +01:00
|
|
|
//
|
2010-04-21 16:40:53 -07:00
|
|
|
// If a JSON value is not appropriate for a given target type,
|
|
|
|
|
// or if a JSON number overflows the target type, Unmarshal
|
2015-09-01 17:51:39 +10:00
|
|
|
// skips that field and completes the unmarshaling as best it can.
|
2010-04-21 16:40:53 -07:00
|
|
|
// If no more serious errors are encountered, Unmarshal returns
|
2023-09-01 01:54:25 -07:00
|
|
|
// an [UnmarshalTypeError] describing the earliest such error. In any
|
2017-06-03 21:36:51 +02:00
|
|
|
// case, it's not guaranteed that all the remaining fields following
|
|
|
|
|
// the problematic one will be unmarshaled into the target object.
|
2010-04-21 16:40:53 -07:00
|
|
|
//
|
2014-05-12 23:38:26 -04:00
|
|
|
// The JSON null value unmarshals into an interface, map, pointer, or slice
|
|
|
|
|
// by setting that Go value to nil. Because null is often used in JSON to mean
|
2022-02-03 14:05:46 -05:00
|
|
|
// “not present,” unmarshaling a JSON null into any other Go type has no effect
|
2014-05-12 23:38:26 -04:00
|
|
|
// on the value and produces no error.
|
|
|
|
|
//
|
2013-02-14 14:56:01 -05:00
|
|
|
// When unmarshaling quoted strings, invalid UTF-8 or
|
|
|
|
|
// invalid UTF-16 surrogate pairs are not treated as an error.
|
|
|
|
|
// Instead, they are replaced by the Unicode replacement
|
|
|
|
|
// character U+FFFD.
|
2021-12-01 12:15:45 -05:00
|
|
|
func Unmarshal(data []byte, v any) error {
|
2013-02-14 14:46:15 -05:00
|
|
|
// Check for well-formedness.
|
2010-04-21 16:40:53 -07:00
|
|
|
// Avoids filling out half a data structure
|
|
|
|
|
// before discovering a JSON syntax error.
|
2013-01-30 17:53:48 -08:00
|
|
|
var d decodeState
|
2010-04-21 16:40:53 -07:00
|
|
|
err := checkValid(data, &d.scan)
|
|
|
|
|
if err != nil {
|
|
|
|
|
return err
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2013-01-30 17:53:48 -08:00
|
|
|
d.init(data)
|
2010-04-21 16:40:53 -07:00
|
|
|
return d.unmarshal(v)
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2016-04-13 18:14:52 +00:00
|
|
|
// Unmarshaler is the interface implemented by types
|
2010-04-21 16:40:53 -07:00
|
|
|
// that can unmarshal a JSON description of themselves.
|
2012-10-29 20:58:24 +01:00
|
|
|
// The input can be assumed to be a valid encoding of
|
|
|
|
|
// a JSON value. UnmarshalJSON must copy the JSON data
|
2010-04-21 16:40:53 -07:00
|
|
|
// if it wishes to retain the data after returning.
|
|
|
|
|
type Unmarshaler interface {
|
2011-11-01 22:04:37 -04:00
|
|
|
UnmarshalJSON([]byte) error
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// An UnmarshalTypeError describes a JSON value that was
|
|
|
|
|
// not appropriate for a value of a specific Go type.
|
|
|
|
|
type UnmarshalTypeError struct {
|
2015-01-26 11:51:43 +01:00
|
|
|
Value string // description of JSON value - "bool", "array", "number -5"
|
|
|
|
|
Type reflect.Type // type of Go value it could not be assigned to
|
|
|
|
|
Offset int64 // error occurred after reading Offset bytes
|
2016-01-18 16:26:05 +01:00
|
|
|
Struct string // name of the struct type containing the field
|
2024-08-27 14:35:59 +00:00
|
|
|
Field string // the full path from root node to the field, include embedded struct
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *UnmarshalTypeError) Error() string {
|
2016-01-18 16:26:05 +01:00
|
|
|
if e.Struct != "" || e.Field != "" {
|
|
|
|
|
return "json: cannot unmarshal " + e.Value + " into Go struct field " + e.Struct + "." + e.Field + " of type " + e.Type.String()
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
return "json: cannot unmarshal " + e.Value + " into Go value of type " + e.Type.String()
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2010-09-28 14:40:23 -04:00
|
|
|
// An UnmarshalFieldError describes a JSON object key that
|
|
|
|
|
// led to an unexported (and therefore unwritable) struct field.
|
2017-10-11 14:41:25 -07:00
|
|
|
//
|
|
|
|
|
// Deprecated: No longer used; kept for compatibility.
|
2010-09-28 14:40:23 -04:00
|
|
|
type UnmarshalFieldError struct {
|
|
|
|
|
Key string
|
2011-04-08 12:27:58 -04:00
|
|
|
Type reflect.Type
|
2010-09-28 14:40:23 -04:00
|
|
|
Field reflect.StructField
|
|
|
|
|
}
|
|
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *UnmarshalFieldError) Error() string {
|
2010-09-28 14:40:23 -04:00
|
|
|
return "json: cannot unmarshal object key " + strconv.Quote(e.Key) + " into unexported field " + e.Field.Name + " of type " + e.Type.String()
|
|
|
|
|
}
|
|
|
|
|
|
2023-09-01 01:54:25 -07:00
|
|
|
// An InvalidUnmarshalError describes an invalid argument passed to [Unmarshal].
|
|
|
|
|
// (The argument to [Unmarshal] must be a non-nil pointer.)
|
2010-04-21 16:40:53 -07:00
|
|
|
type InvalidUnmarshalError struct {
|
|
|
|
|
Type reflect.Type
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2011-11-01 22:04:37 -04:00
|
|
|
func (e *InvalidUnmarshalError) Error() string {
|
2010-04-21 16:40:53 -07:00
|
|
|
if e.Type == nil {
|
|
|
|
|
return "json: Unmarshal(nil)"
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2021-10-25 23:00:56 +07:00
|
|
|
if e.Type.Kind() != reflect.Pointer {
|
2010-04-21 16:40:53 -07:00
|
|
|
return "json: Unmarshal(non-pointer " + e.Type.String() + ")"
|
|
|
|
|
}
|
|
|
|
|
return "json: Unmarshal(nil " + e.Type.String() + ")"
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) unmarshal(v any) error {
|
2011-04-25 13:39:36 -04:00
|
|
|
rv := reflect.ValueOf(v)
|
2021-10-25 23:00:56 +07:00
|
|
|
if rv.Kind() != reflect.Pointer || rv.IsNil() {
|
2011-04-25 13:39:36 -04:00
|
|
|
return &InvalidUnmarshalError{reflect.TypeOf(v)}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
d.scan.reset()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2012-12-17 02:34:49 +01:00
|
|
|
// We decode rv not rv.Elem because the Unmarshaler interface
|
2010-11-08 15:33:00 -08:00
|
|
|
// test must be applied at the top level of the value.
|
2020-07-01 11:31:15 +00:00
|
|
|
err := d.value(rv)
|
|
|
|
|
if err != nil {
|
2018-09-03 11:20:23 +01:00
|
|
|
return d.addErrorContext(err)
|
2018-03-03 15:20:26 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
return d.savedError
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2012-06-25 17:36:09 -04:00
|
|
|
// A Number represents a JSON number literal.
|
|
|
|
|
type Number string
|
|
|
|
|
|
|
|
|
|
// String returns the literal text of the number.
|
|
|
|
|
func (n Number) String() string { return string(n) }
|
|
|
|
|
|
|
|
|
|
// Float64 returns the number as a float64.
|
|
|
|
|
func (n Number) Float64() (float64, error) {
|
|
|
|
|
return strconv.ParseFloat(string(n), 64)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Int64 returns the number as an int64.
|
|
|
|
|
func (n Number) Int64() (int64, error) {
|
|
|
|
|
return strconv.ParseInt(string(n), 10, 64)
|
|
|
|
|
}
|
|
|
|
|
|
2020-11-18 12:50:29 -08:00
|
|
|
// An errorContext provides context for type errors during decoding.
|
|
|
|
|
type errorContext struct {
|
|
|
|
|
Struct reflect.Type
|
|
|
|
|
FieldStack []string
|
|
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// decodeState represents the state while decoding a JSON value.
|
|
|
|
|
type decodeState struct {
|
2020-11-18 12:50:29 -08:00
|
|
|
data []byte
|
|
|
|
|
off int // next read offset in data
|
|
|
|
|
opcode int // last read result
|
|
|
|
|
scan scanner
|
|
|
|
|
errorContext *errorContext
|
2017-10-31 13:16:38 -07:00
|
|
|
savedError error
|
|
|
|
|
useNumber bool
|
|
|
|
|
disallowUnknownFields bool
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// readIndex returns the position of the last byte read.
|
|
|
|
|
func (d *decodeState) readIndex() int {
|
|
|
|
|
return d.off - 1
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-12 09:26:31 +02:00
|
|
|
// phasePanicMsg is used as a panic message when we end up with something that
|
|
|
|
|
// shouldn't happen. It can indicate a bug in the JSON decoder, or that
|
|
|
|
|
// something is editing the data slice while the decoder executes.
|
|
|
|
|
const phasePanicMsg = "JSON decoder out of sync - data changing underfoot?"
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
func (d *decodeState) init(data []byte) *decodeState {
|
|
|
|
|
d.data = data
|
|
|
|
|
d.off = 0
|
|
|
|
|
d.savedError = nil
|
2020-11-18 12:50:29 -08:00
|
|
|
if d.errorContext != nil {
|
|
|
|
|
d.errorContext.Struct = nil
|
|
|
|
|
// Reuse the allocated space for the FieldStack slice.
|
|
|
|
|
d.errorContext.FieldStack = d.errorContext.FieldStack[:0]
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
return d
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// saveError saves the first err it is called with,
|
|
|
|
|
// for reporting at the end of the unmarshal.
|
2011-11-01 22:04:37 -04:00
|
|
|
func (d *decodeState) saveError(err error) {
|
2010-04-21 16:40:53 -07:00
|
|
|
if d.savedError == nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.savedError = d.addErrorContext(err)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// addErrorContext returns a new error enhanced with information from d.errorContext
|
|
|
|
|
func (d *decodeState) addErrorContext(err error) error {
|
2020-11-18 12:50:29 -08:00
|
|
|
if d.errorContext != nil && (d.errorContext.Struct != nil || len(d.errorContext.FieldStack) > 0) {
|
2016-01-18 16:26:05 +01:00
|
|
|
switch err := err.(type) {
|
|
|
|
|
case *UnmarshalTypeError:
|
2018-07-08 13:17:56 +01:00
|
|
|
err.Struct = d.errorContext.Struct.Name()
|
2024-08-15 02:11:44 +00:00
|
|
|
fieldStack := d.errorContext.FieldStack
|
|
|
|
|
if err.Field != "" {
|
|
|
|
|
fieldStack = append(fieldStack, err.Field)
|
|
|
|
|
}
|
|
|
|
|
err.Field = strings.Join(fieldStack, ".")
|
2016-01-18 16:26:05 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2016-01-18 16:26:05 +01:00
|
|
|
return err
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// skip scans to the end of what was started.
|
|
|
|
|
func (d *decodeState) skip() {
|
|
|
|
|
s, data, i := &d.scan, d.data, d.off
|
|
|
|
|
depth := len(s.parseState)
|
|
|
|
|
for {
|
|
|
|
|
op := s.step(s, data[i])
|
|
|
|
|
i++
|
|
|
|
|
if len(s.parseState) < depth {
|
|
|
|
|
d.off = i
|
|
|
|
|
d.opcode = op
|
|
|
|
|
return
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// scanNext processes the byte at d.data[d.off].
|
|
|
|
|
func (d *decodeState) scanNext() {
|
2018-07-08 16:14:35 +01:00
|
|
|
if d.off < len(d.data) {
|
|
|
|
|
d.opcode = d.scan.step(&d.scan, d.data[d.off])
|
|
|
|
|
d.off++
|
2010-04-21 16:40:53 -07:00
|
|
|
} else {
|
2018-07-08 16:14:35 +01:00
|
|
|
d.opcode = d.scan.eof()
|
|
|
|
|
d.off = len(d.data) + 1 // mark processed EOF with len+1
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// scanWhile processes bytes in d.data[d.off:] until it
|
|
|
|
|
// receives a scan code not equal to op.
|
2017-06-29 11:51:22 +02:00
|
|
|
func (d *decodeState) scanWhile(op int) {
|
|
|
|
|
s, data, i := &d.scan, d.data, d.off
|
2018-07-08 16:14:35 +01:00
|
|
|
for i < len(data) {
|
2017-06-29 11:51:22 +02:00
|
|
|
newOp := s.step(s, data[i])
|
|
|
|
|
i++
|
2010-04-21 16:40:53 -07:00
|
|
|
if newOp != op {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.opcode = newOp
|
|
|
|
|
d.off = i
|
|
|
|
|
return
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
|
2018-07-08 16:14:35 +01:00
|
|
|
d.off = len(data) + 1 // mark processed EOF with len+1
|
2017-06-29 11:51:22 +02:00
|
|
|
d.opcode = d.scan.eof()
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2018-11-23 16:56:23 +00:00
|
|
|
// rescanLiteral is similar to scanWhile(scanContinue), but it specialises the
|
|
|
|
|
// common case where we're decoding a literal. The decoder scans the input
|
|
|
|
|
// twice, once for syntax errors and to check the length of the value, and the
|
|
|
|
|
// second to perform the decoding.
|
|
|
|
|
//
|
|
|
|
|
// Only in the second step do we use decodeState to tokenize literals, so we
|
|
|
|
|
// know there aren't any syntax errors. We can take advantage of that knowledge,
|
|
|
|
|
// and scan a literal's bytes much more quickly.
|
|
|
|
|
func (d *decodeState) rescanLiteral() {
|
|
|
|
|
data, i := d.data, d.off
|
|
|
|
|
Switch:
|
|
|
|
|
switch data[i-1] {
|
|
|
|
|
case '"': // string
|
|
|
|
|
for ; i < len(data); i++ {
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
switch data[i] {
|
|
|
|
|
case '\\':
|
2018-11-23 16:56:23 +00:00
|
|
|
i++ // escaped char
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
case '"':
|
2018-11-23 16:56:23 +00:00
|
|
|
i++ // tokenize the closing quote too
|
|
|
|
|
break Switch
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-': // number
|
|
|
|
|
for ; i < len(data); i++ {
|
|
|
|
|
switch data[i] {
|
|
|
|
|
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
|
|
|
|
|
'.', 'e', 'E', '+', '-':
|
|
|
|
|
default:
|
|
|
|
|
break Switch
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
case 't': // true
|
|
|
|
|
i += len("rue")
|
|
|
|
|
case 'f': // false
|
|
|
|
|
i += len("alse")
|
|
|
|
|
case 'n': // null
|
|
|
|
|
i += len("ull")
|
|
|
|
|
}
|
|
|
|
|
if i < len(data) {
|
|
|
|
|
d.opcode = stateEndValue(&d.scan, data[i])
|
|
|
|
|
} else {
|
|
|
|
|
d.opcode = scanEnd
|
|
|
|
|
}
|
|
|
|
|
d.off = i + 1
|
|
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// value consumes a JSON value from d.data[d.off-1:], decoding into v, and
|
|
|
|
|
// reads the following byte ahead. If v is invalid, the value is discarded.
|
|
|
|
|
// The first byte of the value has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) value(v reflect.Value) error {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginArray:
|
2017-06-29 11:51:22 +02:00
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.array(v); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
} else {
|
|
|
|
|
d.skip()
|
|
|
|
|
}
|
|
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginObject:
|
2017-06-29 11:51:22 +02:00
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.object(v); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
} else {
|
|
|
|
|
d.skip()
|
|
|
|
|
}
|
|
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case scanBeginLiteral:
|
2017-06-29 11:51:22 +02:00
|
|
|
// All bytes inside literal return scanContinue op code.
|
|
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
|
|
|
|
|
if v.IsValid() {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2014-10-07 11:07:04 -04:00
|
|
|
type unquotedValue struct{}
|
|
|
|
|
|
|
|
|
|
// valueQuoted is like value but decodes a
|
|
|
|
|
// quoted string literal or literal null into an interface value.
|
|
|
|
|
// If it finds anything other than a quoted string literal or null,
|
|
|
|
|
// valueQuoted returns unquotedValue{}.
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) valueQuoted() any {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2014-10-07 11:07:04 -04:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2014-10-07 11:07:04 -04:00
|
|
|
|
2018-07-08 16:14:35 +01:00
|
|
|
case scanBeginArray, scanBeginObject:
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
|
|
|
|
d.scanNext()
|
2014-10-07 11:07:04 -04:00
|
|
|
|
|
|
|
|
case scanBeginLiteral:
|
2018-09-12 09:26:31 +02:00
|
|
|
v := d.literalInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
switch v.(type) {
|
2014-10-07 11:07:04 -04:00
|
|
|
case nil, string:
|
2018-09-12 09:26:31 +02:00
|
|
|
return v
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
|
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return unquotedValue{}
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// indirect walks down v allocating pointers as needed,
|
|
|
|
|
// until it gets to a non-pointer.
|
2019-05-01 14:52:57 +02:00
|
|
|
// If it encounters an Unmarshaler, indirect stops and returns that.
|
|
|
|
|
// If decodingNull is true, indirect stops at the first settable pointer so it
|
|
|
|
|
// can be set to nil.
|
2017-06-29 11:51:22 +02:00
|
|
|
func indirect(v reflect.Value, decodingNull bool) (Unmarshaler, encoding.TextUnmarshaler, reflect.Value) {
|
2018-02-28 13:45:06 -08:00
|
|
|
// Issue #24153 indicates that it is generally not a guaranteed property
|
|
|
|
|
// that you may round-trip a reflect.Value by calling Value.Addr().Elem()
|
|
|
|
|
// and expect the value to still be settable for values derived from
|
|
|
|
|
// unexported embedded struct fields.
|
|
|
|
|
//
|
|
|
|
|
// The logic below effectively does this when it first addresses the value
|
|
|
|
|
// (to satisfy possible pointer methods) and continues to dereference
|
|
|
|
|
// subsequent pointers as necessary.
|
|
|
|
|
//
|
|
|
|
|
// After the first round-trip, we set v back to the original value to
|
|
|
|
|
// preserve the original RW flags contained in reflect.Value.
|
|
|
|
|
v0 := v
|
|
|
|
|
haveAddr := false
|
|
|
|
|
|
2011-08-10 09:26:51 -04:00
|
|
|
// If v is a named type and is addressable,
|
|
|
|
|
// start with its address, so that if the type has pointer methods,
|
|
|
|
|
// we find them.
|
2021-10-25 23:00:56 +07:00
|
|
|
if v.Kind() != reflect.Pointer && v.Type().Name() != "" && v.CanAddr() {
|
2018-02-28 13:45:06 -08:00
|
|
|
haveAddr = true
|
2011-08-10 09:26:51 -04:00
|
|
|
v = v.Addr()
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
2012-06-07 01:48:55 -04:00
|
|
|
// Load value from interface, but only if the result will be
|
|
|
|
|
// usefully addressable.
|
2012-06-25 16:03:18 -04:00
|
|
|
if v.Kind() == reflect.Interface && !v.IsNil() {
|
|
|
|
|
e := v.Elem()
|
2021-10-25 23:00:56 +07:00
|
|
|
if e.Kind() == reflect.Pointer && !e.IsNil() && (!decodingNull || e.Elem().Kind() == reflect.Pointer) {
|
2018-02-28 13:45:06 -08:00
|
|
|
haveAddr = false
|
2012-06-07 01:48:55 -04:00
|
|
|
v = e
|
|
|
|
|
continue
|
|
|
|
|
}
|
2010-04-27 10:24:00 -07:00
|
|
|
}
|
2011-04-18 14:36:22 -04:00
|
|
|
|
2021-10-25 23:00:56 +07:00
|
|
|
if v.Kind() != reflect.Pointer {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
|
2019-05-01 14:52:57 +02:00
|
|
|
if decodingNull && v.CanSet() {
|
2012-06-25 16:03:18 -04:00
|
|
|
break
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2019-04-29 22:57:25 +07:00
|
|
|
|
|
|
|
|
// Prevent infinite loop if v is an interface pointing to its own address:
|
2024-05-23 20:50:25 -07:00
|
|
|
// var v any
|
2019-04-29 22:57:25 +07:00
|
|
|
// v = &v
|
2024-11-07 17:40:08 -08:00
|
|
|
if v.Elem().Kind() == reflect.Interface && v.Elem().Elem().Equal(v) {
|
2019-04-29 22:57:25 +07:00
|
|
|
v = v.Elem()
|
|
|
|
|
break
|
|
|
|
|
}
|
2012-06-25 16:03:18 -04:00
|
|
|
if v.IsNil() {
|
|
|
|
|
v.Set(reflect.New(v.Type().Elem()))
|
2010-04-27 10:24:00 -07:00
|
|
|
}
|
2018-10-11 14:43:47 +01:00
|
|
|
if v.Type().NumMethod() > 0 && v.CanInterface() {
|
2013-08-14 14:56:07 -04:00
|
|
|
if u, ok := v.Interface().(Unmarshaler); ok {
|
|
|
|
|
return u, nil, reflect.Value{}
|
|
|
|
|
}
|
2016-10-12 16:54:02 -04:00
|
|
|
if !decodingNull {
|
|
|
|
|
if u, ok := v.Interface().(encoding.TextUnmarshaler); ok {
|
|
|
|
|
return nil, u, reflect.Value{}
|
|
|
|
|
}
|
2012-06-25 16:03:18 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-02-28 13:45:06 -08:00
|
|
|
|
|
|
|
|
if haveAddr {
|
|
|
|
|
v = v0 // restore original value after round-trip Value.Addr().Elem()
|
|
|
|
|
haveAddr = false
|
|
|
|
|
} else {
|
|
|
|
|
v = v.Elem()
|
|
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
return nil, nil, v
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// array consumes an array from d.data[d.off-1:], decoding into v.
|
|
|
|
|
// The first byte of the array ('[') has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) array(v reflect.Value) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, false)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
|
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return u.UnmarshalJSON(d.data[start:d.off])
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
if ut != nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
|
|
|
|
|
|
|
|
|
// Check type of target.
|
2011-12-19 15:32:06 -05:00
|
|
|
switch v.Kind() {
|
2013-01-14 08:44:16 +01:00
|
|
|
case reflect.Interface:
|
|
|
|
|
if v.NumMethod() == 0 {
|
2017-08-19 22:33:51 +02:00
|
|
|
// Decoding into nil interface? Switch to non-reflect code.
|
2018-09-12 09:26:31 +02:00
|
|
|
ai := d.arrayInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
v.Set(reflect.ValueOf(ai))
|
|
|
|
|
return nil
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
|
|
|
|
// Otherwise it's invalid.
|
|
|
|
|
fallthrough
|
2011-12-19 15:32:06 -05:00
|
|
|
default:
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2020-07-01 11:31:15 +00:00
|
|
|
case reflect.Array, reflect.Slice:
|
|
|
|
|
break
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
i := 0
|
|
|
|
|
for {
|
|
|
|
|
// Look ahead for ] - can only happen on first iteration.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
2023-02-23 13:32:10 -08:00
|
|
|
// Expand slice length, growing the slice if necessary.
|
2011-12-19 15:32:06 -05:00
|
|
|
if v.Kind() == reflect.Slice {
|
|
|
|
|
if i >= v.Cap() {
|
2023-02-23 13:32:10 -08:00
|
|
|
v.Grow(1)
|
2011-12-19 15:32:06 -05:00
|
|
|
}
|
|
|
|
|
if i >= v.Len() {
|
|
|
|
|
v.SetLen(i + 1)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2020-07-01 11:31:15 +00:00
|
|
|
|
2011-12-19 15:32:06 -05:00
|
|
|
if i < v.Len() {
|
2020-07-01 11:31:15 +00:00
|
|
|
// Decode into element.
|
|
|
|
|
if err := d.value(v.Index(i)); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
// Ran out of fixed array: skip.
|
|
|
|
|
if err := d.value(reflect.Value{}); err != nil {
|
|
|
|
|
return err
|
2018-03-03 15:20:26 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
i++
|
|
|
|
|
|
|
|
|
|
// Next token must be , or ].
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanArrayValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2011-11-14 16:03:23 -05:00
|
|
|
|
2011-12-19 15:32:06 -05:00
|
|
|
if i < v.Len() {
|
|
|
|
|
if v.Kind() == reflect.Array {
|
|
|
|
|
for ; i < v.Len(); i++ {
|
2023-02-23 13:28:48 -08:00
|
|
|
v.Index(i).SetZero() // zero remainder of array
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
} else {
|
2023-02-23 13:28:48 -08:00
|
|
|
v.SetLen(i) // truncate the slice
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2020-07-01 11:31:15 +00:00
|
|
|
if i == 0 && v.Kind() == reflect.Slice {
|
2011-12-19 15:32:06 -05:00
|
|
|
v.Set(reflect.MakeSlice(v.Type(), 0, 0))
|
2011-11-14 16:03:23 -05:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2014-10-07 11:07:04 -04:00
|
|
|
var nullLiteral = []byte("null")
|
2023-07-31 15:18:12 -07:00
|
|
|
var textUnmarshalerType = reflect.TypeFor[encoding.TextUnmarshaler]()
|
2014-10-07 11:07:04 -04:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// object consumes an object from d.data[d.off-1:], decoding into v.
|
|
|
|
|
// The first byte ('{') of the object has been read already.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) object(v reflect.Value) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, false)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
|
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return u.UnmarshalJSON(d.data[start:d.off])
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
if ut != nil {
|
2016-01-18 16:26:05 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: v.Type(), Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
2018-07-08 16:14:35 +01:00
|
|
|
t := v.Type()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-08-19 22:33:51 +02:00
|
|
|
// Decoding into nil interface? Switch to non-reflect code.
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.Kind() == reflect.Interface && v.NumMethod() == 0 {
|
2018-09-12 09:26:31 +02:00
|
|
|
oi := d.objectInterface()
|
2018-03-03 15:20:26 +01:00
|
|
|
v.Set(reflect.ValueOf(oi))
|
|
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
encoding/json: index names for the struct decoder
In the common case, structs have a handful of fields and most inputs
match struct field names exactly.
The previous code would do a linear search over the fields, stopping at
the first exact match, and otherwise using the first case insensitive
match.
This is unfortunate, because it means that for the common case, we'd do
a linear search with bytes.Equal. Even for structs with only two or
three fields, that is pretty wasteful.
Worse even, up until the exact match was found via the linear search,
all previous fields would run their equalFold functions, which aren't
cheap even in the simple case.
Instead, cache a map along with the field list that indexes the fields
by their name. This way, a case sensitive field search doesn't involve a
linear search, nor does it involve any equalFold func calls.
This patch should also slightly speed up cases where there's a case
insensitive match but not a case sensitive one, as then we'd avoid
calling bytes.Equal on all the fields. Though that's not a common case,
and there are no benchmarks for it.
name old time/op new time/op delta
CodeDecoder-8 11.0ms ± 0% 10.6ms ± 1% -4.42% (p=0.000 n=9+10)
name old speed new speed delta
CodeDecoder-8 176MB/s ± 0% 184MB/s ± 1% +4.62% (p=0.000 n=9+10)
name old alloc/op new alloc/op delta
CodeDecoder-8 2.28MB ± 0% 2.28MB ± 0% ~ (p=0.725 n=10+10)
name old allocs/op new allocs/op delta
CodeDecoder-8 76.9k ± 0% 76.9k ± 0% ~ (all equal)
Updates #28923.
Change-Id: I9929c1f06c76505e5b96914199315dbdaae5dc76
Reviewed-on: https://go-review.googlesource.com/c/go/+/172918
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-22 23:36:43 +07:00
|
|
|
var fields structFields
|
2018-07-08 16:14:35 +01:00
|
|
|
|
2016-03-08 12:41:35 -08:00
|
|
|
// Check type of target:
|
|
|
|
|
// struct or
|
2016-04-13 16:51:25 -07:00
|
|
|
// map[T1]T2 where T1 is string, an integer type,
|
|
|
|
|
// or an encoding.TextUnmarshaler
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
|
|
|
|
case reflect.Map:
|
2016-04-13 16:51:25 -07:00
|
|
|
// Map key must either have string kind, have an integer kind,
|
|
|
|
|
// or be an encoding.TextUnmarshaler.
|
|
|
|
|
switch t.Key().Kind() {
|
|
|
|
|
case reflect.String,
|
|
|
|
|
reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64,
|
|
|
|
|
reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
|
|
|
|
default:
|
2021-10-25 23:00:56 +07:00
|
|
|
if !reflect.PointerTo(t.Key()).Implements(textUnmarshalerType) {
|
2018-07-08 16:14:35 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2016-04-13 16:51:25 -07:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2012-12-17 02:34:49 +01:00
|
|
|
if v.IsNil() {
|
|
|
|
|
v.Set(reflect.MakeMap(t))
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Struct:
|
2018-07-08 16:14:35 +01:00
|
|
|
fields = cachedTypeFields(t)
|
2016-01-18 16:26:05 +01:00
|
|
|
// ok
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-07-08 16:14:35 +01:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
|
2017-06-29 11:51:22 +02:00
|
|
|
d.skip()
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2020-05-20 17:03:31 +00:00
|
|
|
var mapElem reflect.Value
|
2020-11-18 12:50:29 -08:00
|
|
|
var origErrorContext errorContext
|
|
|
|
|
if d.errorContext != nil {
|
|
|
|
|
origErrorContext = *d.errorContext
|
|
|
|
|
}
|
2013-08-29 14:45:59 +10:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
|
|
|
|
// Read opening " of string key or closing }.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
// closing } - can only happen on first iteration.
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanBeginLiteral {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2013-12-18 07:30:21 -08:00
|
|
|
// Read key.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
key, ok := unquoteBytes(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2010-04-27 10:24:00 -07:00
|
|
|
// Figure out field corresponding to key.
|
2020-05-20 17:03:31 +00:00
|
|
|
var subv reflect.Value
|
2013-08-29 14:45:59 +10:00
|
|
|
destring := false // whether the value is wrapped in a string to be decoded first
|
|
|
|
|
|
|
|
|
|
if v.Kind() == reflect.Map {
|
2020-05-20 17:03:31 +00:00
|
|
|
elemType := t.Elem()
|
|
|
|
|
if !mapElem.IsValid() {
|
|
|
|
|
mapElem = reflect.New(elemType).Elem()
|
|
|
|
|
} else {
|
2023-02-23 13:28:48 -08:00
|
|
|
mapElem.SetZero()
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
2020-05-20 17:03:31 +00:00
|
|
|
subv = mapElem
|
2013-08-29 14:45:59 +10:00
|
|
|
} else {
|
2023-02-20 11:26:10 -08:00
|
|
|
f := fields.byExactName[string(key)]
|
|
|
|
|
if f == nil {
|
|
|
|
|
f = fields.byFoldedName[string(foldName(key))]
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
|
|
|
|
if f != nil {
|
|
|
|
|
subv = v
|
|
|
|
|
destring = f.quoted
|
2024-08-27 14:35:59 +00:00
|
|
|
if d.errorContext == nil {
|
|
|
|
|
d.errorContext = new(errorContext)
|
|
|
|
|
}
|
|
|
|
|
for i, ind := range f.index {
|
2021-10-25 23:00:56 +07:00
|
|
|
if subv.Kind() == reflect.Pointer {
|
2013-08-29 14:45:59 +10:00
|
|
|
if subv.IsNil() {
|
2017-12-05 22:38:36 -08:00
|
|
|
// If a struct embeds a pointer to an unexported type,
|
|
|
|
|
// it is not possible to set a newly allocated value
|
|
|
|
|
// since the field is unexported.
|
|
|
|
|
//
|
|
|
|
|
// See https://golang.org/issue/21357
|
|
|
|
|
if !subv.CanSet() {
|
|
|
|
|
d.saveError(fmt.Errorf("json: cannot set embedded pointer to unexported struct: %v", subv.Type().Elem()))
|
|
|
|
|
// Invalidate subv to ensure d.value(subv) skips over
|
|
|
|
|
// the JSON value without assigning it to subv.
|
|
|
|
|
subv = reflect.Value{}
|
|
|
|
|
destring = false
|
|
|
|
|
break
|
|
|
|
|
}
|
2013-08-29 14:45:59 +10:00
|
|
|
subv.Set(reflect.New(subv.Type().Elem()))
|
|
|
|
|
}
|
|
|
|
|
subv = subv.Elem()
|
|
|
|
|
}
|
2024-08-27 14:35:59 +00:00
|
|
|
if i < len(f.index)-1 {
|
|
|
|
|
d.errorContext.FieldStack = append(
|
|
|
|
|
d.errorContext.FieldStack,
|
|
|
|
|
subv.Type().Field(ind).Name,
|
|
|
|
|
)
|
|
|
|
|
}
|
|
|
|
|
subv = subv.Field(ind)
|
2020-11-18 12:50:29 -08:00
|
|
|
}
|
2018-07-08 16:14:35 +01:00
|
|
|
d.errorContext.Struct = t
|
2024-08-27 14:35:59 +00:00
|
|
|
d.errorContext.FieldStack = append(d.errorContext.FieldStack, f.name)
|
2017-10-31 13:16:38 -07:00
|
|
|
} else if d.disallowUnknownFields {
|
|
|
|
|
d.saveError(fmt.Errorf("json: unknown field %q", key))
|
2013-08-29 14:45:59 +10:00
|
|
|
}
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Read : before value.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectKey {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-08-29 12:46:32 -07:00
|
|
|
if destring {
|
2018-09-12 09:26:31 +02:00
|
|
|
switch qv := d.valueQuoted().(type) {
|
2014-10-07 11:07:04 -04:00
|
|
|
case nil:
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore(nullLiteral, subv, false); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2014-10-07 11:07:04 -04:00
|
|
|
case string:
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.literalStore([]byte(qv), subv, true); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2014-10-07 11:07:04 -04:00
|
|
|
default:
|
2014-12-27 20:52:17 +01:00
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
|
2014-10-07 11:07:04 -04:00
|
|
|
}
|
2011-08-29 12:46:32 -07:00
|
|
|
} else {
|
2018-03-03 15:20:26 +01:00
|
|
|
if err := d.value(subv); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
2011-08-29 12:46:32 -07:00
|
|
|
}
|
2012-12-30 15:40:42 +11:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// Write value back to map;
|
|
|
|
|
// if using struct, subv points into struct already.
|
2020-05-20 17:03:31 +00:00
|
|
|
if v.Kind() == reflect.Map {
|
|
|
|
|
kt := t.Key()
|
|
|
|
|
var kv reflect.Value
|
2023-07-30 15:02:15 +08:00
|
|
|
if reflect.PointerTo(kt).Implements(textUnmarshalerType) {
|
2020-05-20 17:03:31 +00:00
|
|
|
kv = reflect.New(kt)
|
|
|
|
|
if err := d.literalStore(item, kv, true); err != nil {
|
|
|
|
|
return err
|
|
|
|
|
}
|
|
|
|
|
kv = kv.Elem()
|
2023-07-30 15:02:15 +08:00
|
|
|
} else {
|
2020-05-20 17:03:31 +00:00
|
|
|
switch kt.Kind() {
|
2023-07-30 15:02:15 +08:00
|
|
|
case reflect.String:
|
|
|
|
|
kv = reflect.New(kt).Elem()
|
|
|
|
|
kv.SetString(string(key))
|
2020-05-20 17:03:31 +00:00
|
|
|
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
|
|
|
|
|
s := string(key)
|
|
|
|
|
n, err := strconv.ParseInt(s, 10, 64)
|
2024-02-28 23:50:37 +00:00
|
|
|
if err != nil || kt.OverflowInt(n) {
|
2020-05-20 17:03:31 +00:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
|
|
|
|
|
break
|
|
|
|
|
}
|
2023-07-30 15:02:15 +08:00
|
|
|
kv = reflect.New(kt).Elem()
|
|
|
|
|
kv.SetInt(n)
|
2020-05-20 17:03:31 +00:00
|
|
|
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
|
|
|
|
s := string(key)
|
|
|
|
|
n, err := strconv.ParseUint(s, 10, 64)
|
2024-02-28 23:50:37 +00:00
|
|
|
if err != nil || kt.OverflowUint(n) {
|
2020-05-20 17:03:31 +00:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
|
|
|
|
|
break
|
|
|
|
|
}
|
2023-07-30 15:02:15 +08:00
|
|
|
kv = reflect.New(kt).Elem()
|
|
|
|
|
kv.SetUint(n)
|
2020-05-20 17:03:31 +00:00
|
|
|
default:
|
|
|
|
|
panic("json: Unexpected key type") // should never occur
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if kv.IsValid() {
|
|
|
|
|
v.SetMapIndex(kv, subv)
|
|
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Next token must be , or }.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
2020-11-18 12:50:29 -08:00
|
|
|
if d.errorContext != nil {
|
|
|
|
|
// Reset errorContext to its original state.
|
|
|
|
|
// Keep the same underlying array for FieldStack, to reuse the
|
|
|
|
|
// space and avoid unnecessary allocs.
|
|
|
|
|
d.errorContext.FieldStack = d.errorContext.FieldStack[:len(origErrorContext.FieldStack)]
|
|
|
|
|
d.errorContext.Struct = origErrorContext.Struct
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2012-06-25 17:36:09 -04:00
|
|
|
// convertNumber converts the number literal s to a float64 or a Number
|
|
|
|
|
// depending on the setting of d.useNumber.
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) convertNumber(s string) (any, error) {
|
2012-06-25 17:36:09 -04:00
|
|
|
if d.useNumber {
|
|
|
|
|
return Number(s), nil
|
|
|
|
|
}
|
|
|
|
|
f, err := strconv.ParseFloat(s, 64)
|
|
|
|
|
if err != nil {
|
2023-07-31 15:18:12 -07:00
|
|
|
return nil, &UnmarshalTypeError{Value: "number " + s, Type: reflect.TypeFor[float64](), Offset: int64(d.off)}
|
2012-06-25 17:36:09 -04:00
|
|
|
}
|
|
|
|
|
return f, nil
|
|
|
|
|
}
|
|
|
|
|
|
2023-07-31 15:18:12 -07:00
|
|
|
var numberType = reflect.TypeFor[Number]()
|
2012-06-25 17:36:09 -04:00
|
|
|
|
2011-08-29 12:46:32 -07:00
|
|
|
// literalStore decodes a literal stored in item into v.
|
2012-01-12 14:40:29 -08:00
|
|
|
//
|
|
|
|
|
// fromQuoted indicates whether this literal came from unwrapping a
|
|
|
|
|
// string from the ",string" struct tag option. this is used only to
|
|
|
|
|
// produce more helpful error messages.
|
2018-03-03 15:20:26 +01:00
|
|
|
func (d *decodeState) literalStore(item []byte, v reflect.Value, fromQuoted bool) error {
|
2010-04-21 16:40:53 -07:00
|
|
|
// Check for unmarshaler.
|
2012-05-03 17:35:44 -04:00
|
|
|
if len(item) == 0 {
|
2023-08-02 12:30:56 +00:00
|
|
|
// Empty string given.
|
2012-05-03 17:35:44 -04:00
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2012-05-03 17:35:44 -04:00
|
|
|
}
|
2016-10-12 16:54:02 -04:00
|
|
|
isNull := item[0] == 'n' // null
|
2017-06-29 11:51:22 +02:00
|
|
|
u, ut, pv := indirect(v, isNull)
|
2013-08-14 14:56:07 -04:00
|
|
|
if u != nil {
|
2018-04-19 21:56:45 +03:00
|
|
|
return u.UnmarshalJSON(item)
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
|
|
|
|
if ut != nil {
|
|
|
|
|
if item[0] != '"' {
|
|
|
|
|
if fromQuoted {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
2018-09-11 22:09:00 +02:00
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
val := "number"
|
|
|
|
|
switch item[0] {
|
|
|
|
|
case 'n':
|
|
|
|
|
val = "null"
|
|
|
|
|
case 't', 'f':
|
|
|
|
|
val = "bool"
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-09-11 22:09:00 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: val, Type: v.Type(), Offset: int64(d.readIndex())})
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
s, ok := unquoteBytes(item)
|
2013-08-14 14:56:07 -04:00
|
|
|
if !ok {
|
|
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2013-08-14 14:56:07 -04:00
|
|
|
}
|
2018-04-19 21:56:45 +03:00
|
|
|
return ut.UnmarshalText(s)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2013-08-14 14:56:07 -04:00
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
v = pv
|
|
|
|
|
|
|
|
|
|
switch c := item[0]; c {
|
|
|
|
|
case 'n': // null
|
2016-10-12 15:55:02 -04:00
|
|
|
// The main parser checks that only true and false can reach here,
|
|
|
|
|
// but if this was a quoted string input, it could be anything.
|
|
|
|
|
if fromQuoted && string(item) != "null" {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2021-10-25 23:00:56 +07:00
|
|
|
case reflect.Interface, reflect.Pointer, reflect.Map, reflect.Slice:
|
2023-02-23 13:28:48 -08:00
|
|
|
v.SetZero()
|
2012-11-12 15:35:11 -05:00
|
|
|
// otherwise, ignore null for primitives/string
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
case 't', 'f': // true, false
|
2016-10-12 15:55:02 -04:00
|
|
|
value := item[0] == 't'
|
|
|
|
|
// The main parser checks that only true and false can reach here,
|
|
|
|
|
// but if this was a quoted string input, it could be anything.
|
|
|
|
|
if fromQuoted && string(item) != "true" && string(item) != "false" {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
|
|
|
|
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Bool:
|
|
|
|
|
v.SetBool(value)
|
|
|
|
|
case reflect.Interface:
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() == 0 {
|
|
|
|
|
v.Set(reflect.ValueOf(value))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
case '"': // string
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
s, ok := unquoteBytes(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Slice:
|
2015-04-26 23:52:42 +02:00
|
|
|
if v.Type().Elem().Kind() != reflect.Uint8 {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-02-23 11:32:29 -05:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
|
|
|
|
|
n, err := base64.StdEncoding.Decode(b, s)
|
|
|
|
|
if err != nil {
|
|
|
|
|
d.saveError(err)
|
|
|
|
|
break
|
|
|
|
|
}
|
2015-10-25 22:42:41 +01:00
|
|
|
v.SetBytes(b[:n])
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.String:
|
2024-03-30 03:56:30 +00:00
|
|
|
t := string(s)
|
|
|
|
|
if v.Type() == numberType && !isValidNumber(t) {
|
2019-09-16 19:46:12 +00:00
|
|
|
return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
|
|
|
|
|
}
|
2024-03-30 03:56:30 +00:00
|
|
|
v.SetString(t)
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Interface:
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() == 0 {
|
|
|
|
|
v.Set(reflect.ValueOf(string(s)))
|
|
|
|
|
} else {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
}
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
default: // number
|
|
|
|
|
if c != '-' && (c < '0' || c > '9') {
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
switch v.Kind() {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2012-06-25 17:36:09 -04:00
|
|
|
if v.Kind() == reflect.String && v.Type() == numberType {
|
2019-07-03 00:37:05 +02:00
|
|
|
// s must be a valid number, because it's
|
|
|
|
|
// already been tokenized.
|
2023-08-11 17:20:36 +08:00
|
|
|
v.SetString(string(item))
|
2012-06-25 17:36:09 -04:00
|
|
|
break
|
|
|
|
|
}
|
2012-01-12 14:40:29 -08:00
|
|
|
if fromQuoted {
|
2018-03-03 15:20:26 +01:00
|
|
|
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
|
2012-01-12 14:40:29 -08:00
|
|
|
}
|
2018-08-28 15:56:10 +00:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Interface:
|
2023-08-11 17:20:36 +08:00
|
|
|
n, err := d.convertNumber(string(item))
|
2010-04-21 16:40:53 -07:00
|
|
|
if err != nil {
|
2012-06-25 17:36:09 -04:00
|
|
|
d.saveError(err)
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2013-01-14 08:44:16 +01:00
|
|
|
if v.NumMethod() != 0 {
|
2017-06-29 11:51:22 +02:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
|
2013-01-14 08:44:16 +01:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-25 13:39:36 -04:00
|
|
|
v.Set(reflect.ValueOf(n))
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
|
2023-08-11 17:20:36 +08:00
|
|
|
n, err := strconv.ParseInt(string(item), 10, 64)
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowInt(n) {
|
2023-08-11 17:20:36 +08:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + string(item), Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetInt(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
|
2023-08-11 17:20:36 +08:00
|
|
|
n, err := strconv.ParseUint(string(item), 10, 64)
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowUint(n) {
|
2023-08-11 17:20:36 +08:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + string(item), Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetUint(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2011-04-08 12:27:58 -04:00
|
|
|
case reflect.Float32, reflect.Float64:
|
2023-08-11 17:20:36 +08:00
|
|
|
n, err := strconv.ParseFloat(string(item), v.Type().Bits())
|
2011-04-08 12:27:58 -04:00
|
|
|
if err != nil || v.OverflowFloat(n) {
|
2023-08-11 17:20:36 +08:00
|
|
|
d.saveError(&UnmarshalTypeError{Value: "number " + string(item), Type: v.Type(), Offset: int64(d.readIndex())})
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2011-04-08 12:27:58 -04:00
|
|
|
v.SetFloat(n)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2018-03-03 15:20:26 +01:00
|
|
|
return nil
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
// The xxxInterface routines build up a value to be stored
|
2016-03-01 23:21:55 +00:00
|
|
|
// in an empty interface. They are not strictly necessary,
|
2010-04-21 16:40:53 -07:00
|
|
|
// but they avoid the weight of reflection in this common case.
|
|
|
|
|
|
2024-05-23 20:50:25 -07:00
|
|
|
// valueInterface is like value but returns any.
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) valueInterface() (val any) {
|
2017-06-29 11:51:22 +02:00
|
|
|
switch d.opcode {
|
2010-04-21 16:40:53 -07:00
|
|
|
default:
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginArray:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.arrayInterface()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginObject:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.objectInterface()
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanNext()
|
2010-04-21 16:40:53 -07:00
|
|
|
case scanBeginLiteral:
|
2018-09-12 09:26:31 +02:00
|
|
|
val = d.literalInterface()
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
return
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2024-05-23 20:50:25 -07:00
|
|
|
// arrayInterface is like array but returns []any.
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) arrayInterface() []any {
|
|
|
|
|
var v = make([]any, 0)
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
|
|
|
|
// Look ahead for ] - can only happen on first iteration.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-12 09:26:31 +02:00
|
|
|
v = append(v, d.valueInterface())
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Next token must be , or ].
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndArray {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanArrayValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return v
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
2024-05-23 20:50:25 -07:00
|
|
|
// objectInterface is like object but returns map[string]any.
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) objectInterface() map[string]any {
|
|
|
|
|
m := make(map[string]any)
|
2010-04-21 16:40:53 -07:00
|
|
|
for {
|
|
|
|
|
// Read opening " of string key or closing }.
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
// closing } - can only happen on first iteration.
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanBeginLiteral {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read string key.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
key, ok := unquote(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Read : before value.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectKey {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
d.scanWhile(scanSkipSpace)
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Read value.
|
2018-09-12 09:26:31 +02:00
|
|
|
m[key] = d.valueInterface()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
// Next token must be , or }.
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode == scanSkipSpace {
|
|
|
|
|
d.scanWhile(scanSkipSpace)
|
|
|
|
|
}
|
|
|
|
|
if d.opcode == scanEndObject {
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
2017-06-29 11:51:22 +02:00
|
|
|
if d.opcode != scanObjectValue {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return m
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
|
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
// literalInterface consumes and returns a literal from d.data[d.off-1:] and
|
|
|
|
|
// it reads the following byte ahead. The first byte of the literal has been
|
|
|
|
|
// read already (that's how the caller knows it's a literal).
|
2021-12-01 12:15:45 -05:00
|
|
|
func (d *decodeState) literalInterface() any {
|
2010-04-21 16:40:53 -07:00
|
|
|
// All bytes inside literal return scanContinue op code.
|
2017-06-29 11:51:22 +02:00
|
|
|
start := d.readIndex()
|
2018-11-23 16:56:23 +00:00
|
|
|
d.rescanLiteral()
|
2010-04-21 16:40:53 -07:00
|
|
|
|
2017-06-29 11:51:22 +02:00
|
|
|
item := d.data[start:d.readIndex()]
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
switch c := item[0]; c {
|
|
|
|
|
case 'n': // null
|
2018-09-12 09:26:31 +02:00
|
|
|
return nil
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case 't', 'f': // true, false
|
2018-09-12 09:26:31 +02:00
|
|
|
return c == 't'
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
case '"': // string
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
s, ok := unquote(item)
|
2010-04-21 16:40:53 -07:00
|
|
|
if !ok {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return s
|
2010-04-21 16:40:53 -07:00
|
|
|
|
|
|
|
|
default: // number
|
|
|
|
|
if c != '-' && (c < '0' || c > '9') {
|
2018-09-12 09:26:31 +02:00
|
|
|
panic(phasePanicMsg)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2012-06-25 17:36:09 -04:00
|
|
|
n, err := d.convertNumber(string(item))
|
2010-04-21 16:40:53 -07:00
|
|
|
if err != nil {
|
2012-06-25 17:36:09 -04:00
|
|
|
d.saveError(err)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2018-09-12 09:26:31 +02:00
|
|
|
return n
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// getu4 decodes \uXXXX from the beginning of s, returning the hex value,
|
|
|
|
|
// or it returns -1.
|
2011-10-25 22:23:54 -07:00
|
|
|
func getu4(s []byte) rune {
|
2010-04-21 16:40:53 -07:00
|
|
|
if len(s) < 6 || s[0] != '\\' || s[1] != 'u' {
|
|
|
|
|
return -1
|
|
|
|
|
}
|
2017-06-03 13:36:54 -07:00
|
|
|
var r rune
|
|
|
|
|
for _, c := range s[2:6] {
|
|
|
|
|
switch {
|
|
|
|
|
case '0' <= c && c <= '9':
|
|
|
|
|
c = c - '0'
|
|
|
|
|
case 'a' <= c && c <= 'f':
|
|
|
|
|
c = c - 'a' + 10
|
|
|
|
|
case 'A' <= c && c <= 'F':
|
|
|
|
|
c = c - 'A' + 10
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
r = r*16 + rune(c)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2017-06-03 13:36:54 -07:00
|
|
|
return r
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// unquote converts a quoted JSON string literal s into an actual string t.
|
|
|
|
|
// The rules are different than for Go, so cannot use strconv.Unquote.
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
func unquote(s []byte) (t string, ok bool) {
|
|
|
|
|
s, ok = unquoteBytes(s)
|
2011-02-23 11:32:29 -05:00
|
|
|
t = string(s)
|
|
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
|
2024-05-21 23:24:47 -04:00
|
|
|
// unquoteBytes should be an internal detail,
|
|
|
|
|
// but widely used packages access it using linkname.
|
|
|
|
|
// Notable members of the hall of shame include:
|
|
|
|
|
// - github.com/bytedance/sonic
|
|
|
|
|
//
|
|
|
|
|
// Do not remove or change the type signature.
|
|
|
|
|
// See go.dev/issue/67401.
|
|
|
|
|
//
|
|
|
|
|
//go:linkname unquoteBytes
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
func unquoteBytes(s []byte) (t []byte, ok bool) {
|
|
|
|
|
if len(s) < 2 || s[0] != '"' || s[len(s)-1] != '"' {
|
2019-08-20 17:29:04 -04:00
|
|
|
return
|
|
|
|
|
}
|
2011-02-23 11:32:29 -05:00
|
|
|
s = s[1 : len(s)-1]
|
|
|
|
|
|
encoding/json: revert "avoid work when unquoting strings, take 2"
This reverts golang.org/cl/190659 and golang.org/cl/226218, minus the
regression tests in the latter.
The original work happened in golang.org/cl/151157, which was reverted
in golang.org/cl/190909 due to a crash found by fuzzing.
We tried a second time in golang.org/cl/190659, which shipped with Go
1.14. A bug was found, where strings would be mangled in certain edge
cases. The fix for that was golang.org/cl/226218, which was backported
into Go 1.14.4.
Unfortunately, a second regression was just reported in #39555, which is
a similar case of strings getting mangled when decoding under certain
conditions. It would be possible to come up with another small patch to
fix that edge case, but instead, let's just revert the entire
optimization, as it has proved to do more harm than good. Moreover, it's
hard to argue or prove that there will be no more such regressions.
However, all the work wasn't for nothing. First, we learned that the way
the decoder unquotes tokenized strings isn't simple; initially, we had
wrongly assumed that each string was unquoted exactly once and in order.
Second, we have gained a number of regression tests which will be useful
to prevent the same mistakes in the future, including the test cases we
add in this CL.
Fixes #39555.
Change-Id: I66a6919c2dd6d9789232482ba6cf3814eaa70f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/237838
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2020-06-14 22:09:18 +01:00
|
|
|
// Check for unusual characters. If there are none,
|
|
|
|
|
// then no unquoting is needed, so return a slice of the
|
|
|
|
|
// original bytes.
|
|
|
|
|
r := 0
|
|
|
|
|
for r < len(s) {
|
|
|
|
|
c := s[r]
|
|
|
|
|
if c == '\\' || c == '"' || c < ' ' {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
rr, size := utf8.DecodeRune(s[r:])
|
|
|
|
|
if rr == utf8.RuneError && size == 1 {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
r += size
|
|
|
|
|
}
|
|
|
|
|
if r == len(s) {
|
2011-02-23 11:32:29 -05:00
|
|
|
return s, true
|
|
|
|
|
}
|
|
|
|
|
|
2010-04-21 16:40:53 -07:00
|
|
|
b := make([]byte, len(s)+2*utf8.UTFMax)
|
2011-02-23 11:32:29 -05:00
|
|
|
w := copy(b, s[0:r])
|
|
|
|
|
for r < len(s) {
|
2017-08-19 22:33:51 +02:00
|
|
|
// Out of room? Can only happen if s is full of
|
2010-04-21 16:40:53 -07:00
|
|
|
// malformed UTF-8 and we're replacing each
|
|
|
|
|
// byte with RuneError.
|
|
|
|
|
if w >= len(b)-2*utf8.UTFMax {
|
|
|
|
|
nb := make([]byte, (len(b)+utf8.UTFMax)*2)
|
|
|
|
|
copy(nb, b[0:w])
|
|
|
|
|
b = nb
|
|
|
|
|
}
|
|
|
|
|
switch c := s[r]; {
|
|
|
|
|
case c == '\\':
|
|
|
|
|
r++
|
2011-02-23 11:32:29 -05:00
|
|
|
if r >= len(s) {
|
2010-04-21 16:40:53 -07:00
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
switch s[r] {
|
|
|
|
|
default:
|
|
|
|
|
return
|
|
|
|
|
case '"', '\\', '/', '\'':
|
|
|
|
|
b[w] = s[r]
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'b':
|
|
|
|
|
b[w] = '\b'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'f':
|
|
|
|
|
b[w] = '\f'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'n':
|
|
|
|
|
b[w] = '\n'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'r':
|
|
|
|
|
b[w] = '\r'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 't':
|
|
|
|
|
b[w] = '\t'
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
case 'u':
|
|
|
|
|
r--
|
2011-10-25 22:23:54 -07:00
|
|
|
rr := getu4(s[r:])
|
|
|
|
|
if rr < 0 {
|
2010-04-21 16:40:53 -07:00
|
|
|
return
|
|
|
|
|
}
|
|
|
|
|
r += 6
|
2011-10-25 22:23:54 -07:00
|
|
|
if utf16.IsSurrogate(rr) {
|
|
|
|
|
rr1 := getu4(s[r:])
|
|
|
|
|
if dec := utf16.DecodeRune(rr, rr1); dec != unicode.ReplacementChar {
|
2010-04-21 16:40:53 -07:00
|
|
|
// A valid pair; consume.
|
|
|
|
|
r += 6
|
2010-11-30 16:59:43 -05:00
|
|
|
w += utf8.EncodeRune(b[w:], dec)
|
2010-04-21 16:40:53 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
// Invalid surrogate; fall back to replacement rune.
|
2011-10-25 22:23:54 -07:00
|
|
|
rr = unicode.ReplacementChar
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2011-10-25 22:23:54 -07:00
|
|
|
w += utf8.EncodeRune(b[w:], rr)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Quote, control characters are invalid.
|
|
|
|
|
case c == '"', c < ' ':
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
// ASCII
|
|
|
|
|
case c < utf8.RuneSelf:
|
|
|
|
|
b[w] = c
|
|
|
|
|
r++
|
|
|
|
|
w++
|
|
|
|
|
|
|
|
|
|
// Coerce to well-formed UTF-8.
|
|
|
|
|
default:
|
2011-10-25 22:23:54 -07:00
|
|
|
rr, size := utf8.DecodeRune(s[r:])
|
2010-04-21 16:40:53 -07:00
|
|
|
r += size
|
2011-10-25 22:23:54 -07:00
|
|
|
w += utf8.EncodeRune(b[w:], rr)
|
2010-04-21 16:40:53 -07:00
|
|
|
}
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|
2011-02-23 11:32:29 -05:00
|
|
|
return b[0:w], true
|
2009-11-30 13:55:09 -08:00
|
|
|
}
|