The new Map implementation introduced in golang.org/cl/33201
did not differentiate if an invalid UTF-8 sequence was decoded
or the RuneError rune. It would therefore always advance by
3 bytes (which is the length of the RuneError rune) instead
of 1 for an invalid sequences. This cl adds a check to correctly
determine the length of bytes needed to advance to the next rune.
Fixes#19330.
Change-Id: I1e7f9333f3ef6068ffc64015bb0a9f32b0b7111d
Reviewed-on: https://go-review.googlesource.com/37597
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Using 'sep' as parameter name for strings functions that take a
separator argument is fine, but for functions like Index or Count that
look for a substring it's better to use 'substr' (like Contains
already does).
Fixes#19039
Change-Id: Idd557409c8fea64ce830ab0e3fec37d3d56a79f0
Reviewed-on: https://go-review.googlesource.com/36874
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In a large codebase within Google, there are thousands of uses of:
ContainsAny|IndexAny|LastIndexAny|Trim|TrimLeft|TrimRight
An analysis of their usage shows that over 97% of them only use character
sets consisting of only ASCII symbols.
Uses of ContainsAny|IndexAny|LastIndexAny:
6% are 1 character (e.g., "\n" or " ")
58% are 2-4 characters (e.g., "<>" or "\r\n\t ")
24% are 5-9 characters (e.g., "()[]*^$")
10% are 10+ characters (e.g., "+-=&|><!(){}[]^\"~*?:\\/ ")
We optimize for ASCII sets, which are commonly used to search for
"control" characters in some string. We don't optimize for the
single character scenario since IndexRune or IndexByte could be used.
Uses of Trim|TrimLeft|TrimRight:
71% are 1 character (e.g., "\n" or " ")
14% are 2 characters (e.g., "\r\n")
10% are 3-4 characters (e.g., " \t\r\n")
5% are 10+ characters (e.g., "0123456789abcdefABCDEF")
We optimize for the single character case with a simple closured function
that only checks for that character's value. We optimize for the medium
and larger sets using a 16-byte bit-map representing a set of ASCII characters.
The benchmarks below have the following suffix name "%d:%d" where the first
number is the length of the input and the second number is the length
of the charset.
== bytes package ==
benchmark old ns/op new ns/op delta
BenchmarkIndexAnyASCII/1:1-4 5.09 5.23 +2.75%
BenchmarkIndexAnyASCII/1:2-4 5.81 5.85 +0.69%
BenchmarkIndexAnyASCII/1:4-4 7.22 7.50 +3.88%
BenchmarkIndexAnyASCII/1:8-4 11.0 11.1 +0.91%
BenchmarkIndexAnyASCII/1:16-4 17.5 17.8 +1.71%
BenchmarkIndexAnyASCII/16:1-4 36.0 34.0 -5.56%
BenchmarkIndexAnyASCII/16:2-4 46.6 36.5 -21.67%
BenchmarkIndexAnyASCII/16:4-4 78.0 40.4 -48.21%
BenchmarkIndexAnyASCII/16:8-4 136 47.4 -65.15%
BenchmarkIndexAnyASCII/16:16-4 254 61.5 -75.79%
BenchmarkIndexAnyASCII/256:1-4 542 388 -28.41%
BenchmarkIndexAnyASCII/256:2-4 705 382 -45.82%
BenchmarkIndexAnyASCII/256:4-4 1089 386 -64.55%
BenchmarkIndexAnyASCII/256:8-4 1994 394 -80.24%
BenchmarkIndexAnyASCII/256:16-4 3843 411 -89.31%
BenchmarkIndexAnyASCII/4096:1-4 8522 5873 -31.08%
BenchmarkIndexAnyASCII/4096:2-4 11253 5861 -47.92%
BenchmarkIndexAnyASCII/4096:4-4 17824 5883 -66.99%
BenchmarkIndexAnyASCII/4096:8-4 32053 5871 -81.68%
BenchmarkIndexAnyASCII/4096:16-4 60512 5888 -90.27%
BenchmarkTrimASCII/1:1-4 79.5 70.8 -10.94%
BenchmarkTrimASCII/1:2-4 79.0 105 +32.91%
BenchmarkTrimASCII/1:4-4 79.6 109 +36.93%
BenchmarkTrimASCII/1:8-4 78.8 118 +49.75%
BenchmarkTrimASCII/1:16-4 80.2 132 +64.59%
BenchmarkTrimASCII/16:1-4 243 116 -52.26%
BenchmarkTrimASCII/16:2-4 243 171 -29.63%
BenchmarkTrimASCII/16:4-4 243 176 -27.57%
BenchmarkTrimASCII/16:8-4 241 184 -23.65%
BenchmarkTrimASCII/16:16-4 238 199 -16.39%
BenchmarkTrimASCII/256:1-4 2580 840 -67.44%
BenchmarkTrimASCII/256:2-4 2603 1175 -54.86%
BenchmarkTrimASCII/256:4-4 2572 1188 -53.81%
BenchmarkTrimASCII/256:8-4 2550 1191 -53.29%
BenchmarkTrimASCII/256:16-4 2585 1208 -53.27%
BenchmarkTrimASCII/4096:1-4 39773 12181 -69.37%
BenchmarkTrimASCII/4096:2-4 39946 17231 -56.86%
BenchmarkTrimASCII/4096:4-4 39641 17179 -56.66%
BenchmarkTrimASCII/4096:8-4 39835 17175 -56.88%
BenchmarkTrimASCII/4096:16-4 40229 17215 -57.21%
== strings package ==
benchmark old ns/op new ns/op delta
BenchmarkIndexAnyASCII/1:1-4 5.94 4.97 -16.33%
BenchmarkIndexAnyASCII/1:2-4 5.94 5.55 -6.57%
BenchmarkIndexAnyASCII/1:4-4 7.45 7.21 -3.22%
BenchmarkIndexAnyASCII/1:8-4 10.8 10.6 -1.85%
BenchmarkIndexAnyASCII/1:16-4 17.4 17.2 -1.15%
BenchmarkIndexAnyASCII/16:1-4 36.4 32.2 -11.54%
BenchmarkIndexAnyASCII/16:2-4 49.6 34.6 -30.24%
BenchmarkIndexAnyASCII/16:4-4 77.5 37.9 -51.10%
BenchmarkIndexAnyASCII/16:8-4 138 45.5 -67.03%
BenchmarkIndexAnyASCII/16:16-4 241 59.1 -75.48%
BenchmarkIndexAnyASCII/256:1-4 509 378 -25.74%
BenchmarkIndexAnyASCII/256:2-4 720 381 -47.08%
BenchmarkIndexAnyASCII/256:4-4 1142 384 -66.37%
BenchmarkIndexAnyASCII/256:8-4 1999 391 -80.44%
BenchmarkIndexAnyASCII/256:16-4 3735 403 -89.21%
BenchmarkIndexAnyASCII/4096:1-4 7973 5824 -26.95%
BenchmarkIndexAnyASCII/4096:2-4 11432 5809 -49.19%
BenchmarkIndexAnyASCII/4096:4-4 18327 5819 -68.25%
BenchmarkIndexAnyASCII/4096:8-4 33059 5828 -82.37%
BenchmarkIndexAnyASCII/4096:16-4 59703 5817 -90.26%
BenchmarkTrimASCII/1:1-4 71.9 71.8 -0.14%
BenchmarkTrimASCII/1:2-4 73.3 103 +40.52%
BenchmarkTrimASCII/1:4-4 71.8 106 +47.63%
BenchmarkTrimASCII/1:8-4 71.2 113 +58.71%
BenchmarkTrimASCII/1:16-4 71.6 128 +78.77%
BenchmarkTrimASCII/16:1-4 152 116 -23.68%
BenchmarkTrimASCII/16:2-4 160 168 +5.00%
BenchmarkTrimASCII/16:4-4 172 170 -1.16%
BenchmarkTrimASCII/16:8-4 200 177 -11.50%
BenchmarkTrimASCII/16:16-4 254 193 -24.02%
BenchmarkTrimASCII/256:1-4 1438 864 -39.92%
BenchmarkTrimASCII/256:2-4 1551 1195 -22.95%
BenchmarkTrimASCII/256:4-4 1770 1200 -32.20%
BenchmarkTrimASCII/256:8-4 2195 1216 -44.60%
BenchmarkTrimASCII/256:16-4 3054 1224 -59.92%
BenchmarkTrimASCII/4096:1-4 21726 12557 -42.20%
BenchmarkTrimASCII/4096:2-4 23586 17508 -25.77%
BenchmarkTrimASCII/4096:4-4 26898 17510 -34.90%
BenchmarkTrimASCII/4096:8-4 33714 17595 -47.81%
BenchmarkTrimASCII/4096:16-4 47429 17700 -62.68%
The benchmarks added test the worst case. For IndexAny, that is when the
charset matches none of the input. For Trim, it is when the charset matches
all of the input.
Change-Id: I970874d101a96b33528fc99b165379abe58cf6ea
Reviewed-on: https://go-review.googlesource.com/31593
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Martin Möhrmann <martisch@uos.de>
In all previous versions of Go, the behavior of IndexRune(s, r)
where r was utf.RuneError was that it would effectively return the
index of any invalid UTF-8 byte sequence (include RuneError).
Optimizations made in http://golang.org/cl/28537 and
http://golang.org/cl/28546 altered this undocumented behavior such
that RuneError would only match on the RuneError rune itself.
Although, the new behavior is arguably reasonable, it did break code
that depended on the previous behavior. Thus, we add special checks
to ensure that we preserve the old behavior.
There is a slight performance hit for correctness:
benchmark old ns/op new ns/op delta
BenchmarkIndexRune/10-4 19.3 21.6 +11.92%
BenchmarkIndexRune/32-4 33.6 35.2 +4.76%
This only occurs on small strings. The performance hit for larger strings
is neglible and not shown.
Fixes#17611
Change-Id: I1d863a741213d46c40b2e1724c41245df52502a5
Reviewed-on: https://go-review.googlesource.com/32123
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Panic if Repeat is given a negative count or
if the value of (len(*) * count) is detected
to overflow.
We panic because we cannot change the
signature of Repeat to return an error.
Fixes#16237
Change-Id: I9f5ba031a5b8533db0582d7a672ffb715143f3fb
Reviewed-on: https://go-review.googlesource.com/29954
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We already had special cases for 0 and 1. Add 2 and 3 for now too.
To be removed if the compiler is improved later (#6714).
This halves the number of allocations and total bytes allocated via
common filepath.Join calls, improving filepath.Walk performance.
Noticed as part of investigating filepath.Walk in #16399.
Change-Id: If7b1bb85606d4720f3ebdf8de7b1e12ad165079d
Reviewed-on: https://go-review.googlesource.com/25005
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Merges explodetests into splittests which already contain
some of the tests that cover explode.
Adds a test to cover the utf8.RuneError branch in explode.
name old time/op new time/op delta
Split1-2 14.9ms ± 0% 14.2ms ± 0% -4.06% (p=0.000 n=47+49)
Change-Id: I00f796bd2edab70e926ea9e65439d820c6a28254
Reviewed-on: https://go-review.googlesource.com/21609
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The previous commit (git 2ae77376) just did golang.org. This one
includes golang.org subdomains like blog, play, and build.
Change-Id: I4469f7b307ae2a12ea89323422044e604c5133ae
Reviewed-on: https://go-review.googlesource.com/12071
Reviewed-by: Rob Pike <r@golang.org>
Also add a reference to the strings blog post.
Fixes#11045.
Change-Id: Ic0a8908cbd7b51a36d104849fa0e8abfd54de2b9
Reviewed-on: https://go-review.googlesource.com/10662
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently the packages have the following index functions:
func Index(s, sep []byte) int
func IndexAny(s []byte, chars string) int
func IndexByte(s []byte, c byte) int
func IndexFunc(s []byte, f func(r rune) bool) int
func IndexRune(s []byte, r rune) int
func LastIndex(s, sep []byte) int
func LastIndexAny(s []byte, chars string) int
func LastIndexFunc(s []byte, f func(r rune) bool) int
Searching for the last occurrence of a byte is quite common
for string parsing algorithms (e.g. find the last paren on a line).
Also addition of LastIndexByte makes the set more orthogonal.
Change-Id: Ida168849acacf8e78dd70c1354bef9eac5effafe
Reviewed-on: https://go-review.googlesource.com/9500
Reviewed-by: Rob Pike <r@golang.org>
Regular expression noteMarker requires the definition of a (who) section
when reading note from a sequence of comments.
Change-Id: I9635de9b86f00d20ec108097fee4d4a8f76237b2
Reviewed-on: https://go-review.googlesource.com/1952
Reviewed-by: Russ Cox <rsc@golang.org>
Noticed while reviewing https://golang.org/cl/147690043/
I'd never seen anybody use IndexRune before, and
unsurprisingly it doesn't use the other fast paths in the
strings/bytes packages. IndexByte uses assembly.
Also, less code this way.
LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/147700043