strings: optimize Repeat for common substrings

According to static analysis of Go source code known by the module proxy,
spaces, dashes, zeros, and tabs are the most commonly repeated string literals.

Out of ~69k total calls to Repeat:
* ~25k calls are repeats of " "
* ~7k calls are repeats of "-"
* ~4k calls are repeats of "0"
* ~2k calls are repeats of "="
* ~2k calls are repeats of "\t"

After this optimization, ~60% of Repeat calls will go through the fast path.

These are often used in padding of fixed-width terminal UI or
in the presentation of humanly readable text
(e.g., indentation made of spaces or tabs).

Optimize for this case by handling short repeated sequences of common literals.

Performance:

	name             old time/op    new time/op    delta
	RepeatSpaces-24    19.3ns ± 1%     5.0ns ± 1%   -74.27%  (p=0.000 n=8+9)

	name             old alloc/op   new alloc/op   delta
	RepeatSpaces-24     2.00B ± 0%     0.00B       -100.00%  (p=0.000 n=10+10)

	name             old allocs/op  new allocs/op  delta
	RepeatSpaces-24      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)

Change-Id: Id1cafd0cc509e835c8241a626489eb206e0adc3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/536615
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This commit is contained in:
Joe Tsai 2023-10-01 12:15:14 -07:00 committed by Joseph Tsai
parent 0d7afc2ebf
commit 3c78ace24f
2 changed files with 58 additions and 0 deletions

View file

@ -1111,6 +1111,13 @@ func TestCaseConsistency(t *testing.T) {
}
var longString = "a" + string(make([]byte, 1<<16)) + "z"
var longSpaces = func() string {
b := make([]byte, 200)
for i := range b {
b[i] = ' '
}
return string(b)
}()
var RepeatTests = []struct {
in, out string
@ -1123,6 +1130,12 @@ var RepeatTests = []struct {
{"-", "-", 1},
{"-", "----------", 10},
{"abc ", "abc abc abc ", 3},
{" ", " ", 1},
{"--", "----", 2},
{"===", "======", 2},
{"000", "000000000", 3},
{"\t\t\t\t", "\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t", 4},
{" ", longSpaces, len(longSpaces)},
// Tests for results over the chunkLimit
{string(rune(0)), string(make([]byte, 1<<16)), 1 << 16},
{longString, longString + longString, 2},
@ -1925,6 +1938,13 @@ func BenchmarkRepeatLarge(b *testing.B) {
}
}
func BenchmarkRepeatSpaces(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
Repeat(" ", 2)
}
}
func BenchmarkIndexAnyASCII(b *testing.B) {
x := Repeat("#", 2048) // Never matches set
cs := "0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz"