mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
350 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
97da068774 |
cmd/compile: eliminate nil checks on .dict arg
The first arg of a generic function is the dictionary. This dictionary is never nil, but it gets a nil check becuase the dict arg is treated as a slice during construction. cmp.Compare[go.shape.int] was: 00006 (+41) TESTB AX, (AX) 00007 (+52) CMPQ CX, BX 00008 (52) JGT 14 00009 (+55) JGE 12 00010 (+56) MOVL $1, AX 00011 (56) RET 00012 (+58) XORL AX, AX 00013 (58) RET 00014 (+53) MOVQ $-1, AX 00015 (53) RET Note how the function begins with a TESTB that loads the dict to perform the nil check. This CL eliminates that nil check. For most generic functions, this doesn't matter too much, but not infrequently are generic functions written which never actually use the dictionary (like cmp.Compare), so I suspect this might help in hot code to avoid repeatedly touching the dictionary in memory, and in cases where the generic function is not inlined (and thus the dict dropped). compilecmp shows these changes (deduped): cmp.Compare[go.shape.float64] 73 -> 72 (-1.37%) cmp.Compare[go.shape.int] 26 -> 24 (-7.69%) cmp.Compare[go.shape.int32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.int64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.string] 142 -> 141 (-0.70%) cmp.Compare[go.shape.uint16] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint32] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uint64] 26 -> 24 (-7.69%) cmp.Compare[go.shape.uint8] 25 -> 23 (-8.00%) cmp.Compare[go.shape.uintptr] 26 -> 24 (-7.69%) cmp.Less[go.shape.float64] 35 -> 34 (-2.86%) cmp.Less[go.shape.int32] 8 -> 6 (-25.00%) cmp.Less[go.shape.int64] 9 -> 7 (-22.22%) cmp.Less[go.shape.int] 9 -> 7 (-22.22%) cmp.Less[go.shape.string] 112 -> 110 (-1.79%) cmp.Less[go.shape.uint16] 9 -> 7 (-22.22%) cmp.Less[go.shape.uint32] 8 -> 6 (-25.00%) cmp.Less[go.shape.uint64] 9 -> 7 (-22.22%) internal/synctest.Associate[go.shape.struct 114 -> 113 (-0.88%) internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791 (-1.74%) internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852 (-0.70%) main.(*gState[go.shape.int64]).stop 2111 -> 2085 (-1.23%) main.(*gState[go.shape.int64]).unblock 941 -> 923 (-1.91%) runtime.fmax[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmax[go.shape.float64] 89 -> 87 (-2.25%) runtime.fmin[go.shape.float32] 85 -> 83 (-2.35%) runtime.fmin[go.shape.float64] 89 -> 87 (-2.25%) slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337 (-2.60%) slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453 (-1.95%) slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169 (-0.59%) slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181 (-2.69%) slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219 (-1.79%) slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219 (-1.79%) slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42 (-4.55%) slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42 (-4.55%) slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170 (-1.60%) slices.medianCmpFunc[go.shape.struct 1118 -> 1113 (-0.45%) slices.medianCmpFunc[go.shape.struct 1214 -> 1209 (-0.41%) slices.medianCmpFunc[go.shape.struct 889 -> 887 (-0.22%) slices.medianCmpFunc[go.shape.struct 901 -> 874 (-3.00%) slices.order2Ordered[go.shape.float64] 89 -> 87 (-2.25%) slices.order2Ordered[go.shape.uint16] 75 -> 70 (-6.67%) slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110 (-0.45%) slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352 (-1.68%) slices.partitionEqualOrdered[go.shape.int] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.int32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.int64] 208 -> 203 (-2.40%) slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198 (-4.81%) slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203 (-2.40%) slices.partitionOrdered[go.shape.float64] 538 -> 533 (-0.93%) slices.partitionOrdered[go.shape.int] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.int64] 437 -> 427 (-2.29%) slices.partitionOrdered[go.shape.uint16] 447 -> 442 (-1.12%) slices.partitionOrdered[go.shape.uint64] 437 -> 427 (-2.29%) slices.rotateCmpFunc[go.shape.struct 1045 -> 1029 (-1.53%) slices.rotateCmpFunc[go.shape.struct 1205 -> 1163 (-3.49%) slices.rotateCmpFunc[go.shape.struct 1226 -> 1176 (-4.08%) slices.rotateCmpFunc[go.shape.struct 1322 -> 1272 (-3.78%) slices.rotateCmpFunc[go.shape.struct 1419 -> 1400 (-1.34%) slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538 (-2.00%) slices.rotateLeft[go.shape.string] 603 -> 588 (-2.49%) slices.rotateLeft[go.shape.uint8] 255 -> 250 (-1.96%) slices.siftDownOrdered[go.shape.int] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.int64] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.string] 614 -> 592 (-3.58%) slices.siftDownOrdered[go.shape.uint32] 181 -> 171 (-5.52%) slices.siftDownOrdered[go.shape.uint64] 181 -> 171 (-5.52%) time.parseRFC3339[go.shape.string] 1774 -> 1758 (-0.90%) unique.(*canonMap[go.shape.struct 280 -> 276 (-1.43%) unique.clone[go.shape.struct 311 -> 293 (-5.79%) weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134 (-1.47%) weak.Make[go.shape.struct 136 -> 134 (-1.47%) weak.Make[go.shape.uint8] 136 -> 134 (-1.47%) Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7 Reviewed-on: https://go-review.googlesource.com/c/go/+/706655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
|
a5fa5ea51c |
cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}
This CL slightly speeds up strings.HasPrefix when testing constant
prefixes of length {3,5,6,7}.
goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
│ old │ new │
│ sec/op │ sec/op vs base │
StringPrefix3-8 11.125n ± 2% 8.539n ± 1% -23.25% (p=0.000 n=20)
StringPrefix5-8 11.170n ± 2% 8.700n ± 1% -22.11% (p=0.000 n=20)
StringPrefix6-8 11.190n ± 2% 8.655n ± 1% -22.65% (p=0.000 n=20)
StringPrefix7-8 11.095n ± 1% 8.878n ± 1% -19.98% (p=0.000 n=20)
Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/700816
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
|
||
|
|
b915e14490 |
cmd/compile: consolidate logic for rewriting fixed loads
Many CLs have worked with this bit of code, extending the cases more and more for various fixed addresses and constants. But, I find that it's getting duplicitive, and I don't find the current setup very clear that something like isFixed32 _only_ works for a specific element within the type data. This CL rewrites these rules (pun unintended) into a single set of rewrite rules with shared logic, which stops hardcoding offsets and type compatibility checks. This should open the door to optimizing further type:... field loads, of which most can be done entirely statically but are not yet today outside Hash and Elem. Passes toolstash -cmp. Change-Id: I754138ce1785c6036eada9ed53f0ce2ad2a58b63 Reviewed-on: https://go-review.googlesource.com/c/go/+/701297 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Florian Lehner <lehner.florian86@gmail.com> |
||
|
|
d767064170 |
cmd/compile: mark abi.PtrType.Elem sym as used
CL 700336 let the compiler see into the abi.PtrType.Elem field, but forgot the MarkTypeSymUsedInInterface to ensure that the symbol is marked as referenced. I am not sure how to write a test for this, but I noticed this when working on further optimizations where I "fixed" this issue and confusingly failed toolstash -cmp, with diffs like: @@ -70582,6 +70582,7 @@ reflect.groupAndSlotOf<1> STEXT size=696 args=0x20 locals=0x1e0 funcid=0x0 align rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0 rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0 rel 3+0 t=R_USEIFACE type:*uint64<0>+0 + rel 3+0 t=R_USEIFACE type:uint64<0>+0 rel 71+0 t=R_CALLIND +0 rel 92+4 t=R_PCREL go:itab.*reflect.rtype,reflect.Type<0>+0 rel 114+4 t=R_CALL reflect.(*rtype).ptrTo<1>+0 Updates #75203 Change-Id: Ib8de8a32aeb8a7ea6fcf5d728a2e4944ef227ab2 Reviewed-on: https://go-review.googlesource.com/c/go/+/701296 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
df29038486 |
cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem
This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org> |
||
|
|
84b070bfb1 |
cmd/compile/internal/ssa: make oneBit function generic
Allows rewrite rules using oneBit to be made more compact. Change-Id: I986715f77db5b548759d809fe668e1893048f25c Reviewed-on: https://go-review.googlesource.com/c/go/+/699295 Reviewed-by: Youlin Feng <fengyoulin@live.com> Commit-Queue: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
74421a305b |
Revert "cmd/compile: allow multi-field structs to be stored directly in interfaces"
This reverts commit
|
||
|
|
7248995b60 |
Revert "cmd/compile: allow more args in StructMake folding rule"
This reverts commit
|
||
|
|
72e8237cc1 |
cmd/compile: allow more args in StructMake folding rule
imakeOfStructMake does the right thing, but we never call it when the StructMake has more than one argument. Fixes #74908 Change-Id: Ib4b1a025bfb1fa69a325207e47b74bd6217092bf Reviewed-on: https://go-review.googlesource.com/c/go/+/693615 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
cd55f86b8d |
cmd/compile: allow multi-field structs to be stored directly in interfaces
If the struct is a bunch of 0-sized fields and one pointer field. Fixes #74092 Change-Id: I87c5d162c8c9fdba812420d7f9d21de97295b62c Reviewed-on: https://go-review.googlesource.com/c/go/+/681937 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
bd94ae8903 |
cmd/compile: use unsigned power-of-two detector for unsigned mod
Same as CL 689815, but for modulus instead of division. Updates #74485 Change-Id: I73000231c886a987a1093669ff207fd9117a8160 Reviewed-on: https://go-review.googlesource.com/c/go/+/689895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
f3582fc80e |
cmd/compile: add unsigned power-of-two detector
Fixes #74485 Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca Reviewed-on: https://go-review.googlesource.com/c/go/+/689815 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
||
|
|
4569255f8c |
cmd/compile: cleanup SelectN rules by indexing into args
Change-Id: I7b8e8cd88c4d6d562aa25df91593d35d331ef63c Reviewed-on: https://go-review.googlesource.com/c/go/+/690595 Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
94645d2413 |
cmd/compile: rewrite cmov(x, x, cond) into x
I don't think branchelim will intentionally generate theses. But at the time where branchelim is generating them they might different, and through opt process they become the same value. Change-Id: I4a19f1db14c08057b7e782a098f4c18ca36ab7fe Reviewed-on: https://go-review.googlesource.com/c/go/+/690519 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <mark@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
ce05ad448f |
cmd/compile: rewrite condselects into doublings and halvings
For performance see CL 685676.
This allows something like:
if y { x *= 2 }
To be compiled to:
SHLXQ BX, AX, AX
Instead of:
MOVQ AX, CX
SHLQ $1, CX
MOVBLZX BL, DX
TESTQ DX, DX
CMOVQNE CX, AX
While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings.
Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444
Reviewed-on: https://go-review.googlesource.com/c/go/+/685695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
fcd28070fe |
cmd/compile: add opt branchelim to rewrite some CondSelect into math
This allows something like:
if y { x++ }
To be compiled to:
MOVBLZX BX, CX
ADDQ CX, AX
Instead of:
LEAQ 1(AX), CX
MOVBLZX BL, DX
TESTQ DX, DX
CMOVQNE CX, AX
While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions.
See benchmark here: https://go.dev/play/p/DJf5COjwhd_s
Either it's a performance no-op or it is faster:
goos: linux
goarch: amd64
cpu: AMD Ryzen 5 3600 6-Core Processor
│ /tmp/old.logs │ /tmp/new.logs │
│ sec/op │ sec/op vs base │
CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10)
CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10)
CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10)
CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10)
CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10)
CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10)
CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10)
CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10)
geomean
There are other benefits not tested by this benchmark:
- the math form is usually a couple bytes shorter (ICACHE)
- the math form is usually 0~2 uops shorter (UCACHE)
- the math form has usually less register pressure*
- the math form can sometimes be optimized further
*regalloc rarely find how it can use less registers
As far as pass ordering goes there are many possible options,
I've decided to reorder branchelim before late opt since:
- unlike running exclusively the CondSelect rules after branchelim,
some extra optimizations might trigger on the adds or subs.
- I don't want to maintain a second generic.rules file of only the stuff,
that can trigger after branchelim.
- rerunning all of opt a third time increase compilation time for little gains.
By elimination moving branchelim seems fine.
Change-Id: I869adf57e4d109948ee157cfc47144445146bafd
Reviewed-on: https://go-review.googlesource.com/c/go/+/685676
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
|
||
|
|
27ff0f249c |
cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.
This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).
Fixes #71926
Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
|
||
|
|
7d0cb2a2ad |
cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
16a6b71f18 |
cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.
Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.
Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.
This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.
Local sweet run result on an x86_64 laptop:
│ Orig.res │ Hopt.res │
│ sec/op │ sec/op vs base │
BiogoIgor-8 5.303 ± 1% 5.322 ± 1% ~ (p=0.190 n=10)
BiogoKrishna-8 7.894 ± 1% 7.828 ± 2% ~ (p=0.190 n=10)
BleveIndexBatch100-8 2.257 ± 1% 2.248 ± 2% ~ (p=0.529 n=10)
EtcdPut-8 30.12m ± 1% 30.03m ± 1% ~ (p=0.796 n=10)
EtcdSTM-8 127.1m ± 1% 126.2m ± 0% -0.74% (p=0.023 n=10)
GoBuildKubelet-8 52.21 ± 0% 52.05 ± 1% ~ (p=0.063 n=10)
GoBuildKubeletLink-8 4.342 ± 1% 4.305 ± 0% -0.85% (p=0.000 n=10)
GoBuildIstioctl-8 43.33 ± 0% 43.24 ± 0% -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8 4.604 ± 1% 4.598 ± 0% ~ (p=0.063 n=10)
GoBuildFrontend-8 15.33 ± 0% 15.29 ± 0% ~ (p=0.143 n=10)
GoBuildFrontendLink-8 740.0m ± 1% 737.7m ± 1% ~ (p=0.912 n=10)
GopherLuaKNucleotide-8 9.590 ± 1% 9.656 ± 1% ~ (p=0.165 n=10)
MarkdownRenderXHTML-8 96.97m ± 1% 97.26m ± 2% ~ (p=0.105 n=10)
Tile38QueryLoad-8 335.9µ ± 1% 335.6µ ± 1% ~ (p=0.481 n=10)
geomean 1.336 1.333 -0.22%
Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
|
||
|
|
b60b9cf21f |
cmd/compile: add constant folding for bits.Add64
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd Reviewed-on: https://go-review.googlesource.com/c/go/+/656275 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> |
||
|
|
4ff70cf868 |
cmd/compile: add MakeTuple generic SSA op to remove duplicate Select[01] rules
Change-Id: Id94a5e503f02aa29dc1e334b521770107d4261db Reviewed-on: https://go-review.googlesource.com/c/go/+/656615 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> |
||
|
|
3e033b7553 |
cmd/compile: add constant folding for PopCount
Change-Id: I6ea3f75ddd5c7af114ef77bc48f28c7f8570997b Reviewed-on: https://go-review.googlesource.com/c/go/+/656156 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
43e6525986 |
cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.
type M interface{ M() }
type A interface{ A() }
type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}
func main() {
var a M = &Impl{}
a.(A).A()
}
Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev:
|
||
|
|
89c2f282dc |
cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes) before calling map access or delete functions with the resulting string as the key, then we can just use the ptr/len of the slicebytetostring argument as the key. This avoids an allocation. Fixes #44898 Update #71132 There's old code in cmd/compile/internal/walk/order.go that handles some of these cases. 1. m[string(b)] 2. s := string(b); m[s] 3. m[[2]string{string(b1),string(b2)}] The old code handled cases 1&3. The new code handles cases 1&2. We'll leave the old code around to keep 3 working, although it seems not terribly common. Case 2 happens particularly after inlining, so it is pretty common. Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43 Reviewed-on: https://go-review.googlesource.com/c/go/+/640656 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
072eea9b3b |
cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct. Fixes #70738 Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f Reviewed-on: https://go-review.googlesource.com/c/go/+/635435 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
f7dbbf2519 |
cmd/compile: distribute 8 and 16-bit multiplication
Expand the existing rule to cover 8 and 16 bit variants. compilecmp linux/amd64: time time.parseStrictRFC3339.func1 80 -> 70 (-12.50%) time.Time.appendStrictRFC3339.func1 80 -> 70 (-12.50%) time.Time.appendStrictRFC3339 439 -> 428 (-2.51%) time [cmd/compile] time.parseStrictRFC3339.func1 80 -> 70 (-12.50%) time.Time.appendStrictRFC3339.func1 80 -> 70 (-12.50%) time.Time.appendStrictRFC3339 439 -> 428 (-2.51%) linux/arm64: time time.parseStrictRFC3339.func1 changed time.Time.appendStrictRFC3339.func1 changed time.Time.appendStrictRFC3339 416 -> 400 (-3.85%) time [cmd/compile] time.Time.appendStrictRFC3339 416 -> 400 (-3.85%) time.parseStrictRFC3339.func1 changed time.Time.appendStrictRFC3339.func1 changed Change-Id: I0ad3b2363a9fe8c322dd05fbc13bf151a146d8cb Reviewed-on: https://go-review.googlesource.com/c/go/+/641756 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
f0b0109242 |
cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
This came up in some swissmap code. Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83 Reviewed-on: https://go-review.googlesource.com/c/go/+/629516 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
7d2e3da243 |
cmd/compile: remove redundant nil checks
Optimize them away if we can. If not, be more careful about splicing them out after scheduling. Change-Id: I660e54649d753dc456d2e25d389d375a16d76940 Reviewed-on: https://go-review.googlesource.com/c/go/+/627418 Reviewed-by: Shengwei Zhao <wingrez@126.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
32f4864216 |
cmd/compile: arithmetic optimization for shifts
Fixes #69635 Change-Id: I4f8d7dafb34ccfb943c29f96c982278ab7edcd05 Reviewed-on: https://go-review.googlesource.com/c/go/+/621357 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> |
||
|
|
6d856a804c |
cmd/compile: generalize struct load/store
The SSA backend currently only handle struct with up to 4 fields. Thus, there are different operations corresponding to number fields of the struct. This CL generalizes these with just one OpStructMake, allow struct types with arbitrary number of fields. However, the ssa.MaxStruct is still kept as-is, and future CL will increase this value to optimize large structs. Updates #24416 Change-Id: I192ffbea881186693584476b5639394e79be45c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/611075 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
944a2ac3c7 |
cmd/compile: small cleanups to rewrite rule helpers
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c Reviewed-on: https://go-review.googlesource.com/c/go/+/615198 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
be86f09e01 |
cmd/compile: use generics for isPowerOfTwo predicates
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390 Reviewed-on: https://go-review.googlesource.com/c/go/+/615197 Auto-Submit: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
090f03fd2f |
cmd/compile: do constant folding for BitLen*
Change-Id: I56c27d606b55ea882f4db264fd4735b0cccdf7c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/604015 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
||
|
|
9f9dd2bfd8 |
cmd/compile: fix cmpstring rewrite rule
We need to ensure that the Select0 lives in the same block as its argument. Divide up the rule into 2 so that we can put the parts in the right places. (This would be simpler if we could use @block syntax mid-rule, but that feature currently only works at the top level.) This fixes the ssacheck builder after CL 578835 Change-Id: Id26a01d9fac0684e0b732d35d0f7999f6de07825 Reviewed-on: https://go-review.googlesource.com/c/go/+/580815 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
1a0b86375f |
cmd/compile: remove redundant calls to cmpstring
The results of cmpstring are reuseable if the second call has the same arguments and memory. Note that this gets rid of cmpstring, but we still generate a redundant </<= test and branch afterwards, because the compiler doesn't know that cmpstring only ever returns -1,0,1. Update #61725 Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b Reviewed-on: https://go-review.googlesource.com/c/go/+/578835 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> |
||
|
|
bbd0bc22ea |
cmd/compile: improve integer comparisons with numeric bounds
This do: - Fold always false or always true comparisons for ints and uint. - Reduce < and <= where the true set is only one value to == with such value. Change-Id: Ie9e3f70efd1845bef62db56543f051a50ad2532e Reviewed-on: https://go-review.googlesource.com/c/go/+/555135 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
962ccbef91 |
cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of that pointer can use the result, ensuring that they are ordered after the nil check itself. The order dependence goes away after scheduling, when we've fixed an order. At that point we move uses back to the original pointer so it doesn't change regalloc any. This prevents pointer arithmetic on nil from being spilled to the stack and then observed by a stack scan. Fixes #63657 Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea Reviewed-on: https://go-review.googlesource.com/c/go/+/537775 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
45d3d10071 |
cmd/compile/internal/ssa: rename ssagen.TypeOK as CanSSA
No need to indirect through Frontend for this. Change-Id: I5812eb4dadfda79267cabc9d13aeab126c1479e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/526517 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |
||
|
|
d0964e172b |
cmd/compile: optimize s==s for strings
s==s is always true for strings. This comes up in NaN testing in generic code, where we want x==x to compile completely away except for float types. Fixes #60777 Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637 Reviewed-on: https://go-review.googlesource.com/c/go/+/503795 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> |
||
|
|
bd3f44e4ff |
cmd/compile: constant-fold loads from constant dictionaries and types
Retrying the original CL with a small modification. The original CL did not handle the case of reading an itab out of a dictionary correctly. When we read an itab out of a dictionary, we must treat the type inside that itab as maybe being put in an interface. Original CL: 486895 Revert CL: 490156 Change-Id: Id2dc1699d184cd8c63dac83986a70b60b4e6cbd7 Reviewed-on: https://go-review.googlesource.com/c/go/+/491495 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
|
|
95c4f320d5 |
cmd/compile: add De Morgan's rewrite rule
Adds rules that rewrites statements such as ~P&~Q as ~(P|Q) and ~P|~Q as ~(P&Q), removing an extraneous instruction.
Change-Id: Icedb97df741680ddf9799df79df78657173aa500
GitHub-Last-Rev:
|
||
|
|
f046180890 |
Revert "cmd/compile: constant-fold loads from constant dictionaries and types"
This reverts CL 486895. Reason for revert: This breaks internal tests at Google, see b/280035614. Change-Id: I48772a44f5f6070a7f06b5704e9f9aa272b5f978 Reviewed-on: https://go-review.googlesource.com/c/go/+/490156 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Stapelberg <stapelberg@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Michael Stapelberg <stapelberg@google.com> Reviewed-by: David Chase <drchase@google.com> |
||
|
|
635839a17a |
cmd/compile: constant-fold loads from constant dictionaries and types
Update #59591 Change-Id: Id250a7779c5b53776fff73f3e678fec54d92a8e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/486895 Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
|
|
6b165577fe |
cmd/compile: remove memequal call from string compares in more cases
Add more rules to ensure that order doesn't matter. Add memequal 0 rule. Try to use a constant argument to memequal when one is available. Fixes #59684 Change-Id: I36e85ffbd949396ed700ed6e8ec2bc3ae013f5d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/485535 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
|
|
da4687923b |
cmd/compile: add rewrite rules for arithmetic operations
Add the following common local transformations
(t + x) - (t + y) == x - y
(t + x) - (y + t) == x - y
(x + t) - (y + t) == x - y
(x + t) - (t + y) == x - y
(x - t) + (t + y) == x + y
(x - t) + (y + t) == x + y
The compiler itself matches such patterns many times. This also aligns with other popular compilers.
Fixes #59111
Change-Id: Ibdfdb414782f8fcaa20b84ac5d43d0d9ae2c7b60
GitHub-Last-Rev:
|
||
|
|
85d54a7667 |
cmd/compile: use zero constants in comparisons where possible
Some integer comparisons with 1 and -1 can be rewritten as comparisons with 0. For example, x < 1 is equivalent to x <= 0. This is an advantageous transformation on riscv64 because comparisons with zero do not require a constant to be loaded into a register. Other architectures will likely benefit too and the transformation is relatively benign on architectures that do not benefit. Change-Id: I2ce9821dd7605a660eb71d76e83a61f9bae1bf25 Reviewed-on: https://go-review.googlesource.com/c/go/+/350831 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org> |
||
|
|
ebe49f98c8 |
cmd/compile: inline constant sized memclrNoHeapPointers calls on PPC64
Update the function isInlinableMemclr for ppc64 and ppc64le to enable inlining for the constant sized cases < 512. Larger cases can use dcbz which performs better but requires alignment checking so it is best to continue using memclrNoHeapPointers for those cases. Results on p10: MemclrKnownSize1 2.07ns ± 0% 0.57ns ± 0% -72.59% MemclrKnownSize2 2.56ns ± 5% 0.57ns ± 0% -77.82% MemclrKnownSize4 5.15ns ± 0% 0.57ns ± 0% -89.00% MemclrKnownSize8 2.23ns ± 0% 0.57ns ± 0% -74.57% MemclrKnownSize16 2.23ns ± 0% 0.50ns ± 0% -77.74% MemclrKnownSize32 2.28ns ± 0% 0.56ns ± 0% -75.28% MemclrKnownSize64 2.49ns ± 0% 0.72ns ± 0% -70.95% MemclrKnownSize112 2.97ns ± 2% 1.14ns ± 0% -61.72% MemclrKnownSize128 4.64ns ± 6% 2.45ns ± 1% -47.17% MemclrKnownSize192 5.45ns ± 5% 2.79ns ± 0% -48.87% MemclrKnownSize248 4.51ns ± 0% 2.83ns ± 0% -37.12% MemclrKnownSize256 6.34ns ± 1% 3.58ns ± 0% -43.53% MemclrKnownSize512 3.64ns ± 0% 3.64ns ± 0% -0.03% MemclrKnownSize1024 4.73ns ± 0% 4.73ns ± 0% +0.01% MemclrKnownSize4096 17.1ns ± 0% 17.1ns ± 0% +0.07% MemclrKnownSize512KiB 2.12µs ± 0% 2.12µs ± 0% ~ (all equal) Change-Id: If1abf5749f4802c64523a41fe0058bd144d0ea46 Reviewed-on: https://go-review.googlesource.com/c/go/+/464340 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> |
||
|
|
0d8d181bd5 |
cmd/compile: use MakeResult in empty MakeSlice elimination
This gets eliminated by thoses rules above: // for rewriting results of some late-expanded rewrites (below) (SelectN [0] (MakeResult x ___)) => x (SelectN [1] (MakeResult x y ___)) => y (SelectN [2] (MakeResult x y z ___)) => z Fixes #58161 Change-Id: I4fbfd52c72c06b6b3db906bd9910b6dbb7fe8975 Reviewed-on: https://go-review.googlesource.com/c/go/+/463846 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> |
||
|
|
1fc585dc2f |
cmd/compile: inline known-size memclrNoHeapPointers calls
This patch rewrites known-size calls to memclrNoHeapPointers with an OpZero.
This significantly improves performance and lets some clears get DSE'd.
One of the cases where this applies is zeroing a known-size array, example:
var x [256]int8
...
for a := range x {
x[a] = 0
}
Other cases can be found in the runtime itself where memclrNoHeapPointers is sometimes directly invoked with a constant.
It seems that for some sized-clears on some architectures (AMD64, maybe others) the memcrlNoHeapPointers is more performant than OpZero.
See the issue #56997 for more details.
Benches ARM (M1 Pro):
name old time/op new time/op delta
MemclrKnownSize1-10 2.03ns ± 0% 0.31ns ± 0% -84.69% (p=0.000 n=18+19)
MemclrKnownSize2-10 1.97ns ± 0% 0.31ns ± 0% -84.19% (p=0.000 n=12+19)
MemclrKnownSize4-10 2.02ns ± 0% 0.31ns ± 0% -84.56% (p=0.000 n=12+20)
MemclrKnownSize8-10 2.02ns ± 0% 0.31ns ± 0% -84.59% (p=0.000 n=14+19)
MemclrKnownSize16-10 2.15ns ± 0% 0.31ns ± 0% -85.50% (p=0.000 n=18+19)
MemclrKnownSize32-10 2.48ns ± 0% 0.31ns ± 0% -87.48% (p=0.000 n=20+19)
MemclrKnownSize64-10 1.93ns ± 0% 0.62ns ± 0% -67.88% (p=0.000 n=20+19)
MemclrKnownSize112-10 2.48ns ± 0% 1.80ns ± 0% -27.74% (p=0.000 n=19+20)
MemclrKnownSize128-10 10.0ns ±112% 2.0ns ± 0% -79.76% (p=0.000 n=18+17)
MemclrKnownSize192-10 27.4ns ±103% 2.6ns ± 0% -90.38% (p=0.000 n=16+19)
MemclrKnownSize248-10 9.67ns ±43% 3.26ns ± 0% -66.29% (p=0.000 n=19+19)
MemclrKnownSize256-10 85.4ns ±148% 3.3ns ± 0% -96.18% (p=0.000 n=20+20)
MemclrKnownSize512-10 223ns ±54% 6ns ± 0% -97.42% (p=0.000 n=18+20)
MemclrKnownSize1024-10 216ns ±26% 11ns ± 0% -95.00% (p=0.000 n=18+15)
MemclrKnownSize4096-10 265ns ± 2% 88ns ± 0% -66.84% (p=0.000 n=19+17)
MemclrKnownSize512KiB-10 9.91µs ± 1% 10.23µs ± 2% +3.14% (p=0.000 n=19+19)
[Geo mean] 15.6ns 2.5ns -83.62%
name old speed new speed delta
MemclrKnownSize1-10 493MB/s ± 0% 3216MB/s ± 0% +553.04% (p=0.000 n=18+19)
MemclrKnownSize2-10 1.02GB/s ± 0% 6.43GB/s ± 0% +532.33% (p=0.000 n=16+19)
MemclrKnownSize4-10 1.99GB/s ± 0% 12.86GB/s ± 0% +547.67% (p=0.000 n=18+20)
MemclrKnownSize8-10 3.96GB/s ± 0% 25.72GB/s ± 0% +548.81% (p=0.000 n=19+19)
MemclrKnownSize16-10 7.46GB/s ± 0% 51.43GB/s ± 0% +589.72% (p=0.000 n=20+19)
MemclrKnownSize32-10 12.9GB/s ± 0% 102.9GB/s ± 0% +698.60% (p=0.000 n=20+18)
MemclrKnownSize64-10 33.1GB/s ± 0% 103.0GB/s ± 0% +211.34% (p=0.000 n=19+19)
MemclrKnownSize112-10 45.1GB/s ± 0% 62.4GB/s ± 0% +38.38% (p=0.000 n=19+20)
MemclrKnownSize128-10 13.3GB/s ±107% 63.5GB/s ± 0% +378.03% (p=0.000 n=19+18)
MemclrKnownSize192-10 6.97GB/s ±139% 72.72GB/s ± 0% +943.44% (p=0.000 n=19+19)
MemclrKnownSize248-10 25.9GB/s ±46% 76.1GB/s ± 0% +194.16% (p=0.000 n=20+17)
MemclrKnownSize256-10 8.64GB/s ±196% 78.51GB/s ± 0% +808.19% (p=0.000 n=20+20)
MemclrKnownSize512-10 2.33GB/s ±86% 89.13GB/s ± 0% +3719.50% (p=0.000 n=17+20)
MemclrKnownSize1024-10 4.85GB/s ±32% 94.93GB/s ± 0% +1856.74% (p=0.000 n=18+19)
MemclrKnownSize4096-10 15.4GB/s ± 2% 46.6GB/s ± 0% +201.55% (p=0.000 n=19+18)
MemclrKnownSize512KiB-10 52.9GB/s ± 1% 51.3GB/s ± 2% -3.04% (p=0.000 n=19+19)
[Geo mean] 7.54GB/s 42.86GB/s +468.76%
Intel Alder Lake 12600k:
name old time/op new time/op delta
MemclrKnownSize1-16 0.59ns ± 3% 0.38ns ± 6% -36.00% (p=0.000 n=19+18)
MemclrKnownSize2-16 0.57ns ± 1% 0.19ns ± 5% -66.27% (p=0.000 n=19+19)
MemclrKnownSize4-16 0.66ns ± 2% 0.36ns ±21% -45.12% (p=0.000 n=19+20)
MemclrKnownSize8-16 0.74ns ± 1% 0.30ns ±26% -59.81% (p=0.000 n=18+20)
MemclrKnownSize16-16 1.00ns ± 7% 0.21ns ± 8% -79.51% (p=0.000 n=20+19)
MemclrKnownSize32-16 0.95ns ± 1% 0.40ns ± 1% -57.61% (p=0.000 n=20+18)
MemclrKnownSize64-16 1.20ns ± 2% 0.41ns ± 0% -65.82% (p=0.000 n=20+18)
MemclrKnownSize112-16 1.27ns ± 2% 1.03ns ± 0% -19.35% (p=0.000 n=20+18)
MemclrKnownSize128-16 1.34ns ± 2% 1.03ns ± 0% -23.02% (p=0.000 n=20+20)
MemclrKnownSize192-16 1.92ns ± 2% 1.44ns ± 0% -24.89% (p=0.000 n=20+16)
MemclrKnownSize248-16 2.77ns ± 1% 3.29ns ± 0% +18.81% (p=0.000 n=20+16)
MemclrKnownSize256-16 1.92ns ± 1% 1.86ns ± 0% -3.49% (p=0.000 n=19+15)
MemclrKnownSize512-16 2.81ns ± 2% 3.49ns ± 0% +24.15% (p=0.000 n=20+17)
MemclrKnownSize1024-16 4.02ns ± 1% 6.78ns ± 0% +68.44% (p=0.000 n=20+18)
MemclrKnownSize4096-16 17.2ns ± 2% 14.4ns ± 0% -16.73% (p=0.000 n=20+17)
MemclrKnownSize512KiB-16 6.71µs ± 1% 6.52µs ± 0% -2.85% (p=0.000 n=20+18)
[Geo mean] 2.60ns 1.71ns -34.06%
name old speed new speed delta
MemclrKnownSize1-16 1.71GB/s ± 3% 2.67GB/s ± 6% +56.39% (p=0.000 n=19+18)
MemclrKnownSize2-16 3.52GB/s ± 2% 10.43GB/s ± 6% +196.04% (p=0.000 n=20+20)
MemclrKnownSize4-16 6.06GB/s ± 1% 10.83GB/s ±11% +78.63% (p=0.000 n=19+18)
MemclrKnownSize8-16 10.7GB/s ± 1% 27.0GB/s ±21% +151.49% (p=0.000 n=18+20)
MemclrKnownSize16-16 16.0GB/s ± 8% 78.1GB/s ± 7% +387.24% (p=0.000 n=20+19)
MemclrKnownSize32-16 33.6GB/s ± 1% 79.4GB/s ± 1% +135.89% (p=0.000 n=20+18)
MemclrKnownSize64-16 53.3GB/s ± 2% 155.9GB/s ± 0% +192.58% (p=0.000 n=20+18)
MemclrKnownSize112-16 88.0GB/s ± 2% 109.1GB/s ± 0% +23.97% (p=0.000 n=20+18)
MemclrKnownSize128-16 95.3GB/s ± 2% 123.8GB/s ± 0% +29.88% (p=0.000 n=20+20)
MemclrKnownSize192-16 100GB/s ± 2% 133GB/s ± 0% +33.12% (p=0.000 n=20+17)
MemclrKnownSize248-16 89.7GB/s ± 1% 75.5GB/s ± 0% -15.84% (p=0.000 n=20+19)
MemclrKnownSize256-16 133GB/s ± 1% 138GB/s ± 0% +3.61% (p=0.000 n=19+14)
MemclrKnownSize512-16 182GB/s ± 2% 147GB/s ± 0% -19.46% (p=0.000 n=20+17)
MemclrKnownSize1024-16 254GB/s ± 1% 151GB/s ± 0% -40.64% (p=0.000 n=20+18)
MemclrKnownSize4096-16 237GB/s ± 2% 285GB/s ± 0% +20.09% (p=0.000 n=20+17)
MemclrKnownSize512KiB-16 78.2GB/s ± 1% 80.4GB/s ± 0% +2.93% (p=0.000 n=20+18)
[Geo mean] 42.1GB/s 63.8GB/s +51.53%
compilecmp linux/amd64:
runtime
runtime.(*pallocData).allocAll 85 -> 45 (-47.06%)
runtime.(*pageAlloc).allocRange 942 -> 923 (-2.02%)
runtime.(*pageAlloc).free 798 -> 774 (-3.01%)
runtime.(*pageBits).clearAll 66 -> 20 (-69.70%)
runtime.startCheckmarks 255 -> 246 (-3.53%)
runtime.(*pallocData).freeAll 86 -> 46 (-46.51%)
runtime.(*pallocBits).freeAll 66 -> 20 (-69.70%)
runtime.(*consistentHeapStats).unsafeClear 66 -> 19 (-71.21%)
runtime.newproc1 965 -> 933 (-3.32%)
crypto/rc4
crypto/rc4.(*Cipher).Reset 78 -> 69 (-11.54%)
compress/bzip2
compress/bzip2.(*reader).readBlock 2973 -> 2941 (-1.08%)
image/jpeg
image/jpeg.(*decoder).processDHT 1179 -> 1166 (-1.10%)
index/suffixarray
index/suffixarray.bucketMax_8_32 394 -> 241 (-38.83%)
index/suffixarray.freq_8_32 317 -> 185 (-41.64%)
index/suffixarray.freq_8_64 317 -> 178 (-43.85%)
index/suffixarray.bucketMin_8_32 394 -> 243 (-38.32%)
index/suffixarray.bucketMin_8_64 398 -> 234 (-41.21%)
index/suffixarray.bucketMax_8_64 398 -> 234 (-41.21%)
compress/flate
compress/flate.(*huffmanBitWriter).generateCodegen 965 -> 838 (-13.16%)
compress/flate.(*compressor).reset 429 -> 409 (-4.66%)
cmd/vendor/golang.org/x/sys/unix
cmd/vendor/golang.org/x/sys/unix.(*FdSet).Zero 66 -> 60 (-9.09%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetInet4Addr 211 -> 129 (-38.86%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint32 98 -> 14 (-85.71%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).clear 66 -> 11 (-83.33%)
cmd/vendor/golang.org/x/sys/unix.(*Ifreq).SetUint16 101 -> 15 (-85.15%)
cmd/vendor/golang.org/x/sys/unix.(*CPUSet).Zero 66 -> 60 (-9.09%)
internal/coverage/decodemeta
internal/coverage/decodemeta.(*CoverageMetaFileReader).rdUint64 325 -> 293 (-9.85%)
crypto/tls
crypto/tls.(*halfConn).setTrafficSecret 253 -> 247 (-2.37%)
crypto/tls.(*Conn).readRecordOrCCS 10315 -> 10283 (-0.31%)
crypto/tls.(*halfConn).changeCipherSpec 271 -> 261 (-3.69%)
crypto/tls.(*Conn).writeRecordLocked 1765 -> 1748 (-0.96%)
file before after Δ %
runtime.s 512467 512164 -303 -0.059%
crypto/rc4.s 955 946 -9 -0.942%
compress/bzip2.s 9586 9554 -32 -0.334%
image/jpeg.s 32122 32109 -13 -0.040%
index/suffixarray.s 38547 37644 -903 -2.343%
compress/flate.s 46668 46521 -147 -0.315%
cmd/vendor/golang.org/x/sys/unix.s 118620 118301 -319 -0.269%
internal/coverage/decodemeta.s 7224 7192 -32 -0.443%
crypto/tls.s 288762 288697 -65 -0.023%
cmd/compile/internal/ssa.s 3639799 3640727 +928 +0.025%
total 20790248 20789353 -895 -0.004%
src/runtime benchmarks (Linux Alder Lake 12600k):
name old time/op new time/op delta
MakeChan/Byte-16 26.2ns ± 2% 25.6ns ± 3% -2.05% (p=0.003 n=9+10)
MakeChan/Int-16 33.9ns ± 2% 33.3ns ± 4% -1.99% (p=0.015 n=10+10)
MakeChan/Ptr-16 54.2ns ± 2% 53.7ns ± 1% -0.90% (p=0.016 n=10+9)
MakeChan/Struct/0-16 23.8ns ± 3% 23.4ns ± 1% -1.72% (p=0.009 n=10+8)
MakeChan/Struct/32-16 55.9ns ± 2% 53.9ns ± 1% -3.48% (p=0.000 n=10+10)
MakeChan/Struct/40-16 63.5ns ± 1% 61.1ns ± 2% -3.79% (p=0.000 n=10+9)
ChanNonblocking-16 0.22ns ± 0% 0.22ns ± 0% +0.40% (p=0.011 n=9+8)
SelectUncontended-16 4.63ns ± 1% 4.62ns ± 0% -0.35% (p=0.001 n=10+8)
SelectSyncContended-16 1.58µs ± 2% 1.59µs ± 1% ~ (p=0.540 n=10+10)
SelectAsyncContended-16 290ns ± 0% 291ns ± 0% +0.14% (p=0.012 n=8+9)
SelectNonblock-16 0.95ns ± 1% 0.95ns ± 1% ~ (p=0.546 n=9+9)
ChanUncontended-16 239ns ± 3% 242ns ± 6% ~ (p=0.886 n=9+10)
ChanContended-16 17.7µs ± 1% 18.2µs ± 1% +2.87% (p=0.000 n=10+9)
ChanSync-16 109ns ± 2% 109ns ± 1% ~ (p=0.342 n=10+10)
ChanSyncWork-16 6.55µs ± 1% 6.53µs ± 1% ~ (p=0.101 n=10+10)
ChanProdCons0-16 502ns ± 1% 499ns ± 0% -0.55% (p=0.001 n=10+9)
ChanProdCons10-16 373ns ± 2% 377ns ± 1% ~ (p=0.095 n=10+9)
ChanProdCons100-16 224ns ± 2% 223ns ± 3% ~ (p=0.150 n=9+10)
ChanProdConsWork0-16 491ns ± 1% 484ns ± 0% -1.26% (p=0.000 n=10+9)
ChanProdConsWork10-16 451ns ± 2% 448ns ± 2% ~ (p=0.210 n=8+10)
ChanProdConsWork100-16 406ns ± 0% 407ns ± 1% ~ (p=0.138 n=8+8)
SelectProdCons-16 509ns ± 0% 509ns ± 0% ~ (p=0.917 n=9+9)
ReceiveDataFromClosedChan-16 12.1ns ± 0% 12.1ns ± 0% ~ (p=0.780 n=10+10)
ChanCreation-16 22.6ns ± 1% 22.4ns ± 0% -0.72% (p=0.001 n=10+8)
ChanSem-16 165ns ± 1% 166ns ± 1% +0.72% (p=0.002 n=10+10)
ChanPopular-16 500µs ± 2% 498µs ± 1% ~ (p=0.218 n=10+10)
ChanClosed-16 0.29ns ± 0% 0.29ns ± 0% +0.09% (p=0.019 n=9+8)
CallClosure-16 1.28ns ± 0% 1.27ns ± 0% -0.51% (p=0.000 n=9+9)
CallClosure1-16 1.50ns ± 0% 1.50ns ± 0% ~ (p=0.123 n=9+9)
CallClosure2-16 8.86ns ± 1% 8.86ns ± 3% ~ (p=0.590 n=9+10)
CallClosure3-16 8.75ns ± 2% 8.69ns ± 2% ~ (p=0.247 n=10+10)
CallClosure4-16 8.65ns ± 2% 8.56ns ± 2% ~ (p=0.105 n=10+10)
Complex128DivNormal-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.790 n=10+9)
Complex128DivNisNaN-16 4.44ns ± 0% 4.43ns ± 0% ~ (p=0.564 n=10+10)
Complex128DivDisNaN-16 4.48ns ± 0% 4.48ns ± 0% ~ (p=0.101 n=10+10)
Complex128DivNisInf-16 2.58ns ± 0% 2.58ns ± 0% ~ (p=0.808 n=10+10)
Complex128DivDisInf-16 6.30ns ± 0% 6.31ns ± 0% ~ (p=0.305 n=10+10)
SetTypePtr-16 0.73ns ± 1% 0.73ns ± 3% ~ (p=0.644 n=10+10)
SetTypePtr8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.127 n=10+10)
SetTypePtr16-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.109 n=10+10)
SetTypePtr32-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.203 n=9+10)
SetTypePtr64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.696 n=10+10)
SetTypePtr126-16 6.91ns ± 0% 6.91ns ± 0% ~ (p=0.469 n=10+10)
SetTypePtr128-16 6.66ns ± 0% 6.67ns ± 0% ~ (p=0.246 n=9+10)
SetTypePtrSlice-16 54.1ns ± 1% 54.1ns ± 1% ~ (p=0.509 n=9+10)
SetTypeNode1-16 4.13ns ± 1% 4.12ns ± 0% ~ (p=0.342 n=10+10)
SetTypeNode1Slice-16 10.1ns ± 1% 10.0ns ± 1% -1.18% (p=0.000 n=10+10)
SetTypeNode8-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.137 n=8+8)
SetTypeNode8Slice-16 22.6ns ± 0% 22.6ns ± 0% ~ (p=0.423 n=10+10)
SetTypeNode64-16 6.90ns ± 0% 6.91ns ± 0% ~ (p=0.275 n=10+10)
SetTypeNode64Slice-16 173ns ± 0% 173ns ± 0% ~ (p=0.610 n=9+10)
SetTypeNode64Dead-16 5.53ns ± 0% 5.52ns ± 0% ~ (p=0.123 n=10+6)
SetTypeNode64DeadSlice-16 150ns ± 0% 150ns ± 0% ~ (p=0.398 n=10+10)
SetTypeNode124-16 6.90ns ± 0% 6.90ns ± 0% ~ (p=0.779 n=10+10)
SetTypeNode124Slice-16 222ns ± 5% 217ns ± 0% ~ (p=0.302 n=10+10)
SetTypeNode126-16 6.66ns ± 0% 6.66ns ± 0% ~ (p=0.324 n=10+9)
SetTypeNode126Slice-16 218ns ± 0% 218ns ± 0% ~ (p=0.119 n=9+10)
SetTypeNode128-16 9.76ns ± 0% 9.73ns ± 0% -0.31% (p=0.003 n=9+10)
SetTypeNode128Slice-16 279ns ± 0% 278ns ± 0% ~ (p=0.112 n=10+9)
SetTypeNode130-16 9.77ns ± 0% 9.73ns ± 0% -0.33% (p=0.002 n=10+10)
SetTypeNode130Slice-16 284ns ± 0% 284ns ± 0% ~ (p=0.668 n=10+10)
SetTypeNode1024-16 51.2ns ± 0% 51.6ns ± 1% ~ (p=0.080 n=9+9)
SetTypeNode1024Slice-16 1.83µs ± 0% 1.82µs ± 0% ~ (p=0.115 n=10+10)
Allocation-16 4.64µs ± 1% 4.37µs ± 1% -5.69% (p=0.000 n=9+9)
ReadMemStats-16 5.62µs ± 2% 5.55µs ± 5% -1.36% (p=0.050 n=10+10)
WriteBarrier-16 4.95ns ± 3% 4.99ns ± 3% ~ (p=0.255 n=10+10)
BulkWriteBarrier-16 1.69ns ± 2% 1.63ns ± 4% -3.77% (p=0.001 n=10+10)
ScanStackNoLocals-16 12.8ms ± 2% 12.9ms ± 1% +0.72% (p=0.019 n=10+10)
MSpanCountAlloc/bits=64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.124 n=10+10)
MSpanCountAlloc/bits=128-16 2.08ns ± 1% 2.06ns ± 1% -0.87% (p=0.000 n=10+10)
MSpanCountAlloc/bits=256-16 2.71ns ± 1% 2.69ns ± 1% -0.74% (p=0.001 n=10+9)
MSpanCountAlloc/bits=512-16 4.15ns ± 0% 4.23ns ± 2% +2.15% (p=0.000 n=10+10)
MSpanCountAlloc/bits=1024-16 7.89ns ± 1% 7.89ns ± 1% ~ (p=0.867 n=10+10)
Hash5-16 1.93ns ± 1% 2.01ns ± 0% +3.99% (p=0.000 n=10+8)
Hash16-16 2.04ns ± 1% 2.21ns ± 1% +8.61% (p=0.000 n=10+10)
Hash64-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.154 n=9+9)
Hash1024-16 16.4ns ± 0% 16.4ns ± 0% +0.17% (p=0.020 n=9+10)
Hash65536-16 886ns ± 0% 885ns ± 0% ~ (p=0.725 n=10+10)
AlignedLoad-16 0.96ns ± 2% 0.95ns ± 3% ~ (p=0.123 n=10+10)
UnalignedLoad-16 0.95ns ± 2% 1.01ns ± 2% +6.31% (p=0.000 n=10+10)
EqEfaceConcrete-16 0.31ns ± 3% 0.33ns ± 5% +8.10% (p=0.000 n=10+10)
EqIfaceConcrete-16 0.31ns ±13% 0.28ns ± 2% -9.23% (p=0.001 n=10+10)
NeEfaceConcrete-16 0.29ns ± 1% 0.31ns ± 7% +5.59% (p=0.010 n=8+8)
NeIfaceConcrete-16 0.28ns ± 2% 0.29ns ± 1% +4.49% (p=0.000 n=9+8)
ConvT2EByteSized/bool-16 0.53ns ± 1% 0.52ns ± 1% -2.18% (p=0.000 n=10+10)
ConvT2EByteSized/uint8-16 0.53ns ± 1% 0.53ns ± 0% +1.22% (p=0.000 n=10+10)
ConvT2ESmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.774 n=9+9)
ConvT2EUintptr-16 1.03ns ± 0% 1.04ns ± 0% +0.50% (p=0.000 n=10+8)
ConvT2ELarge-16 14.4ns ± 2% 14.4ns ± 1% ~ (p=0.726 n=10+10)
ConvT2ISmall-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.693 n=9+10)
ConvT2IUintptr-16 1.03ns ± 0% 1.03ns ± 0% +0.44% (p=0.000 n=10+10)
ConvT2ILarge-16 14.2ns ± 1% 14.4ns ± 1% +0.85% (p=0.007 n=9+10)
ConvI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.39% (p=0.037 n=10+8)
ConvI2I-16 2.68ns ± 0% 2.70ns ± 1% +0.73% (p=0.000 n=9+10)
AssertE2T-16 0.28ns ± 1% 0.39ns ± 5% +37.38% (p=0.000 n=10+10)
AssertE2TLarge-16 0.42ns ± 2% 0.48ns ± 1% +14.92% (p=0.000 n=9+10)
AssertE2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.352 n=9+9)
AssertI2T-16 0.37ns ± 3% 0.34ns ± 1% -6.16% (p=0.000 n=10+10)
AssertI2I-16 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.286 n=10+10)
AssertI2E-16 0.54ns ± 1% 0.54ns ± 0% -0.94% (p=0.000 n=10+10)
AssertE2E-16 0.41ns ± 0% 0.41ns ± 1% ~ (p=0.880 n=9+9)
AssertE2T2-16 0.41ns ± 1% 0.41ns ± 1% ~ (p=0.725 n=10+10)
AssertE2T2Blank-16 0.24ns ± 5% 0.21ns ± 1% -14.79% (p=0.000 n=10+9)
AssertI2E2-16 0.69ns ± 0% 0.69ns ± 1% ~ (p=0.541 n=10+10)
AssertI2E2Blank-16 0.26ns ± 9% 0.21ns ± 1% -18.86% (p=0.000 n=10+9)
AssertE2E2-16 0.53ns ± 1% 0.53ns ± 1% +0.72% (p=0.004 n=10+10)
AssertE2E2Blank-16 0.23ns ± 4% 0.21ns ± 1% -8.42% (p=0.000 n=10+10)
ConvT2Ezero/zero/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.583 n=9+10)
ConvT2Ezero/zero/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.417 n=10+10)
ConvT2Ezero/zero/64-16 1.03ns ± 1% 1.03ns ± 0% ~ (p=0.051 n=10+10)
ConvT2Ezero/zero/str-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.132 n=10+10)
ConvT2Ezero/zero/slice-16 1.14ns ± 0% 1.15ns ± 0% +0.49% (p=0.001 n=10+10)
ConvT2Ezero/zero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.171 n=10+10)
ConvT2Ezero/nonzero/str-16 19.4ns ± 1% 19.3ns ± 3% ~ (p=0.548 n=9+10)
ConvT2Ezero/nonzero/slice-16 22.2ns ± 2% 22.0ns ± 2% ~ (p=0.109 n=10+10)
ConvT2Ezero/nonzero/big-16 123ns ± 1% 123ns ± 1% ~ (p=0.446 n=10+8)
ConvT2Ezero/smallint/16-16 1.13ns ± 0% 1.14ns ± 1% ~ (p=0.362 n=10+10)
ConvT2Ezero/smallint/32-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.907 n=10+9)
ConvT2Ezero/smallint/64-16 1.04ns ± 0% 1.03ns ± 0% -0.38% (p=0.002 n=10+10)
ConvT2Ezero/largeint/16-16 6.65ns ± 1% 6.65ns ± 2% ~ (p=0.618 n=10+9)
ConvT2Ezero/largeint/32-16 6.75ns ± 3% 6.63ns ± 2% -1.77% (p=0.015 n=10+10)
ConvT2Ezero/largeint/64-16 9.19ns ± 1% 9.26ns ± 2% ~ (p=0.123 n=10+10)
Malloc8-16 8.66ns ± 1% 8.89ns ± 2% +2.74% (p=0.000 n=10+10)
Malloc16-16 13.7ns ± 1% 13.8ns ± 1% +0.71% (p=0.022 n=10+8)
MallocTypeInfo8-16 11.7ns ± 3% 11.6ns ± 2% ~ (p=0.469 n=10+10)
MallocTypeInfo16-16 18.3ns ± 1% 18.2ns ± 2% ~ (p=0.251 n=9+10)
MallocLargeStruct-16 195ns ± 1% 198ns ± 1% +1.65% (p=0.000 n=9+10)
GoroutineSelect-16 1.10ms ± 1% 1.12ms ± 1% +1.36% (p=0.000 n=10+8)
GoroutineBlocking-16 986µs ± 1% 998µs ± 1% +1.23% (p=0.002 n=10+10)
GoroutineForRange-16 985µs ± 1% 1001µs ± 1% +1.68% (p=0.000 n=10+10)
GoroutineIdle-16 679µs ± 1% 691µs ± 0% +1.74% (p=0.000 n=10+9)
HashStringSpeed-16 5.33ns ± 5% 5.19ns ± 4% ~ (p=0.113 n=9+9)
HashBytesSpeed-16 8.20ns ± 3% 8.24ns ± 1% ~ (p=0.497 n=10+9)
HashInt32Speed-16 4.01ns ± 2% 3.90ns ± 4% -2.63% (p=0.011 n=9+10)
HashInt64Speed-16 3.94ns ± 4% 3.79ns ± 1% -3.74% (p=0.003 n=10+9)
HashStringArraySpeed-16 12.5ns ± 4% 12.3ns ± 1% ~ (p=0.055 n=10+10)
MegMap-16 3.72ns ± 1% 3.73ns ± 1% ~ (p=0.484 n=9+10)
MegOneMap-16 2.28ns ± 1% 2.27ns ± 1% ~ (p=0.287 n=10+10)
MegEqMap-16 22.0µs ± 3% 22.3µs ± 2% +1.48% (p=0.028 n=10+9)
MegEmptyMap-16 0.93ns ± 1% 0.92ns ± 1% -0.52% (p=0.030 n=10+10)
SmallStrMap-16 3.77ns ± 0% 3.77ns ± 0% ~ (p=0.324 n=10+10)
MapStringKeysEight_16-16 3.91ns ± 0% 3.91ns ± 0% ~ (p=0.088 n=9+9)
MapStringKeysEight_32-16 3.58ns ± 1% 3.50ns ± 0% -2.11% (p=0.000 n=10+10)
MapStringKeysEight_64-16 3.58ns ± 1% 3.50ns ± 0% -2.23% (p=0.000 n=10+10)
MapStringKeysEight_1M-16 3.57ns ± 1% 3.50ns ± 0% -1.92% (p=0.000 n=10+10)
IntMap-16 2.89ns ± 1% 2.89ns ± 0% ~ (p=0.381 n=10+10)
MapFirst/1-16 1.60ns ± 1% 1.59ns ± 2% -0.49% (p=0.020 n=10+9)
MapFirst/2-16 1.61ns ± 0% 1.59ns ± 1% -1.17% (p=0.001 n=10+10)
MapFirst/3-16 1.61ns ± 1% 1.59ns ± 1% -1.45% (p=0.000 n=10+10)
MapFirst/4-16 1.60ns ± 1% 1.59ns ± 1% -1.16% (p=0.000 n=10+10)
MapFirst/5-16 1.60ns ± 1% 1.58ns ± 1% -0.98% (p=0.000 n=10+10)
MapFirst/6-16 1.60ns ± 1% 1.59ns ± 1% -0.87% (p=0.001 n=10+10)
MapFirst/7-16 1.60ns ± 1% 1.59ns ± 1% -0.79% (p=0.002 n=10+10)
MapFirst/8-16 1.60ns ± 1% 1.59ns ± 1% -0.67% (p=0.017 n=9+10)
MapFirst/9-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.492 n=10+10)
MapFirst/10-16 2.83ns ± 0% 2.84ns ± 0% +0.24% (p=0.017 n=10+10)
MapFirst/11-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.445 n=10+10)
MapFirst/12-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.564 n=10+10)
MapFirst/13-16 2.83ns ± 0% 2.84ns ± 0% ~ (p=0.175 n=9+10)
MapFirst/14-16 2.83ns ± 0% 2.83ns ± 0% ~ (p=0.322 n=10+9)
MapFirst/15-16 2.83ns ± 0% 2.84ns ± 1% ~ (p=0.209 n=10+10)
MapFirst/16-16 2.83ns ± 1% 2.84ns ± 0% ~ (p=0.238 n=10+10)
MapMid/1-16 1.64ns ± 0% 1.64ns ± 0% ~ (p=0.453 n=10+9)
MapMid/2-16 1.86ns ± 1% 1.86ns ± 0% ~ (p=0.764 n=10+9)
MapMid/3-16 1.86ns ± 0% 1.86ns ± 1% ~ (p=1.000 n=10+10)
MapMid/4-16 2.06ns ± 0% 2.06ns ± 0% -0.27% (p=0.014 n=10+9)
MapMid/5-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.075 n=9+10)
MapMid/6-16 2.27ns ± 0% 2.27ns ± 1% ~ (p=0.898 n=10+10)
MapMid/7-16 2.27ns ± 1% 2.26ns ± 0% -0.23% (p=0.049 n=10+10)
MapMid/8-16 2.47ns ± 0% 2.47ns ± 1% ~ (p=0.840 n=10+10)
MapMid/9-16 4.21ns ± 7% 4.13ns ±19% ~ (p=0.315 n=10+10)
MapMid/10-16 4.17ns ± 7% 4.31ns ± 5% +3.37% (p=0.021 n=10+9)
MapMid/11-16 4.18ns ± 7% 4.32ns ± 6% +3.50% (p=0.015 n=10+10)
MapMid/12-16 4.34ns ± 7% 4.30ns ± 5% ~ (p=0.858 n=9+10)
MapMid/13-16 4.25ns ± 6% 4.28ns ± 6% ~ (p=0.489 n=9+9)
MapMid/14-16 3.75ns ±23% 3.90ns ±16% ~ (p=0.353 n=10+10)
MapMid/15-16 3.87ns ±25% 3.95ns ±26% ~ (p=0.315 n=10+10)
MapMid/16-16 4.06ns ±19% 3.94ns ±16% ~ (p=0.796 n=10+10)
MapLast/1-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.607 n=10+10)
MapLast/2-16 1.86ns ± 0% 1.86ns ± 0% +0.26% (p=0.029 n=10+10)
MapLast/3-16 2.06ns ± 1% 2.06ns ± 0% ~ (p=0.689 n=8+9)
MapLast/4-16 2.27ns ± 1% 2.26ns ± 0% ~ (p=0.148 n=10+9)
MapLast/5-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.385 n=9+10)
MapLast/6-16 2.67ns ± 0% 2.68ns ± 0% ~ (p=0.202 n=9+10)
MapLast/7-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.751 n=10+10)
MapLast/8-16 3.08ns ± 0% 3.08ns ± 0% ~ (p=0.826 n=10+9)
MapLast/9-16 4.31ns ± 6% 4.54ns ± 5% ~ (p=0.070 n=9+8)
MapLast/10-16 4.25ns ± 5% 4.42ns ± 6% ~ (p=0.321 n=9+8)
MapLast/11-16 4.59ns ±16% 5.42ns ±44% +17.99% (p=0.019 n=10+10)
MapLast/12-16 5.04ns ±19% 6.11ns ±28% +21.35% (p=0.005 n=9+10)
MapLast/13-16 6.00ns ±35% 5.76ns ± 3% ~ (p=0.173 n=10+8)
MapLast/14-16 4.27ns ± 5% 4.53ns ± 6% +6.14% (p=0.007 n=10+10)
MapLast/15-16 4.41ns ± 1% 4.44ns ± 7% ~ (p=0.515 n=8+10)
MapLast/16-16 4.18ns ± 6% 4.99ns ±18% +19.48% (p=0.000 n=10+10)
MapCycle-16 7.48ns ± 2% 7.46ns ± 1% ~ (p=0.699 n=10+10)
RepeatedLookupStrMapKey32-16 6.98ns ± 3% 6.73ns ± 2% -3.63% (p=0.000 n=10+10)
RepeatedLookupStrMapKey1M-16 14.7µs ± 5% 14.7µs ± 4% ~ (p=0.604 n=9+10)
MakeMap/[Byte]Byte-16 58.5ns ± 1% 58.5ns ± 1% ~ (p=0.780 n=10+9)
MakeMap/[Int]Int-16 113ns ± 0% 113ns ± 1% ~ (p=0.100 n=8+10)
NewEmptyMap-16 2.47ns ± 0% 2.47ns ± 0% ~ (p=0.638 n=10+10)
NewSmallMap-16 11.5ns ± 1% 11.6ns ± 0% +1.18% (p=0.000 n=10+10)
MapIter-16 42.2ns ± 0% 42.8ns ± 1% +1.50% (p=0.000 n=10+10)
MapIterEmpty-16 1.85ns ± 0% 1.85ns ± 0% ~ (p=0.651 n=10+10)
SameLengthMap-16 1.85ns ± 1% 1.85ns ± 0% ~ (p=0.247 n=10+10)
BigKeyMap-16 7.18ns ± 1% 7.42ns ± 4% +3.33% (p=0.004 n=10+10)
BigValMap-16 7.03ns ± 2% 7.19ns ± 1% +2.33% (p=0.000 n=10+9)
SmallKeyMap-16 5.32ns ± 1% 5.24ns ± 1% -1.41% (p=0.000 n=10+10)
MapPopulate/1-16 6.30ns ± 0% 6.41ns ± 1% +1.81% (p=0.000 n=8+10)
MapPopulate/10-16 239ns ± 2% 234ns ± 2% -2.05% (p=0.001 n=9+10)
MapPopulate/100-16 4.19µs ± 2% 4.22µs ± 2% ~ (p=0.171 n=10+10)
MapPopulate/1000-16 52.3µs ± 1% 52.5µs ± 1% ~ (p=0.133 n=9+10)
MapPopulate/10000-16 459µs ± 1% 466µs ± 2% +1.45% (p=0.005 n=10+10)
MapPopulate/100000-16 4.22ms ± 2% 4.25ms ± 2% ~ (p=0.393 n=10+10)
ComplexAlgMap-16 12.5ns ± 1% 12.4ns ± 1% -0.95% (p=0.022 n=10+10)
GoMapClear/Reflexive/1-16 9.61ns ± 1% 9.58ns ± 0% -0.27% (p=0.027 n=10+10)
GoMapClear/Reflexive/10-16 10.0ns ± 1% 10.0ns ± 1% ~ (p=0.648 n=9+9)
GoMapClear/Reflexive/100-16 31.4ns ± 0% 31.4ns ± 1% ~ (p=0.305 n=9+10)
GoMapClear/Reflexive/1000-16 147ns ± 0% 149ns ± 2% +1.21% (p=0.000 n=10+10)
GoMapClear/Reflexive/10000-16 3.99µs ± 0% 4.00µs ± 0% +0.21% (p=0.018 n=9+10)
GoMapClear/NonReflexive/1-16 41.4ns ± 2% 41.7ns ± 1% +0.55% (p=0.043 n=9+10)
GoMapClear/NonReflexive/10-16 50.3ns ± 1% 50.9ns ± 1% +1.16% (p=0.000 n=10+10)
GoMapClear/NonReflexive/100-16 125ns ± 0% 126ns ± 0% +0.96% (p=0.000 n=8+10)
GoMapClear/NonReflexive/1000-16 1.08µs ± 0% 1.08µs ± 1% ~ (p=0.097 n=10+10)
GoMapClear/NonReflexive/10000-16 8.18µs ± 2% 8.10µs ± 0% -0.91% (p=0.019 n=10+8)
MapStringConversion/32/simple-16 4.66ns ± 1% 4.69ns ± 3% ~ (p=0.905 n=9+10)
MapStringConversion/32/struct-16 4.65ns ± 3% 4.94ns ± 2% +6.23% (p=0.000 n=10+10)
MapStringConversion/32/array-16 4.69ns ± 3% 4.72ns ± 3% ~ (p=0.631 n=10+10)
MapStringConversion/64/simple-16 4.14ns ± 0% 4.14ns ± 1% ~ (p=0.342 n=10+10)
MapStringConversion/64/struct-16 4.13ns ± 0% 4.13ns ± 0% ~ (p=0.809 n=10+10)
MapStringConversion/64/array-16 4.13ns ± 1% 4.13ns ± 1% ~ (p=0.752 n=10+10)
MapInterfaceString-16 7.90ns ±23% 8.51ns ±33% ~ (p=0.604 n=9+10)
MapInterfacePtr-16 7.68ns ±29% 7.10ns ±36% ~ (p=0.353 n=10+10)
NewEmptyMapHintLessThan8-16 3.70ns ± 0% 3.70ns ± 0% ~ (p=0.209 n=10+10)
NewEmptyMapHintGreaterThan8-16 270ns ± 1% 272ns ± 1% +0.71% (p=0.005 n=10+9)
MapPop100-16 6.45µs ± 0% 6.50µs ± 1% +0.77% (p=0.000 n=10+10)
MapPop1000-16 114µs ± 1% 114µs ± 1% ~ (p=0.190 n=10+10)
MapPop10000-16 2.28ms ± 2% 2.28ms ± 2% ~ (p=0.912 n=10+10)
MapAssign/Int32/256-16 4.75ns ± 2% 4.82ns ± 4% ~ (p=0.101 n=10+10)
MapAssign/Int32/65536-16 16.4ns ± 1% 16.7ns ± 0% +1.44% (p=0.000 n=10+9)
MapAssign/Int64/256-16 4.79ns ± 5% 4.79ns ± 1% ~ (p=0.616 n=10+8)
MapAssign/Int64/65536-16 17.1ns ± 1% 16.8ns ± 0% -1.28% (p=0.000 n=10+8)
MapAssign/Str/256-16 6.07ns ± 6% 6.24ns ± 2% +2.84% (p=0.035 n=10+9)
MapAssign/Str/65536-16 21.4ns ± 0% 21.4ns ± 3% ~ (p=0.300 n=7+10)
MapOperatorAssign/Int32/256-16 4.82ns ± 3% 4.81ns ± 3% ~ (p=0.684 n=10+10)
MapOperatorAssign/Int32/65536-16 16.8ns ± 1% 16.5ns ± 1% -1.68% (p=0.000 n=9+10)
MapOperatorAssign/Int64/256-16 4.74ns ± 1% 4.77ns ± 3% ~ (p=0.563 n=10+9)
MapOperatorAssign/Int64/65536-16 16.9ns ± 1% 17.2ns ± 1% +1.88% (p=0.000 n=10+10)
MapOperatorAssign/Str/256-16 1.09µs ± 1% 1.10µs ± 2% ~ (p=0.210 n=10+10)
MapOperatorAssign/Str/65536-16 184ns ± 9% 184ns ± 8% ~ (p=0.922 n=10+9)
MapAppendAssign/Int32/256-16 13.8ns ±10% 14.4ns ±11% ~ (p=0.190 n=10+10)
MapAppendAssign/Int32/65536-16 28.9ns ± 5% 30.7ns ± 6% +6.13% (p=0.001 n=9+10)
MapAppendAssign/Int64/256-16 14.5ns ±12% 13.8ns ± 8% -5.02% (p=0.037 n=10+10)
MapAppendAssign/Int64/65536-16 30.9ns ± 1% 30.4ns ± 2% -1.56% (p=0.001 n=10+10)
MapAppendAssign/Str/256-16 30.2ns ± 6% 30.0ns ±10% ~ (p=0.645 n=10+10)
MapAppendAssign/Str/65536-16 44.5ns ± 4% 46.8ns ± 3% +5.17% (p=0.001 n=8+9)
MapDelete/Int32/100-16 18.7ns ± 0% 18.7ns ± 0% -0.27% (p=0.017 n=10+10)
MapDelete/Int32/1000-16 17.6ns ± 1% 17.5ns ± 1% -0.85% (p=0.000 n=9+10)
MapDelete/Int32/10000-16 18.7ns ± 0% 18.3ns ± 1% -1.92% (p=0.000 n=10+10)
MapDelete/Int64/100-16 19.1ns ± 0% 19.2ns ± 0% +0.68% (p=0.000 n=10+9)
MapDelete/Int64/1000-16 17.7ns ± 2% 18.3ns ± 1% +3.00% (p=0.000 n=10+10)
MapDelete/Int64/10000-16 18.8ns ± 1% 19.2ns ± 0% +2.01% (p=0.000 n=10+9)
MapDelete/Str/100-16 26.5ns ± 0% 26.4ns ± 1% -0.73% (p=0.000 n=10+10)
MapDelete/Str/1000-16 23.5ns ± 2% 23.4ns ± 1% ~ (p=0.425 n=10+10)
MapDelete/Str/10000-16 25.1ns ± 0% 25.1ns ± 1% +0.28% (p=0.037 n=10+10)
MapDelete/Pointer/100-16 20.6ns ± 1% 20.6ns ± 0% ~ (p=0.117 n=10+10)
MapDelete/Pointer/1000-16 19.2ns ± 1% 19.4ns ± 1% +0.97% (p=0.004 n=10+10)
MapDelete/Pointer/10000-16 20.0ns ± 0% 20.1ns ± 1% +0.52% (p=0.022 n=10+10)
Memmove/0-16 0.21ns ± 2% 0.21ns ± 1% ~ (p=0.671 n=10+10)
Memmove/1-16 0.93ns ± 0% 0.93ns ± 0% +0.21% (p=0.034 n=10+10)
Memmove/2-16 0.93ns ± 0% 0.93ns ± 0% ~ (p=0.101 n=10+10)
Memmove/3-16 0.93ns ± 1% 0.93ns ± 1% +0.49% (p=0.004 n=10+10)
Memmove/4-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.260 n=10+10)
Memmove/5-16 1.13ns ± 0% 1.13ns ± 0% +0.20% (p=0.034 n=10+10)
Memmove/6-16 1.13ns ± 0% 1.13ns ± 1% ~ (p=0.126 n=10+10)
Memmove/7-16 1.13ns ± 0% 1.13ns ± 1% +0.22% (p=0.028 n=10+10)
Memmove/8-16 1.13ns ± 0% 1.13ns ± 0% ~ (p=0.545 n=9+10)
Memmove/9-16 1.25ns ± 0% 1.35ns ± 0% +7.98% (p=0.000 n=10+10)
Memmove/10-16 1.25ns ± 0% 1.35ns ± 0% +7.96% (p=0.000 n=9+9)
Memmove/11-16 1.25ns ± 0% 1.35ns ± 0% +8.53% (p=0.000 n=10+9)
Memmove/12-16 1.25ns ± 0% 1.35ns ± 1% +8.24% (p=0.000 n=10+10)
Memmove/13-16 1.25ns ± 0% 1.34ns ± 0% +7.75% (p=0.000 n=10+10)
Memmove/14-16 1.25ns ± 0% 1.35ns ± 1% +8.28% (p=0.000 n=10+9)
Memmove/15-16 1.25ns ± 0% 1.35ns ± 0% +8.07% (p=0.000 n=10+9)
Memmove/16-16 1.25ns ± 0% 1.35ns ± 1% +8.35% (p=0.000 n=9+10)
Memmove/32-16 1.34ns ± 0% 1.36ns ± 1% +1.22% (p=0.000 n=10+10)
Memmove/64-16 1.45ns ± 0% 1.64ns ± 0% +13.07% (p=0.000 n=10+9)
Memmove/128-16 1.86ns ± 0% 2.02ns ± 0% +8.64% (p=0.000 n=10+10)
Memmove/256-16 2.47ns ± 0% 2.49ns ± 1% +1.14% (p=0.000 n=10+10)
Memmove/512-16 3.96ns ± 1% 3.96ns ± 0% ~ (p=0.182 n=10+10)
Memmove/1024-16 5.90ns ± 1% 5.87ns ± 1% ~ (p=0.258 n=9+9)
Memmove/2048-16 9.62ns ± 1% 9.62ns ± 2% ~ (p=0.963 n=8+9)
Memmove/4096-16 16.4ns ± 0% 17.1ns ± 4% +4.19% (p=0.003 n=8+9)
MemmoveOverlap/32-16 1.62ns ± 1% 1.68ns ± 1% +3.53% (p=0.000 n=10+10)
MemmoveOverlap/64-16 1.64ns ± 0% 1.65ns ± 0% +0.29% (p=0.002 n=9+9)
MemmoveOverlap/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.070 n=10+10)
MemmoveOverlap/256-16 2.67ns ± 0% 2.67ns ± 0% +0.26% (p=0.012 n=10+10)
MemmoveOverlap/512-16 6.20ns ±18% 5.74ns ± 0% ~ (p=0.645 n=10+8)
MemmoveOverlap/1024-16 7.28ns ± 0% 7.30ns ± 0% +0.28% (p=0.006 n=8+10)
MemmoveOverlap/2048-16 11.9ns ± 0% 12.0ns ± 1% +0.37% (p=0.014 n=9+9)
MemmoveOverlap/4096-16 23.3ns ± 1% 23.1ns ± 1% -0.84% (p=0.000 n=8+10)
MemmoveUnalignedDst/0-16 1.03ns ± 0% 1.03ns ± 0% +0.19% (p=0.007 n=10+10)
MemmoveUnalignedDst/1-16 1.24ns ± 0% 1.25ns ± 1% +0.52% (p=0.022 n=10+10)
MemmoveUnalignedDst/2-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.051 n=10+10)
MemmoveUnalignedDst/3-16 1.23ns ± 0% 1.23ns ± 0% +0.14% (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16 1.23ns ± 0% 1.24ns ± 1% +0.37% (p=0.004 n=10+10)
MemmoveUnalignedDst/5-16 1.35ns ± 0% 1.35ns ± 0% ~ (p=0.075 n=10+10)
MemmoveUnalignedDst/6-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.779 n=10+10)
MemmoveUnalignedDst/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16 1.34ns ± 0% 1.35ns ± 1% +0.39% (p=0.024 n=10+10)
MemmoveUnalignedDst/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.849 n=10+10)
MemmoveUnalignedDst/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.255 n=10+10)
MemmoveUnalignedDst/11-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.304 n=10+10)
MemmoveUnalignedDst/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.672 n=10+10)
MemmoveUnalignedDst/13-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.435 n=10+10)
MemmoveUnalignedDst/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.340 n=10+10)
MemmoveUnalignedDst/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.911 n=10+9)
MemmoveUnalignedDst/16-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.074 n=10+10)
MemmoveUnalignedDst/32-16 1.62ns ± 0% 1.63ns ± 0% ~ (p=0.059 n=10+10)
MemmoveUnalignedDst/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.234 n=10+10)
MemmoveUnalignedDst/128-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.709 n=10+9)
MemmoveUnalignedDst/256-16 3.69ns ± 0% 3.70ns ± 0% ~ (p=0.144 n=10+10)
MemmoveUnalignedDst/512-16 4.15ns ± 1% 4.14ns ± 0% ~ (p=0.778 n=10+8)
MemmoveUnalignedDst/1024-16 7.52ns ± 0% 7.53ns ± 1% ~ (p=0.650 n=9+9)
MemmoveUnalignedDst/2048-16 12.9ns ± 0% 12.9ns ± 1% ~ (p=0.548 n=8+8)
MemmoveUnalignedDst/4096-16 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.947 n=9+9)
MemmoveUnalignedDstOverlap/32-16 4.08ns ± 0% 4.09ns ± 0% ~ (p=0.360 n=10+10)
MemmoveUnalignedDstOverlap/64-16 4.56ns ± 0% 4.56ns ± 0% ~ (p=0.705 n=10+9)
MemmoveUnalignedDstOverlap/128-16 4.67ns ± 0% 4.67ns ± 0% ~ (p=0.397 n=10+10)
MemmoveUnalignedDstOverlap/256-16 5.08ns ± 0% 5.08ns ± 0% ~ (p=0.159 n=10+9)
MemmoveUnalignedDstOverlap/512-16 8.45ns ± 5% 8.19ns ± 0% -3.10% (p=0.021 n=10+9)
MemmoveUnalignedDstOverlap/1024-16 9.55ns ± 0% 9.56ns ± 0% ~ (p=0.221 n=8+8)
MemmoveUnalignedDstOverlap/2048-16 14.0ns ± 0% 14.0ns ± 1% ~ (p=0.200 n=10+9)
MemmoveUnalignedDstOverlap/4096-16 26.5ns ± 0% 26.5ns ± 0% ~ (p=0.458 n=10+9)
MemmoveUnalignedSrc/0-16 1.02ns ± 1% 0.99ns ± 1% -2.67% (p=0.000 n=10+9)
MemmoveUnalignedSrc/1-16 1.13ns ± 0% 1.13ns ± 1% -0.25% (p=0.027 n=10+9)
MemmoveUnalignedSrc/2-16 1.13ns ± 1% 1.13ns ± 0% -0.28% (p=0.012 n=10+9)
MemmoveUnalignedSrc/3-16 1.24ns ± 1% 1.23ns ± 0% -0.25% (p=0.022 n=9+10)
MemmoveUnalignedSrc/4-16 1.24ns ± 0% 1.23ns ± 1% ~ (p=0.118 n=9+10)
MemmoveUnalignedSrc/5-16 1.34ns ± 0% 1.34ns ± 1% ~ (p=0.564 n=8+10)
MemmoveUnalignedSrc/6-16 1.34ns ± 0% 1.34ns ± 0% -0.39% (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.235 n=10+10)
MemmoveUnalignedSrc/8-16 1.34ns ± 0% 1.34ns ± 0% -0.37% (p=0.002 n=10+9)
MemmoveUnalignedSrc/9-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.579 n=10+9)
MemmoveUnalignedSrc/10-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.534 n=10+9)
MemmoveUnalignedSrc/11-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.415 n=10+10)
MemmoveUnalignedSrc/12-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16 1.44ns ± 0% 1.44ns ± 1% ~ (p=0.693 n=10+10)
MemmoveUnalignedSrc/14-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.901 n=10+10)
MemmoveUnalignedSrc/15-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.379 n=10+10)
MemmoveUnalignedSrc/16-16 1.44ns ± 1% 1.44ns ± 0% ~ (p=0.538 n=10+10)
MemmoveUnalignedSrc/32-16 1.60ns ± 1% 1.60ns ± 0% ~ (p=0.491 n=10+10)
MemmoveUnalignedSrc/64-16 1.65ns ± 0% 1.65ns ± 0% ~ (p=0.564 n=10+10)
MemmoveUnalignedSrc/128-16 2.09ns ± 0% 2.09ns ± 0% ~ (p=0.497 n=10+9)
MemmoveUnalignedSrc/256-16 2.70ns ± 0% 2.78ns ± 1% +2.81% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16 4.31ns ± 0% 4.30ns ± 0% -0.26% (p=0.031 n=8+9)
MemmoveUnalignedSrc/1024-16 7.28ns ± 0% 7.21ns ± 1% -1.05% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16 13.0ns ± 0% 13.0ns ± 0% ~ (p=0.180 n=9+8)
MemmoveUnalignedSrc/4096-16 25.4ns ± 0% 25.3ns ± 1% ~ (p=0.054 n=10+10)
MemmoveUnalignedSrcOverlap/32-16 4.04ns ± 0% 4.06ns ± 0% +0.62% (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16 4.12ns ± 0% 4.12ns ± 0% ~ (p=0.421 n=10+10)
MemmoveUnalignedSrcOverlap/128-16 4.53ns ± 0% 4.52ns ± 0% ~ (p=0.251 n=10+10)
MemmoveUnalignedSrcOverlap/256-16 6.17ns ± 0% 6.15ns ± 0% -0.35% (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16 7.43ns ± 0% 7.44ns ± 0% ~ (p=0.524 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16 8.94ns ± 0% 8.94ns ± 0% ~ (p=0.419 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16 13.2ns ± 0% 14.5ns ±21% ~ (p=0.107 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16 25.6ns ± 0% 25.6ns ± 1% ~ (p=0.650 n=9+9)
Memclr/5-16 0.86ns ± 1% 0.86ns ± 2% ~ (p=0.531 n=9+9)
Memclr/16-16 1.04ns ± 0% 1.04ns ± 0% +0.32% (p=0.013 n=9+10)
Memclr/64-16 1.23ns ± 0% 1.26ns ± 0% +2.28% (p=0.000 n=10+10)
Memclr/256-16 2.27ns ± 0% 2.27ns ± 0% ~ (p=0.127 n=10+10)
Memclr/4096-16 17.1ns ± 1% 17.3ns ± 0% +0.88% (p=0.000 n=10+10)
Memclr/65536-16 821ns ± 0% 822ns ± 0% ~ (p=0.516 n=10+10)
Memclr/1M-16 14.1µs ± 1% 14.0µs ± 1% ~ (p=0.516 n=10+10)
Memclr/4M-16 86.1µs ± 1% 85.9µs ± 0% ~ (p=0.123 n=10+10)
Memclr/8M-16 174µs ± 2% 173µs ± 0% ~ (p=0.408 n=10+8)
Memclr/16M-16 385µs ± 4% 387µs ± 0% ~ (p=0.173 n=10+8)
Memclr/64M-16 2.18ms ± 0% 2.19ms ± 0% ~ (p=0.113 n=10+9)
GoMemclr/5-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.346 n=9+10)
GoMemclr/16-16 1.02ns ± 0% 1.02ns ± 0% +0.22% (p=0.003 n=10+8)
GoMemclr/64-16 1.14ns ± 0% 1.14ns ± 0% ~ (p=0.948 n=10+9)
GoMemclr/256-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.868 n=10+10)
MemclrRange/1K_2K-16 457ns ± 0% 428ns ± 1% -6.38% (p=0.000 n=10+10)
MemclrRange/2K_8K-16 1.46µs ± 0% 1.46µs ± 0% ~ (p=0.700 n=10+10)
MemclrRange/4K_16K-16 1.16µs ± 0% 1.16µs ± 0% ~ (p=0.567 n=9+10)
MemclrRange/160K_228K-16 20.7µs ± 0% 20.7µs ± 0% ~ (p=0.160 n=10+10)
ClearFat7-16 0.38ns ± 5% 0.21ns ± 1% -45.79% (p=0.000 n=9+10)
ClearFat8-16 0.21ns ± 3% 0.12ns ± 2% -44.16% (p=0.000 n=8+9)
ClearFat11-16 0.35ns ± 3% 0.21ns ± 1% -40.46% (p=0.000 n=9+9)
ClearFat12-16 0.23ns ± 9% 0.21ns ± 1% -10.23% (p=0.000 n=10+9)
ClearFat13-16 0.22ns ± 6% 0.21ns ± 2% -6.53% (p=0.000 n=10+10)
ClearFat14-16 0.22ns ± 4% 0.21ns ± 1% -5.97% (p=0.000 n=10+10)
ClearFat15-16 0.22ns ± 4% 0.21ns ± 1% -6.96% (p=0.000 n=10+9)
ClearFat16-16 0.19ns ± 9% 0.12ns ± 6% -34.89% (p=0.000 n=9+10)
ClearFat24-16 0.23ns ± 6% 0.21ns ± 1% -10.26% (p=0.000 n=10+9)
ClearFat32-16 0.22ns ± 5% 0.21ns ± 2% -5.31% (p=0.000 n=10+10)
ClearFat40-16 0.34ns ± 4% 0.62ns ± 1% +83.00% (p=0.000 n=10+10)
ClearFat48-16 0.33ns ± 2% 0.41ns ± 0% +26.71% (p=0.000 n=10+10)
ClearFat56-16 0.41ns ± 1% 0.41ns ± 0% ~ (p=0.838 n=10+10)
ClearFat64-16 0.41ns ± 0% 0.41ns ± 0% ~ (p=0.178 n=10+8)
ClearFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.669 n=10+10)
ClearFat128-16 1.04ns ± 0% 1.04ns ± 0% ~ (p=0.679 n=10+10)
ClearFat256-16 1.86ns ± 0% 1.86ns ± 0% ~ (p=0.066 n=9+10)
ClearFat512-16 3.50ns ± 0% 3.50ns ± 0% ~ (p=0.626 n=10+10)
ClearFat1024-16 6.79ns ± 0% 6.79ns ± 0% ~ (p=0.986 n=10+10)
ClearFat1032-16 13.6ns ± 0% 13.6ns ± 0% +0.13% (p=0.044 n=10+10)
ClearFat1040-16 10.3ns ± 0% 10.3ns ± 0% ~ (p=0.175 n=10+9)
CopyFat7-16 0.37ns ±13% 0.25ns ± 1% -31.74% (p=0.000 n=10+9)
CopyFat8-16 0.17ns ± 1% 0.17ns ± 2% +1.35% (p=0.004 n=9+9)
CopyFat11-16 0.26ns ± 1% 0.30ns ± 3% +12.58% (p=0.000 n=9+10)
CopyFat12-16 0.28ns ± 2% 0.26ns ± 1% -5.66% (p=0.000 n=9+9)
CopyFat13-16 0.26ns ± 0% 0.28ns ± 4% +7.35% (p=0.000 n=8+10)
CopyFat14-16 0.29ns ± 6% 0.26ns ± 2% -10.46% (p=0.000 n=10+9)
CopyFat15-16 0.26ns ± 1% 0.30ns ± 6% +14.12% (p=0.000 n=8+10)
CopyFat16-16 0.21ns ± 1% 0.21ns ± 0% ~ (p=0.426 n=8+8)
CopyFat24-16 0.29ns ± 3% 0.25ns ± 1% -12.27% (p=0.000 n=9+10)
CopyFat32-16 0.26ns ± 4% 0.29ns ± 4% +11.71% (p=0.000 n=10+10)
CopyFat64-16 0.46ns ± 8% 0.42ns ± 1% -8.37% (p=0.002 n=10+10)
CopyFat72-16 0.82ns ± 0% 0.82ns ± 0% ~ (p=0.563 n=10+10)
CopyFat128-16 1.53ns ± 0% 1.54ns ± 0% +0.62% (p=0.000 n=10+10)
CopyFat256-16 2.68ns ± 0% 2.65ns ± 1% -1.23% (p=0.000 n=10+10)
CopyFat512-16 4.93ns ± 1% 5.19ns ± 3% +5.16% (p=0.000 n=9+9)
CopyFat520-16 6.99ns ± 0% 6.99ns ± 0% ~ (p=0.539 n=10+10)
CopyFat1024-16 11.5ns ± 1% 9.8ns ± 1% -14.98% (p=0.000 n=9+10)
CopyFat1032-16 13.6ns ± 0% 13.6ns ± 0% ~ (p=0.728 n=10+10)
CopyFat1040-16 11.0ns ± 0% 11.1ns ± 0% +0.53% (p=0.000 n=10+10)
Issue18740/2byte-16 10.1µs ± 0% 10.1µs ± 0% ~ (p=0.342 n=10+10)
Issue18740/4byte-16 2.34µs ± 0% 2.35µs ± 0% +0.30% (p=0.002 n=10+8)
Issue18740/8byte-16 1.28µs ± 0% 1.28µs ± 0% +0.32% (p=0.000 n=9+10)
Finalizer-16 345µs ± 1% 336µs ± 0% -2.55% (p=0.000 n=10+9)
FinalizerRun-16 450ns ± 3% 420ns ± 1% -6.65% (p=0.000 n=10+10)
PallocBitsSummarize/Unpacked00-16 2.88ns ± 0% 2.88ns ± 0% ~ (p=0.358 n=10+10)
PallocBitsSummarize/UnpackedFFFFFFFFFFFFFFFF-16 15.2ns ± 0% 15.2ns ± 0% ~ (p=0.925 n=10+10)
PallocBitsSummarize/UnpackedAA-16 16.4ns ± 0% 16.3ns ± 0% ~ (p=0.113 n=9+9)
PallocBitsSummarize/UnpackedAAAAAAAAAAAAAAAA-16 16.5ns ± 0% 16.6ns ± 0% ~ (p=0.238 n=10+10)
PallocBitsSummarize/Unpacked80000000AAAAAAAA-16 37.8ns ± 1% 36.4ns ± 0% -3.70% (p=0.000 n=10+9)
PallocBitsSummarize/UnpackedAAAAAAAA00000001-16 41.8ns ± 1% 39.9ns ± 0% -4.68% (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBBBBBBBBBB-16 18.3ns ± 0% 18.3ns ± 0% ~ (p=0.781 n=10+10)
PallocBitsSummarize/Unpacked80000000BBBBBBBB-16 38.8ns ± 1% 38.1ns ± 0% -1.78% (p=0.000 n=9+10)
PallocBitsSummarize/UnpackedBBBBBBBB00000001-16 37.5ns ± 0% 36.1ns ± 1% -3.88% (p=0.000 n=8+10)
PallocBitsSummarize/UnpackedCCCCCCCCCCCCCCCC-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.018 n=10+9)
PallocBitsSummarize/Unpacked4444444444444444-16 21.8ns ± 0% 21.9ns ± 0% +0.20% (p=0.029 n=10+9)
PallocBitsSummarize/Unpacked4040404040404040-16 26.5ns ± 0% 26.5ns ± 0% -0.24% (p=0.001 n=9+10)
PallocBitsSummarize/Unpacked4000400040004000-16 33.4ns ± 1% 31.3ns ± 0% -6.20% (p=0.000 n=9+10)
PallocBitsSummarize/Unpacked1000404044CCAAFF-16 36.4ns ± 1% 35.9ns ± 0% -1.50% (p=0.000 n=10+10)
FindBitRange64/Pattern00Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.80% (p=0.000 n=10+9)
FindBitRange64/Pattern00Size8-16 0.70ns ± 1% 0.70ns ± 0% -0.68% (p=0.000 n=10+10)
FindBitRange64/Pattern00Size32-16 0.70ns ± 1% 0.69ns ± 0% -0.86% (p=0.001 n=10+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize2-16 0.34ns ± 1% 0.35ns ± 1% +4.45% (p=0.000 n=9+8)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize8-16 1.54ns ± 0% 1.54ns ± 1% ~ (p=0.914 n=9+9)
FindBitRange64/PatternFFFFFFFFFFFFFFFFSize32-16 2.78ns ± 0% 2.78ns ± 0% ~ (p=0.295 n=9+10)
FindBitRange64/PatternAASize2-16 0.34ns ± 2% 0.35ns ± 2% +4.61% (p=0.000 n=10+10)
FindBitRange64/PatternAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.82% (p=0.005 n=10+10)
FindBitRange64/PatternAASize32-16 0.70ns ± 1% 0.70ns ± 0% -0.73% (p=0.003 n=10+9)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize2-16 0.34ns ± 2% 0.35ns ± 2% +3.94% (p=0.000 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 1% -0.67% (p=0.025 n=10+10)
FindBitRange64/PatternAAAAAAAAAAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.118 n=9+10)
FindBitRange64/Pattern80000000AAAAAAAASize2-16 0.34ns ± 1% 0.35ns ± 2% +3.72% (p=0.000 n=10+9)
FindBitRange64/Pattern80000000AAAAAAAASize8-16 0.70ns ± 1% 0.70ns ± 0% ~ (p=0.102 n=10+10)
FindBitRange64/Pattern80000000AAAAAAAASize32-16 0.70ns ± 1% 0.70ns ± 1% -0.55% (p=0.011 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size2-16 0.34ns ± 2% 0.35ns ± 1% +3.83% (p=0.000 n=10+9)
FindBitRange64/PatternAAAAAAAA00000001Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.065 n=10+10)
FindBitRange64/PatternAAAAAAAA00000001Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.95% (p=0.002 n=10+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize2-16 0.34ns ± 0% 0.35ns ± 1% +4.12% (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.002 n=10+9)
FindBitRange64/PatternBBBBBBBBBBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% -0.17% (p=0.023 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize2-16 0.34ns ± 1% 0.35ns ± 2% +4.82% (p=0.000 n=9+10)
FindBitRange64/Pattern80000000BBBBBBBBSize8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.063 n=10+10)
FindBitRange64/Pattern80000000BBBBBBBBSize32-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.164 n=9+10)
FindBitRange64/PatternBBBBBBBB00000001Size2-16 0.34ns ± 1% 0.35ns ± 1% +4.38% (p=0.000 n=8+10)
FindBitRange64/PatternBBBBBBBB00000001Size8-16 1.24ns ± 1% 1.24ns ± 0% ~ (p=0.052 n=10+10)
FindBitRange64/PatternBBBBBBBB00000001Size32-16 1.24ns ± 0% 1.23ns ± 0% -0.40% (p=0.000 n=10+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize2-16 0.34ns ± 0% 0.35ns ± 2% +3.96% (p=0.000 n=9+10)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize8-16 1.24ns ± 0% 1.23ns ± 0% -0.30% (p=0.000 n=10+9)
FindBitRange64/PatternCCCCCCCCCCCCCCCCSize32-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.284 n=10+10)
FindBitRange64/Pattern4444444444444444Size2-16 0.34ns ± 1% 0.35ns ± 1% +3.91% (p=0.000 n=9+9)
FindBitRange64/Pattern4444444444444444Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.617 n=10+10)
FindBitRange64/Pattern4444444444444444Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.60% (p=0.006 n=10+10)
FindBitRange64/Pattern4040404040404040Size2-16 0.34ns ± 2% 0.35ns ± 2% +3.67% (p=0.000 n=10+10)
FindBitRange64/Pattern4040404040404040Size8-16 0.70ns ± 2% 0.70ns ± 1% -0.87% (p=0.014 n=10+10)
FindBitRange64/Pattern4040404040404040Size32-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.256 n=10+10)
FindBitRange64/Pattern4000400040004000Size2-16 0.34ns ± 2% 0.35ns ± 3% +4.71% (p=0.000 n=10+10)
FindBitRange64/Pattern4000400040004000Size8-16 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.393 n=10+10)
FindBitRange64/Pattern4000400040004000Size32-16 0.70ns ± 1% 0.70ns ± 1% -0.86% (p=0.014 n=10+10)
NetpollBreak-16 1.49µs ± 1% 1.50µs ± 3% ~ (p=0.181 n=8+10)
Syscall-16 3.68ns ± 1% 3.66ns ± 2% ~ (p=0.148 n=10+10)
SyscallWork-16 5.15ns ± 1% 5.13ns ± 0% ~ (p=0.188 n=10+9)
SyscallExcess-16 3.89ns ± 2% 3.83ns ± 1% -1.52% (p=0.001 n=10+10)
SyscallExcessWork-16 5.34ns ± 1% 5.31ns ± 0% -0.64% (p=0.000 n=10+9)
PingPongHog-16 397ns ± 7% 394ns ±11% ~ (p=0.912 n=10+10)
StackGrowth-16 67.9ns ± 0% 68.8ns ± 0% +1.28% (p=0.000 n=10+8)
StackGrowthDeep-16 7.70µs ± 1% 8.48µs ± 2% +10.06% (p=0.000 n=9+10)
CreateGoroutines-16 124ns ± 1% 124ns ± 1% ~ (p=0.254 n=10+10)
CreateGoroutinesParallel-16 25.7ns ± 1% 27.6ns ± 2% +7.51% (p=0.000 n=10+10)
CreateGoroutinesCapture-16 823ns ± 1% 821ns ± 2% ~ (p=0.699 n=10+10)
CreateGoroutinesSingle-16 175ns ± 3% 172ns ± 3% -1.90% (p=0.011 n=10+10)
ClosureCall-16 0.11ns ± 7% 0.12ns ± 3% ~ (p=0.842 n=9+10)
WakeupParallelSpinning/0s-16 11.4µs ± 0% 11.4µs ± 0% ~ (p=0.325 n=9+10)
WakeupParallelSpinning/1µs-16 15.4µs ± 0% 15.4µs ± 1% ~ (p=0.955 n=10+10)
WakeupParallelSpinning/2µs-16 18.7µs ± 2% 18.9µs ± 2% ~ (p=0.052 n=10+10)
WakeupParallelSpinning/5µs-16 30.7µs ± 0% 30.7µs ± 0% -0.03% (p=0.003 n=10+10)
WakeupParallelSpinning/10µs-16 48.8µs ± 0% 48.8µs ± 0% ~ (p=0.670 n=10+10)
WakeupParallelSpinning/20µs-16 90.8µs ± 0% 90.8µs ± 0% -0.02% (p=0.004 n=10+10)
WakeupParallelSpinning/50µs-16 211µs ± 0% 211µs ± 0% ~ (p=0.194 n=10+10)
WakeupParallelSpinning/100µs-16 323µs ± 0% 323µs ± 0% ~ (p=1.000 n=10+9)
WakeupParallelSyscall/0s-16 118µs ± 0% 118µs ± 0% ~ (p=0.447 n=10+9)
WakeupParallelSyscall/1µs-16 119µs ± 2% 119µs ± 1% ~ (p=0.604 n=10+9)
WakeupParallelSyscall/2µs-16 120µs ± 1% 121µs ± 3% ~ (p=0.263 n=8+10)
WakeupParallelSyscall/5µs-16 126µs ± 2% 126µs ± 2% ~ (p=0.510 n=10+9)
WakeupParallelSyscall/10µs-16 136µs ± 1% 137µs ± 1% ~ (p=0.095 n=9+10)
WakeupParallelSyscall/20µs-16 156µs ± 2% 157µs ± 3% ~ (p=0.604 n=10+9)
WakeupParallelSyscall/50µs-16 221µs ± 1% 220µs ± 1% ~ (p=0.063 n=10+10)
WakeupParallelSyscall/100µs-16 326µs ± 0% 325µs ± 0% -0.26% (p=0.003 n=9+10)
Matmult-16 0.67ns ± 2% 0.66ns ± 2% ~ (p=0.256 n=10+10)
Fastrand-16 0.08ns ±11% 0.08ns ±13% ~ (p=0.661 n=9+10)
Fastrand64-16 0.08ns ±11% 0.08ns ± 6% ~ (p=0.631 n=10+10)
FastrandHashiter-16 1.76ns ± 1% 1.76ns ± 1% ~ (p=0.854 n=8+8)
Fastrandn/2-16 0.86ns ± 1% 0.86ns ± 1% +1.09% (p=0.000 n=10+9)
Fastrandn/3-16 0.85ns ± 1% 0.86ns ± 1% +1.23% (p=0.001 n=10+10)
Fastrandn/4-16 0.85ns ± 1% 0.87ns ± 2% +1.60% (p=0.000 n=10+10)
Fastrandn/5-16 0.85ns ± 1% 0.86ns ± 1% +1.05% (p=0.000 n=10+10)
IfaceCmp100-16 46.6ns ± 0% 46.1ns ± 0% -1.18% (p=0.000 n=10+10)
IfaceCmpNil100-16 26.8ns ± 0% 26.8ns ± 0% ~ (p=0.777 n=10+8)
EfaceCmpDiff-16 132ns ± 0% 130ns ± 0% -0.95% (p=0.000 n=10+9)
EfaceCmpDiffIndirect-16 209ns ± 0% 211ns ± 0% +1.14% (p=0.000 n=10+9)
Defer-16 3.40ns ± 1% 3.04ns ± 0% -10.67% (p=0.000 n=10+10)
Defer10-16 29.4ns ± 2% 27.2ns ± 3% -7.26% (p=0.000 n=10+10)
DeferMany-16 110ns ± 6% 113ns ± 2% +3.45% (p=0.017 n=9+9)
PanicRecover-16 67.6ns ± 0% 67.7ns ± 2% ~ (p=0.436 n=9+9)
GoroutineProfile/small-nil/idle-16 3.90µs ± 4% 3.86µs ± 2% ~ (p=0.305 n=10+9)
GoroutineProfile/small-nil/loaded-16 4.82µs ± 6% 4.82µs ± 4% ~ (p=0.905 n=10+9)
GoroutineProfile/small/idle-16 103µs ± 3% 102µs ± 3% ~ (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16 432µs ± 5% 440µs ±13% ~ (p=0.604 n=9+10)
GoroutineProfile/large-nil/idle-16 3.86µs ± 3% 3.82µs ± 3% ~ (p=0.210 n=10+10)
GoroutineProfile/large-nil/loaded-16 4.90µs ± 2% 4.90µs ± 5% ~ (p=0.780 n=10+9)
GoroutineProfile/large/idle-16 2.58ms ± 1% 2.52ms ± 1% -2.38% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 8.62ms ± 9% 8.90ms ±11% ~ (p=0.400 n=9+10)
GoroutineProfile/sparse-nil/idle-16 3.85µs ± 4% 3.81µs ± 3% ~ (p=0.470 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 4.82µs ± 4% 4.69µs ± 5% ~ (p=0.052 n=10+10)
GoroutineProfile/sparse/idle-16 102µs ± 4% 102µs ± 2% ~ (p=0.497 n=10+9)
GoroutineProfile/sparse/loaded-16 438µs ± 7% 437µs ± 6% ~ (p=0.796 n=10+10)
RWMutexUncontended-16 6.79ns ± 0% 6.78ns ± 0% ~ (p=0.228 n=10+8)
RWMutexWrite100-16 85.4ns ± 0% 87.1ns ± 0% +2.00% (p=0.000 n=10+8)
RWMutexWrite10-16 168ns ±25% 152ns ±11% ~ (p=0.063 n=10+10)
RWMutexWorkWrite100-16 106ns ± 0% 106ns ± 3% ~ (p=0.136 n=10+10)
RWMutexWorkWrite10-16 567ns ± 3% 571ns ± 1% ~ (p=0.326 n=10+9)
SemTable/OneAddrCollision/n=1000-16 15.9µs ± 1% 16.0µs ± 1% +0.50% (p=0.031 n=9+9)
SemTable/ManyAddrCollision/n=1000-16 56.2µs ± 1% 56.8µs ± 1% +1.06% (p=0.000 n=10+10)
SemTable/OneAddrCollision/n=2000-16 32.6µs ± 2% 32.9µs ± 4% ~ (p=0.156 n=10+9)
SemTable/ManyAddrCollision/n=2000-16 118µs ± 0% 119µs ± 0% +0.75% (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=4000-16 65.3µs ± 1% 65.6µs ± 3% ~ (p=0.497 n=9+10)
SemTable/ManyAddrCollision/n=4000-16 245µs ± 0% 248µs ± 2% +1.36% (p=0.000 n=9+10)
SemTable/OneAddrCollision/n=8000-16 131µs ± 1% 130µs ± 1% -1.01% (p=0.002 n=9+10)
SemTable/ManyAddrCollision/n=8000-16 503µs ± 1% 508µs ± 0% +0.97% (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Byte-16 67.6ns ± 1% 64.1ns ± 2% -5.20% (p=0.000 n=10+10)
MakeSliceCopy/mallocmove/Int-16 65.0ns ± 7% 61.7ns ± 4% -5.08% (p=0.009 n=10+10)
MakeSliceCopy/mallocmove/Ptr-16 88.1ns ± 1% 79.9ns ± 1% -9.29% (p=0.000 n=10+10)
MakeSliceCopy/makecopy/Byte-16 65.2ns ± 6% 63.4ns ± 0% ~ (p=0.500 n=10+8)
MakeSliceCopy/makecopy/Int-16 63.2ns ± 1% 64.1ns ± 1% +1.34% (p=0.001 n=9+9)
MakeSliceCopy/makecopy/Ptr-16 88.1ns ± 1% 80.1ns ± 1% -9.09% (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Byte-16 69.8ns ± 1% 65.7ns ± 3% -5.80% (p=0.000 n=10+10)
MakeSliceCopy/nilappend/Int-16 69.6ns ± 2% 67.2ns ± 1% -3.50% (p=0.000 n=10+9)
MakeSliceCopy/nilappend/Ptr-16 91.5ns ± 1% 83.8ns ± 1% -8.42% (p=0.000 n=9+10)
MakeSlice/Byte-16 6.64ns ± 3% 6.58ns ± 2% ~ (p=0.393 n=10+10)
MakeSlice/Int16-16 8.60ns ± 1% 8.38ns ± 3% -2.48% (p=0.001 n=9+10)
MakeSlice/Int-16 17.7ns ± 3% 16.9ns ± 1% -4.67% (p=0.000 n=10+9)
MakeSlice/Ptr-16 24.0ns ± 3% 23.3ns ± 2% -3.25% (p=0.000 n=10+9)
MakeSlice/Struct/24-16 34.1ns ± 1% 32.0ns ± 1% -6.11% (p=0.000 n=10+10)
MakeSlice/Struct/32-16 39.1ns ± 4% 38.2ns ± 1% ~ (p=0.829 n=10+8)
MakeSlice/Struct/40-16 47.0ns ± 5% 43.0ns ± 2% -8.55% (p=0.000 n=10+9)
GrowSlice/Byte-16 15.3ns ± 3% 15.0ns ± 2% -1.75% (p=0.005 n=9+9)
GrowSlice/Int16-16 18.9ns ± 2% 18.4ns ± 2% -2.71% (p=0.000 n=10+9)
GrowSlice/Int-16 33.9ns ± 1% 32.2ns ± 1% -4.89% (p=0.000 n=10+9)
GrowSlice/Ptr-16 45.3ns ± 2% 43.5ns ± 1% -4.12% (p=0.000 n=10+10)
GrowSlice/Struct/24-16 61.9ns ± 2% 60.0ns ± 4% -3.10% (p=0.002 n=10+10)
GrowSlice/Struct/32-16 79.9ns ± 2% 72.3ns ± 3% -9.58% (p=0.000 n=8+10)
GrowSlice/Struct/40-16 97.1ns ± 7% 88.8ns ± 5% -8.49% (p=0.000 n=10+10)
ExtendSlice/IntSlice-16 21.1ns ± 2% 20.3ns ± 2% -3.71% (p=0.000 n=10+10)
ExtendSlice/PointerSlice-16 26.8ns ± 2% 26.3ns ± 2% -1.86% (p=0.004 n=10+10)
ExtendSlice/NoGrow-16 1.23ns ± 0% 1.30ns ± 1% +5.03% (p=0.000 n=10+10)
Append-16 4.58ns ± 1% 4.53ns ± 0% -1.11% (p=0.000 n=10+10)
AppendGrowByte-16 1.46ms ± 8% 1.42ms ± 7% -3.24% (p=0.035 n=10+10)
AppendGrowString-16 27.8ms ± 4% 27.2ms ± 5% ~ (p=0.052 n=10+10)
AppendSlice/1Bytes-16 1.03ns ± 1% 1.04ns ± 1% ~ (p=0.303 n=10+10)
AppendSlice/4Bytes-16 1.04ns ± 0% 1.05ns ± 0% +0.79% (p=0.000 n=9+10)
AppendSlice/7Bytes-16 1.23ns ± 1% 1.24ns ± 0% +0.45% (p=0.001 n=10+10)
AppendSlice/8Bytes-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.183 n=10+10)
AppendSlice/15Bytes-16 1.37ns ± 1% 1.43ns ± 1% +3.88% (p=0.000 n=10+10)
AppendSlice/16Bytes-16 1.37ns ± 1% 1.42ns ± 1% +3.63% (p=0.000 n=9+10)
AppendSlice/32Bytes-16 1.44ns ± 0% 1.47ns ± 1% +1.83% (p=0.000 n=10+10)
AppendSliceLarge/1024Bytes-16 257ns ± 2% 234ns ± 1% -8.96% (p=0.000 n=8+9)
AppendSliceLarge/4096Bytes-16 871ns ± 6% 812ns ± 1% -6.80% (p=0.000 n=10+10)
AppendSliceLarge/16384Bytes-16 3.15µs ± 6% 3.04µs ± 5% ~ (p=0.052 n=10+10)
AppendSliceLarge/65536Bytes-16 10.7µs ± 7% 10.8µs ± 2% ~ (p=0.278 n=10+9)
AppendSliceLarge/262144Bytes-16 42.9µs ± 1% 39.6µs ± 5% -7.75% (p=0.000 n=9+10)
AppendSliceLarge/1048576Bytes-16 147µs ± 4% 144µs ± 4% -2.21% (p=0.035 n=10+10)
AppendStr/1Bytes-16 1.20ns ± 0% 1.20ns ± 0% ~ (p=0.755 n=10+10)
AppendStr/4Bytes-16 1.13ns ± 0% 1.14ns ± 1% +1.20% (p=0.000 n=10+10)
AppendStr/8Bytes-16 1.24ns ± 0% 1.25ns ± 0% +0.93% (p=0.000 n=10+10)
AppendStr/16Bytes-16 1.40ns ± 0% 1.42ns ± 0% +2.10% (p=0.000 n=9+10)
AppendStr/32Bytes-16 1.44ns ± 0% 1.45ns ± 0% +0.99% (p=0.000 n=10+10)
AppendSpecialCase-16 8.64ns ± 1% 8.89ns ± 2% +2.90% (p=0.000 n=10+10)
Copy/1Byte-16 1.24ns ± 1% 1.24ns ± 0% -0.28% (p=0.000 n=10+6)
Copy/1String-16 1.24ns ± 0% 1.23ns ± 0% ~ (p=0.160 n=10+10)
Copy/2Byte-16 1.24ns ± 0% 1.24ns ± 0% ~ (p=0.115 n=10+10)
Copy/2String-16 1.24ns ± 0% 1.24ns ± 1% ~ (p=0.954 n=10+10)
Copy/4Byte-16 1.24ns ± 0% 1.24ns ± 0% -0.44% (p=0.001 n=10+10)
Copy/4String-16 1.23ns ± 0% 1.23ns ± 0% ~ (p=0.081 n=10+10)
Copy/8Byte-16 1.37ns ± 0% 1.34ns ± 0% -1.79% (p=0.000 n=9+9)
Copy/8String-16 1.34ns ± 0% 1.34ns ± 0% -0.58% (p=0.000 n=9+10)
Copy/12Byte-16 1.44ns ± 0% 1.44ns ± 0% ~ (p=0.149 n=9+9)
Copy/12String-16 1.44ns ± 0% 1.45ns ± 0% ~ (p=0.124 n=9+9)
Copy/16Byte-16 1.44ns ± 0% 1.44ns ± 0% -0.19% (p=0.004 n=10+9)
Copy/16String-16 1.44ns ± 0% 1.45ns ± 0% +0.30% (p=0.008 n=10+10)
Copy/32Byte-16 1.63ns ± 1% 1.62ns ± 1% -0.72% (p=0.002 n=10+10)
Copy/32String-16 1.60ns ± 1% 1.64ns ± 0% +2.23% (p=0.000 n=10+10)
Copy/128Byte-16 2.06ns ± 0% 2.06ns ± 0% ~ (p=0.757 n=9+10)
Copy/128String-16 2.07ns ± 0% 2.07ns ± 0% +0.36% (p=0.004 n=10+10)
Copy/1024Byte-16 6.07ns ± 2% 6.00ns ± 1% -1.20% (p=0.000 n=9+10)
Copy/1024String-16 6.05ns ± 0% 5.95ns ± 1% -1.54% (p=0.000 n=10+9)
AppendInPlace/NoGrow/Byte-16 288ns ± 1% 284ns ± 1% -1.58% (p=0.000 n=10+10)
AppendInPlace/NoGrow/1Ptr-16 844ns ± 1% 809ns ± 3% -4.13% (p=0.000 n=9+10)
AppendInPlace/NoGrow/2Ptr-16 1.47µs ± 1% 1.46µs ± 1% ~ (p=0.388 n=9+10)
AppendInPlace/NoGrow/3Ptr-16 1.87µs ± 7% 1.91µs ± 1% ~ (p=0.166 n=10+8)
AppendInPlace/NoGrow/4Ptr-16 2.66µs ± 1% 2.67µs ± 3% ~ (p=0.968 n=9+10)
AppendInPlace/Grow/Byte-16 126ns ± 2% 121ns ± 2% -4.06% (p=0.000 n=10+10)
AppendInPlace/Grow/1Ptr-16 132ns ± 2% 127ns ± 2% -4.28% (p=0.000 n=10+9)
AppendInPlace/Grow/2Ptr-16 196ns ± 2% 188ns ± 1% -4.20% (p=0.000 n=10+8)
AppendInPlace/Grow/3Ptr-16 264ns ± 1% 260ns ± 1% -1.51% (p=0.000 n=9+10)
AppendInPlace/Grow/4Ptr-16 297ns ± 2% 294ns ± 2% ~ (p=0.085 n=10+10)
StackCopyPtr-16 36.4ms ± 2% 36.7ms ± 2% ~ (p=0.481 n=10+10)
StackCopy-16 33.9ms ± 3% 32.6ms ± 1% -3.87% (p=0.000 n=10+8)
StackCopyNoCache-16 1.00ms ± 5% 1.01ms ± 5% ~ (p=0.143 n=10+10)
StackCopyWithStkobj-16 11.0ms ± 3% 10.9ms ± 4% ~ (p=0.579 n=10+10)
Issue18138-16 49.2µs ± 5% 49.0µs ± 4% ~ (p=1.000 n=10+9)
CompareStringEqual-16 1.39ns ± 1% 1.45ns ± 2% +3.80% (p=0.000 n=8+10)
CompareStringIdentical-16 0.55ns ± 1% 0.55ns ± 0% +0.42% (p=0.007 n=10+10)
CompareStringSameLength-16 1.03ns ± 0% 1.03ns ± 0% ~ (p=0.430 n=9+10)
CompareStringDifferentLength-16 0.11ns ± 2% 0.11ns ± 3% ~ (p=0.139 n=9+10)
CompareStringBigUnaligned-16 23.9µs ± 1% 24.0µs ± 1% ~ (p=0.370 n=9+8)
CompareStringBig-16 22.0µs ± 3% 22.2µs ± 3% ~ (p=0.243 n=9+10)
ConcatStringAndBytes-16 10.7ns ± 1% 10.0ns ± 2% -6.33% (p=0.000 n=10+10)
SliceByteToString/1-16 1.34ns ± 0% 1.34ns ± 0% ~ (p=0.057 n=10+10)
SliceByteToString/2-16 6.67ns ± 2% 6.60ns ± 3% ~ (p=0.101 n=10+10)
SliceByteToString/4-16 7.76ns ± 2% 7.56ns ± 3% -2.59% (p=0.001 n=10+10)
SliceByteToString/8-16 9.81ns ± 4% 9.57ns ± 2% -2.48% (p=0.005 n=10+10)
SliceByteToString/16-16 14.0ns ± 3% 13.7ns ± 2% -2.31% (p=0.009 n=10+10)
SliceByteToString/32-16 17.3ns ± 1% 16.7ns ± 2% -3.41% (p=0.000 n=10+10)
SliceByteToString/64-16 25.1ns ± 1% 24.1ns ± 2% -3.93% (p=0.000 n=9+10)
SliceByteToString/128-16 38.6ns ± 1% 36.5ns ± 1% -5.60% (p=0.000 n=10+10)
RuneCount/lenruneslice/ASCII-16 4.12ns ± 0% 4.11ns ± 0% ~ (p=0.382 n=10+10)
RuneCount/lenruneslice/Japanese-16 25.4ns ± 2% 25.6ns ± 2% ~ (p=0.138 n=9+10)
RuneCount/lenruneslice/MixedLength-16 17.1ns ± 0% 17.2ns ± 0% +0.59% (p=0.000 n=9+9)
RuneCount/rangeloop/ASCII-16 3.30ns ± 1% 3.29ns ± 0% ~ (p=0.267 n=10+10)
RuneCount/rangeloop/Japanese-16 20.1ns ± 1% 24.9ns ± 1% +24.31% (p=0.000 n=9+9)
RuneCount/rangeloop/MixedLength-16 16.5ns ± 1% 16.7ns ± 1% +1.34% (p=0.000 n=10+10)
RuneCount/utf8.RuneCountInString/ASCII-16 5.71ns ± 1% 5.73ns ± 2% ~ (p=0.579 n=10+10)
RuneCount/utf8.RuneCountInString/Japanese-16 22.0ns ± 6% 18.4ns ± 3% -16.41% (p=0.000 n=9+10)
RuneCount/utf8.RuneCountInString/MixedLength-16 15.0ns ± 1% 14.9ns ± 1% -1.01% (p=0.004 n=9+10)
RuneIterate/range/ASCII-16 2.69ns ± 1% 2.72ns ± 0% +0.94% (p=0.026 n=10+9)
RuneIterate/range/Japanese-16 24.5ns ± 2% 25.3ns ± 2% +3.23% (p=0.000 n=9+10)
RuneIterate/range/MixedLength-16 17.0ns ± 1% 17.1ns ± 1% +0.85% (p=0.000 n=10+10)
RuneIterate/range1/ASCII-16 2.70ns ± 1% 2.72ns ± 0% ~ (p=0.058 n=9+9)
RuneIterate/range1/Japanese-16 24.1ns ± 2% 25.2ns ± 3% +4.30% (p=0.000 n=10+10)
RuneIterate/range1/MixedLength-16 16.9ns ± 1% 17.7ns ± 0% +5.04% (p=0.000 n=10+8)
RuneIterate/range2/ASCII-16 2.84ns ± 8% 2.72ns ± 1% -4.28% (p=0.003 n=10+9)
RuneIterate/range2/Japanese-16 22.7ns ± 4% 25.2ns ± 3% +10.97% (p=0.000 n=10+10)
RuneIterate/range2/MixedLength-16 17.0ns ± 1% 17.2ns ± 0% +0.95% (p=0.000 n=10+10)
ArrayEqual-16 0.40ns ± 5% 0.35ns ± 2% -11.83% (p=0.000 n=10+10)
Func/Name-16 8.05ns ± 1% 8.09ns ± 1% +0.40% (p=0.025 n=8+10)
Func/Entry-16 1.73ns ± 1% 1.66ns ± 1% -3.93% (p=0.000 n=10+10)
Func/FileLine-16 27.5ns ± 2% 26.0ns ± 0% -5.50% (p=0.000 n=10+10)
[Geo mean] 16.7ns 15.7ns -6.08%
name old speed new speed delta
SetTypePtr-16 11.0GB/s ± 1% 11.0GB/s ± 3% ~ (p=0.684 n=10+10)
SetTypePtr8-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr16-16 31.0GB/s ± 1% 31.1GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr32-16 62.1GB/s ± 0% 62.2GB/s ± 0% ~ (p=0.123 n=10+10)
SetTypePtr64-16 124GB/s ± 0% 124GB/s ± 0% ~ (p=0.684 n=10+10)
SetTypePtr126-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.481 n=10+10)
SetTypePtr128-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.243 n=9+10)
SetTypePtrSlice-16 151GB/s ± 1% 151GB/s ± 1% ~ (p=0.497 n=9+10)
SetTypeNode1-16 5.82GB/s ± 1% 5.82GB/s ± 0% ~ (p=0.353 n=10+10)
SetTypeNode1Slice-16 76.1GB/s ± 1% 77.0GB/s ± 1% +1.19% (p=0.000 n=10+10)
SetTypeNode8-16 19.4GB/s ± 0% 19.4GB/s ± 0% ~ (p=0.130 n=8+8)
SetTypeNode8Slice-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.604 n=10+9)
SetTypeNode64-16 76.5GB/s ± 0% 76.5GB/s ± 0% ~ (p=0.190 n=10+10)
SetTypeNode64Slice-16 97.8GB/s ± 0% 97.7GB/s ± 0% ~ (p=0.549 n=9+10)
SetTypeNode64Dead-16 95.5GB/s ± 0% 95.7GB/s ± 0% ~ (p=0.118 n=10+6)
SetTypeNode64DeadSlice-16 112GB/s ± 0% 112GB/s ± 0% ~ (p=0.353 n=10+10)
SetTypeNode124-16 146GB/s ± 0% 146GB/s ± 0% ~ (p=0.853 n=10+10)
SetTypeNode124Slice-16 146GB/s ± 5% 149GB/s ± 0% ~ (p=0.315 n=10+10)
SetTypeNode126-16 154GB/s ± 0% 154GB/s ± 0% ~ (p=0.356 n=10+9)
SetTypeNode126Slice-16 150GB/s ± 0% 150GB/s ± 0% ~ (p=0.095 n=9+10)
SetTypeNode128-16 107GB/s ± 0% 107GB/s ± 0% +0.31% (p=0.003 n=9+10)
SetTypeNode128Slice-16 119GB/s ± 0% 120GB/s ± 0% ~ (p=0.156 n=10+9)
SetTypeNode130-16 108GB/s ± 0% 108GB/s ± 0% +0.33% (p=0.002 n=10+10)
SetTypeNode130Slice-16 119GB/s ± 0% 119GB/s ± 0% ~ (p=0.739 n=10+10)
SetTypeNode1024-16 160GB/s ± 0% 159GB/s ± 1% ~ (p=0.113 n=9+9)
SetTypeNode1024Slice-16 144GB/s ± 0% 144GB/s ± 0% ~ (p=0.063 n=10+10)
Hash5-16 2.59GB/s ± 1% 2.49GB/s ± 0% -3.90% (p=0.000 n=10+9)
Hash16-16 7.85GB/s ± 1% 7.23GB/s ± 1% -7.92% (p=0.000 n=10+10)
Hash64-16 24.0GB/s ± 0% 23.9GB/s ± 0% ~ (p=0.190 n=9+9)
Hash1024-16 62.4GB/s ± 0% 62.3GB/s ± 0% -0.16% (p=0.017 n=9+10)
Hash65536-16 74.0GB/s ± 0% 74.0GB/s ± 0% ~ (p=0.796 n=10+10)
Memmove/1-16 1.08GB/s ± 0% 1.08GB/s ± 0% -0.21% (p=0.035 n=10+10)
Memmove/2-16 2.16GB/s ± 0% 2.15GB/s ± 0% ~ (p=0.105 n=10+10)
Memmove/3-16 3.24GB/s ± 1% 3.22GB/s ± 1% -0.49% (p=0.004 n=10+10)
Memmove/4-16 3.89GB/s ± 0% 3.89GB/s ± 0% ~ (p=0.218 n=10+10)
Memmove/5-16 4.42GB/s ± 0% 4.42GB/s ± 0% ~ (p=0.075 n=10+10)
Memmove/6-16 5.31GB/s ± 0% 5.29GB/s ± 1% ~ (p=0.218 n=10+10)
Memmove/7-16 6.19GB/s ± 0% 6.18GB/s ± 0% -0.15% (p=0.035 n=10+9)
Memmove/8-16 7.07GB/s ± 0% 7.07GB/s ± 0% ~ (p=0.684 n=10+10)
Memmove/9-16 7.22GB/s ± 0% 6.68GB/s ± 0% -7.37% (p=0.000 n=10+10)
Memmove/10-16 8.02GB/s ± 0% 7.43GB/s ± 0% -7.38% (p=0.000 n=9+9)
Memmove/11-16 8.83GB/s ± 0% 8.13GB/s ± 0% -7.87% (p=0.000 n=10+9)
Memmove/12-16 9.62GB/s ± 0% 8.89GB/s ± 1% -7.61% (p=0.000 n=10+10)
Memmove/13-16 10.4GB/s ± 0% 9.7GB/s ± 0% -7.20% (p=0.000 n=10+10)
Memmove/14-16 11.2GB/s ± 0% 10.4GB/s ± 1% -7.64% (p=0.000 n=10+9)
Memmove/15-16 12.0GB/s ± 0% 11.1GB/s ± 0% -7.46% (p=0.000 n=10+9)
Memmove/16-16 12.8GB/s ± 0% 11.8GB/s ± 1% -7.67% (p=0.000 n=10+10)
Memmove/32-16 23.8GB/s ± 0% 23.5GB/s ± 1% -1.20% (p=0.000 n=10+10)
Memmove/64-16 44.2GB/s ± 0% 39.1GB/s ± 0% -11.56% (p=0.000 n=10+9)
Memmove/128-16 68.7GB/s ± 0% 63.2GB/s ± 0% -7.95% (p=0.000 n=10+10)
Memmove/256-16 104GB/s ± 0% 103GB/s ± 0% -1.13% (p=0.000 n=10+10)
Memmove/512-16 129GB/s ± 1% 129GB/s ± 0% ~ (p=0.165 n=10+10)
Memmove/1024-16 174GB/s ± 1% 174GB/s ± 1% ~ (p=0.258 n=9+9)
Memmove/2048-16 213GB/s ± 1% 213GB/s ± 2% ~ (p=0.963 n=8+9)
Memmove/4096-16 250GB/s ± 1% 240GB/s ± 4% -3.83% (p=0.006 n=9+9)
MemmoveOverlap/32-16 19.8GB/s ± 1% 19.1GB/s ± 1% -3.40% (p=0.000 n=10+10)
MemmoveOverlap/64-16 39.0GB/s ± 0% 38.8GB/s ± 0% -0.28% (p=0.001 n=9+9)
MemmoveOverlap/128-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.063 n=10+10)
MemmoveOverlap/256-16 96.0GB/s ± 0% 95.8GB/s ± 0% -0.26% (p=0.009 n=10+10)
MemmoveOverlap/512-16 83.6GB/s ±16% 89.2GB/s ± 0% ~ (p=0.696 n=10+8)
MemmoveOverlap/1024-16 141GB/s ± 0% 140GB/s ± 0% -0.28% (p=0.006 n=8+10)
MemmoveOverlap/2048-16 172GB/s ± 0% 171GB/s ± 1% -0.38% (p=0.008 n=9+9)
MemmoveOverlap/4096-16 176GB/s ± 1% 177GB/s ± 1% +0.84% (p=0.001 n=8+10)
MemmoveUnalignedDst/1-16 806MB/s ± 0% 802MB/s ± 1% -0.52% (p=0.023 n=10+10)
MemmoveUnalignedDst/2-16 1.62GB/s ± 0% 1.62GB/s ± 0% -0.11% (p=0.041 n=10+10)
MemmoveUnalignedDst/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% -0.14% (p=0.006 n=9+9)
MemmoveUnalignedDst/4-16 3.24GB/s ± 0% 3.23GB/s ± 1% -0.36% (p=0.007 n=10+10)
MemmoveUnalignedDst/5-16 3.71GB/s ± 0% 3.71GB/s ± 0% ~ (p=0.063 n=10+10)
MemmoveUnalignedDst/6-16 4.48GB/s ± 0% 4.47GB/s ± 0% ~ (p=0.912 n=10+10)
MemmoveUnalignedDst/7-16 5.22GB/s ± 0% 5.22GB/s ± 0% ~ (p=1.000 n=10+10)
MemmoveUnalignedDst/8-16 5.95GB/s ± 0% 5.93GB/s ± 1% -0.40% (p=0.023 n=10+10)
MemmoveUnalignedDst/9-16 6.24GB/s ± 0% 6.24GB/s ± 0% ~ (p=0.912 n=10+10)
MemmoveUnalignedDst/10-16 6.94GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDst/11-16 7.64GB/s ± 0% 7.63GB/s ± 0% ~ (p=0.393 n=10+10)
MemmoveUnalignedDst/12-16 8.33GB/s ± 0% 8.33GB/s ± 0% ~ (p=0.971 n=10+10)
MemmoveUnalignedDst/13-16 9.02GB/s ± 0% 9.01GB/s ± 0% ~ (p=0.436 n=10+10)
MemmoveUnalignedDst/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.280 n=10+10)
MemmoveUnalignedDst/15-16 10.4GB/s ± 0% 10.4GB/s ± 1% ~ (p=0.853 n=10+10)
MemmoveUnalignedDst/16-16 11.1GB/s ± 0% 11.1GB/s ± 0% ~ (p=0.089 n=10+10)
MemmoveUnalignedDst/32-16 19.7GB/s ± 1% 19.6GB/s ± 0% ~ (p=0.075 n=10+10)
MemmoveUnalignedDst/64-16 38.9GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedDst/128-16 62.1GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.549 n=10+9)
MemmoveUnalignedDst/256-16 69.4GB/s ± 0% 69.3GB/s ± 0% ~ (p=0.105 n=10+10)
MemmoveUnalignedDst/512-16 124GB/s ± 1% 124GB/s ± 0% ~ (p=0.762 n=10+8)
MemmoveUnalignedDst/1024-16 136GB/s ± 0% 136GB/s ± 1% ~ (p=0.666 n=9+9)
MemmoveUnalignedDst/2048-16 159GB/s ± 0% 159GB/s ± 0% ~ (p=0.574 n=8+8)
MemmoveUnalignedDst/4096-16 161GB/s ± 0% 161GB/s ± 0% ~ (p=1.000 n=9+9)
MemmoveUnalignedDstOverlap/32-16 7.84GB/s ± 0% 7.83GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/64-16 14.0GB/s ± 0% 14.0GB/s ± 0% ~ (p=0.661 n=10+9)
MemmoveUnalignedDstOverlap/128-16 27.4GB/s ± 0% 27.4GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedDstOverlap/256-16 50.4GB/s ± 0% 50.4GB/s ± 0% ~ (p=0.156 n=10+9)
MemmoveUnalignedDstOverlap/512-16 60.7GB/s ± 4% 62.5GB/s ± 0% +3.07% (p=0.022 n=10+9)
MemmoveUnalignedDstOverlap/1024-16 107GB/s ± 0% 107GB/s ± 0% ~ (p=0.234 n=8+8)
MemmoveUnalignedDstOverlap/2048-16 146GB/s ± 0% 146GB/s ± 1% ~ (p=0.182 n=10+9)
MemmoveUnalignedDstOverlap/4096-16 155GB/s ± 0% 155GB/s ± 0% ~ (p=0.400 n=10+9)
MemmoveUnalignedSrc/1-16 882MB/s ± 0% 884MB/s ± 1% +0.24% (p=0.033 n=10+9)
MemmoveUnalignedSrc/2-16 1.76GB/s ± 1% 1.77GB/s ± 0% +0.27% (p=0.028 n=10+9)
MemmoveUnalignedSrc/3-16 2.43GB/s ± 0% 2.43GB/s ± 0% +0.26% (p=0.027 n=9+10)
MemmoveUnalignedSrc/4-16 3.24GB/s ± 0% 3.24GB/s ± 1% ~ (p=0.079 n=9+10)
MemmoveUnalignedSrc/5-16 3.73GB/s ± 0% 3.73GB/s ± 1% ~ (p=0.829 n=8+10)
MemmoveUnalignedSrc/6-16 4.47GB/s ± 0% 4.49GB/s ± 0% +0.39% (p=0.000 n=10+10)
MemmoveUnalignedSrc/7-16 5.22GB/s ± 0% 5.23GB/s ± 0% ~ (p=0.280 n=10+10)
MemmoveUnalignedSrc/8-16 5.95GB/s ± 0% 5.98GB/s ± 0% +0.39% (p=0.001 n=10+9)
MemmoveUnalignedSrc/9-16 6.24GB/s ± 0% 6.25GB/s ± 0% ~ (p=0.549 n=10+9)
MemmoveUnalignedSrc/10-16 6.93GB/s ± 0% 6.94GB/s ± 0% ~ (p=0.604 n=10+9)
MemmoveUnalignedSrc/11-16 7.63GB/s ± 0% 7.63GB/s ± 1% ~ (p=0.353 n=10+10)
MemmoveUnalignedSrc/12-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrc/13-16 9.02GB/s ± 0% 9.00GB/s ± 1% ~ (p=0.684 n=10+10)
MemmoveUnalignedSrc/14-16 9.71GB/s ± 0% 9.71GB/s ± 0% ~ (p=0.739 n=10+10)
MemmoveUnalignedSrc/15-16 10.4GB/s ± 0% 10.4GB/s ± 0% ~ (p=0.353 n=10+10)
MemmoveUnalignedSrc/16-16 11.1GB/s ± 1% 11.1GB/s ± 0% ~ (p=0.579 n=10+10)
MemmoveUnalignedSrc/32-16 20.0GB/s ± 1% 20.0GB/s ± 0% ~ (p=0.631 n=10+10)
MemmoveUnalignedSrc/64-16 38.8GB/s ± 0% 38.8GB/s ± 0% ~ (p=0.579 n=10+10)
MemmoveUnalignedSrc/128-16 61.2GB/s ± 0% 61.2GB/s ± 0% ~ (p=0.780 n=10+9)
MemmoveUnalignedSrc/256-16 94.8GB/s ± 0% 92.2GB/s ± 1% -2.73% (p=0.000 n=10+10)
MemmoveUnalignedSrc/512-16 119GB/s ± 0% 119GB/s ± 0% +0.26% (p=0.027 n=8+9)
MemmoveUnalignedSrc/1024-16 141GB/s ± 0% 142GB/s ± 1% +1.07% (p=0.000 n=8+10)
MemmoveUnalignedSrc/2048-16 157GB/s ± 0% 157GB/s ± 0% ~ (p=0.167 n=9+8)
MemmoveUnalignedSrc/4096-16 161GB/s ± 0% 162GB/s ± 1% ~ (p=0.063 n=10+10)
MemmoveUnalignedSrcOverlap/32-16 7.93GB/s ± 0% 7.88GB/s ± 0% -0.63% (p=0.000 n=9+10)
MemmoveUnalignedSrcOverlap/64-16 15.5GB/s ± 0% 15.5GB/s ± 0% ~ (p=0.529 n=10+10)
MemmoveUnalignedSrcOverlap/128-16 28.3GB/s ± 0% 28.3GB/s ± 0% ~ (p=0.218 n=10+10)
MemmoveUnalignedSrcOverlap/256-16 41.5GB/s ± 0% 41.6GB/s ± 0% +0.35% (p=0.000 n=10+9)
MemmoveUnalignedSrcOverlap/512-16 68.9GB/s ± 0% 68.8GB/s ± 0% ~ (p=0.541 n=9+8)
MemmoveUnalignedSrcOverlap/1024-16 115GB/s ± 0% 115GB/s ± 0% ~ (p=0.382 n=8+8)
MemmoveUnalignedSrcOverlap/2048-16 155GB/s ± 0% 144GB/s ±18% ~ (p=0.101 n=8+10)
MemmoveUnalignedSrcOverlap/4096-16 160GB/s ± 0% 160GB/s ± 1% ~ (p=0.605 n=9+9)
Memclr/5-16 5.81GB/s ± 1% 5.80GB/s ± 2% ~ (p=0.546 n=9+9)
Memclr/16-16 15.5GB/s ± 0% 15.4GB/s ± 0% -0.32% (p=0.008 n=9+10)
Memclr/64-16 51.8GB/s ± 0% 50.7GB/s ± 0% -2.22% (p=0.000 n=10+10)
Memclr/256-16 113GB/s ± 0% 113GB/s ± 0% ~ (p=0.143 n=10+10)
Memclr/4096-16 239GB/s ± 1% 237GB/s ± 0% -0.87% (p=0.000 n=10+10)
Memclr/65536-16 79.8GB/s ± 0% 79.7GB/s ± 0% ~ (p=0.529 n=10+10)
Memclr/1M-16 74.6GB/s ± 1% 74.7GB/s ± 1% ~ (p=0.529 n=10+10)
Memclr/4M-16 48.7GB/s ± 1% 48.8GB/s ± 0% ~ (p=0.123 n=10+10)
Memclr/8M-16 48.2GB/s ± 2% 48.6GB/s ± 0% ~ (p=0.408 n=10+8)
Memclr/16M-16 43.6GB/s ± 4% 43.3GB/s ± 0% ~ (p=0.173 n=10+8)
Memclr/64M-16 30.7GB/s ± 0% 30.7GB/s ± 0% ~ (p=0.113 n=10+9)
GoMemclr/5-16 6.07GB/s ± 0% 6.08GB/s ± 0% ~ (p=0.367 n=9+10)
GoMemclr/16-16 15.6GB/s ± 0% 15.6GB/s ± 0% -0.22% (p=0.004 n=9+9)
GoMemclr/64-16 56.1GB/s ± 0% 56.1GB/s ± 0% ~ (p=0.968 n=10+9)
GoMemclr/256-16 125GB/s ± 0% 124GB/s ± 0% ~ (p=0.912 n=10+10)
MemclrRange/1K_2K-16 210GB/s ± 0% 224GB/s ± 1% +6.81% (p=0.000 n=10+10)
MemclrRange/2K_8K-16 228GB/s ± 0% 228GB/s ± 0% ~ (p=0.684 n=10+10)
MemclrRange/4K_16K-16 279GB/s ± 0% 279GB/s ± 0% ~ (p=0.780 n=9+10)
MemclrRange/160K_228K-16 80.3GB/s ± 0% 80.2GB/s ± 0% ~ (p=0.165 n=10+10)
Copy/1Byte-16 808MB/s ± 1% 810MB/s ± 0% +0.28% (p=0.000 n=10+8)
Copy/1String-16 810MB/s ± 0% 811MB/s ± 0% ~ (p=0.105 n=10+10)
Copy/2Byte-16 1.62GB/s ± 0% 1.62GB/s ± 0% ~ (p=0.182 n=10+9)
Copy/2String-16 1.62GB/s ± 1% 1.62GB/s ± 1% ~ (p=1.000 n=10+10)
Copy/4Byte-16 3.22GB/s ± 0% 3.24GB/s ± 0% +0.46% (p=0.000 n=10+10)
Copy/4String-16 3.24GB/s ± 0% 3.24GB/s ± 0% ~ (p=0.075 n=10+10)
Copy/8Byte-16 5.86GB/s ± 0% 5.96GB/s ± 0% +1.82% (p=0.000 n=9+9)
Copy/8String-16 5.95GB/s ± 0% 5.99GB/s ± 0% +0.59% (p=0.000 n=9+10)
Copy/12Byte-16 8.32GB/s ± 0% 8.32GB/s ± 0% ~ (p=0.190 n=9+9)
Copy/12String-16 8.31GB/s ± 0% 8.29GB/s ± 0% ~ (p=0.068 n=9+10)
Copy/16Byte-16 11.1GB/s ± 0% 11.1GB/s ± 0% +0.18% (p=0.003 n=10+9)
Copy/16String-16 11.1GB/s ± 0% 11.1GB/s ± 0% -0.31% (p=0.009 n=10+10)
Copy/32Byte-16 19.6GB/s ± 1% 19.8GB/s ± 1% +0.72% (p=0.002 n=10+10)
Copy/32String-16 20.0GB/s ± 0% 19.5GB/s ± 0% -2.19% (p=0.000 n=10+10)
Copy/128Byte-16 62.2GB/s ± 0% 62.1GB/s ± 0% ~ (p=0.661 n=9+10)
Copy/128String-16 61.9GB/s ± 0% 61.7GB/s ± 0% -0.35% (p=0.005 n=10+10)
Copy/1024Byte-16 169GB/s ± 2% 171GB/s ± 1% +1.21% (p=0.000 n=9+10)
Copy/1024String-16 169GB/s ± 0% 172GB/s ± 1% +1.57% (p=0.000 n=10+9)
CompareStringBigUnaligned-16 43.8GB/s ± 1% 43.7GB/s ± 1% ~ (p=0.370 n=9+8)
CompareStringBig-16 47.6GB/s ± 3% 47.3GB/s ± 3% ~ (p=0.243 n=9+10)
[Geo mean] 25.3GB/s 28.0GB/s +10.66%
name old p50-ns new p50-ns delta
ReadMemStatsLatency-16 98.4k ±37% 112.7k ±64% ~ (p=0.436 n=10+10)
ReadMetricsLatency-16 1.69k ± 3% 1.70k ± 2% ~ (p=0.646 n=9+10)
GoroutineProfile/small-nil/idle-16 3.75k ± 4% 3.72k ± 1% ~ (p=0.447 n=10+9)
GoroutineProfile/small-nil/loaded-16 4.33k ± 3% 4.31k ± 5% ~ (p=0.931 n=9+9)
GoroutineProfile/small/idle-16 102k ± 3% 101k ± 4% ~ (p=0.113 n=9+9)
GoroutineProfile/small/loaded-16 214k ± 3% 215k ± 6% ~ (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16 3.70k ± 2% 3.65k ± 2% ~ (p=0.075 n=10+10)
GoroutineProfile/large-nil/loaded-16 4.36k ± 3% 4.31k ±10% ~ (p=0.631 n=10+10)
GoroutineProfile/large/idle-16 2.56M ± 1% 2.51M ± 1% -2.28% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 6.77M ± 4% 6.85M ±19% ~ (p=0.536 n=7+10)
GoroutineProfile/sparse-nil/idle-16 3.66k ± 1% 3.64k ± 2% ~ (p=0.136 n=9+9)
GoroutineProfile/sparse-nil/loaded-16 4.25k ± 5% 4.15k ± 4% ~ (p=0.190 n=10+10)
GoroutineProfile/sparse/idle-16 102k ± 4% 101k ± 3% ~ (p=0.447 n=10+9)
GoroutineProfile/sparse/loaded-16 216k ± 4% 218k ± 4% ~ (p=0.549 n=9+10)
[Geo mean] 35.8k 35.9k +0.35%
name old p90-ns new p90-ns delta
ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8)
ReadMetricsLatency-16 4.01k ±35% 3.75k ±17% ~ (p=0.315 n=10+10)
GoroutineProfile/small-nil/idle-16 4.21k ± 4% 4.27k ± 8% ~ (p=0.968 n=9+10)
GoroutineProfile/small-nil/loaded-16 5.58k ± 8% 5.35k ±12% ~ (p=0.190 n=10+10)
GoroutineProfile/small/idle-16 108k ± 6% 107k ± 7% ~ (p=0.497 n=9+10)
GoroutineProfile/small/loaded-16 450k ± 5% 432k ± 3% -3.92% (p=0.002 n=9+9)
GoroutineProfile/large-nil/idle-16 4.13k ± 7% 4.04k ± 2% ~ (p=0.181 n=10+8)
GoroutineProfile/large-nil/loaded-16 5.76k ± 4% 5.67k ± 5% ~ (p=0.190 n=10+10)
GoroutineProfile/large/idle-16 2.63M ± 2% 2.58M ± 1% -1.97% (p=0.000 n=10+10)
GoroutineProfile/large/loaded-16 16.9M ± 4% 17.0M ± 6% ~ (p=0.661 n=9+10)
GoroutineProfile/sparse-nil/idle-16 4.21k ±10% 4.07k ± 7% ~ (p=0.128 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 5.55k ± 8% 5.38k ± 6% ~ (p=0.089 n=10+10)
GoroutineProfile/sparse/idle-16 106k ± 4% 106k ± 3% ~ (p=0.661 n=10+9)
GoroutineProfile/sparse/loaded-16 454k ± 6% 441k ± 6% -2.86% (p=0.043 n=10+10)
[Geo mean] 58.4k 51.0k -12.61%
name old p99-ns new p99-ns delta
ReadMemStatsLatency-16 983k ±310% 200k ±34% -79.62% (p=0.034 n=10+8)
ReadMetricsLatency-16 26.6k ±22% 26.6k ±17% ~ (p=0.971 n=10+10)
GoroutineProfile/small-nil/idle-16 5.27k ±12% 5.19k ±12% ~ (p=0.579 n=10+10)
GoroutineProfile/small-nil/loaded-16 7.28k ± 3% 7.04k ± 9% ~ (p=0.113 n=9+10)
GoroutineProfile/small/idle-16 114k ± 6% 113k ± 6% ~ (p=0.604 n=9+10)
GoroutineProfile/small/loaded-16 4.84M ±70% 6.06M ±131% ~ (p=0.842 n=9+10)
GoroutineProfile/large-nil/idle-16 5.38k ±19% 5.26k ±13% ~ (p=0.912 n=10+10)
GoroutineProfile/large-nil/loaded-16 7.38k ± 3% 7.23k ± 4% ~ (p=0.143 n=10+10)
GoroutineProfile/large/idle-16 2.79M ± 5% 2.72M ± 3% ~ (p=0.089 n=10+10)
GoroutineProfile/large/loaded-16 24.0M ±24% 24.9M ±29% ~ (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/idle-16 5.32k ±17% 5.49k ±17% ~ (p=0.684 n=10+10)
GoroutineProfile/sparse-nil/loaded-16 7.25k ± 4% 6.97k ± 5% -3.90% (p=0.005 n=9+10)
GoroutineProfile/sparse/idle-16 113k ± 5% 112k ± 6% ~ (p=0.631 n=10+10)
GoroutineProfile/sparse/loaded-16 4.00M ±66% 4.26M ±65% ~ (p=0.489 n=9+9)
[Geo mean] 107k 97k -9.55%
name old alloc/op new alloc/op delta
NewEmptyMap-16 0.00B 0.00B ~ (all equal)
NewSmallMap-16 0.00B 0.00B ~ (all equal)
MapPopulate/1-16 0.00B 0.00B ~ (all equal)
MapPopulate/10-16 179B ± 0% 179B ± 0% ~ (all equal)
MapPopulate/100-16 3.35kB ± 0% 3.35kB ± 0% ~ (p=0.294 n=10+8)
MapPopulate/1000-16 53.3kB ± 0% 53.3kB ± 0% ~ (p=1.000 n=8+10)
MapPopulate/10000-16 428kB ± 0% 428kB ± 0% ~ (p=0.469 n=10+10)
MapPopulate/100000-16 3.62MB ± 0% 3.62MB ± 0% ~ (p=0.888 n=9+10)
MapStringConversion/32/simple-16 0.00B 0.00B ~ (all equal)
MapStringConversion/32/struct-16 0.00B 0.00B ~ (all equal)
MapStringConversion/32/array-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/simple-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/struct-16 0.00B 0.00B ~ (all equal)
MapStringConversion/64/array-16 0.00B 0.00B ~ (all equal)
NewEmptyMapHintLessThan8-16 0.00B 0.00B ~ (all equal)
NewEmptyMapHintGreaterThan8-16 1.15kB ± 0% 1.15kB ± 0% ~ (all equal)
MapAppendAssign/Int32/256-16 41.7B ±15% 44.3B ±12% ~ (p=0.106 n=10+10)
MapAppendAssign/Int32/65536-16 22.6B ± 6% 23.5B ± 6% +4.19% (p=0.025 n=9+10)
MapAppendAssign/Int64/256-16 43.5B ±10% 42.9B ± 7% ~ (p=0.757 n=10+10)
MapAppendAssign/Int64/65536-16 24.7B ± 7% 21.8B ± 6% -11.74% (p=0.000 n=10+10)
MapAppendAssign/Str/256-16 87.6B ±10% 89.2B ± 9% ~ (p=0.379 n=10+10)
MapAppendAssign/Str/65536-16 45.1B ±14% 47.2B ± 8% ~ (p=0.150 n=10+9)
CreateGoroutinesCapture-16 144B ± 0% 144B ± 0% ~ (all equal)
[Geo mean] 769B 770B +0.21%
name old allocs/op new allocs/op delta
NewEmptyMap-16 0.00 0.00 ~ (all equal)
NewSmallMap-16 0.00 0.00 ~ (all equal)
MapPopulate/1-16 0.00 0.00 ~ (all equal)
MapPopulate/10-16 1.00 ± 0% 1.00 ± 0% ~ (all equal)
MapPopulate/100-16 17.0 ± 0% 17.0 ± 0% ~ (all equal)
MapPopulate/1000-16 73.0 ± 0% 73.0 ± 0% ~ (all equal)
MapPopulate/10000-16 320 ± 0% 320 ± 0% ~ (p=1.000 n=10+10)
MapPopulate/100000-16 4.00k ± 0% 4.00k ± 0% ~ (p=0.753 n=10+10)
MapStringConversion/32/simple-16 0.00 0.00 ~ (all equal)
MapStringConversion/32/struct-16 0.00 0.00 ~ (all equal)
MapStringConversion/32/array-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/simple-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/struct-16 0.00 0.00 ~ (all equal)
MapStringConversion/64/array-16 0.00 0.00 ~ (all equal)
NewEmptyMapHintLessThan8-16 0.00 0.00 ~ (all equal)
NewEmptyMapHintGreaterThan8-16 1.00 ± 0% 1.00 ± 0% ~ (all equal)
MapAppendAssign/Int32/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int32/65536-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int64/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Int64/65536-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Str/256-16 0.00 0.00 ~ (all equal)
MapAppendAssign/Str/65536-16 0.00 0.00 ~ (all equal)
CreateGoroutinesCapture-16 5.00 ± 0% 5.00 ± 0% ~ (all equal)
[Geo mean] 26.0 26.0 +0.00%
Change-Id: I5fb03e93df8b380e04795afbdcd1c94aeeecacc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/454255
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Result: Gopher Robot <gobot@golang.org>
|
||
|
|
fc814056aa |
cmd/compile: rewrite empty makeslice to zerobase pointer
make\(\[\][a-zA-Z0-9]+, 0\) is seen 52 times in the go source. And at least 391 times on internet: https://grep.app/search?q=make%5C%28%5C%5B%5C%5D%5Ba-zA-Z0-9%5D%2B%2C%200%5C%29®exp=true This used to compile to calling runtime.makeslice. However we can copy what we do for []T{}, just use a zerobase pointer. On my machine this is 10x faster (from 3ns to 0.3ns). Note that an empty loop also runs in 0.3ns, so this really is free when you count superscallar execution. Change-Id: I1cfe7e69f5a7a4dabbc71912ce6a4f8a2d4a7f3c Reviewed-on: https://go-review.googlesource.com/c/go/+/454036 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> |