go/src/cmd/compile/internal/gc/range.go

537 lines
12 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package gc
import (
"cmd/compile/internal/types"
"cmd/internal/objabi"
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
"cmd/internal/sys"
"unicode/utf8"
)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// range
func typecheckrange(n *Node) {
// Typechecking order is important here:
// 0. first typecheck range expression (slice/map/chan),
// it is evaluated only once and so logically it is not part of the loop.
// 1. typcheck produced values,
// this part can declare new vars and so it must be typechecked before body,
// because body can contain a closure that captures the vars.
// 2. decldepth++ to denote loop body.
// 3. typecheck body.
// 4. decldepth--.
typecheckrangeExpr(n)
// second half of dance, the first half being typecheckrangeExpr
n.SetTypecheck(1)
ls := n.List.Slice()
for i1, n1 := range ls {
if n1.Typecheck() == 0 {
ls[i1] = typecheck(ls[i1], Erv|Easgn)
}
}
decldepth++
typecheckslice(n.Nbody.Slice(), Etop)
decldepth--
}
func typecheckrangeExpr(n *Node) {
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Right = typecheck(n.Right, Erv)
t := n.Right.Type
if t == nil {
return
}
// delicate little dance. see typecheckas2
ls := n.List.Slice()
for i1, n1 := range ls {
if n1.Name == nil || n1.Name.Defn != n {
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
ls[i1] = typecheck(ls[i1], Erv|Easgn)
}
}
if t.IsPtr() && t.Elem().IsArray() {
t = t.Elem()
}
n.Type = t
var t1, t2 *types.Type
toomany := false
switch t.Etype {
default:
yyerrorl(n.Pos, "cannot range over %L", n.Right)
return
case TARRAY, TSLICE:
t1 = types.Types[TINT]
t2 = t.Elem()
case TMAP:
t1 = t.Key()
t2 = t.Val()
case TCHAN:
if !t.ChanDir().CanRecv() {
yyerrorl(n.Pos, "invalid operation: range %v (receive from send-only type %v)", n.Right, n.Right.Type)
return
}
t1 = t.Elem()
t2 = nil
if n.List.Len() == 2 {
toomany = true
}
case TSTRING:
t1 = types.Types[TINT]
t2 = types.Runetype
}
if n.List.Len() > 2 || toomany {
yyerrorl(n.Pos, "too many variables in range")
}
var v1, v2 *Node
if n.List.Len() != 0 {
v1 = n.List.First()
}
if n.List.Len() > 1 {
v2 = n.List.Second()
}
// this is not only a optimization but also a requirement in the spec.
// "if the second iteration variable is the blank identifier, the range
// clause is equivalent to the same clause with only the first variable
// present."
if v2.isBlank() {
if v1 != nil {
n.List.Set1(v1)
}
v2 = nil
}
var why string
if v1 != nil {
if v1.Name != nil && v1.Name.Defn == n {
v1.Type = t1
} else if v1.Type != nil && assignop(t1, v1.Type, &why) == 0 {
yyerrorl(n.Pos, "cannot assign type %v to %L in range%s", t1, v1, why)
}
checkassign(n, v1)
}
if v2 != nil {
if v2.Name != nil && v2.Name.Defn == n {
v2.Type = t2
} else if v2.Type != nil && assignop(t2, v2.Type, &why) == 0 {
yyerrorl(n.Pos, "cannot assign type %v to %L in range%s", t2, v2, why)
}
checkassign(n, v2)
}
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
func cheapComputableIndex(width int64) bool {
switch thearch.LinkArch.Family {
// MIPS does not have R+R addressing
// Arm64 may lack ability to generate this code in our assembler,
// but the architecture supports it.
case sys.PPC64, sys.S390X:
return width == 1
case sys.AMD64, sys.I386, sys.ARM64, sys.ARM:
switch width {
case 1, 2, 4, 8:
return true
}
}
return false
}
// walkrange transforms various forms of ORANGE into
// simpler forms. The result must be assigned back to n.
// Node n may also be modified in place, and may also be
// the returned node.
func walkrange(n *Node) *Node {
// variable name conventions:
// ohv1, hv1, hv2: hidden (old) val 1, 2
// ha, hit: hidden aggregate, iterator
// hn, hp: hidden len, pointer
// hb: hidden bool
// a, v1, v2: not hidden aggregate, val 1, 2
t := n.Type
a := n.Right
lno := setlineno(a)
n.Right = nil
var v1, v2 *Node
l := n.List.Len()
if l > 0 {
v1 = n.List.First()
}
if l > 1 {
v2 = n.List.Second()
}
if v2.isBlank() {
v2 = nil
}
if v1.isBlank() && v2 == nil {
v1 = nil
}
if v1 == nil && v2 != nil {
Fatalf("walkrange: v2 != nil while v1 == nil")
}
// n.List has no meaning anymore, clear it
// to avoid erroneous processing by racewalk.
n.List.Set(nil)
var ifGuard *Node
translatedLoopOp := OFOR
var body []*Node
var init []*Node
switch t.Etype {
default:
Fatalf("walkrange")
case TARRAY, TSLICE:
if memclrrange(n, v1, v2, a) {
lineno = lno
return n
}
// orderstmt arranged for a copy of the array/slice variable if needed.
ha := a
hv1 := temp(types.Types[TINT])
hn := temp(types.Types[TINT])
init = append(init, nod(OAS, hv1, nil))
init = append(init, nod(OAS, hn, nod(OLEN, ha, nil)))
n.Left = nod(OLT, hv1, hn)
n.Right = nod(OAS, hv1, nod(OADD, hv1, nodintconst(1)))
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
// for range ha { body }
if v1 == nil {
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
// for v1 := range ha { body }
if v2 == nil {
body = []*Node{nod(OAS, v1, hv1)}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
// for v1, v2 := range ha { body }
if cheapComputableIndex(n.Type.Elem().Width) {
// v1, v2 = hv1, ha[hv1]
tmp := nod(OINDEX, ha, hv1)
tmp.SetBounded(true)
// Use OAS2 to correctly handle assignments
// of the form "v1, a[v1] := range".
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
a.Rlist.Set2(hv1, tmp)
body = []*Node{a}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
if objabi.Preemptibleloops_enabled != 0 {
// Doing this transformation makes a bounds check removal less trivial; see #20711
// TODO enhance the preemption check insertion so that this transformation is not necessary.
ifGuard = nod(OIF, nil, nil)
ifGuard.Left = nod(OLT, hv1, hn)
translatedLoopOp = OFORUNTIL
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
hp := temp(types.NewPtr(n.Type.Elem()))
tmp := nod(OINDEX, ha, nodintconst(0))
tmp.SetBounded(true)
init = append(init, nod(OAS, hp, nod(OADDR, tmp, nil)))
// Use OAS2 to correctly handle assignments
// of the form "v1, a[v1] := range".
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(hv1, nod(OIND, hp, nil))
body = append(body, a)
// Advance pointer as part of increment.
// We used to advance the pointer before executing the loop body,
// but doing so would make the pointer point past the end of the
// array during the final iteration, possibly causing another unrelated
// piece of memory not to be garbage collected until the loop finished.
// Advancing during the increment ensures that the pointer p only points
// pass the end of the array during the final "p++; i++; if(i >= len(x)) break;",
// after which p is dead, so it cannot confuse the collector.
tmp = nod(OADD, hp, nodintconst(t.Elem().Width))
tmp.Type = hp.Type
tmp.SetTypecheck(1)
tmp.Right.Type = types.Types[types.Tptr]
tmp.Right.SetTypecheck(1)
a = nod(OAS, hp, tmp)
a = typecheck(a, Etop)
n.Right.Ninit.Set1(a)
case TMAP:
// orderstmt allocated the iterator for us.
// we only use a once, so no copy needed.
ha := a
hit := prealloc[n]
th := hit.Type
n.Left = nil
keysym := th.Field(0).Sym // depends on layout of iterator struct. See reflect.go:hiter
valsym := th.Field(1).Sym // ditto
fn := syslook("mapiterinit")
fn = substArgTypes(fn, t.Key(), t.Val(), th)
init = append(init, mkcall1(fn, nil, nil, typename(t), ha, nod(OADDR, hit, nil)))
n.Left = nod(ONE, nodSym(ODOT, hit, keysym), nodnil())
fn = syslook("mapiternext")
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
fn = substArgTypes(fn, th)
n.Right = mkcall1(fn, nil, nil, nod(OADDR, hit, nil))
key := nodSym(ODOT, hit, keysym)
key = nod(OIND, key, nil)
if v1 == nil {
body = nil
} else if v2 == nil {
body = []*Node{nod(OAS, v1, key)}
} else {
val := nodSym(ODOT, hit, valsym)
val = nod(OIND, val, nil)
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(key, val)
body = []*Node{a}
}
case TCHAN:
// orderstmt arranged for a copy of the channel variable.
ha := a
n.Left = nil
hv1 := temp(t.Elem())
hv1.SetTypecheck(1)
if types.Haspointers(t.Elem()) {
init = append(init, nod(OAS, hv1, nil))
}
hb := temp(types.Types[TBOOL])
n.Left = nod(ONE, hb, nodbool(false))
a := nod(OAS2RECV, nil, nil)
a.SetTypecheck(1)
a.List.Set2(hv1, hb)
a.Rlist.Set1(nod(ORECV, ha, nil))
n.Left.Ninit.Set1(a)
if v1 == nil {
body = nil
} else {
body = []*Node{nod(OAS, v1, hv1)}
}
// Zero hv1. This prevents hv1 from being the sole, inaccessible
// reference to an otherwise GC-able value during the next channel receive.
// See issue 15281.
body = append(body, nod(OAS, hv1, nil))
case TSTRING:
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// Transform string range statements like "for v1, v2 = range a" into
//
// ha := a
// for hv1 := 0; hv1 < len(ha); {
// hv1t := hv1
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv2 := rune(ha[hv1])
// if hv2 < utf8.RuneSelf {
// hv1++
// } else {
// hv2, hv1 = decoderune(ha, hv1)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// }
// v1, v2 = hv1t, hv2
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// // original body
// }
// orderstmt arranged for a copy of the string variable.
ha := a
hv1 := temp(types.Types[TINT])
hv1t := temp(types.Types[TINT])
hv2 := temp(types.Runetype)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1 := 0
init = append(init, nod(OAS, hv1, nil))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1 < len(ha)
n.Left = nod(OLT, hv1, nod(OLEN, ha, nil))
if v1 != nil {
// hv1t = hv1
body = append(body, nod(OAS, hv1t, hv1))
}
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv2 := rune(ha[hv1])
nind := nod(OINDEX, ha, hv1)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
nind.SetBounded(true)
body = append(body, nod(OAS, hv2, conv(nind, types.Runetype)))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// if hv2 < utf8.RuneSelf
nif := nod(OIF, nil, nil)
nif.Left = nod(OLT, hv2, nodintconst(utf8.RuneSelf))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1++
nif.Nbody.Set1(nod(OAS, hv1, nod(OADD, hv1, nodintconst(1))))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// } else {
eif := nod(OAS2, nil, nil)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
nif.Rlist.Set1(eif)
// hv2, hv1 = decoderune(ha, hv1)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
eif.List.Set2(hv2, hv1)
fn := syslook("decoderune")
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
eif.Rlist.Set1(mkcall1(fn, fn.Type.Results(), nil, ha, hv1))
body = append(body, nif)
if v1 != nil {
if v2 != nil {
// v1, v2 = hv1t, hv2
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(hv1t, hv2)
body = append(body, a)
} else {
// v1 = hv1t
body = append(body, nod(OAS, v1, hv1t))
}
}
}
n.Op = translatedLoopOp
typecheckslice(init, Etop)
if ifGuard != nil {
ifGuard.Ninit.Append(init...)
ifGuard = typecheck(ifGuard, Etop)
} else {
n.Ninit.Append(init...)
}
typecheckslice(n.Left.Ninit.Slice(), Etop)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Left = typecheck(n.Left, Erv)
n.Left = defaultlit(n.Left, nil)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Right = typecheck(n.Right, Etop)
typecheckslice(body, Etop)
n.Nbody.Prepend(body...)
if ifGuard != nil {
ifGuard.Nbody.Set1(n)
n = ifGuard
}
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n = walkstmt(n)
lineno = lno
return n
}
// Lower n into runtime·memclr if possible, for
// fast zeroing of slices and arrays (issue 5373).
// Look for instances of
//
// for i := range a {
// a[i] = zero
// }
//
// in which the evaluation of a is side-effect-free.
//
// Parameters are as in walkrange: "for v1, v2 = range a".
func memclrrange(n, v1, v2, a *Node) bool {
if Debug['N'] != 0 || instrumenting {
return false
}
if v1 == nil || v2 != nil {
return false
}
if n.Nbody.Len() == 0 || n.Nbody.First() == nil || n.Nbody.Len() > 1 {
return false
}
stmt := n.Nbody.First() // only stmt in body
if stmt.Op != OAS || stmt.Left.Op != OINDEX {
return false
}
if !samesafeexpr(stmt.Left.Left, a) || !samesafeexpr(stmt.Left.Right, v1) {
return false
}
elemsize := n.Type.Elem().Width
if elemsize <= 0 || !isZero(stmt.Right) {
return false
}
// Convert to
// if len(a) != 0 {
// hp = &a[0]
// hn = len(a)*sizeof(elem(a))
// memclr{NoHeap,Has}Pointers(hp, hn)
// i = len(a) - 1
// }
n.Op = OIF
n.Nbody.Set(nil)
n.Left = nod(ONE, nod(OLEN, a, nil), nodintconst(0))
// hp = &a[0]
hp := temp(types.Types[TUNSAFEPTR])
tmp := nod(OINDEX, a, nodintconst(0))
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
tmp.SetBounded(true)
tmp = nod(OADDR, tmp, nil)
tmp = nod(OCONVNOP, tmp, nil)
tmp.Type = types.Types[TUNSAFEPTR]
n.Nbody.Append(nod(OAS, hp, tmp))
// hn = len(a) * sizeof(elem(a))
hn := temp(types.Types[TUINTPTR])
tmp = nod(OLEN, a, nil)
tmp = nod(OMUL, tmp, nodintconst(elemsize))
tmp = conv(tmp, types.Types[TUINTPTR])
n.Nbody.Append(nod(OAS, hn, tmp))
var fn *Node
if types.Haspointers(a.Type.Elem()) {
// memclrHasPointers(hp, hn)
fn = mkcall("memclrHasPointers", nil, nil, hp, hn)
} else {
// memclrNoHeapPointers(hp, hn)
fn = mkcall("memclrNoHeapPointers", nil, nil, hp, hn)
}
n.Nbody.Append(fn)
// i = len(a) - 1
v1 = nod(OAS, v1, nod(OSUB, nod(OLEN, a, nil), nodintconst(1)))
n.Nbody.Append(v1)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Left = typecheck(n.Left, Erv)
n.Left = defaultlit(n.Left, nil)
typecheckslice(n.Nbody.Slice(), Etop)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n = walkstmt(n)
return true
}