go/src/cmd/compile/internal/gc/range.go

625 lines
14 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package gc
import (
"cmd/compile/internal/types"
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
"cmd/internal/sys"
"unicode/utf8"
)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// range
func typecheckrange(n *Node) {
// Typechecking order is important here:
// 0. first typecheck range expression (slice/map/chan),
// it is evaluated only once and so logically it is not part of the loop.
// 1. typcheck produced values,
// this part can declare new vars and so it must be typechecked before body,
// because body can contain a closure that captures the vars.
// 2. decldepth++ to denote loop body.
// 3. typecheck body.
// 4. decldepth--.
typecheckrangeExpr(n)
// second half of dance, the first half being typecheckrangeExpr
n.SetTypecheck(1)
ls := n.List.Slice()
for i1, n1 := range ls {
if n1.Typecheck() == 0 {
ls[i1] = typecheck(ls[i1], Erv|Easgn)
}
}
decldepth++
typecheckslice(n.Nbody.Slice(), Etop)
decldepth--
}
func typecheckrangeExpr(n *Node) {
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Right = typecheck(n.Right, Erv)
t := n.Right.Type
if t == nil {
return
}
// delicate little dance. see typecheckas2
ls := n.List.Slice()
for i1, n1 := range ls {
if n1.Name == nil || n1.Name.Defn != n {
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
ls[i1] = typecheck(ls[i1], Erv|Easgn)
}
}
if t.IsPtr() && t.Elem().IsArray() {
t = t.Elem()
}
n.Type = t
var t1, t2 *types.Type
toomany := false
switch t.Etype {
default:
yyerrorl(n.Pos, "cannot range over %L", n.Right)
return
case TARRAY, TSLICE:
t1 = types.Types[TINT]
t2 = t.Elem()
case TMAP:
t1 = t.Key()
t2 = t.Elem()
case TCHAN:
if !t.ChanDir().CanRecv() {
yyerrorl(n.Pos, "invalid operation: range %v (receive from send-only type %v)", n.Right, n.Right.Type)
return
}
t1 = t.Elem()
t2 = nil
if n.List.Len() == 2 {
toomany = true
}
case TSTRING:
t1 = types.Types[TINT]
t2 = types.Runetype
}
if n.List.Len() > 2 || toomany {
yyerrorl(n.Pos, "too many variables in range")
}
var v1, v2 *Node
if n.List.Len() != 0 {
v1 = n.List.First()
}
if n.List.Len() > 1 {
v2 = n.List.Second()
}
// this is not only a optimization but also a requirement in the spec.
// "if the second iteration variable is the blank identifier, the range
// clause is equivalent to the same clause with only the first variable
// present."
if v2.isBlank() {
if v1 != nil {
n.List.Set1(v1)
}
v2 = nil
}
var why string
if v1 != nil {
if v1.Name != nil && v1.Name.Defn == n {
v1.Type = t1
} else if v1.Type != nil && assignop(t1, v1.Type, &why) == 0 {
yyerrorl(n.Pos, "cannot assign type %v to %L in range%s", t1, v1, why)
}
checkassign(n, v1)
}
if v2 != nil {
if v2.Name != nil && v2.Name.Defn == n {
v2.Type = t2
} else if v2.Type != nil && assignop(t2, v2.Type, &why) == 0 {
yyerrorl(n.Pos, "cannot assign type %v to %L in range%s", t2, v2, why)
}
checkassign(n, v2)
}
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
func cheapComputableIndex(width int64) bool {
switch thearch.LinkArch.Family {
// MIPS does not have R+R addressing
// Arm64 may lack ability to generate this code in our assembler,
// but the architecture supports it.
case sys.PPC64, sys.S390X:
return width == 1
case sys.AMD64, sys.I386, sys.ARM64, sys.ARM:
switch width {
case 1, 2, 4, 8:
return true
}
}
return false
}
// walkrange transforms various forms of ORANGE into
// simpler forms. The result must be assigned back to n.
// Node n may also be modified in place, and may also be
// the returned node.
func walkrange(n *Node) *Node {
if isMapClear(n) {
m := n.Right
lno := setlineno(m)
n = mapClear(m)
lineno = lno
return n
}
// variable name conventions:
// ohv1, hv1, hv2: hidden (old) val 1, 2
// ha, hit: hidden aggregate, iterator
// hn, hp: hidden len, pointer
// hb: hidden bool
// a, v1, v2: not hidden aggregate, val 1, 2
t := n.Type
a := n.Right
lno := setlineno(a)
n.Right = nil
var v1, v2 *Node
l := n.List.Len()
if l > 0 {
v1 = n.List.First()
}
if l > 1 {
v2 = n.List.Second()
}
if v2.isBlank() {
v2 = nil
}
if v1.isBlank() && v2 == nil {
v1 = nil
}
if v1 == nil && v2 != nil {
Fatalf("walkrange: v2 != nil while v1 == nil")
}
// n.List has no meaning anymore, clear it
// to avoid erroneous processing by racewalk.
n.List.Set(nil)
var ifGuard *Node
translatedLoopOp := OFOR
var body []*Node
var init []*Node
switch t.Etype {
default:
Fatalf("walkrange")
case TARRAY, TSLICE:
if arrayClear(n, v1, v2, a) {
lineno = lno
return n
}
// orderstmt arranged for a copy of the array/slice variable if needed.
ha := a
hv1 := temp(types.Types[TINT])
hn := temp(types.Types[TINT])
init = append(init, nod(OAS, hv1, nil))
init = append(init, nod(OAS, hn, nod(OLEN, ha, nil)))
n.Left = nod(OLT, hv1, hn)
n.Right = nod(OAS, hv1, nod(OADD, hv1, nodintconst(1)))
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
// for range ha { body }
if v1 == nil {
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
// for v1 := range ha { body }
if v2 == nil {
body = []*Node{nod(OAS, v1, hv1)}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
// for v1, v2 := range ha { body }
if cheapComputableIndex(n.Type.Elem().Width) {
// v1, v2 = hv1, ha[hv1]
tmp := nod(OINDEX, ha, hv1)
tmp.SetBounded(true)
// Use OAS2 to correctly handle assignments
// of the form "v1, a[v1] := range".
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
a.Rlist.Set2(hv1, tmp)
body = []*Node{a}
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
break
}
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
// TODO(austin): OFORUNTIL is a strange beast, but is
// necessary for expressing the control flow we need
// while also making "break" and "continue" work. It
// would be nice to just lower ORANGE during SSA, but
// racewalk needs to see many of the operations
// involved in ORANGE's implementation. If racewalk
// moves into SSA, consider moving ORANGE into SSA and
// eliminating OFORUNTIL.
// TODO(austin): OFORUNTIL inhibits bounds-check
// elimination on the index variable (see #20711).
// Enhance the prove pass to understand this.
ifGuard = nod(OIF, nil, nil)
ifGuard.Left = nod(OLT, hv1, hn)
translatedLoopOp = OFORUNTIL
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
hp := temp(types.NewPtr(n.Type.Elem()))
tmp := nod(OINDEX, ha, nodintconst(0))
tmp.SetBounded(true)
init = append(init, nod(OAS, hp, nod(OADDR, tmp, nil)))
// Use OAS2 to correctly handle assignments
// of the form "v1, a[v1] := range".
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(hv1, nod(OIND, hp, nil))
body = append(body, a)
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
// Advance pointer as part of the late increment.
//
// This runs *after* the condition check, so we know
// advancing the pointer is safe and won't go past the
// end of the allocation.
a = nod(OAS, hp, addptr(hp, t.Elem().Width))
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
a = typecheck(a, Etop)
cmd/compile: don't produce a past-the-end pointer in range loops Currently, range loops over slices and arrays are compiled roughly like: for i, x := range s { b } ⇓ for i, _n, _p := 0, len(s), &s[0]; i < _n; i, _p = i+1, _p + unsafe.Sizeof(s[0]) { b } ⇓ i, _n, _p := 0, len(s), &s[0] goto cond body: { b } i, _p = i+1, _p + unsafe.Sizeof(s[0]) cond: if i < _n { goto body } else { goto end } end: The problem with this lowering is that _p may temporarily point past the end of the allocation the moment before the loop terminates. Right now this isn't a problem because there's never a safe-point during this brief moment. We're about to introduce safe-points everywhere, so this bad pointer is going to be a problem. We could mark the increment as an unsafe block, but this inhibits reordering opportunities and could result in infrequent safe-points if the body is short. Instead, this CL fixes this by changing how we compile range loops to never produce this past-the-end pointer. It changes the lowering to roughly: i, _n, _p := 0, len(s), &s[0] if i < _n { goto body } else { goto end } top: _p += unsafe.Sizeof(s[0]) body: { b } i++ if i < _n { goto top } else { goto end } end: Notably, the increment is split into two parts: we increment the index before checking the condition, but increment the pointer only *after* the condition check has succeeded. The implementation builds on the OFORUNTIL construct that was introduced during the loop preemption experiments, since OFORUNTIL places the increment and condition after the loop body. To support the extra "late increment" step, we further define OFORUNTIL's "List" field to contain the late increment statements. This makes all of this a relatively small change. This depends on the improvements to the prove pass in CL 102603. With the current lowering, bounds-check elimination knows that i < _n in the body because the body block is dominated by the cond block. In the new lowering, deriving this fact requires detecting that i < _n on *both* paths into body and hence is true in body. CL 102603 made prove able to detect this. The code size effect of this is minimal. The cmd/go binary on linux/amd64 increases by 0.17%. Performance-wise, this actually appears to be a net win, though it's mostly noise: name old time/op new time/op delta BinaryTree17-12 2.80s ± 0% 2.61s ± 1% -6.88% (p=0.000 n=20+18) Fannkuch11-12 2.41s ± 0% 2.42s ± 0% +0.05% (p=0.005 n=20+20) FmtFprintfEmpty-12 41.6ns ± 5% 41.4ns ± 6% ~ (p=0.765 n=20+19) FmtFprintfString-12 69.4ns ± 3% 69.3ns ± 1% ~ (p=0.084 n=19+17) FmtFprintfInt-12 76.1ns ± 1% 77.3ns ± 1% +1.57% (p=0.000 n=19+19) FmtFprintfIntInt-12 122ns ± 2% 123ns ± 3% +0.95% (p=0.015 n=20+20) FmtFprintfPrefixedInt-12 153ns ± 2% 151ns ± 3% -1.27% (p=0.013 n=20+20) FmtFprintfFloat-12 215ns ± 0% 216ns ± 0% +0.47% (p=0.000 n=20+16) FmtManyArgs-12 486ns ± 1% 498ns ± 0% +2.40% (p=0.000 n=20+17) GobDecode-12 6.43ms ± 0% 6.50ms ± 0% +1.08% (p=0.000 n=18+19) GobEncode-12 5.43ms ± 1% 5.47ms ± 0% +0.76% (p=0.000 n=20+20) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.883 n=20+20) Gunzip-12 38.8ms ± 0% 38.9ms ± 0% ~ (p=0.644 n=19+19) HTTPClientServer-12 76.2µs ± 1% 76.4µs ± 2% ~ (p=0.218 n=20+20) JSONEncode-12 12.2ms ± 0% 12.3ms ± 1% +0.45% (p=0.000 n=19+19) JSONDecode-12 54.2ms ± 1% 53.3ms ± 0% -1.67% (p=0.000 n=20+20) Mandelbrot200-12 3.71ms ± 0% 3.71ms ± 0% ~ (p=0.143 n=19+20) GoParse-12 3.22ms ± 0% 3.19ms ± 1% -0.72% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 76.7ns ± 1% 75.8ns ± 1% -1.19% (p=0.000 n=20+17) RegexpMatchEasy0_1K-12 245ns ± 1% 243ns ± 0% -0.72% (p=0.000 n=18+17) RegexpMatchEasy1_32-12 71.9ns ± 0% 71.7ns ± 1% -0.39% (p=0.006 n=12+18) RegexpMatchEasy1_1K-12 358ns ± 1% 354ns ± 1% -1.13% (p=0.000 n=20+19) RegexpMatchMedium_32-12 105ns ± 2% 105ns ± 1% -0.63% (p=0.007 n=19+20) RegexpMatchMedium_1K-12 31.9µs ± 1% 31.9µs ± 1% ~ (p=1.000 n=17+17) RegexpMatchHard_32-12 1.51µs ± 1% 1.52µs ± 2% +0.46% (p=0.042 n=18+18) RegexpMatchHard_1K-12 45.3µs ± 1% 45.5µs ± 2% +0.44% (p=0.029 n=18+19) Revcomp-12 388ms ± 1% 385ms ± 0% -0.57% (p=0.000 n=19+18) Template-12 63.0ms ± 1% 63.3ms ± 0% +0.50% (p=0.000 n=19+20) TimeParse-12 309ns ± 1% 307ns ± 0% -0.62% (p=0.000 n=20+20) TimeFormat-12 328ns ± 0% 333ns ± 0% +1.35% (p=0.000 n=19+19) [Geo mean] 47.0µs 46.9µs -0.20% (https://perf.golang.org/search?q=upload:20180326.1) For #10958. For #24543. Change-Id: Icbd52e711fdbe7938a1fea3e6baca1104b53ac3a Reviewed-on: https://go-review.googlesource.com/102604 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>
2018-03-22 12:04:51 -04:00
n.List.Set1(a)
cmd/compile: simplify slice/array range loops for some element sizes In range loops over slices and arrays besides a variable to track the index an extra variable containing the address of the current element is used. To compute a pointer to the next element the elements size is added to the address. On 386 and amd64 an element of size 1, 2, 4 or 8 bytes can by copied from an array using a MOV instruction with suitable addressing mode that uses the start address of the array, the index of the element and element size as scaling factor. Thereby, for arrays and slices with suitable element size we can avoid keeping and incrementing an extra variable to compute the next elements address. Shrinks cmd/go by 4 kilobytes. AMD64: name old time/op new time/op delta BinaryTree17 2.66s ± 7% 2.54s ± 0% -4.53% (p=0.000 n=10+8) Fannkuch11 3.02s ± 1% 3.02s ± 1% ~ (p=0.579 n=10+10) FmtFprintfEmpty 45.6ns ± 1% 42.2ns ± 1% -7.46% (p=0.000 n=10+10) FmtFprintfString 69.8ns ± 1% 70.4ns ± 1% +0.84% (p=0.041 n=10+10) FmtFprintfInt 80.1ns ± 1% 79.0ns ± 1% -1.35% (p=0.000 n=10+10) FmtFprintfIntInt 127ns ± 1% 125ns ± 1% -1.00% (p=0.007 n=10+9) FmtFprintfPrefixedInt 158ns ± 2% 152ns ± 1% -4.11% (p=0.000 n=10+10) FmtFprintfFloat 218ns ± 1% 214ns ± 1% -1.61% (p=0.000 n=10+10) FmtManyArgs 508ns ± 1% 504ns ± 1% -0.93% (p=0.001 n=9+10) GobDecode 6.76ms ± 1% 6.78ms ± 1% ~ (p=0.353 n=10+10) GobEncode 5.84ms ± 1% 5.77ms ± 1% -1.31% (p=0.000 n=10+9) Gzip 223ms ± 1% 218ms ± 1% -2.39% (p=0.000 n=10+10) Gunzip 40.3ms ± 1% 40.4ms ± 3% ~ (p=0.796 n=10+10) HTTPClientServer 73.5µs ± 0% 73.3µs ± 0% -0.28% (p=0.000 n=10+9) JSONEncode 12.7ms ± 1% 12.6ms ± 8% ~ (p=0.173 n=8+10) JSONDecode 57.5ms ± 1% 56.1ms ± 2% -2.40% (p=0.000 n=10+10) Mandelbrot200 3.80ms ± 1% 3.86ms ± 6% ~ (p=0.579 n=10+10) GoParse 3.25ms ± 1% 3.23ms ± 1% ~ (p=0.052 n=10+10) RegexpMatchEasy0_32 74.4ns ± 1% 76.9ns ± 1% +3.39% (p=0.000 n=10+10) RegexpMatchEasy0_1K 243ns ± 2% 248ns ± 1% +1.86% (p=0.000 n=10+8) RegexpMatchEasy1_32 71.0ns ± 2% 72.8ns ± 1% +2.55% (p=0.000 n=10+10) RegexpMatchEasy1_1K 370ns ± 1% 383ns ± 0% +3.39% (p=0.000 n=10+9) RegexpMatchMedium_32 107ns ± 0% 113ns ± 1% +5.33% (p=0.000 n=6+10) RegexpMatchMedium_1K 35.0µs ± 1% 36.0µs ± 1% +3.13% (p=0.000 n=10+10) RegexpMatchHard_32 1.65µs ± 1% 1.69µs ± 1% +2.23% (p=0.000 n=10+9) RegexpMatchHard_1K 49.8µs ± 1% 50.6µs ± 1% +1.59% (p=0.000 n=10+10) Revcomp 398ms ± 1% 396ms ± 1% -0.51% (p=0.043 n=10+10) Template 63.4ms ± 1% 60.8ms ± 0% -4.11% (p=0.000 n=10+9) TimeParse 318ns ± 1% 322ns ± 1% +1.10% (p=0.005 n=10+10) TimeFormat 323ns ± 1% 336ns ± 1% +4.15% (p=0.000 n=10+10) Updates: #15809. Change-Id: I55915aaf6d26768e12247f8a8edf14e7630726d1 Reviewed-on: https://go-review.googlesource.com/38061 Run-TryBot: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2016-12-18 20:13:58 +01:00
case TMAP:
// orderstmt allocated the iterator for us.
// we only use a once, so no copy needed.
ha := a
hit := prealloc[n]
th := hit.Type
n.Left = nil
keysym := th.Field(0).Sym // depends on layout of iterator struct. See reflect.go:hiter
valsym := th.Field(1).Sym // ditto
fn := syslook("mapiterinit")
fn = substArgTypes(fn, t.Key(), t.Elem(), th)
init = append(init, mkcall1(fn, nil, nil, typename(t), ha, nod(OADDR, hit, nil)))
n.Left = nod(ONE, nodSym(ODOT, hit, keysym), nodnil())
fn = syslook("mapiternext")
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
fn = substArgTypes(fn, th)
n.Right = mkcall1(fn, nil, nil, nod(OADDR, hit, nil))
key := nodSym(ODOT, hit, keysym)
key = nod(OIND, key, nil)
if v1 == nil {
body = nil
} else if v2 == nil {
body = []*Node{nod(OAS, v1, key)}
} else {
val := nodSym(ODOT, hit, valsym)
val = nod(OIND, val, nil)
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(key, val)
body = []*Node{a}
}
case TCHAN:
// orderstmt arranged for a copy of the channel variable.
ha := a
n.Left = nil
hv1 := temp(t.Elem())
hv1.SetTypecheck(1)
if types.Haspointers(t.Elem()) {
init = append(init, nod(OAS, hv1, nil))
}
hb := temp(types.Types[TBOOL])
n.Left = nod(ONE, hb, nodbool(false))
a := nod(OAS2RECV, nil, nil)
a.SetTypecheck(1)
a.List.Set2(hv1, hb)
a.Rlist.Set1(nod(ORECV, ha, nil))
n.Left.Ninit.Set1(a)
if v1 == nil {
body = nil
} else {
body = []*Node{nod(OAS, v1, hv1)}
}
// Zero hv1. This prevents hv1 from being the sole, inaccessible
// reference to an otherwise GC-able value during the next channel receive.
// See issue 15281.
body = append(body, nod(OAS, hv1, nil))
case TSTRING:
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// Transform string range statements like "for v1, v2 = range a" into
//
// ha := a
// for hv1 := 0; hv1 < len(ha); {
// hv1t := hv1
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv2 := rune(ha[hv1])
// if hv2 < utf8.RuneSelf {
// hv1++
// } else {
// hv2, hv1 = decoderune(ha, hv1)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// }
// v1, v2 = hv1t, hv2
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// // original body
// }
// orderstmt arranged for a copy of the string variable.
ha := a
hv1 := temp(types.Types[TINT])
hv1t := temp(types.Types[TINT])
hv2 := temp(types.Runetype)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1 := 0
init = append(init, nod(OAS, hv1, nil))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1 < len(ha)
n.Left = nod(OLT, hv1, nod(OLEN, ha, nil))
if v1 != nil {
// hv1t = hv1
body = append(body, nod(OAS, hv1t, hv1))
}
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv2 := rune(ha[hv1])
nind := nod(OINDEX, ha, hv1)
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
nind.SetBounded(true)
body = append(body, nod(OAS, hv2, conv(nind, types.Runetype)))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// if hv2 < utf8.RuneSelf
nif := nod(OIF, nil, nil)
nif.Left = nod(OLT, hv2, nodintconst(utf8.RuneSelf))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// hv1++
nif.Nbody.Set1(nod(OAS, hv1, nod(OADD, hv1, nodintconst(1))))
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
// } else {
eif := nod(OAS2, nil, nil)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
nif.Rlist.Set1(eif)
// hv2, hv1 = decoderune(ha, hv1)
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
eif.List.Set2(hv2, hv1)
fn := syslook("decoderune")
cmd/compile: improve string iteration performance Generate a for loop for ranging over strings that only needs to call the runtime function charntorune for non ASCII characters. This provides faster iteration over ASCII characters and slightly faster iteration for other characters. The runtime function charntorune is changed to take an index from where to start decoding and returns the index after the last byte belonging to the decoded rune. All call sites of charntorune in the runtime are replaced by a for loop that will be transformed by the compiler instead of calling the charntorune function directly. go binary size decreases by 80 bytes. godoc binary size increases by around 4 kilobytes. runtime: name old time/op new time/op delta RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45) RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50) RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50) RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49) RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47) RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50) strings: name old time/op new time/op delta IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21) MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24) Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25) FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24) name old speed new speed delta Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25) FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24) Updates #13162 Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701 Reviewed-on: https://go-review.googlesource.com/27853 TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <martisch@uos.de>
2016-08-26 15:00:46 +02:00
eif.Rlist.Set1(mkcall1(fn, fn.Type.Results(), nil, ha, hv1))
body = append(body, nif)
if v1 != nil {
if v2 != nil {
// v1, v2 = hv1t, hv2
a := nod(OAS2, nil, nil)
a.List.Set2(v1, v2)
a.Rlist.Set2(hv1t, hv2)
body = append(body, a)
} else {
// v1 = hv1t
body = append(body, nod(OAS, v1, hv1t))
}
}
}
n.Op = translatedLoopOp
typecheckslice(init, Etop)
if ifGuard != nil {
ifGuard.Ninit.Append(init...)
ifGuard = typecheck(ifGuard, Etop)
} else {
n.Ninit.Append(init...)
}
typecheckslice(n.Left.Ninit.Slice(), Etop)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Left = typecheck(n.Left, Erv)
n.Left = defaultlit(n.Left, nil)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Right = typecheck(n.Right, Etop)
typecheckslice(body, Etop)
n.Nbody.Prepend(body...)
if ifGuard != nil {
ifGuard.Nbody.Set1(n)
n = ifGuard
}
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n = walkstmt(n)
lineno = lno
return n
}
// isMapClear checks if n is of the form:
//
// for k := range m {
// delete(m, k)
// }
//
// where == for keys of map m is reflexive.
func isMapClear(n *Node) bool {
if Debug['N'] != 0 || instrumenting {
return false
}
if n.Op != ORANGE || n.Type.Etype != TMAP || n.List.Len() != 1 {
return false
}
k := n.List.First()
if k == nil || k.isBlank() {
return false
}
// Require k to be a new variable name.
if k.Name == nil || k.Name.Defn != n {
return false
}
if n.Nbody.Len() != 1 {
return false
}
stmt := n.Nbody.First() // only stmt in body
if stmt == nil || stmt.Op != ODELETE {
return false
}
m := n.Right
if !samesafeexpr(stmt.List.First(), m) || !samesafeexpr(stmt.List.Second(), k) {
return false
}
// Keys where equality is not reflexive can not be deleted from maps.
if !isreflexive(m.Type.Key()) {
return false
}
return true
}
// mapClear constructs a call to runtime.mapclear for the map m.
func mapClear(m *Node) *Node {
t := m.Type
// instantiate mapclear(typ *type, hmap map[any]any)
fn := syslook("mapclear")
fn = substArgTypes(fn, t.Key(), t.Elem())
n := mkcall1(fn, nil, nil, typename(t), m)
n = typecheck(n, Etop)
n = walkstmt(n)
return n
}
// Lower n into runtime·memclr if possible, for
// fast zeroing of slices and arrays (issue 5373).
// Look for instances of
//
// for i := range a {
// a[i] = zero
// }
//
// in which the evaluation of a is side-effect-free.
//
// Parameters are as in walkrange: "for v1, v2 = range a".
func arrayClear(n, v1, v2, a *Node) bool {
if Debug['N'] != 0 || instrumenting {
return false
}
if v1 == nil || v2 != nil {
return false
}
if n.Nbody.Len() != 1 || n.Nbody.First() == nil {
return false
}
stmt := n.Nbody.First() // only stmt in body
if stmt.Op != OAS || stmt.Left.Op != OINDEX {
return false
}
if !samesafeexpr(stmt.Left.Left, a) || !samesafeexpr(stmt.Left.Right, v1) {
return false
}
elemsize := n.Type.Elem().Width
if elemsize <= 0 || !isZero(stmt.Right) {
return false
}
// Convert to
// if len(a) != 0 {
// hp = &a[0]
// hn = len(a)*sizeof(elem(a))
// memclr{NoHeap,Has}Pointers(hp, hn)
// i = len(a) - 1
// }
n.Op = OIF
n.Nbody.Set(nil)
n.Left = nod(ONE, nod(OLEN, a, nil), nodintconst(0))
// hp = &a[0]
hp := temp(types.Types[TUNSAFEPTR])
tmp := nod(OINDEX, a, nodintconst(0))
cmd/compile: pack bool fields in Node, Name, Func and Type structs to bitsets This reduces compiler memory usage by up to 4% - see compilebench results below. name old time/op new time/op delta Template 245ms ± 4% 241ms ± 2% -1.88% (p=0.029 n=10+10) Unicode 126ms ± 3% 124ms ± 3% ~ (p=0.105 n=10+10) GoTypes 805ms ± 2% 813ms ± 3% ~ (p=0.515 n=8+10) Compiler 3.95s ± 2% 3.83s ± 1% -2.96% (p=0.000 n=9+10) MakeBash 47.4s ± 4% 46.6s ± 1% -1.59% (p=0.028 n=9+10) name old user-ns/op new user-ns/op delta Template 324M ± 5% 326M ± 3% ~ (p=0.935 n=10+10) Unicode 186M ± 5% 178M ±10% ~ (p=0.067 n=9+10) GoTypes 1.08G ± 7% 1.09G ± 4% ~ (p=0.956 n=10+10) Compiler 5.34G ± 4% 5.31G ± 1% ~ (p=0.501 n=10+8) name old alloc/op new alloc/op delta Template 41.0MB ± 0% 39.8MB ± 0% -3.03% (p=0.000 n=10+10) Unicode 32.3MB ± 0% 31.0MB ± 0% -4.13% (p=0.000 n=10+10) GoTypes 119MB ± 0% 116MB ± 0% -2.39% (p=0.000 n=10+10) Compiler 499MB ± 0% 487MB ± 0% -2.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Template 380k ± 1% 379k ± 1% ~ (p=0.436 n=10+10) Unicode 324k ± 1% 324k ± 0% ~ (p=0.853 n=10+10) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.481 n=10+10) Compiler 4.41M ± 0% 4.41M ± 0% -0.12% (p=0.007 n=10+10) name old text-bytes new text-bytes delta HelloSize 623k ± 0% 623k ± 0% ~ (all equal) CmdGoSize 6.64M ± 0% 6.64M ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.81k ± 0% 5.81k ± 0% ~ (all equal) CmdGoSize 238k ± 0% 238k ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 134k ± 0% 134k ± 0% ~ (all equal) CmdGoSize 152k ± 0% 152k ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 967k ± 0% 967k ± 0% ~ (all equal) CmdGoSize 10.2M ± 0% 10.2M ± 0% ~ (all equal) Change-Id: I1f40af738254892bd6c8ba2eb43390b175753d52 Reviewed-on: https://go-review.googlesource.com/37445 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-27 19:56:38 +02:00
tmp.SetBounded(true)
tmp = nod(OADDR, tmp, nil)
tmp = convnop(tmp, types.Types[TUNSAFEPTR])
n.Nbody.Append(nod(OAS, hp, tmp))
// hn = len(a) * sizeof(elem(a))
hn := temp(types.Types[TUINTPTR])
tmp = nod(OLEN, a, nil)
tmp = nod(OMUL, tmp, nodintconst(elemsize))
tmp = conv(tmp, types.Types[TUINTPTR])
n.Nbody.Append(nod(OAS, hn, tmp))
var fn *Node
if types.Haspointers(a.Type.Elem()) {
// memclrHasPointers(hp, hn)
fn = mkcall("memclrHasPointers", nil, nil, hp, hn)
} else {
// memclrNoHeapPointers(hp, hn)
fn = mkcall("memclrNoHeapPointers", nil, nil, hp, hn)
}
n.Nbody.Append(fn)
// i = len(a) - 1
v1 = nod(OAS, v1, nod(OSUB, nod(OLEN, a, nil), nodintconst(1)))
n.Nbody.Append(v1)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n.Left = typecheck(n.Left, Erv)
n.Left = defaultlit(n.Left, nil)
typecheckslice(n.Nbody.Slice(), Etop)
cmd/compile: reduce use of **Node parameters Escape analysis has a hard time with tree-like structures (see #13493 and #14858). This is unlikely to change. As a result, when invoking a function that accepts a **Node parameter, we usually allocate a *Node on the heap. This happens a whole lot. This CL changes functions from taking a **Node to acting more like append: It both modifies the input and returns a replacement for it. Because of the cascading nature of escape analysis, in order to get the benefits, I had to modify almost all such functions. The remaining functions are in racewalk and the backend. I would be happy to update them as well in a separate CL. This CL was created by manually updating the function signatures and the directly impacted bits of code. The callsites were then automatically updated using a bespoke script: https://gist.github.com/josharian/046b1be7aceae244de39 For ease of reviewing and future understanding, this CL is also broken down into four CLs, mailed separately, which show the manual and the automated changes separately. They are CLs 20990, 20991, 20992, and 20993. Passes toolstash -cmp. name old time/op new time/op delta Template 335ms ± 5% 324ms ± 5% -3.35% (p=0.000 n=23+24) Unicode 176ms ± 9% 165ms ± 6% -6.12% (p=0.000 n=23+24) GoTypes 1.10s ± 4% 1.07s ± 2% -2.77% (p=0.000 n=24+24) Compiler 5.31s ± 3% 5.15s ± 3% -2.95% (p=0.000 n=24+24) MakeBash 41.6s ± 1% 41.7s ± 2% ~ (p=0.586 n=23+23) name old alloc/op new alloc/op delta Template 63.3MB ± 0% 62.4MB ± 0% -1.36% (p=0.000 n=25+23) Unicode 42.4MB ± 0% 41.6MB ± 0% -1.99% (p=0.000 n=24+25) GoTypes 220MB ± 0% 217MB ± 0% -1.11% (p=0.000 n=25+25) Compiler 994MB ± 0% 973MB ± 0% -2.08% (p=0.000 n=24+25) name old allocs/op new allocs/op delta Template 681k ± 0% 574k ± 0% -15.71% (p=0.000 n=24+25) Unicode 518k ± 0% 413k ± 0% -20.34% (p=0.000 n=25+24) GoTypes 2.08M ± 0% 1.78M ± 0% -14.62% (p=0.000 n=25+25) Compiler 9.26M ± 0% 7.64M ± 0% -17.48% (p=0.000 n=25+25) name old text-bytes new text-bytes delta HelloSize 578k ± 0% 578k ± 0% ~ (all samples are equal) CmdGoSize 6.46M ± 0% 6.46M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 281k ± 0% 281k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 921k ± 0% 921k ± 0% ~ (all samples are equal) CmdGoSize 9.86M ± 0% 9.86M ± 0% ~ (all samples are equal) Change-Id: I277d95bd56d51c166ef7f560647aeaa092f3f475 Reviewed-on: https://go-review.googlesource.com/20959 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-20 08:03:31 -07:00
n = walkstmt(n)
return true
}
// addptr returns (*T)(uintptr(p) + n).
func addptr(p *Node, n int64) *Node {
t := p.Type
p = nod(OCONVNOP, p, nil)
p.Type = types.Types[TUINTPTR]
p = nod(OADD, p, nodintconst(n))
p = nod(OCONVNOP, p, nil)
p.Type = t
return p
}