cmd/compile: use MOVBQZX for OpAMD64LoweredHasCPUFeature

In the commit message of CL 212360, I wrote: > This new intrinsic ... generates MOVB+TESTB+NE. > (It is possible that MOVBQZX+TESTQ+NE would be better.) I should have tested. MOVBQZX+TESTQ+NE does in fact appear to be better. For the benchmark in #36196, on my machine: name old time/op new time/op delta FMA-8 0.86ns ± 6% 0.70ns ± 5% -18.79% (p=0.000 n=98+97) NonFMA-8 0.61ns ± 5% 0.60ns ± 4% -0.74% (p=0.001 n=100+97) Interestingly, these are both considerably faster than the measurements I took a couple of months ago (1.4ns/2ns). It appears that CL 219131 (clearing VZEROUPPER in asyncPreempt) helped a lot. And FMA is now once again slower than NonFMA, although this change helps it regain some ground. Updates #15808 Updates #36351 Updates #36196 Change-Id: I8a326289a963b1939aaa7eaa2fab2ec536467c7d Reviewed-on: https://go-review.googlesource.com/c/go/+/227238 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2025-12-08 06:10:04 +00:00 · 2020-04-04 19:22:28 -07:00 · 2020-04-04 19:22:28 -07:00 · 7ee8467b27
commit 7ee8467b27
parent 64f19d7080
4 changed files with 22 additions and 5 deletions
--- a/src/cmd/compile/internal/amd64/ssa.go
+++ b/src/cmd/compile/internal/amd64/ssa.go
@ -903,7 +903,7 @@ func ssaGenValue(s *gc.SSAGenState, v *ssa.Value) {
 		p.From.Reg = v.Args[0].Reg()
 		gc.AddrAuto(&p.To, v)
 	case ssa.OpAMD64LoweredHasCPUFeature:
-		p := s.Prog(x86.AMOVB)
+		p := s.Prog(x86.AMOVBQZX)
 		p.From.Type = obj.TYPE_MEM
 		gc.AddAux(&p.From, v)
 		p.To.Type = obj.TYPE_REG