Commit graph

621 commits

Author SHA1 Message Date
Cherry Mui
d7a0c45642 [dev.simd] all: merge master (57362e9) into dev.simd
Conflicts:

- src/cmd/compile/internal/ir/symtab.go
- src/cmd/compile/internal/ssa/prove.go
- src/cmd/compile/internal/ssa/rewriteAMD64.go
- src/cmd/compile/internal/ssagen/intrinsics.go
- src/cmd/compile/internal/typecheck/builtin.go
- src/internal/buildcfg/exp.go
- src/internal/strconv/ftoa.go
- test/codegen/stack.go

Manually resolved some conflicts:

- Use internal/strconv for simd.String, remove internal/ftoa
- prove.go is just copied from the one on the main branch. We
  have cherry-picked the changes to prove.go to main branch, so
  our copy is identical to an old version of the one on the main
  branch. There are CLs landed after our cherry-picks. Just copy
  it over to adopt the new code.

Merge List:

+ 2025-11-13 57362e9814 go/types, types2: check for direct cycles as a separate phase
+ 2025-11-13 099e0027bd cmd/go/internal/modfetch: consolidate global vars
+ 2025-11-13 028375323f cmd/go/internal/modfetch/codehost: fix flaky TestReadZip
+ 2025-11-13 4ebf295b0b runtime: prefer to restart Ps on the same M after STW
+ 2025-11-13 625d8e9b9c runtime/pprof: fix goroutine leak profile tests for noopt
+ 2025-11-13 4684a26c26 spec: remove cycle restriction for type parameters
+ 2025-11-13 0f9c8fb29d cmd/asm,cmd/internal/obj/riscv: add support for riscv compressed instructions
+ 2025-11-13 a15d036ce2 cmd/internal/obj/riscv: implement better bit pattern encoding
+ 2025-11-12 abb241a789 cmd/internal/obj/loong64: add {,X}VS{ADD,SUB}.{B/H/W/V}{,U} instructions support
+ 2025-11-12 0929d21978 cmd/go: keep objects alive while stopping cleanups
+ 2025-11-12 f03d06ec1a runtime: fix list test memory management for mayMoreStack
+ 2025-11-12 48127f656b crypto/internal/fips140/sha3: remove outdated TODO
+ 2025-11-12 c3d1d42764 sync/atomic: amend comments for Value.{Swap,CompareAndSwap}
+ 2025-11-12 e0807ba470 cmd/compile: don't clear ptrmask in fillptrmask
+ 2025-11-12 66318d2b4b internal/abi: correctly describe result in Name.Name doc comment
+ 2025-11-12 34aef89366 cmd/compile: use FCLASSD for subnormal checks on riscv64
+ 2025-11-12 0c28789bd7 net/url: disallow raw IPv6 addresses in host
+ 2025-11-12 4e761b9a18 cmd/compile: optimize liveness in stackalloc
+ 2025-11-12 956909ff84 crypto/x509: move BetterTLS suite from crypto/tls
+ 2025-11-12 6525f46707 cmd/link: change shdr and phdr from arrays to slices
+ 2025-11-12 d3aeba1670 runtime: switch p.gcFractionalMarkTime to atomic.Int64
+ 2025-11-12 8873e8bea2 runtime,runtime/pprof: clean up goroutine leak profile writing
+ 2025-11-12 b8b84b789e cmd/go: clarify the -o testflag is only for copying the binary
+ 2025-11-12 c761b26b56 mime: parse media types that contain braces
+ 2025-11-12 65858a146e os/exec: include Cmd.Start in the list of methods that run Cmd
+ 2025-11-11 4bfc3a9d14 std,cmd: go fix -any std cmd
+ 2025-11-11 2263d4aabd runtime: doubly-linked sched.midle list
+ 2025-11-11 046dce0e54 runtime: use new list type for spanSPMCs
+ 2025-11-11 5f11275457 runtime: reusable intrusive doubly-linked list
+ 2025-11-11 951cf0501b internal/trace/testtrace: fix flag name typos
+ 2025-11-11 2750f95291 cmd/go: implement accurate pseudo-versions for Mercurial
+ 2025-11-11 b709a3e8b4 cmd/go/internal/vcweb: cache hg servers
+ 2025-11-11 426ef30ecf cmd/go: implement -reuse for Mercurial repos
+ 2025-11-10 5241d114f5 spec: more precise prose for special case of append
+ 2025-11-10 cdf64106f6 go/types, types2: first argument to append must never be be nil
+ 2025-11-10 a0eb4548cf .gitignore: ignore go test artifacts
+ 2025-11-10 bf58e7845e internal/trace: add "command" to convert text traces to raw
+ 2025-11-10 052c192a4c runtime: fix lock rank for work.spanSPMCs.lock
+ 2025-11-10 bc5ffe5c79 internal/runtime/sys,math/bits: eliminate bounds checks on len8tab
+ 2025-11-10 32f8d6486f runtime: document that tracefpunwindoff applies to some profilers
+ 2025-11-10 1c1c1942ba cmd/go: remove redundant AVX regex in security flag checks
+ 2025-11-10 3b3d6b9e5d cmd/internal/obj/arm64: shorten constant integer loads
+ 2025-11-10 5f4b5f1a19 runtime/msan: use different msan routine for copying
+ 2025-11-10 0fe6c8e8c8 runtime: tweak wording for comment of mcache.flushGen
+ 2025-11-10 95a0e5adc1 sync: don't call Done when f() panics in WaitGroup.Go
+ 2025-11-08 e8ed85d6c2 cmd/go: update goSum if necessary
+ 2025-11-08 b76103c08e cmd/go: output missing GoDebug entries
+ 2025-11-07 47a63a331d cmd/go: rewrite hgrepo1 test repo to be deterministic
+ 2025-11-07 7995751d3a cmd/go: copy git reuse and support repos to hg
+ 2025-11-07 66c7ca7fb3 cmd/go: improve TestScript/reuse_git
+ 2025-11-07 de84ac55c6 cmd/link: clean up some comments to Go standards
+ 2025-11-07 5cd1b73772 runtime: correctly print panics before fatal-ing on defer
+ 2025-11-07 91ca80f970 runtime/cgo: improve error messages after pointer panic
+ 2025-11-07 d36e88f21f runtime: tweak wording for doc
+ 2025-11-06 ad3ccd92e4 cmd/link: move pclntab out of relro section
+ 2025-11-06 43b91e7abd iter: fix a tiny doc comment bug
+ 2025-11-06 48c7fa13c6 Revert "runtime: remove the pc field of _defer struct"
+ 2025-11-05 8111104a21 cmd/internal/obj/loong64: add {,X}VSHUF.{B/H/W/V} instructions support
+ 2025-11-05 2e2072561c cmd/internal/obj/loong64: add {,X}VEXTRINS.{B,H,W,V} instruction support
+ 2025-11-05 01c29d1f0b internal/chacha8rand: replace VORV with instruction VMOVQ on loong64
+ 2025-11-05 f01a1841fd cmd/compile: fix error message on loong64
+ 2025-11-05 8cf7a0b4c9 cmd/link: support weak binding on darwin
+ 2025-11-05 2dd7e94e16 cmd/go: use go.dev instead of golang.org in flag errors
+ 2025-11-05 28f1ad5782 cmd/go: fix TestScript/govcs
+ 2025-11-05 daa220a1c9 cmd/go: silence TLS handshake errors during test
+ 2025-11-05 3ae9e95002 cmd/go: fix TestCgoPkgConfig on darwin with pkg-config installed
+ 2025-11-05 a494a26bc2 cmd/go: fix TestScript/vet_flags
+ 2025-11-05 a8fb94969c cmd/go: fix TestScript/tool_build_as_needed
+ 2025-11-05 04f05219c4 cmd/cgo: skip escape checks if call site has no argument
+ 2025-11-04 9f3a108ee0 os: ignore O_TRUNC errors on named pipes and terminal devices on Windows
+ 2025-11-04 0e1bd8b5f1 cmd/link, runtime: don't store text start in pcHeader
+ 2025-11-04 7347b54727 cmd/link: don't generate .gosymtab section
+ 2025-11-04 6914dd11c0 cmd/link: add and use new SymKind SFirstUnallocated
+ 2025-11-04 f5f14262d0 cmd/link: remove misleading comment
+ 2025-11-04 61de3a9dae cmd/link: remove unused SFILEPATH symbol kind
+ 2025-11-04 8e2bd267b5 cmd/link: add comments for SymKind values
+ 2025-11-04 16705b962e cmd/compile: faster liveness analysis in regalloc
+ 2025-11-04 a5fe6791d7 internal/syscall/windows: fix ReOpenFile sentinel error value
+ 2025-11-04 a7d174ccaa cmd/compile/internal/ssa: simplify riscv64 FCLASSD rewrite rules
+ 2025-11-04 856238615d runtime: amend doc for setPinned
+ 2025-11-04 c7ccbddf22 cmd/compile/internal/ssa: more aggressive on dead auto elim
+ 2025-11-04 75b2bb1d1a cmd/cgo: drop pre-1.18 support
+ 2025-11-04 dd839f1d00 internal/strconv: handle %f with fixedFtoa when possible
+ 2025-11-04 6e165b4d17 cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
+ 2025-11-04 9f6590f333 encoding/pem: don't reslice in failure modes
+ 2025-11-03 34fec512ce internal/strconv: extract fixed-precision ftoa from ftoaryu.go
+ 2025-11-03 162ba6cc40 internal/strconv: add tests and benchmarks for ftoaFixed
+ 2025-11-03 9795c7ba22 internal/strconv: fix pow10 off-by-one in exponent result
+ 2025-11-03 ad5e941a45 cmd/internal/obj/loong64: using {xv,v}slli.d to perform copying between vector registers
+ 2025-11-03 dadbac0c9e cmd/internal/obj/loong64: add VPERMI.W, XVPERMI.{W,V,Q} instruction support
+ 2025-11-03 e2c6a2024c runtime: avoid append in printint, printuint
+ 2025-11-03 c93cc603cd runtime: allow Stack to traceback goroutines in syscall _Grunning window
+ 2025-11-03 b5353fd90a runtime: don't panic in castogscanstatus
+ 2025-11-03 43491f8d52 cmd/cgo: use the export'ed file/line in error messages
+ 2025-11-03 aa94fdf0cc cmd/go: link to go.dev/doc/godebug for removed GODEBUG settings
+ 2025-11-03 4d2b03d2fc crypto/tls: add BetterTLS test coverage
+ 2025-11-03 0c4444e13d cmd/internal/obj: support arm64 FMOVQ large offset encoding
+ 2025-11-03 85bec791a0 cmd/go/testdata/script: loosen list_empty_importpath for freebsd
+ 2025-11-03 17b57078ab internal/runtime/cgobench: add cgo callback benchmark
+ 2025-11-03 5f8fdb720c cmd/go: move functions to methods
+ 2025-11-03 0a95856b95 cmd/go: eliminate additional global variable
+ 2025-11-03 f93186fb44 cmd/go/internal/telemetrystats: count cgo usage
+ 2025-11-03 eaf28a27fd runtime: update outdated comments for deferprocStack
+ 2025-11-03 e12d8a90bf all: remove extra space in the comments
+ 2025-11-03 c5559344ac internal/profile: optimize Parse allocs
+ 2025-11-03 5132158ac2 bytes: add Buffer.Peek
+ 2025-11-03 361d51a6b5 runtime: remove the pc field of _defer struct
+ 2025-11-03 00ee1860ce crypto/internal/constanttime: expose intrinsics to the FIPS 140-3 packages
+ 2025-11-02 388c41c412 cmd/go: skip git sha256 tests if git < 2.29
+ 2025-11-01 385dc33250 runtime: prevent time.Timer.Reset(0) from deadlocking testing/synctest tests
+ 2025-10-31 99b724f454 cmd/go: document purego convention
+ 2025-10-31 27937289dc runtime: avoid zeroing scavenged memory
+ 2025-10-30 89dee70484 runtime: prioritize panic output over racefini
+ 2025-10-30 8683bb846d runtime: optimistically CAS atomicstatus directly in enter/exitsyscall
+ 2025-10-30 5b8e850340 runtime: don't track scheduling latency for _Grunning <-> _Gsyscall
+ 2025-10-30 251814e580 runtime: document tracer invariants explicitly
+ 2025-10-30 7244e9221f runtime: eliminate _Psyscall
+ 2025-10-30 5ef19c0d0c strconv: delete divmod1e9
+ 2025-10-30 d32b1f02c3 runtime: delete timediv
+ 2025-10-30 cbbd385cb8 strconv: remove arch-specific decision in formatBase10
+ 2025-10-30 6aca04a73a reflect: correct internal docs for uncommonType
+ 2025-10-30 235b4e729d cmd/compile/internal/ssa: model right shift more precisely
+ 2025-10-30 d44db293f9 go/token: fix a typo in a comment
+ 2025-10-30 cdc6b559ca strconv: remove hand-written divide on 32-bit systems
+ 2025-10-30 1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
+ 2025-10-30 38317c44e7 crypto/internal/fips140/aes: fix CTR generator
+ 2025-10-29 3be9a0e014 go/types, types: proceed with correct (invalid) type in case of a selector error
+ 2025-10-29 d2c5fa0814 strconv: remove &0xFF trick in formatBase10
+ 2025-10-29 9bbda7c99d cmd/compile: make prove understand div, mod better
+ 2025-10-29 915c1839fe test/codegen: simplify asmcheck pattern matching
+ 2025-10-29 32ee3f3f73 runtime: tweak example code for gorecover
+ 2025-10-29 da3fb90b23 crypto/internal/fips140/bigmod: fix extendedGCD comment
+ 2025-10-29 9035f7aea5 runtime: use internal/strconv
+ 2025-10-29 49c1da474d internal/itoa, internal/runtime/strconv: delete
+ 2025-10-29 b2a346bbd1 strconv: move all but Quote to internal/strconv
+ 2025-10-28 041f564b3e internal/runtime/gc/scan: avoid memory destination on VPCOMPRESSQ
+ 2025-10-28 81afd3a59b cmd/compile: extend ppc64 MADDLD to match const ADDconst & MULLDconst
+ 2025-10-28 ea50d61b66 cmd/compile: name change isDirect -> isDirectAndComparable
+ 2025-10-28 bd4dc413cd cmd/compile: don't optimize away a panicing interface comparison
+ 2025-10-28 30c047d0d0 cmd/compile: extend loong MOV*idx rules to match ADDshiftLLV
+ 2025-10-28 46e5e2b09a runtime: define PanicBounds in funcdata.h
+ 2025-10-28 3da0356685 crypto/internal/fips140test: collect 300M entropy samples for ESV
+ 2025-10-28 d5953185d5 runtime: amend comments a bit
+ 2025-10-28 12c8d14d94 errors: document that the target of Is must be comparable
+ 2025-10-28 1f4d14e493 go/types, types2: pull up package-level object sort to a separate phase
+ 2025-10-28 b8aa1ee442 go/types, types2: reduce locks held at once in resolveUnderlying
+ 2025-10-28 24af441437 cmd/compile: rewrite proved multiplies by 0 or 1 into CondSelect
+ 2025-10-28 2d33a456c6 cmd/compile: move branchelim supported arches to Config
+ 2025-10-27 2c91c33e88 crypto/subtle,cmd/compile: add intrinsics for ConstantTimeSelect and *Eq
+ 2025-10-27 73d7635fae cmd/compile: add generic rules to remove bool → int → bool roundtrips
+ 2025-10-27 1662d55247 cmd/compile: do not Zext bools to 64bits in amd64 CMOV generation rules
+ 2025-10-27 b8468d8c4e cmd/compile: introduce bytesizeToConst to cleanup switches in prove
+ 2025-10-27 9e25c2f6de cmd/link: internal linking support for windows/arm64
+ 2025-10-27 ff2ebf69c4 internal/runtime/gc/scan: correct size class size check
+ 2025-10-27 9a77aa4f08 cmd/compile: add position info to sccp debug messages
+ 2025-10-27 77dc138030 cmd/compile: teach prove about unsigned rounding-up divide
+ 2025-10-27 a0f33b2887 cmd/compile: change !l.nonzero() into l.maybezero()
+ 2025-10-27 5453b788fd cmd/compile: optimize Add64carry with unused carries into plain Add64
+ 2025-10-27 2ce5aab79e cmd/compile: remove 68857 ModU flowLimit workaround in prove
+ 2025-10-27 a50de4bda7 cmd/compile: remove 68857 min & max flowLimit workaround in prove
+ 2025-10-27 53be78630a cmd/compile: use topo-sort in prove to correctly learn facts while walking once
+ 2025-10-27 dec2b4c83d runtime: avoid bound check in freebsd binuptime
+ 2025-10-27 916e682d51 cmd/internal/obj, cmd/asm: reclassify the offset of memory access operations on loong64
+ 2025-10-27 2835b994fb cmd/go: remove global loader state variable
+ 2025-10-27 139f89226f cmd/go: use local state for telemetry
+ 2025-10-27 8239156571 cmd/go: use tagged switch
+ 2025-10-27 d741483a1f cmd/go: increase stmt threshold on amd64
+ 2025-10-27 a6929cf4a7 cmd/go: removed unused code in toolchain.Exec
+ 2025-10-27 180c07e2c1 go/types, types2: clarify docs for resolveUnderlying
+ 2025-10-27 d8a32f3d4b go/types, types2: wrap Named.fromRHS into Named.rhs
+ 2025-10-27 b2af92270f go/types, types2: verify stateMask transitions in debug mode
+ 2025-10-27 92decdcbaa net/url: further speed up escape and unescape
+ 2025-10-27 5f4ec3541f runtime: remove unused cgoCheckUsingType function
+ 2025-10-27 189f2c08cc time: rewrite IsZero method to use wall and ext fields
+ 2025-10-27 f619b4a00d cmd/go: reorder parameters so that context is first
+ 2025-10-27 f527994c61 sync: update comments for Once.done
+ 2025-10-26 5dcaf9a01b runtime: add GOEXPERIMENT=runtimefree
+ 2025-10-26 d7a52f9369 cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64
+ 2025-10-26 6f04a92be3 internal/chacha8rand: provide vector implementation for riscv64
+ 2025-10-26 54e3adc533 cmd/go: use local loader state in test
+ 2025-10-26 ca379b1c56 cmd/go: remove loaderstate dependency
+ 2025-10-26 83a44bde64 cmd/go: remove unused loader state
+ 2025-10-26 7e7cd9de68 cmd/go: remove temporary rf cleanup script
+ 2025-10-26 53ad68de4b cmd/compile: allow unaligned load/store on Wasm
+ 2025-10-25 12ec09f434 cmd/go: use local state object in work.runBuild and work.runInstall
+ 2025-10-24 643f80a11f runtime: add ppc and s390 to 32 build constraints for gccgo
+ 2025-10-24 0afbeb5102 runtime: add ppc and s390 to linux 32 bits syscall build constraints for gccgo
+ 2025-10-24 7b506d106f cmd/go: use local state object in `generate.runGenerate`
+ 2025-10-24 26a8a21d7f cmd/go: use local state object in `env.runEnv`
+ 2025-10-24 f2dd3d7e31 cmd/go: use local state object in `vet.runVet`
+ 2025-10-24 784700439a cmd/go: use local state object in pkg `workcmd`
+ 2025-10-24 69673e9be2 cmd/go: use local state object in `tool.runTool`
+ 2025-10-24 2e12c5db11 cmd/go: use local state object in `test.runTest`
+ 2025-10-24 fe345ff2ae cmd/go: use local state object in `modget.runGet`
+ 2025-10-24 d312e27e8b cmd/go: use local state object in pkg `modcmd`
+ 2025-10-24 ea9cf26aa1 cmd/go: use local state object in `list.runList`
+ 2025-10-24 9926e1124e cmd/go: use local state object in `bug.runBug`
+ 2025-10-24 2c4fd7b2cd cmd/go: use local state object in `run.runRun`
+ 2025-10-24 ade9f33e1f cmd/go: add loaderstate as field on `mvsReqs`
+ 2025-10-24 ccf4192a31 cmd/go: make ImportMissingError work with local state
+ 2025-10-24 f5403f15f0 debug/pe: check for zdebug_gdb_scripts section in testDWARF
+ 2025-10-24 a26f860fa4 runtime: use 32-bit hash for maps on Wasm
+ 2025-10-24 747fe2efed encoding/json/v2: fix typo in documentation about errors.AsType
+ 2025-10-24 94f47fc03f cmd/link: remove pointless assignment in SetSymAlign
+ 2025-10-24 e6cff69051 crypto/x509: move constraint checking after chain building
+ 2025-10-24 f5f69a3de9 encoding/json/jsontext: avoid pinning application data in pools
+ 2025-10-24 a6a59f0762 encoding/json/v2: use slices.Sort directly
+ 2025-10-24 0d3dab9b1d crypto/x509: simplify candidate chain filtering
+ 2025-10-24 29046398bb cmd/go: refactor injection of modload.LoaderState
+ 2025-10-24 c18fa69e52 cmd/go: make ErrNoModRoot work with local state
+ 2025-10-24 296ecc918d cmd/go: add modload.State parameter to AllowedFunc
+ 2025-10-24 c445a61e52 cmd/go: add loaderstate as field on `QueryMatchesMainModulesError`
+ 2025-10-24 6ac40051d3 cmd/go: remove module loader state from ccompile
+ 2025-10-24 6a5a452528 cmd/go: inject vendor dir into builder struct
+ 2025-10-23 dfac972233 crypto/pbkdf2: add missing error return value in example
+ 2025-10-23 47bf8f073e unique: fix inconsistent panic prefix in canonmap cleanup path
+ 2025-10-23 03bd43e8bb go/types, types2: rename Named.resolve to unpack
+ 2025-10-23 9fcdc814b2 go/types, types2: rename loaded namedState to lazyLoaded
+ 2025-10-23 8401512a9b go/types, types2: rename complete namedState to hasMethods
+ 2025-10-23 cf826bfcb4 go/types, types2: set t.underlying exactly once in resolveUnderlying
+ 2025-10-23 c4e910895b net/url: speed up escape and unescape
+ 2025-10-23 3f6ac3a10f go/build: use slices.Equal
+ 2025-10-23 839da71f89 encoding/pem: properly calculate end indexes
+ 2025-10-23 39ed968832 cmd: update golang.org/x/arch for riscv64 disassembler
+ 2025-10-23 ca448191c9 all: replace Split in loops with more efficient SplitSeq
+ 2025-10-23 107fcb70de internal/goroot: replace HasPrefix+TrimPrefix with CutPrefix
+ 2025-10-23 8378276d66 strconv: optimize int-to-decimal and use consistently
+ 2025-10-23 e5688d0bdd cmd/internal/obj/riscv: simplify validation and encoding of raw instructions
+ 2025-10-22 77fc27972a doc/next: improve new(expr) release note
+ 2025-10-22 d94a8c56ad runtime: cleanup pagetrace
+ 2025-10-22 02728a2846 crypto/internal/fips140test: add entropy SHA2-384 testing
+ 2025-10-22 f92e01c117 runtime/cgo: fix cgoCheckArg description
+ 2025-10-22 50586182ab runtime: use backoff and ISB instruction to reduce contention in (*lfstack).pop and (*spanSet).pop on arm64
+ 2025-10-22 1ff59f3dd3 strconv: clean up powers-of-10 table, tests
+ 2025-10-22 7c9fa4d5e9 cmd/go: check if build output should overwrite files with renames
+ 2025-10-22 557b4d6e0f comment: change slice to string in function comment/help
+ 2025-10-22 d09a8c8ef4 go/types, types2: simplify locking in Named.resolveUnderlying
+ 2025-10-22 5a42af7f6c go/types, types2: in resolveUnderlying, only compute path when needed
+ 2025-10-22 4bdb55b5b8 go/types, types2: rename Named.under to Named.resolveUnderlying
+ 2025-10-21 29d43df8ab go/build, cmd/go: use ast.ParseDirective for go:embed
+ 2025-10-21 4e695dd634 go/ast: add ParseDirective for parsing directive comments
+ 2025-10-21 06e57e60a7 go/types, types2: only report version errors if new(expr) is ok otherwise
+ 2025-10-21 6c3d0d259f path/filepath: reword documentation for Rel
+ 2025-10-21 39fd61ddb0 go/types, types2: guard Named.underlying with Named.mu
+ 2025-10-21 4a0115c886 runtime,syscall: implement and use syscalln on darwin
+ 2025-10-21 261c561f5a all: gofmt -w
+ 2025-10-21 c9c78c06ef strconv: embed testdata in test
+ 2025-10-21 8f74f9daf4 sync: re-enable race even when panicking
+ 2025-10-21 8a6c64f4fe syscall: use rawSyscall6 to call ptrace in forkAndExecInChild
+ 2025-10-21 4620db72d2 runtime: use timer_settime64 on 32-bit Linux
+ 2025-10-21 b31dc77cea os: support deleting read-only files in RemoveAll on older Windows versions
+ 2025-10-21 46cc532900 cmd/compile/internal/ssa: fix typo in comment
+ 2025-10-21 2163a58021 crypto/internal/fips140/entropy: increase AllocsPerRun iterations
+ 2025-10-21 306eacbc11 cmd/go/testdata/script: disable list_empty_importpath test on Windows
+ 2025-10-21 a5a249d6a6 all: eliminate unnecessary type conversions
+ 2025-10-21 694182d77b cmd/internal/obj/ppc64: improve large prologue generation
+ 2025-10-21 b0dcb95542 cmd/compile: leave the horses alone
+ 2025-10-21 9a5a1202f4 runtime: clean dead architectures from go:build constraint
+ 2025-10-21 8539691d0c crypto/internal/fips140/entropy: move to crypto/internal/entropy/v1.0.0
+ 2025-10-20 99cf4d671c runtime: save lasx and lsx registers in loong64 async preemption
+ 2025-10-20 79ae97fe9b runtime: make procyieldAsm no longer loop infinitely if passed 0
+ 2025-10-20 f838faffe2 runtime: wrap procyield assembly and check for 0
+ 2025-10-20 ee4d2c312d runtime/trace: dump test traces on validation failure
+ 2025-10-20 7b81a1e107 net/url: reduce allocs in Encode
+ 2025-10-20 e425176843 cmd/asm: fix typo in comment
+ 2025-10-20 dc9a3e2a65 runtime: fix generation skew with trace reentrancy
+ 2025-10-20 df33c17091 runtime: add _Gdeadextra status
+ 2025-10-20 7503856d40 cmd/go: inject loaderstate into matcher function
+ 2025-10-20 d57c3fd743 cmd/go: inject State parameter into `work.runInstall`
+ 2025-10-20 e94a5008f6 cmd/go: inject State parameter into `work.runBuild`
+ 2025-10-20 d9e6f95450 cmd/go: inject State parameter into `workcmd.runSync`
+ 2025-10-20 9769a61e64 cmd/go: inject State parameter into `modget.runGet`
+ 2025-10-20 f859799ccf cmd/go: inject State parameter into `modcmd.runVerify`
+ 2025-10-20 0f820aca29 cmd/go: inject State parameter into `modcmd.runVendor`
+ 2025-10-20 92aa3e9e98 cmd/go: inject State parameter into `modcmd.runInit`
+ 2025-10-20 e176dff41c cmd/go: inject State parameter into `modcmd.runDownload`
+ 2025-10-20 e7c66a58d5 cmd/go: inject State parameter into `toolchain.Select`
+ 2025-10-20 4dc3dd9a86 cmd/go: add loaderstate to Switcher
+ 2025-10-20 bcf7da1595 cmd/go: convert functions to methods
+ 2025-10-20 0d3044f965 cmd/go: make Reset work with any State instance
+ 2025-10-20 386d81151d cmd/go: make setState work with any State instance
+ 2025-10-20 a420aa221e cmd/go: inject State parameter into `tool.runTool`
+ 2025-10-20 441e7194a4 cmd/go: inject State parameter into `test.runTest`
+ 2025-10-20 35e8309be2 cmd/go: inject State parameter into `list.runList`
+ 2025-10-20 29a81624f7 cmd/go: inject state parameter into `fmtcmd.runFmt`
+ 2025-10-20 f7eaea02fd cmd/go: inject state parameter into `clean.runClean`
+ 2025-10-20 58a8fdb6cf cmd/go: inject State parameter into `bug.runBug`
+ 2025-10-20 8d0bef7ffe runtime: add linkname documentation and guidance
+ 2025-10-20 3e43f48cb6 encoding/asn1: use reflect.TypeAssert to improve performance
+ 2025-10-20 4ad5585c2c runtime: fix _rt0_ppc64x_lib on aix
+ 2025-10-17 a5f55a441e cmd/fix: add modernize and inline analyzers
+ 2025-10-17 80876f4b42 cmd/go/internal/vet: tweak help doc
+ 2025-10-17 b5aefe07e5 all: remove unnecessary loop variable copies in tests
+ 2025-10-17 5137c473b6 go/types, types2: remove references to under function in comments
+ 2025-10-17 dbbb1bfc91 all: correct name for comments
+ 2025-10-17 0983090171 encoding/pem: properly decode strange PEM data
+ 2025-10-17 36863d6194 runtime: unify riscv64 library entry point
+ 2025-10-16 0c14000f87 go/types, types2: remove under(Type) in favor of Type.Underlying()
+ 2025-10-16 1099436f1b go/types, types2: change and enforce lifecycle of Named.fromRHS and Named.underlying fields
+ 2025-10-16 41f5659347 go/types, types2: remove superfluous unalias call (minor cleanup)
+ 2025-10-16 e7351c03c8 runtime: use DC ZVA instead of its encoding in WORD in arm64 memclr
+ 2025-10-16 6cbe0920c4 cmd: update to x/tools@7d9453cc
+ 2025-10-15 45eee553e2 cmd/internal/obj: move ARM64RegisterExtension from cmd/asm/internal/arch
+ 2025-10-15 27f9a6705c runtime: increase repeat count for alloc test
+ 2025-10-15 b68cebd809 net/http/httptest: record failed ResponseWriter writes
+ 2025-10-15 f1fed742eb cmd: fix three printf problems reported by newest vet
+ 2025-10-15 0984dcd757 cmd/compile: fix an error in comments
+ 2025-10-15 31f82877e8 go/types, types2: fix misleading internal comment
+ 2025-10-15 6346349f56 cmd/compile: replace angle brackets with square
+ 2025-10-15 284379cdfc cmd/compile: remove rematerializable values from live set across calls
+ 2025-10-15 519ae514ab cmd/compile: eliminate bound check for slices of the same length
+ 2025-10-15 b5a29cca48 cmd/distpack: add fix tool to inventory
+ 2025-10-15 bb5eb51715 runtime/pprof: fix errors in pprof_test
+ 2025-10-15 5c9a26c7f8 cmd/compile: use arm64 neon in LoweredMemmove/LoweredMemmoveLoop
+ 2025-10-15 61d1ff61ad cmd/compile: use block starting position for phi line number
+ 2025-10-15 5b29875c8e cmd/go: inject State parameter into `run.runRun`
+ 2025-10-15 5113496805 runtime/pprof: skip flaky test TestProfilerStackDepth/heap for now
+ 2025-10-15 36086e85f8 cmd/go: create temporary cleanup script
+ 2025-10-14 7056c71d32 cmd/compile: disable use of new saturating float-to-int conversions
+ 2025-10-14 6d5b13793f Revert "cmd/compile: make 386 float-to-int conversions match amd64"
+ 2025-10-14 bb2a14252b Revert "runtime: adjust softfloat corner cases to match amd64/arm64"
+ 2025-10-14 3bc9d9fa83 Revert "cmd/compile: make wasm match other platforms for FP->int32/64 conversions"
+ 2025-10-14 ee5af46172 encoding/json: avoid misleading errors under goexperiment.jsonv2
+ 2025-10-14 11d3d2f77d cmd/internal/obj/arm64: add support for PAC instructions
+ 2025-10-14 4dbf1a5a4c cmd/compile/internal/devirtualize: do not track assignments to non-PAUTO
+ 2025-10-14 0ddb5ed465 cmd/compile/internal/devirtualize: use FatalfAt instead of Fatalf where possible
+ 2025-10-14 0a239bcc99 Revert "net/url: disallow raw IPv6 addresses in host"
+ 2025-10-14 5a9ef44bc0 cmd/compile/internal/devirtualize: fix OCONVNOP assertion
+ 2025-10-14 3765758b96 go/types, types2: minor cleanup (remove TODO)
+ 2025-10-14 f6b9d56aff crypto/internal/fips140/entropy: fix benign race
+ 2025-10-14 60f6d2f623 crypto/internal/fips140/entropy: support SHA-384 sizes for ACVP tests
+ 2025-10-13 6fd8e88d07 encoding/json/v2: restrict presence of default options
+ 2025-10-13 1abc6b0204 go/types, types2: permit type cycles through type parameter lists
+ 2025-10-13 9fdd6904da strconv: add tests that Java once mishandled
+ 2025-10-13 9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler
+ 2025-10-13 0e64ee1286 encoding/json/v2: report EOF for top-level values in UnmarshalDecode
+ 2025-10-13 6bcd97d9f4 all: replace calls to errors.As with errors.AsType
+ 2025-10-11 1cd71689f2 crypto/x509: rework fix for CVE-2025-58187
+ 2025-10-11 8aa1efa223 cmd/link: in TestFallocate, only check number of blocks on Darwin
+ 2025-10-10 b497a29d25 encoding/json: fix regression in quoted numbers under goexperiment.jsonv2
+ 2025-10-10 48bb7a6114 cmd/compile: repair bisection behavior for float-to-unsigned conversion
+ 2025-10-10 e8a53538b4 runtime: fail TestGoroutineLeakProfile on data race
+ 2025-10-10 e3be2d1b2b net/url: disallow raw IPv6 addresses in host
+ 2025-10-10 aced4c79a2 net/http: strip request body headers on POST to GET redirects
+ 2025-10-10 584a89fe74 all: omit unnecessary reassignment
+ 2025-10-10 69e8279632 net/http: set cookie host to Request.Host when available
+ 2025-10-10 6f4c63ba63 cmd/go: unify "go fix" and "go vet"
+ 2025-10-10 955a5a0dc5 runtime: support arm64 Neon in async preemption
+ 2025-10-10 5368e77429 net/http: run TestRequestWriteTransport with fake time to avoid flakes
+ 2025-10-09 c53cb642de internal/buildcfg: enable greenteagc experiment for loong64
+ 2025-10-09 954fdcc51a cmd/compile: declare no output register for loong64 LoweredAtomic{And,Or}32 ops
+ 2025-10-09 19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly
+ 2025-10-09 80f3bb5516 reflect: remove timeout in TestChanOfGC
+ 2025-10-09 9db7e30bb4 net/url: allow IP-literals with IPv4-mapped IPv6 addresses
+ 2025-10-09 8d810286b3 cmd/compile: make wasm match other platforms for FP->int32/64 conversions
+ 2025-10-09 b9f3accdcf runtime: adjust softfloat corner cases to match amd64/arm64
+ 2025-10-09 78d75b3799 cmd/compile: make 386 float-to-int conversions match amd64
+ 2025-10-09 0e466a8d1d cmd/compile: modify float-to-[u]int so that amd64 and arm64 match
+ 2025-10-08 4837fbe414 net/http/httptest: check whether response bodies are allowed
+ 2025-10-08 ee163197a8 path/filepath: return cleaned path from Rel
+ 2025-10-08 de9da0de30 cmd/compile/internal/devirtualize: improve concrete type analysis
+ 2025-10-08 ae094a1397 crypto/internal/fips140test: make entropy file pair names match
+ 2025-10-08 941e5917c1 runtime: cleanup comments from asm_ppc64x.s improvements
+ 2025-10-08 d945600d06 cmd/gofmt: change -d to exit 1 if diffs exist
+ 2025-10-08 d4830c6130 cmd/internal/obj: fix Link.Diag printf errors
+ 2025-10-08 e1ca1de123 net/http: format pprof.go
+ 2025-10-08 e5d004c7a8 net/http: update HTTP/2 documentation to reference new config features
+ 2025-10-08 97fd6bdecc cmd/compile: fuse NaN checks with other comparisons
+ 2025-10-07 78b43037dc cmd/go: refactor usage of `workFilePath`
+ 2025-10-07 bb1ca7ae81 cmd/go, testing: add TB.ArtifactDir and -artifacts flag
+ 2025-10-07 1623927730 cmd/go: refactor usage of `requirements`
+ 2025-10-07 a1661e776f Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mips64x"
+ 2025-10-07 cb81270113 Revert "crypto/internal/fips140/subtle: add assembly implementation of xorBytes for mipsx"
+ 2025-10-07 f2d0d05d28 cmd/go: refactor usage of `MainModules`
+ 2025-10-07 f7a68d3804 archive/tar: set a limit on the size of GNU sparse file 1.0 regions
+ 2025-10-07 463165699d net/mail: avoid quadratic behavior in mail address parsing
+ 2025-10-07 5ede095649 net/textproto: avoid quadratic complexity in Reader.ReadResponse
+ 2025-10-07 5ce8cd16f3 encoding/pem: make Decode complexity linear
+ 2025-10-07 f6f4e8b3ef net/url: enforce stricter parsing of bracketed IPv6 hostnames
+ 2025-10-07 7dd54e1fd7 runtime: make work.spanSPMCs.all doubly-linked
+ 2025-10-07 3ee761739b runtime: free spanQueue on P destroy
+ 2025-10-07 8709a41d5e encoding/asn1: prevent memory exhaustion when parsing using internal/saferio
+ 2025-10-07 9b9d02c5a0 net/http: add httpcookiemaxnum GODEBUG option to limit number of cookies parsed
+ 2025-10-07 3fc4c79fdb crypto/x509: improve domain name verification
+ 2025-10-07 6e4007e8cf crypto/x509: mitigate DoS vector when intermediate certificate contains DSA public key
+ 2025-10-07 6f7926589d cmd/go: refactor usage of `modRoots`
+ 2025-10-07 11d5484190 runtime: fix self-deadlock on sbrk platforms
+ 2025-10-07 2e52060084 cmd/go: refactor usage of `RootMode`
+ 2025-10-07 f86ddb54b5 cmd/go: refactor usage of `ForceUseModules`
+ 2025-10-07 c938051dd0 Revert "cmd/compile: redo arm64 LR/FP save and restore"
+ 2025-10-07 6469954203 runtime: assert p.destroy runs with GC not running
+ 2025-10-06 4c0fd3a2b4 internal/goexperiment: remove the synctest GOEXPERIMENT
+ 2025-10-06 c1e6e49d5d fmt: reduce Errorf("x") allocations to match errors.New("x")
+ 2025-10-06 7fbf54bfeb internal/buildcfg: enable greenteagc experiment by default
+ 2025-10-06 7bfeb43509 cmd/go: refactor usage of `initialized`
+ 2025-10-06 1d62e92567 test/codegen: make sure assignment results are used.
+ 2025-10-06 4fca79833f runtime: delete redundant code in the page allocator
+ 2025-10-06 719dfcf8a8 cmd/compile: redo arm64 LR/FP save and restore
+ 2025-10-06 f3312124c2 runtime: remove batching from spanSPMC free
+ 2025-10-06 24416458c2 cmd/go: export type State
+ 2025-10-06 c2fb15164b testing/synctest: remove Run
+ 2025-10-06 ac2ec82172 runtime: bump thread count slack for TestReadMetricsSched
+ 2025-10-06 e74b224b7c crypto/tls: streamline BoGo testing w/ -bogo-local-dir
+ 2025-10-06 3a05e7b032 spec: close tag
+ 2025-10-03 2a71af11fc net/url: improve URL docs
+ 2025-10-03 ee5369b003 cmd/link: add LIBRARY statement only with -buildmode=cshared
+ 2025-10-03 1bca4c1673 cmd/compile: improve slicemask removal
+ 2025-10-03 38b26f29f1 cmd/compile: remove stores to unread parameters
+ 2025-10-03 003b5ce1bc cmd/compile: fix SIMD const rematerialization condition
+ 2025-10-03 d91148c7a8 cmd/compile: enhance prove to infer bounds in slice len/cap calculations
+ 2025-10-03 20c9377e47 cmd/compile: enhance the chunked indexing case to include reslicing
+ 2025-10-03 ad3db2562e cmd/compile: handle rematerialized op for incompatible reg constraint
+ 2025-10-03 18cd4a1fc7 cmd/compile: use the right type for spill slot
+ 2025-10-03 1caa95acfa cmd/compile: enhance prove to deal with double-offset IsInBounds checks
+ 2025-10-03 ec70d19023 cmd/compile: rewrite to elide Slicemask from len==c>0 slicing
+ 2025-10-03 10e7968849 cmd/compile: accounts rematerialize ops's output reginfo
+ 2025-10-03 ab043953cb cmd/compile: minor tweak for race detector
+ 2025-10-03 ebb72bef44 cmd/compile: don't treat devel compiler as a released compiler
+ 2025-10-03 c54dc1418b runtime: support valgrind (but not asan) in specialized malloc functions
+ 2025-10-03 a7917eed70 internal/buildcfg: enable specializedmalloc experiment
+ 2025-10-03 630799c6c9 crypto/tls: add flag to render HTML BoGo report

Change-Id: I6bf904c523a77ee7d3dea9c8ae72292f8a5f2ba5
2025-11-13 16:54:07 -05:00
Michael Munday
34aef89366 cmd/compile: use FCLASSD for subnormal checks on riscv64
Only implemented for 64 bit floating point operations for now.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │       sec/op        │   sec/op     vs base                │
Acos                          154.1n ± 0%   154.1n ± 0%        ~ (p=0.303 n=10)
Acosh                         215.8n ± 6%   226.7n ± 0%        ~ (p=0.439 n=10)
Asin                          149.2n ± 1%   149.2n ± 0%        ~ (p=0.700 n=10)
Asinh                         262.1n ± 0%   258.5n ± 0%   -1.37% (p=0.000 n=10)
Atan                          99.48n ± 0%   99.49n ± 0%        ~ (p=0.836 n=10)
Atanh                         244.9n ± 0%   243.8n ± 0%   -0.43% (p=0.002 n=10)
Atan2                         158.2n ± 1%   153.3n ± 0%   -3.10% (p=0.000 n=10)
Cbrt                          186.8n ± 0%   181.1n ± 0%   -3.03% (p=0.000 n=10)
Ceil                          36.71n ± 1%   36.71n ± 0%        ~ (p=0.434 n=10)
Copysign                      6.531n ± 1%   6.526n ± 0%        ~ (p=0.268 n=10)
Cos                           98.19n ± 0%   95.40n ± 0%   -2.84% (p=0.000 n=10)
Cosh                          233.1n ± 0%   222.6n ± 0%   -4.50% (p=0.000 n=10)
Erf                           122.5n ± 0%   114.2n ± 0%   -6.78% (p=0.000 n=10)
Erfc                          126.0n ± 1%   116.6n ± 0%   -7.46% (p=0.000 n=10)
Erfinv                        138.8n ± 0%   138.6n ± 0%        ~ (p=0.082 n=10)
Erfcinv                       140.0n ± 0%   139.7n ± 0%        ~ (p=0.359 n=10)
Exp                           193.3n ± 0%   184.2n ± 0%   -4.68% (p=0.000 n=10)
ExpGo                         204.8n ± 0%   194.5n ± 0%   -5.03% (p=0.000 n=10)
Expm1                         152.5n ± 1%   145.0n ± 0%   -4.92% (p=0.000 n=10)
Exp2                          174.5n ± 0%   164.2n ± 0%   -5.85% (p=0.000 n=10)
Exp2Go                        184.4n ± 1%   175.4n ± 0%   -4.88% (p=0.000 n=10)
Abs                           4.912n ± 0%   4.914n ± 0%        ~ (p=0.283 n=10)
Dim                           15.50n ± 1%   15.52n ± 1%        ~ (p=0.331 n=10)
Floor                         36.89n ± 1%   36.76n ± 1%        ~ (p=0.325 n=10)
Max                           31.05n ± 1%   31.17n ± 1%        ~ (p=0.628 n=10)
Min                           31.01n ± 0%   31.06n ± 0%        ~ (p=0.767 n=10)
Mod                           294.1n ± 0%   245.6n ± 0%  -16.52% (p=0.000 n=10)
Frexp                         44.86n ± 1%   35.20n ± 0%  -21.53% (p=0.000 n=10)
Gamma                         195.8n ± 0%   185.4n ± 1%   -5.29% (p=0.000 n=10)
Hypot                         84.91n ± 0%   84.54n ± 1%   -0.43% (p=0.006 n=10)
HypotGo                       96.70n ± 0%   95.42n ± 1%   -1.32% (p=0.000 n=10)
Ilogb                         45.03n ± 0%   35.07n ± 1%  -22.10% (p=0.000 n=10)
J0                            634.5n ± 0%   627.2n ± 0%   -1.16% (p=0.000 n=10)
J1                            644.5n ± 0%   636.9n ± 0%   -1.18% (p=0.000 n=10)
Jn                            1.357µ ± 0%   1.344µ ± 0%   -0.92% (p=0.000 n=10)
Ldexp                         49.89n ± 0%   39.96n ± 0%  -19.90% (p=0.000 n=10)
Lgamma                        186.6n ± 0%   184.3n ± 0%   -1.21% (p=0.000 n=10)
Log                           150.4n ± 0%   141.1n ± 0%   -6.15% (p=0.000 n=10)
Logb                          46.70n ± 0%   35.89n ± 0%  -23.15% (p=0.000 n=10)
Log1p                         164.1n ± 0%   163.9n ± 0%        ~ (p=0.122 n=10)
Log10                         153.1n ± 0%   143.5n ± 0%   -6.24% (p=0.000 n=10)
Log2                          58.83n ± 0%   49.75n ± 0%  -15.43% (p=0.000 n=10)
Modf                          40.82n ± 1%   40.78n ± 0%        ~ (p=0.239 n=10)
Nextafter32                   49.15n ± 0%   48.93n ± 0%   -0.44% (p=0.011 n=10)
Nextafter64                   43.33n ± 0%   43.23n ± 0%        ~ (p=0.228 n=10)
PowInt                        269.4n ± 0%   243.8n ± 0%   -9.49% (p=0.000 n=10)
PowFrac                       618.0n ± 0%   571.7n ± 0%   -7.48% (p=0.000 n=10)
Pow10Pos                      13.09n ± 0%   13.05n ± 0%   -0.31% (p=0.003 n=10)
Pow10Neg                      30.99n ± 1%   30.99n ± 0%        ~ (p=0.173 n=10)
Round                         23.73n ± 0%   23.65n ± 0%   -0.36% (p=0.011 n=10)
RoundToEven                   27.87n ± 0%   27.73n ± 0%   -0.48% (p=0.003 n=10)
Remainder                     282.1n ± 0%   249.6n ± 0%  -11.52% (p=0.000 n=10)
Signbit                       11.46n ± 0%   11.42n ± 0%   -0.39% (p=0.003 n=10)
Sin                           115.2n ± 0%   113.2n ± 0%   -1.74% (p=0.000 n=10)
Sincos                        140.6n ± 0%   138.6n ± 0%   -1.39% (p=0.000 n=10)
Sinh                          252.0n ± 0%   241.4n ± 0%   -4.21% (p=0.000 n=10)
SqrtIndirect                  4.909n ± 0%   4.893n ± 0%   -0.34% (p=0.021 n=10)
SqrtLatency                   19.57n ± 1%   19.57n ± 0%        ~ (p=0.087 n=10)
SqrtIndirectLatency           19.64n ± 0%   19.57n ± 0%   -0.36% (p=0.025 n=10)
SqrtGoLatency                 198.1n ± 0%   197.4n ± 0%   -0.35% (p=0.014 n=10)
SqrtPrime                     5.733µ ± 0%   5.725µ ± 0%        ~ (p=0.116 n=10)
Tan                           149.1n ± 0%   146.8n ± 0%   -1.54% (p=0.000 n=10)
Tanh                          248.2n ± 1%   238.1n ± 0%   -4.05% (p=0.000 n=10)
Trunc                         36.86n ± 0%   36.70n ± 0%   -0.43% (p=0.029 n=10)
Y0                            638.2n ± 0%   633.6n ± 0%   -0.71% (p=0.000 n=10)
Y1                            641.8n ± 0%   636.1n ± 0%   -0.87% (p=0.000 n=10)
Yn                            1.358µ ± 0%   1.345µ ± 0%   -0.92% (p=0.000 n=10)
Float64bits                   5.721n ± 0%   5.709n ± 0%   -0.22% (p=0.044 n=10)
Float64frombits               4.905n ± 0%   4.893n ± 0%        ~ (p=0.266 n=10)
Float32bits                   12.27n ± 0%   12.23n ± 0%        ~ (p=0.122 n=10)
Float32frombits               4.909n ± 0%   4.893n ± 0%   -0.32% (p=0.024 n=10)
FMA                           6.556n ± 0%   6.526n ± 0%        ~ (p=0.283 n=10)
geomean                       86.82n        83.75n        -3.54%

Change-Id: I522297a79646d76543d516accce291f5a3cea337
Reviewed-on: https://go-review.googlesource.com/c/go/+/717560
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-11-12 10:03:41 -08:00
Junyang Shao
86b4fe31d9 [dev.simd] cmd/compile: add masked merging ops and optimizations
This CL generates optimizations for masked variant of AVX512
instructions for patterns:

x.Op(y).Merge(z, mask) => OpMasked(z, x, y mask), where OpMasked is
resultInArg0.

Change-Id: Ife7ccc9ddbf76ae921a085bd6a42b965da9bc179
Reviewed-on: https://go-review.googlesource.com/c/go/+/718160
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Junyang Shao <shaojunyang@google.com>
2025-11-11 13:34:39 -08:00
Junyang Shao
771a1dc216 [dev.simd] cmd/compile: add peepholes for all masked ops and bug fixes
For 512-bits they are unchanged. This CL adds the optimization rules for
128/256-bits under feature check.

This CL also fixed a bug for masked load variant of instructions and
make them zeroing by default as well.

Change-Id: I6fe395541c0cd509984a81841420e71c3af732f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/717822
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-11-10 09:53:24 -08:00
Youlin Feng
c7ccbddf22 cmd/compile/internal/ssa: more aggressive on dead auto elim
Propagate "unread" across OpMoves. If the addr of this auto is only used
by an OpMove as its source arg, and the OpMove's target arg is the addr
of another auto. If the 2nd auto can be eliminated, this one can also be
eliminated.

This CL eliminates unnecessary memory copies and makes the frame smaller
in the following code snippet:

func contains(m map[string][16]int, k string) bool {
        _, ok := m[k]
        return ok
}

These are the benchmark results followed by the benchmark code:

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │   old.txt   │              new.txt                │
                │   sec/op    │   sec/op     vs base                │
Map1Access2Ok-8   9.582n ± 2%   9.226n ± 0%   -3.72% (p=0.000 n=20)
Map2Access2Ok-8   13.79n ± 1%   10.24n ± 1%  -25.77% (p=0.000 n=20)
Map3Access2Ok-8   68.68n ± 1%   12.65n ± 1%  -81.58% (p=0.000 n=20)

package main_test

import "testing"

var (
        m1 = map[int]int{}
        m2 = map[int][16]int{}
        m3 = map[int][256]int{}
)

func init() {
        for i := range 1000 {
                m1[i] = i
                m2[i] = [16]int{15:i}
                m3[i] = [256]int{255:i}
        }
}

func BenchmarkMap1Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m1[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap2Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m2[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

func BenchmarkMap3Access2Ok(b *testing.B) {
        for i := range b.N {
                _, ok := m3[i%1000]
                if !ok {
                        b.Errorf("%d not found", i)
                }
        }
}

Fixes #75398

Change-Id: If75e9caaa50d460efc31a94565b9ba28c8158771
Reviewed-on: https://go-review.googlesource.com/c/go/+/702875
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-11-04 12:46:15 -08:00
Russ Cox
6e165b4d17 cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.

goos: wasip1
goarch: wasm
pkg: internal/strconv
                               │   old.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal              192.8n ± 1%   194.6n ± 0%   +0.91% (p=0.000 n=10)
AppendFloat/Float                328.6n ± 0%   279.6n ± 0%  -14.93% (p=0.000 n=10)
AppendFloat/Exp                  335.6n ± 1%   289.2n ± 1%  -13.80% (p=0.000 n=10)
AppendFloat/NegExp               336.0n ± 0%   289.1n ± 1%  -13.97% (p=0.000 n=10)
AppendFloat/LongExp              332.4n ± 0%   285.2n ± 1%  -14.20% (p=0.000 n=10)
AppendFloat/Big                  348.2n ± 0%   300.1n ± 0%  -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp            137.4n ± 0%   138.2n ± 0%   +0.55% (p=0.001 n=10)
AppendFloat/32Integer            193.3n ± 1%   196.5n ± 0%   +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction      283.3n ± 0%   268.9n ± 1%   -5.08% (p=0.000 n=10)
AppendFloat/32Point              279.9n ± 0%   266.5n ± 0%   -4.80% (p=0.000 n=10)
AppendFloat/32Exp                300.1n ± 0%   288.3n ± 1%   -3.90% (p=0.000 n=10)
AppendFloat/32NegExp             288.2n ± 1%   277.9n ± 1%   -3.59% (p=0.000 n=10)
AppendFloat/32Shortest           261.7n ± 0%   250.2n ± 0%   -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard         173.3n ± 1%   158.9n ± 1%   -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard         180.0n ± 0%   167.9n ± 2%   -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1             167.1n ± 0%   149.6n ± 1%  -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2             162.4n ± 1%   146.5n ± 0%   -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5           165.5n ± 0%   149.4n ± 1%   -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3             166.4n ± 1%   150.2n ± 0%   -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4             163.7n ± 0%   149.6n ± 1%   -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard         182.8n ± 1%   167.1n ± 1%   -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12            222.2n ± 0%   208.8n ± 0%   -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16            197.6n ± 1%   181.7n ± 0%   -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard        194.5n ± 0%   181.0n ± 0%   -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard        205.1n ± 1%   191.9n ± 0%   -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard        6.269µ ± 0%   6.643µ ± 0%   +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1            211.7n ± 1%   197.0n ± 0%   -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2            189.4n ± 0%   174.2n ± 0%   -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3            169.0n ± 0%   154.9n ± 0%   -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64           321.2n ± 0%   274.2n ± 1%  -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64   307.4n ± 1%   261.2n ± 0%  -15.03% (p=0.000 n=10)
AppendInt                        3.367µ ± 1%   3.376µ ± 0%        ~ (p=0.517 n=10)
AppendUint                       675.5n ± 0%   676.9n ± 0%        ~ (p=0.196 n=10)
AppendIntSmall                   28.13n ± 1%   28.17n ± 0%   +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1        20.70n ± 0%   20.51n ± 1%   -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2        20.43n ± 0%   20.27n ± 0%   -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3        38.48n ± 0%   37.93n ± 0%   -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4        41.10n ± 0%   38.78n ± 1%   -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5        42.25n ± 1%   42.11n ± 0%   -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6        45.40n ± 1%   43.14n ± 0%   -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7        46.81n ± 1%   46.03n ± 0%   -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8        48.88n ± 1%   46.59n ± 1%   -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9        49.94n ± 2%   49.41n ± 1%   -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10       57.28n ± 1%   56.92n ± 1%   -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11       60.09n ± 1%   58.11n ± 2%   -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12       62.22n ± 0%   61.85n ± 0%   -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13       64.94n ± 0%   62.92n ± 0%   -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14       65.42n ± 1%   65.19n ± 1%   -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15       68.17n ± 0%   66.13n ± 0%   -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16       70.21n ± 1%   70.09n ± 1%        ~ (p=0.517 n=10)
AppendUintVarlen/digits=17       72.93n ± 0%   70.49n ± 0%   -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18       73.01n ± 0%   72.75n ± 0%   -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19       79.27n ± 1%   79.49n ± 1%        ~ (p=0.671 n=10)
AppendUintVarlen/digits=20       82.18n ± 0%   80.43n ± 1%   -2.14% (p=0.000 n=10)
geomean                          143.4n        136.0n        -5.20%


Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-04 11:38:18 -08:00
Russ Cox
1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.

Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:

	(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
	(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
	(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)

For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:

	(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
	(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
	(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)

The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.

The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.

This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.

pkg: cmd/compile/internal/test

benchmark \ host                      local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                    vs base      vs base  vs base    vs base        vs base
DivconstI64                               ~            ~        ~    -49.66%        -21.02%
ModconstI64                               ~            ~        ~    -13.45%        +14.52%
DivisiblePow2constI64                     ~            ~        ~     +0.97%         -1.32%
DivisibleconstI64                         ~            ~        ~    -20.01%        -48.28%
DivisibleWDivconstI64                     ~            ~   -1.76%    -38.59%        -42.74%
DivconstU64/3                             ~            ~        ~    -13.82%         -4.09%
DivconstU64/5                             ~            ~        ~    -14.10%         -3.54%
DivconstU64/37                       -2.07%       -4.45%        ~    -19.60%         -9.55%
DivconstU64/1234567                       ~            ~        ~    -61.55%        -56.93%
ModconstU64                               ~            ~        ~     -6.25%              ~
DivisibleconstU64                         ~            ~        ~     -2.78%         -7.82%
DivisibleWDivconstU64                     ~            ~        ~     +4.23%         +2.56%

pkg: math/bits

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Add                       ~            ~          ~              ~
Add32                +1.59%            ~          ~              ~
Add64                     ~            ~          ~              ~
Add64multiple             ~            ~          ~              ~
Sub                       ~            ~          ~              ~
Sub32                     ~            ~          ~              ~
Sub64                     ~            ~     -9.20%              ~
Sub64multiple             ~            ~          ~              ~
Mul                       ~            ~          ~              ~
Mul32                     ~            ~          ~              ~
Mul64                     ~            ~    -41.58%        -53.21%
Div                       ~            ~          ~              ~
Div32                     ~            ~          ~              ~
Div64                     ~            ~          ~              ~

pkg: strconv

benchmark \ host                       s7  linux-amd64  linux-386  s7:GOARCH=386
                                  vs base      vs base    vs base        vs base
ParseInt/Pos/7bit                       ~            ~    -11.08%         -6.75%
ParseInt/Pos/26bit                      ~            ~    -13.65%        -11.02%
ParseInt/Pos/31bit                      ~            ~    -14.65%         -9.71%
ParseInt/Pos/56bit                 -1.80%            ~    -17.97%        -10.78%
ParseInt/Pos/63bit                      ~            ~    -13.85%         -9.63%
ParseInt/Neg/7bit                       ~            ~    -12.14%         -7.26%
ParseInt/Neg/26bit                      ~            ~    -14.18%         -9.81%
ParseInt/Neg/31bit                      ~            ~    -14.51%         -9.02%
ParseInt/Neg/56bit                      ~            ~    -15.79%         -9.79%
ParseInt/Neg/63bit                      ~            ~    -15.68%        -11.07%
AppendFloat/Decimal                     ~            ~     -7.25%        -12.26%
AppendFloat/Float                       ~            ~    -15.96%        -19.45%
AppendFloat/Exp                         ~            ~    -13.96%        -17.76%
AppendFloat/NegExp                      ~            ~    -14.89%        -20.27%
AppendFloat/LongExp                     ~            ~    -12.68%        -17.97%
AppendFloat/Big                         ~            ~    -11.10%        -16.64%
AppendFloat/BinaryExp                   ~            ~          ~              ~
AppendFloat/32Integer                   ~            ~    -10.05%        -10.91%
AppendFloat/32ExactFraction             ~            ~     -8.93%        -13.00%
AppendFloat/32Point                     ~            ~    -10.36%        -14.89%
AppendFloat/32Exp                       ~            ~     -9.88%        -13.54%
AppendFloat/32NegExp                    ~            ~    -10.16%        -14.26%
AppendFloat/32Shortest                  ~            ~    -11.39%        -14.96%
AppendFloat/32Fixed8Hard                ~            ~          ~         -2.31%
AppendFloat/32Fixed9Hard                ~            ~          ~         -7.01%
AppendFloat/64Fixed1                    ~            ~     -2.83%         -8.23%
AppendFloat/64Fixed2                    ~            ~          ~         -7.94%
AppendFloat/64Fixed3                    ~            ~     -4.07%         -7.22%
AppendFloat/64Fixed4                    ~            ~     -7.24%         -7.62%
AppendFloat/64Fixed12                   ~            ~     -6.57%         -4.82%
AppendFloat/64Fixed16                   ~            ~     -4.00%         -5.81%
AppendFloat/64Fixed12Hard          -2.22%            ~     -4.07%         -6.35%
AppendFloat/64Fixed17Hard          -2.12%            ~          ~         -3.79%
AppendFloat/64Fixed18Hard          -1.89%            ~     +2.48%              ~
AppendFloat/Slowpath64             -1.85%            ~    -14.49%        -18.21%
AppendFloat/SlowpathDenormal64          ~            ~    -13.08%        -19.41%

pkg: crypto/internal/fips140/nistec/fiat

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Mul/P224                  ~            ~    -29.95%        -39.60%
Mul/P384                  ~            ~    -37.11%        -63.33%
Mul/P521                  ~            ~    -26.62%        -12.42%
Square/P224          +1.46%            ~    -40.62%        -49.18%
Square/P384               ~            ~    -45.51%        -69.68%
Square/P521         +90.37%            ~    -25.26%        -11.23%

(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)

pkg: crypto/internal/fips140/edwards25519

benchmark \ host                    s7  linux-amd64  linux-386  s7:GOARCH=386
                               vs base      vs base    vs base        vs base
EncodingDecoding                     ~            ~    -34.67%        -35.75%
ScalarBaseMult                       ~            ~    -31.25%        -30.29%
ScalarMult                           ~            ~    -33.45%        -32.54%
VarTimeDoubleScalarBaseMult          ~            ~    -33.78%        -33.68%

Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-30 08:04:20 -07:00
Russ Cox
9bbda7c99d cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.

The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)

The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.

The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.

Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.

https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.

One short example, in code that does x[i%len(x)]:

< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds

A longer example:

< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds

These diffs are due to the smaller opt being better
and taking work away from prove:

< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds

In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:

	(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
	(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
	(Add 8 (Mul 8 (Sub i i))) ->
	(Add 8 (Mul 8 0)) ->
	(Add 8 0) ->
	8

The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.

Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.

The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.

Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.

benchmark \ host                          s7  linux-amd64      mac  linux-arm64  linux-ppc64le  linux-386  s7:GOARCH=386  linux-arm
                                     vs base      vs base  vs base      vs base        vs base    vs base        vs base    vs base
LoadAdd                                    ~            ~        ~            ~              ~     -1.59%              ~          ~
ExtShift                                   ~            ~  -42.14%       +0.10%              ~     +1.44%         +5.66%     +8.50%
Modify                                     ~            ~        ~            ~              ~          ~              ~     -1.53%
MullImm                                    ~            ~        ~            ~              ~    +37.90%        -21.87%     +3.05%
ConstModify                                ~            ~        ~            ~        -49.14%          ~              ~          ~
BitSet                                     ~            ~        ~            ~        -15.86%    -14.57%         +6.44%     +0.06%
BitClear                                   ~            ~        ~            ~              ~     +1.78%         +3.50%     +0.06%
BitToggle                                  ~            ~        ~            ~              ~    -16.09%         +2.91%          ~
BitSetConst                                ~            ~        ~            ~              ~          ~              ~     -0.49%
BitClearConst                              ~            ~        ~            ~        -28.29%          ~              ~     -0.40%
BitToggleConst                             ~            ~        ~       +8.89%        -31.19%          ~              ~     -0.77%
MulNeg                                     ~            ~        ~            ~              ~          ~              ~          ~
Mul2Neg                                    ~            ~   -4.83%            ~              ~    -13.75%         -5.92%          ~
DivconstI64                                ~            ~        ~            ~              ~    -30.12%              ~     +0.50%
ModconstI64                                ~            ~   -9.94%       -4.63%              ~     +3.15%              ~     +5.32%
DivisiblePow2constI64                -34.49%      -12.58%        ~            ~        -12.25%          ~              ~          ~
DivisibleconstI64                    -24.69%      -25.06%   -0.40%       -2.27%        -42.61%     -3.31%              ~     +1.63%
DivisibleWDivconstI64                      ~            ~        ~            ~              ~    -17.55%              ~     -0.60%
DivconstU64/3                              ~            ~        ~            ~              ~     +1.51%              ~          ~
DivconstU64/5                              ~            ~        ~            ~              ~          ~              ~          ~
DivconstU64/37                             ~            ~   -0.18%            ~              ~     +2.70%              ~          ~
DivconstU64/1234567                        ~            ~        ~            ~              ~          ~              ~     +0.12%
ModconstU64                                ~            ~        ~       -0.24%              ~     -5.10%         -1.07%     -1.56%
DivisibleconstU64                          ~            ~        ~            ~              ~    -29.01%        -59.13%    -50.72%
DivisibleWDivconstU64                      ~            ~  -12.18%      -18.88%              ~     -5.50%         -3.91%     +5.17%
DivconstI32                                ~            ~   -0.48%            ~        -34.69%    +89.01%         -6.01%    -16.67%
ModconstI32                                ~       +2.95%   -0.33%            ~              ~     -2.98%         -5.40%     -8.30%
DivisiblePow2constI32                      ~            ~        ~            ~              ~          ~              ~    -16.22%
DivisibleconstI32                          ~            ~        ~            ~              ~    -37.27%        -47.75%    -25.03%
DivisibleWDivconstI32                -11.59%       +5.22%  -12.99%      -23.83%              ~    +45.95%         -7.03%    -10.01%
DivconstU32                                ~            ~        ~            ~              ~    +74.71%         +4.81%          ~
ModconstU32                                ~            ~   +0.53%       +0.18%              ~    +51.16%              ~          ~
DivisibleconstU32                          ~            ~        ~       -0.62%              ~     -4.25%              ~          ~
DivisibleWDivconstU32                 -2.77%       +5.56%  +11.12%       -5.15%              ~    +48.70%        +25.11%     -4.07%
DivconstI16                           -6.06%            ~   -0.33%       +0.22%              ~          ~         -9.68%     +5.47%
ModconstI16                                ~            ~   +4.44%       +2.82%              ~          ~              ~     +5.06%
DivisiblePow2constI16                      ~            ~        ~            ~              ~          ~              ~     -0.17%
DivisibleconstI16                          ~            ~   -0.23%            ~              ~          ~         +4.60%     +6.64%
DivisibleWDivconstI16                 -1.44%       -0.43%  +13.48%       -5.76%              ~     +1.62%        -23.15%     -9.06%
DivconstU16                           +1.61%            ~   -0.35%       -0.47%              ~          ~        +15.59%          ~
ModconstU16                                ~            ~        ~            ~              ~     -0.72%              ~    +14.23%
DivisibleconstU16                          ~            ~   -0.05%       +3.00%              ~          ~              ~     +5.06%
DivisibleWDivconstU16                +52.10%       +0.75%  +17.28%       +4.79%              ~    -37.39%         +5.28%     -9.06%
DivconstI8                                 ~            ~   -0.34%       -0.96%              ~          ~         -9.20%          ~
ModconstI8                            +2.29%            ~   +4.38%       +2.96%              ~          ~              ~          ~
DivisiblePow2constI8                       ~            ~        ~            ~              ~          ~              ~          ~
DivisibleconstI8                           ~            ~        ~            ~              ~          ~         +6.04%          ~
DivisibleWDivconstI8                 -26.44%       +1.69%  +17.03%       +4.05%              ~    +32.48%        -24.90%          ~
DivconstU8                            -4.50%      +14.06%   -0.28%            ~              ~          ~         +4.16%     +0.88%
ModconstU8                                 ~            ~  +25.84%       -0.64%              ~          ~              ~          ~
DivisibleconstU8                           ~            ~   -5.70%            ~              ~          ~              ~          ~
DivisibleWDivconstU8                 +49.55%       +9.07%        ~       +4.03%        +53.87%    -40.03%        +39.72%     -3.01%
Mul2                                       ~            ~        ~            ~              ~          ~              ~          ~
MulNeg2                                    ~            ~        ~            ~        -11.73%          ~              ~     -0.02%
EfaceInteger                               ~            ~        ~            ~              ~    +18.11%              ~     +2.53%
TypeAssert                           +33.90%       +2.86%        ~            ~              ~     -1.07%         -5.29%     -1.04%
Div64UnsignedSmall                         ~            ~        ~            ~              ~          ~              ~          ~
Div64Small                                 ~            ~        ~            ~              ~     -0.88%              ~     +2.39%
Div64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +0.35%
Div64SmallNegDividend                      ~            ~        ~            ~              ~     -0.84%              ~     +3.57%
Div64SmallNegBoth                          ~            ~        ~            ~              ~     -0.86%              ~     +3.55%
Div64Unsigned                              ~            ~        ~            ~              ~          ~              ~     -0.11%
Div64                                      ~            ~        ~            ~              ~          ~              ~     +0.11%
Div64NegDivisor                            ~            ~        ~            ~              ~     -1.29%              ~          ~
Div64NegDividend                           ~            ~        ~            ~              ~     -1.44%              ~          ~
Div64NegBoth                               ~            ~        ~            ~              ~          ~              ~     +0.28%
Mod64UnsignedSmall                         ~            ~        ~            ~              ~     +0.48%              ~     +0.93%
Mod64Small                                 ~            ~        ~            ~              ~          ~              ~          ~
Mod64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +1.44%
Mod64SmallNegDividend                      ~            ~        ~            ~              ~     +0.22%              ~     +1.37%
Mod64SmallNegBoth                          ~            ~        ~            ~              ~          ~              ~     -2.22%
Mod64Unsigned                              ~            ~        ~            ~              ~     -0.95%              ~     +0.11%
Mod64                                      ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegDivisor                            ~            ~        ~            ~              ~          ~              ~     -0.02%
Mod64NegDividend                           ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegBoth                               ~            ~        ~            ~              ~          ~              ~     -0.02%
MulconstI32/3                              ~            ~        ~      -25.00%              ~          ~              ~    +47.37%
MulconstI32/5                              ~            ~        ~      +33.28%              ~          ~              ~    +32.21%
MulconstI32/12                             ~            ~        ~       -2.13%              ~          ~              ~     -0.02%
MulconstI32/120                            ~            ~        ~       +2.93%              ~          ~              ~     -0.03%
MulconstI32/-120                           ~            ~        ~       -2.17%              ~          ~              ~     -0.03%
MulconstI32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstI32/65538                          ~            ~        ~            ~              ~    -33.38%              ~     +0.04%
MulconstI64/3                              ~            ~        ~      +33.35%              ~     -0.37%              ~     -0.13%
MulconstI64/5                              ~            ~        ~      -25.00%              ~     -0.34%              ~          ~
MulconstI64/12                             ~            ~        ~       +2.13%              ~    +11.62%              ~     +2.30%
MulconstI64/120                            ~            ~        ~       -1.98%              ~          ~              ~          ~
MulconstI64/-120                           ~            ~        ~       +0.75%              ~          ~              ~          ~
MulconstI64/65537                          ~            ~        ~            ~              ~     +5.61%              ~          ~
MulconstI64/65538                          ~            ~        ~            ~              ~     +5.25%              ~          ~
MulconstU32/3                              ~       +0.81%        ~      +33.39%              ~    +77.92%              ~    -32.31%
MulconstU32/5                              ~            ~        ~      -24.97%              ~    +77.92%              ~    -24.47%
MulconstU32/12                             ~            ~        ~       +2.06%              ~          ~              ~     +0.03%
MulconstU32/120                            ~            ~        ~       -2.74%              ~          ~              ~     +0.03%
MulconstU32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstU32/65538                          ~            ~        ~            ~              ~    -33.42%              ~     -0.03%
MulconstU64/3                              ~            ~        ~      +33.33%              ~     -0.28%              ~     +1.22%
MulconstU64/5                              ~            ~        ~      -25.00%              ~          ~              ~     -0.64%
MulconstU64/12                             ~            ~        ~       +2.30%              ~    +11.59%              ~     +0.14%
MulconstU64/120                            ~            ~        ~       -2.82%              ~          ~              ~     +0.04%
MulconstU64/65537                          ~       +0.37%        ~            ~              ~     +5.58%              ~          ~
MulconstU64/65538                          ~            ~        ~            ~              ~     +5.16%              ~          ~
ShiftArithmeticRight                       ~            ~        ~            ~              ~    -10.81%              ~     +0.31%
Switch8Predictable                   +14.69%            ~        ~            ~              ~    -24.85%              ~          ~
Switch8Unpredictable                       ~       -0.58%   -3.80%            ~              ~    -11.78%              ~     -0.79%
Switch32Predictable                  -10.33%      +17.89%        ~            ~              ~     +5.76%              ~          ~
Switch32Unpredictable                 -3.15%       +1.19%   +9.42%            ~              ~    -10.30%         -5.09%     +0.44%
SwitchStringPredictable              +70.88%      +20.48%        ~            ~              ~     +2.39%              ~     +0.31%
SwitchStringUnpredictable                  ~       +3.91%   -5.06%       -0.98%              ~     +0.61%         +2.03%          ~
SwitchTypePredictable               +146.58%       -1.10%        ~      -12.45%              ~     -0.46%         -3.81%          ~
SwitchTypeUnpredictable               +0.46%       -0.83%        ~       +4.18%              ~     +0.43%              ~     +0.62%
SwitchInterfaceTypePredictable       -13.41%      -10.13%  +11.03%            ~              ~     -4.38%              ~     +0.75%
SwitchInterfaceTypeUnpredictable      -6.37%       -2.14%        ~       -3.21%              ~     -4.20%              ~     +1.08%

Fixes #63110.
Fixes #75954.

Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-29 18:49:40 -07:00
Russ Cox
915c1839fe test/codegen: simplify asmcheck pattern matching
Separate patterns in asmcheck by spaces instead of commas.
Many patterns end in comma (like "MOV [$]123,") so separating
patterns by comma is not great; they're already quoted, so spaces are fine.

Also replace all tabs in the assembly lines with spaces before matching.
Finally, replace \$ or \\$ with [$] as the matching idiom.
The effect of all these is to make the patterns look like:

  	   // amd64:"BSFQ" "ORQ [$]256"

instead of the old:

  	   // amd64:"BSFQ","ORQ\t\\$256"

Update all tests as well.

Change-Id: Ia39febe5d7f67ba115846422789e11b185d5c807
Reviewed-on: https://go-review.googlesource.com/c/go/+/716060
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-10-29 13:55:00 -07:00
Jorropo
73d7635fae cmd/compile: add generic rules to remove bool → int → bool roundtrips
Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/715200
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-27 23:24:54 -07:00
Meng Zhuo
d7a52f9369 cmd/compile: use MOV(D|F) with const for Const(64|32)F on riscv64
The original Const64F using: AUIPC + LD + FMVDX to load
float64 const, we can use AUIPC + FLD instead, same as Const32F.

Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/703215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-10-26 18:35:09 -07:00
David Chase
7056c71d32 cmd/compile: disable use of new saturating float-to-int conversions
The new conversions can be activated (or bisected) with
  -gcflags=all=-d=converthash=PATTERN

where PATTERN is either a hash string or n, qn, y, qy for
no, quietly no, yes, quietly yes.

This CL makes the default pattern be "qn" instead of the
default-default which is an efficient encoding of "qy".

Updates #75834

Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/711845
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
2025-10-14 15:09:35 -07:00
Keith Randall
9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler
Leave those constant foldings for runtime, similar to how we do it
for NaN generation.

These are the only instances I could find in cmd/compile/..., using

objdump -d ../pkg/tool/darwin_arm64/compile| egrep "(fcvtz|>:)" | grep -B1 fcvt

(There are instances in other places, like runtime and reflect, but I don't
think those places would affect compiler output.)

Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27
Reviewed-on: https://go-review.googlesource.com/c/go/+/711281
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-13 12:19:32 -07:00
Michael Matloob
19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly
This change creates calls to size-specialized malloc functions instead
of calls to newObject when we know the size of the allocation at
compilation time. Most of it is a matter of calling the newObject
function (which will create calls to the size-specialized functions)
rather then the newObjectNonSpecialized function (which won't). In the
newHeapaddr, small, non-pointer case, we'll create a non specialized
newObject and transform that into the appropriate size-specialized
function when we produce the mallocgc in flushPendingHeapAllocations.

We have to update some of the rewrites in generic.rules to also apply to
the size-specialized functions when they apply to newObject.

The messiest thing is we have to adjust the offset we use to save the
memory profiler stack, because the depth of the call to profilealloc is
two frames fewer in the size-specialized malloc functions compared to
when newObject calls mallocgc. A bunch of tests have been adjusted to
account for that.

Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/707856
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-09 14:59:40 -07:00
Michael Munday
97fd6bdecc cmd/compile: fuse NaN checks with other comparisons
NaN checks can often be merged into other comparisons by inverting them.
For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`.

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
            │         sec/op         │    sec/op     vs base                │
Acos                     4.315n ± 0%    4.314n ± 0%        ~ (p=0.642 n=10)
Acosh                    8.398n ± 0%    7.779n ± 0%   -7.37% (p=0.000 n=10)
Asin                     4.203n ± 0%    4.211n ± 0%   +0.20% (p=0.001 n=10)
Asinh                   10.150n ± 0%    9.562n ± 0%   -5.79% (p=0.000 n=10)
Atan                     2.363n ± 0%    2.363n ± 0%        ~ (p=0.801 n=10)
Atanh                    8.192n ± 2%    7.685n ± 0%   -6.20% (p=0.000 n=10)
Atan2                    4.013n ± 0%    4.010n ± 0%        ~ (p=0.073 n=10)
Cbrt                     4.858n ± 0%    4.755n ± 0%   -2.12% (p=0.000 n=10)
Cos                      4.596n ± 0%    4.357n ± 0%   -5.20% (p=0.000 n=10)
Cosh                     5.071n ± 0%    5.071n ± 0%        ~ (p=0.585 n=10)
Erf                      2.802n ± 1%    2.788n ± 0%   -0.54% (p=0.002 n=10)
Erfc                     3.087n ± 1%    3.071n ± 0%        ~ (p=0.320 n=10)
Erfinv                   3.981n ± 0%    3.965n ± 0%   -0.41% (p=0.000 n=10)
Erfcinv                  3.985n ± 0%    3.977n ± 0%   -0.20% (p=0.000 n=10)
ExpGo                    8.721n ± 2%    8.252n ± 0%   -5.38% (p=0.000 n=10)
Expm1                    4.378n ± 0%    4.228n ± 0%   -3.43% (p=0.000 n=10)
Exp2                     8.313n ± 0%    7.855n ± 0%   -5.52% (p=0.000 n=10)
Exp2Go                   8.498n ± 2%    7.921n ± 0%   -6.79% (p=0.000 n=10)
Mod                      15.16n ± 4%    12.20n ± 1%  -19.58% (p=0.000 n=10)
Frexp                    1.780n ± 2%    1.496n ± 0%  -15.96% (p=0.000 n=10)
Gamma                    4.378n ± 1%    4.013n ± 0%   -8.35% (p=0.000 n=10)
HypotGo                  2.655n ± 5%    2.427n ± 1%   -8.57% (p=0.000 n=10)
Ilogb                    1.912n ± 5%    1.749n ± 0%   -8.53% (p=0.000 n=10)
J0                       22.43n ± 9%    20.46n ± 0%   -8.76% (p=0.000 n=10)
J1                       21.03n ± 4%    19.96n ± 0%   -5.09% (p=0.000 n=10)
Jn                       45.40n ± 1%    42.59n ± 0%   -6.20% (p=0.000 n=10)
Ldexp                    2.312n ± 1%    1.944n ± 0%  -15.94% (p=0.000 n=10)
Lgamma                   4.617n ± 1%    4.584n ± 0%   -0.73% (p=0.000 n=10)
Log                      4.226n ± 0%    4.213n ± 0%   -0.31% (p=0.001 n=10)
Logb                     1.771n ± 0%    1.775n ± 0%        ~ (p=0.097 n=10)
Log1p                    5.102n ± 2%    5.001n ± 0%   -1.97% (p=0.000 n=10)
Log10                    4.407n ± 0%    4.408n ± 0%        ~ (p=1.000 n=10)
Log2                     2.416n ± 1%    2.138n ± 0%  -11.51% (p=0.000 n=10)
Modf                     1.669n ± 2%    1.611n ± 0%   -3.50% (p=0.000 n=10)
Nextafter32              2.186n ± 0%    2.185n ± 0%        ~ (p=0.051 n=10)
Nextafter64              2.182n ± 0%    2.184n ± 0%   +0.09% (p=0.016 n=10)
PowInt                   11.39n ± 6%    10.68n ± 2%   -6.24% (p=0.000 n=10)
PowFrac                  26.60n ± 2%    26.12n ± 0%   -1.80% (p=0.000 n=10)
Pow10Pos                0.5067n ± 4%   0.5003n ± 1%   -1.27% (p=0.001 n=10)
Pow10Neg                0.8552n ± 0%   0.8552n ± 0%        ~ (p=0.928 n=10)
Round                    1.181n ± 0%    1.182n ± 0%   +0.08% (p=0.001 n=10)
RoundToEven              1.709n ± 0%    1.710n ± 0%        ~ (p=0.053 n=10)
Remainder                12.54n ± 5%    11.99n ± 2%   -4.46% (p=0.000 n=10)
Sin                      3.933n ± 5%    3.926n ± 0%   -0.17% (p=0.000 n=10)
Sincos                   5.672n ± 0%    5.522n ± 0%   -2.65% (p=0.000 n=10)
Sinh                     5.447n ± 1%    5.444n ± 0%   -0.06% (p=0.029 n=10)
Tan                      4.061n ± 0%    4.058n ± 0%   -0.07% (p=0.005 n=10)
Tanh                     5.599n ± 0%    5.595n ± 0%   -0.06% (p=0.042 n=10)
Y0                       20.75n ± 5%    19.73n ± 1%   -4.92% (p=0.000 n=10)
Y1                       20.87n ± 2%    19.78n ± 1%   -5.20% (p=0.000 n=10)
Yn                       44.50n ± 2%    42.04n ± 2%   -5.53% (p=0.000 n=10)
geomean                  4.989n         4.791n        -3.96%

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │     sec/op     │    sec/op     vs base                  │
Acos                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.269 n=10)
Acosh                   244.7n ±  0%   235.0n ±  0%   -3.98% (p=0.000 n=10)
Asin                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.154 n=10)
Asinh                   270.8n ±  0%   261.1n ±  0%   -3.60% (p=0.000 n=10)
Atan                    119.1n ±  0%   119.1n ±  0%        ~ (p=0.347 n=10)
Atanh                   260.2n ±  0%   261.8n ±  4%        ~ (p=0.459 n=10)
Atan2                   186.8n ±  0%   186.8n ±  0%        ~ (p=0.487 n=10)
Cbrt                    203.5n ±  0%   198.2n ±  0%   -2.60% (p=0.000 n=10)
Ceil                    31.82n ±  0%   31.81n ±  0%        ~ (p=0.714 n=10)
Copysign                4.894n ±  0%   4.893n ±  0%        ~ (p=0.161 n=10)
Cos                     107.6n ±  0%   103.6n ±  0%   -3.76% (p=0.000 n=10)
Cosh                    259.0n ±  0%   252.8n ±  0%   -2.39% (p=0.000 n=10)
Erf                     133.7n ±  0%   133.7n ±  0%        ~ (p=0.720 n=10)
Erfc                    137.9n ±  0%   137.8n ±  0%   -0.04% (p=0.033 n=10)
Erfinv                  173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Erfcinv                 173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Exp                     215.3n ±  0%   208.1n ±  0%   -3.34% (p=0.000 n=10)
ExpGo                   226.7n ±  0%   220.6n ±  0%   -2.69% (p=0.000 n=10)
Expm1                   164.8n ±  0%   159.0n ±  0%   -3.52% (p=0.000 n=10)
Exp2                    185.0n ±  0%   182.7n ±  0%   -1.22% (p=0.000 n=10)
Exp2Go                  198.9n ±  0%   196.5n ±  0%   -1.21% (p=0.000 n=10)
Abs                     4.894n ±  0%   4.893n ±  0%        ~ (p=0.262 n=10)
Dim                     16.31n ±  0%   16.31n ±  0%        ~ (p=1.000 n=10)
Floor                   31.81n ±  0%   31.81n ±  0%        ~ (p=0.067 n=10)
Max                     26.11n ±  0%   26.10n ±  0%        ~ (p=0.080 n=10)
Min                     26.10n ±  0%   26.10n ±  0%        ~ (p=0.095 n=10)
Mod                     337.7n ±  0%   291.9n ±  0%  -13.56% (p=0.000 n=10)
Frexp                   50.57n ±  0%   42.41n ±  0%  -16.13% (p=0.000 n=10)
Gamma                   206.3n ±  0%   198.1n ±  0%   -4.00% (p=0.000 n=10)
Hypot                   94.62n ±  0%   94.61n ±  0%        ~ (p=0.437 n=10)
HypotGo                 109.3n ±  0%   109.3n ±  0%        ~ (p=1.000 n=10)
Ilogb                   44.05n ±  0%   44.04n ±  0%   -0.02% (p=0.025 n=10)
J0                      663.1n ±  0%   663.9n ±  0%   +0.13% (p=0.002 n=10)
J1                      663.9n ±  0%   666.4n ±  0%   +0.38% (p=0.000 n=10)
Jn                      1.404µ ±  0%   1.407µ ±  0%   +0.21% (p=0.000 n=10)
Ldexp                   57.10n ±  0%   48.93n ±  0%  -14.30% (p=0.000 n=10)
Lgamma                  185.1n ±  0%   187.6n ±  0%   +1.32% (p=0.000 n=10)
Log                     182.7n ±  0%   170.1n ±  0%   -6.87% (p=0.000 n=10)
Logb                    46.49n ±  0%   46.49n ±  0%        ~ (p=0.675 n=10)
Log1p                   184.3n ±  0%   179.4n ±  0%   -2.63% (p=0.000 n=10)
Log10                   184.3n ±  0%   171.2n ±  0%   -7.08% (p=0.000 n=10)
Log2                    66.05n ±  0%   57.90n ±  0%  -12.34% (p=0.000 n=10)
Modf                    34.25n ±  0%   34.24n ±  0%        ~ (p=0.163 n=10)
Nextafter32             49.33n ±  1%   48.93n ±  0%   -0.81% (p=0.002 n=10)
Nextafter64             43.64n ±  0%   43.23n ±  0%   -0.93% (p=0.000 n=10)
PowInt                  267.6n ±  0%   251.2n ±  0%   -6.11% (p=0.000 n=10)
PowFrac                 672.9n ±  0%   637.9n ±  0%   -5.19% (p=0.000 n=10)
Pow10Pos                13.87n ±  0%   13.87n ±  0%        ~ (p=1.000 n=10)
Pow10Neg                19.58n ± 62%   19.59n ± 62%        ~ (p=0.355 n=10)
Round                   23.65n ±  0%   23.65n ±  0%        ~ (p=1.000 n=10)
RoundToEven             27.73n ±  0%   27.73n ±  0%        ~ (p=0.635 n=10)
Remainder               309.9n ±  0%   280.5n ±  0%   -9.49% (p=0.000 n=10)
Signbit                 13.05n ±  0%   13.05n ±  0%        ~ (p=1.000 n=10) ¹
Sin                     120.7n ±  0%   120.7n ±  0%        ~ (p=1.000 n=10) ¹
Sincos                  148.4n ±  0%   143.5n ±  0%   -3.30% (p=0.000 n=10)
Sinh                    275.6n ±  0%   267.5n ±  0%   -2.94% (p=0.000 n=10)
SqrtIndirect            3.262n ±  0%   3.262n ±  0%        ~ (p=0.263 n=10)
SqrtLatency             19.57n ±  0%   19.57n ±  0%        ~ (p=0.582 n=10)
SqrtIndirectLatency     19.57n ±  0%   19.57n ±  0%        ~ (p=1.000 n=10)
SqrtGoLatency           203.2n ±  0%   197.6n ±  0%   -2.78% (p=0.000 n=10)
SqrtPrime               4.952µ ±  0%   4.952µ ±  0%   -0.01% (p=0.025 n=10)
Tan                     153.3n ±  0%   153.3n ±  0%        ~ (p=1.000 n=10)
Tanh                    280.5n ±  0%   272.4n ±  0%   -2.91% (p=0.000 n=10)
Trunc                   31.81n ±  0%   31.81n ±  0%        ~ (p=1.000 n=10)
Y0                      680.1n ±  0%   664.8n ±  0%   -2.25% (p=0.000 n=10)
Y1                      684.2n ±  0%   669.6n ±  0%   -2.14% (p=0.000 n=10)
Yn                      1.444µ ±  0%   1.410µ ±  0%   -2.35% (p=0.000 n=10)
Float64bits             5.709n ±  0%   5.708n ±  0%        ~ (p=0.573 n=10)
Float64frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.734 n=10)
Float32bits             12.23n ±  0%   12.23n ±  0%        ~ (p=0.628 n=10)
Float32frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.971 n=10)
FMA                     4.893n ±  0%   4.893n ±  0%        ~ (p=0.736 n=10)
geomean                 88.96n         87.05n         -2.15%
¹ all samples are equal

Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
2025-10-08 08:11:17 -07:00
Cherry Mui
1d62e92567 test/codegen: make sure assignment results are used.
Some tests make assignments to an argument without reading it.
With CL 708865, they are treated as dead stores and are removed.
Make sure the results are used.

Fixes #75745.
Fixes #75746.

Change-Id: I05580beb1006505ec1550e5fa245b54dcefd10b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/708916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-06 14:51:23 -07:00
Cherry Mui
38b26f29f1 cmd/compile: remove stores to unread parameters
Currently, we remove stores to local variables that are not read.
We don't do that for arguments. But arguments and locals are
essentially the same. Arguments are passed by value, and are not
expected to be read in the caller's frame. So we can remove the
writes to them as well. One exception is the cgo_unsafe_arg
directive, which makes all the arguments effectively address-taken.
cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions'
arguments.

Cherry-picked from the dev.simd branch. This CL is not
necessarily SIMD specific. Apply early to reduce risk.

Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8
Reviewed-on: https://go-review.googlesource.com/c/go/+/705815
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/708865
2025-10-03 12:31:20 -07:00
Cherry Mui
fb1749a3fe [dev.simd] all: merge master (adce7f1) into dev.simd
Conflicts:

- src/internal/goexperiment/flags.go
- src/runtime/export_test.go

Merge List:

+ 2025-10-03 adce7f196e cmd/link: support .def file with MSVC clang toolchain
+ 2025-10-03 d5b950399d cmd/cgo: fix unaligned arguments typedmemmove crash on iOS
+ 2025-10-02 53845004d6 net/http/httputil: deprecate ReverseProxy.Director
+ 2025-10-02 bbdff9e8e1 net/http: update bundled x/net/http2 and delete obsolete http2inTests
+ 2025-10-02 4008e07080 io/fs: move path name documentation up to the package doc comment
+ 2025-10-02 0e4e2e6832 runtime: skip TestGoroutineLeakProfile under mayMoreStackPreempt
+ 2025-10-02 f03c392295 runtime: fix aix/ppc64 library initialization
+ 2025-10-02 707454b41f cmd/go: update `go help mod edit` with the tool and ignore sections
+ 2025-10-02 8c68a1c1ab runtime,net/http/pprof: goroutine leak detection by using the garbage collector
+ 2025-10-02 84db201ae1 cmd/compile: propagate len([]T{}) to make builtin to allow stack allocation
+ 2025-10-02 5799c139a7 crypto/tls: rm marshalEncryptedClientHelloConfigList dead code
+ 2025-10-01 633dd1d475 encoding/json: fix Decoder.InputOffset regression in goexperiment.jsonv2
+ 2025-10-01 8ad27fb656 doc/go_spec.html: update date
+ 2025-10-01 3f451f2c54 testing/synctest: fix inverted test failure message in TestContextAfterFunc
+ 2025-10-01 be0fed8a5f cmd/go/testdata/script/test_fuzz_fuzztime.txt: disable
+ 2025-09-30 eb1c7f6e69 runtime: move loong64 library entry point to os-agnostic file
+ 2025-09-30 c9257151e5 runtime: unify ppc64/ppc64le library entry point
+ 2025-09-30 4ff8a457db test/codegen: codify handling of floating point constants on arm64
+ 2025-09-30 fcb893fc4b cmd/compile/internal/ssa: remove redundant "type:" prefix check
+ 2025-09-30 19cc1022ba mime: reduce allocs incurred by ParseMediaType
+ 2025-09-30 08afc50bea mime: extend "builtinTypes" to include a more complete list of common types
+ 2025-09-30 97da068774 cmd/compile: eliminate nil checks on .dict arg
+ 2025-09-30 300d9d2714 runtime: initialise debug settings much earlier in startup process
+ 2025-09-30 a846bb0aa5 errors: add AsType
+ 2025-09-30 7c8166d02d cmd/link/internal/arm64: support Mach-O ARM64_RELOC_SUBTRACTOR in internal linking
+ 2025-09-30 6e95748335 cmd/link/internal/arm64: support Mach-O ARM64_RELOC_POINTER_TO_GOT in internal linking
+ 2025-09-30 742f92063e cmd/compile, runtime: always enable Wasm signext and satconv features
+ 2025-09-30 db10db6be3 internal/poll: remove operation fields from FD
+ 2025-09-29 75c87df58e internal/poll: pass the I/O mode instead of an overlapped object in execIO
+ 2025-09-29 fc88e18b4a crypto/internal/fips140/entropy: add CPU jitter-based entropy source
+ 2025-09-29 db4fade759 crypto/internal/fips140/mlkem: make CAST conditional
+ 2025-09-29 db3cb3fd9a runtime: correct reference to getStackMap in comment
+ 2025-09-29 690fc2fb05 internal/poll: remove buf field from operation
+ 2025-09-29 eaf2345256 cmd/link: use a .def file to mark exported symbols on Windows
+ 2025-09-29 4b77733565 internal/syscall/windows: regenerate GetFileSizeEx
+ 2025-09-29 4e9006a716 crypto/tls: quote protocols in ALPN error message
+ 2025-09-29 047c2ab841 cmd/link: don't pass -Wl,-S on Solaris
+ 2025-09-29 ae8eba071b cmd/link: use correct length for pcln.cutab
+ 2025-09-29 fe3ba74b9e cmd/link: skip TestFlagW on platforms without DWARF symbol table
+ 2025-09-29 d42d56b764 encoding/xml: make use of reflect.TypeAssert
+ 2025-09-29 6d51f93257 runtime: jump instead of branch in netbsd/arm64 entry point
+ 2025-09-28 5500cbf0e4 debug/elf: prevent offset overflow
+ 2025-09-27 34e67623a8 all:  fix typos
+ 2025-09-27 af6999e60d cmd/compile: implement jump table on loong64
+ 2025-09-26 63cd912083 os/user: simplify go:build
+ 2025-09-26 53009b26dd runtime: use a smaller arena size on Wasm
+ 2025-09-26 3a5df9d2b2 net/http: add HTTP2Config.StrictMaxConcurrentRequests
+ 2025-09-26 16be34df02 net/http: add more tests of transport connection pool
+ 2025-09-26 3e4540b49d os/user: use getgrouplist on illumos && cgo
+ 2025-09-26 15fbe3480b internal/poll: simplify WriteMsg and ReadMsg on Windows
+ 2025-09-26 16ae11a9e1 runtime: move TestReadMetricsSched to testprog
+ 2025-09-26 459f3a3adc cmd/link: don't pass -Wl,-S on AIX
+ 2025-09-26 4631a2d3c6 cmd/link: skip TestFlagW on AIX
+ 2025-09-26 0f31d742cd cmd/compile: fix ICE with new(<untyped expr>)
+ 2025-09-26 7d7cd6e07b internal/poll: don't call SetFilePointerEx in Seek for overlapped handles
+ 2025-09-26 41cba31e66 mime/multipart: percent-encode CR and LF in header values to avoid CRLF injection
+ 2025-09-26 dd1d597c3a Revert "cmd/internal/obj/loong64: use the MOVVP instruction to optimize prologue"
+ 2025-09-26 45d6bc76af runtime: unify arm64 entry point code
+ 2025-09-25 fdea7da3e6 runtime: use common library entry point on windows amd64/386
+ 2025-09-25 e8a4f508d1 lib/fips140: re-seal v1.0.0
+ 2025-09-25 9b7a328089 crypto/internal/fips140: remove key import PCTs, make keygen PCTs fatal
+ 2025-09-25 7f9ab7203f crypto/internal/fips140: update frozen module version to "v1.0.0"
+ 2025-09-25 fb5719cbda crypto/internal/fips140/ecdsa: make TestingOnlyNewDRBG generic
+ 2025-09-25 56067e31f2 std: remove unused declarations

Change-Id: Iecb28fd62c69fbed59da557f46d31bae55889e2c
2025-10-03 10:11:21 -04:00
Joel Sing
4ff8a457db test/codegen: codify handling of floating point constants on arm64
While here, reorder Float32ConstantStore/Float64ConstantStore for
consistency.

Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/703756
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-30 14:49:25 -07:00
Jake Bailey
97da068774 cmd/compile: eliminate nil checks on .dict arg
The first arg of a generic function is the dictionary. This dictionary
is never nil, but it gets a nil check becuase the dict arg is treated as
a slice during construction.

cmp.Compare[go.shape.int] was:

00006 (+41) TESTB AX, (AX)
00007 (+52) CMPQ CX, BX
00008 (52) JGT 14
00009 (+55) JGE 12
00010 (+56) MOVL $1, AX
00011 (56) RET
00012 (+58) XORL AX, AX
00013 (58) RET
00014 (+53) MOVQ $-1, AX
00015 (53) RET

Note how the function begins with a TESTB that loads the dict to perform
the nil check.

This CL eliminates that nil check.

For most generic functions, this doesn't matter too much, but not
infrequently are generic functions written which never actually use the
dictionary (like cmp.Compare), so I suspect this might help in hot code
to avoid repeatedly touching the dictionary in memory, and in cases
where the generic function is not inlined (and thus the dict dropped).

compilecmp shows these changes (deduped):

cmp.Compare[go.shape.float64] 73 -> 72  (-1.37%)
cmp.Compare[go.shape.int] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.int32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.int64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.string] 142 -> 141  (-0.70%)
cmp.Compare[go.shape.uint16] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uint64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint8] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uintptr] 26 -> 24  (-7.69%)
cmp.Less[go.shape.float64] 35 -> 34  (-2.86%)
cmp.Less[go.shape.int32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.int64] 9 -> 7  (-22.22%)
cmp.Less[go.shape.int] 9 -> 7  (-22.22%)
cmp.Less[go.shape.string] 112 -> 110  (-1.79%)
cmp.Less[go.shape.uint16] 9 -> 7  (-22.22%)
cmp.Less[go.shape.uint32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.uint64] 9 -> 7  (-22.22%)
internal/synctest.Associate[go.shape.struct 114 -> 113  (-0.88%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791  (-1.74%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852  (-0.70%)
main.(*gState[go.shape.int64]).stop 2111 -> 2085  (-1.23%)
main.(*gState[go.shape.int64]).unblock 941 -> 923  (-1.91%)
runtime.fmax[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmax[go.shape.float64] 89 -> 87  (-2.25%)
runtime.fmin[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmin[go.shape.float64] 89 -> 87  (-2.25%)
slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337  (-2.60%)
slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453  (-1.95%)
slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181  (-2.69%)
slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42  (-4.55%)
slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170  (-1.60%)
slices.medianCmpFunc[go.shape.struct 1118 -> 1113  (-0.45%)
slices.medianCmpFunc[go.shape.struct 1214 -> 1209  (-0.41%)
slices.medianCmpFunc[go.shape.struct 889 -> 887  (-0.22%)
slices.medianCmpFunc[go.shape.struct 901 -> 874  (-3.00%)
slices.order2Ordered[go.shape.float64] 89 -> 87  (-2.25%)
slices.order2Ordered[go.shape.uint16] 75 -> 70  (-6.67%)
slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110  (-0.45%)
slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352  (-1.68%)
slices.partitionEqualOrdered[go.shape.int] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.int32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.int64] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203  (-2.40%)
slices.partitionOrdered[go.shape.float64] 538 -> 533  (-0.93%)
slices.partitionOrdered[go.shape.int] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.int64] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.uint16] 447 -> 442  (-1.12%)
slices.partitionOrdered[go.shape.uint64] 437 -> 427  (-2.29%)
slices.rotateCmpFunc[go.shape.struct 1045 -> 1029  (-1.53%)
slices.rotateCmpFunc[go.shape.struct 1205 -> 1163  (-3.49%)
slices.rotateCmpFunc[go.shape.struct 1226 -> 1176  (-4.08%)
slices.rotateCmpFunc[go.shape.struct 1322 -> 1272  (-3.78%)
slices.rotateCmpFunc[go.shape.struct 1419 -> 1400  (-1.34%)
slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538  (-2.00%)
slices.rotateLeft[go.shape.string] 603 -> 588  (-2.49%)
slices.rotateLeft[go.shape.uint8] 255 -> 250  (-1.96%)
slices.siftDownOrdered[go.shape.int] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int64] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.string] 614 -> 592  (-3.58%)
slices.siftDownOrdered[go.shape.uint32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.uint64] 181 -> 171  (-5.52%)
time.parseRFC3339[go.shape.string] 1774 -> 1758  (-0.90%)
unique.(*canonMap[go.shape.struct 280 -> 276  (-1.43%)
unique.clone[go.shape.struct 311 -> 293  (-5.79%)
weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134  (-1.47%)
weak.Make[go.shape.struct 136 -> 134  (-1.47%)
weak.Make[go.shape.uint8] 136 -> 134  (-1.47%)

Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7
Reviewed-on: https://go-review.googlesource.com/c/go/+/706655
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-30 11:22:35 -07:00
limeidan
af6999e60d cmd/compile: implement jump table on loong64
Following CL 357330, use jump tables on Loong64.

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Switch8Predictable                 2.352n ± 0%   2.101n ± 0%  -10.65% (p=0.000 n=10)
Switch8Unpredictable               11.99n ± 0%   10.25n ± 0%  -14.51% (p=0.000 n=10)
Switch32Predictable                3.153n ± 0%   1.887n ± 1%  -40.14% (p=0.000 n=10)
Switch32Unpredictable              12.47n ± 0%   10.22n ± 0%  -18.00% (p=0.000 n=10)
SwitchStringPredictable            3.162n ± 0%   3.352n ± 0%   +6.01% (p=0.000 n=10)
SwitchStringUnpredictable          14.70n ± 0%   13.31n ± 0%   -9.46% (p=0.000 n=10)
SwitchTypePredictable              3.702n ± 0%   2.201n ± 0%  -40.55% (p=0.000 n=10)
SwitchTypeUnpredictable            16.18n ± 0%   14.48n ± 0%  -10.51% (p=0.000 n=10)
SwitchInterfaceTypePredictable     7.654n ± 0%   9.680n ± 0%  +26.47% (p=0.000 n=10)
SwitchInterfaceTypeUnpredictable   22.04n ± 0%   22.44n ± 0%   +1.81% (p=0.000 n=10)
geomean                            7.441n        6.469n       -13.07%

Change-Id: Id6f30fa73349c60fac17670084daee56973a955f
Reviewed-on: https://go-review.googlesource.com/c/go/+/705396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-09-27 05:02:58 -07:00
Junyang Shao
578777bf7c [dev.simd] cmd/compile: make condtion of CanSSA smarter for SIMD fields
This CL tires to improve a situation pointed out by
https://github.com/golang/go/issues/73787#issuecomment-3305494947.

Change-Id: Ic23c80fe71344fc25383ab238ad6631e0f0cd22e
Reviewed-on: https://go-review.googlesource.com/c/go/+/705416
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-26 10:53:39 -07:00
Cherry Mui
2b50ffe172 [dev.simd] cmd/compile: remove stores to unread parameters
Currently, we remove stores to local variables that are not read.
We don't do that for arguments. But arguments and locals are
essentially the same. Arguments are passed by value, and are not
expected to be read in the caller's frame. So we can remove the
writes to them as well. One exception is the cgo_unsafe_arg
directive, which makes all the arguments effectively address-taken.
cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions'
arguments.

Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8
Reviewed-on: https://go-review.googlesource.com/c/go/+/705815
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-09-23 08:05:41 -07:00
Cherry Mui
2d8cb80d7c [dev.simd] all: merge master (9b2d39b) into dev.simd
Conflicts:

- src/internal/buildcfg/exp.go

Merge List:

+ 2025-09-22 9b2d39b75b cmd/compile/internal/ssa: match style and formatting
+ 2025-09-22 e23edf5e55 runtime: don't re-read metrics before check in TestReadMetricsSched
+ 2025-09-22 177cd8d763 log/slog: use a pooled json encoder
+ 2025-09-22 2353c15785 cmd/cgo/internal/test: skip TestMultipleAssign when using UCRT on Windows
+ 2025-09-22 32dfd69282 cmd/dist: disable FIPS 140-3 mode when testing maphash with purego
+ 2025-09-19 7f6ff5ec3e cmd/compile: fix doc word
+ 2025-09-19 9693b94be0 runtime: include stderr when objdump fails
+ 2025-09-19 8616981ce6 log/slog: optimize slog Level.String() to avoid fmt.Sprintf
+ 2025-09-19 b8af744360 testing: fix example for unexported identifier
+ 2025-09-19 51dc5bfe6c Revert "cmd/go: disable cgo by default if CC unset and  DefaultCC doesn't exist"
+ 2025-09-19 ee7bf06cb3 time: improve ParseDuration performance for invalid input
+ 2025-09-19 f9e61a9a32 cmd/compile: duplicate nil check to two branches of write barrier
+ 2025-09-18 3cf1aaf8b9 runtime: use futexes with 64-bit time on Linux
+ 2025-09-18 0ab038af62 cmd/compile/internal/abi: use clear built-in
+ 2025-09-18 00bf24fdca bytes: use clear in test
+ 2025-09-18 f9701d21d2 crypto: use clear built-in
+ 2025-09-18 a58afe44fa net: fix testHookCanceledDial race
+ 2025-09-18 3203a5da29 net/http: avoid connCount underflow race
+ 2025-09-18 8ca209ec39 context: don't return a non-nil from Err before Done is closed
+ 2025-09-18 3032894e04 runtime: make explicit nil check in heapSetTypeSmallHeader
+ 2025-09-17 ef05b66d61 cmd/internal/obj/riscv: add support for Zicond instructions
+ 2025-09-17 78ef487a6f cmd/compile: fix the issue of shift amount exceeding the valid range
+ 2025-09-17 77aac7bb75 runtime: don't enable heap randomization if MSAN or ASAN is enabled
+ 2025-09-17 465b85eb76 runtime: fix CheckScavengedBitsCleared with randomized heap base
+ 2025-09-17 909704b85e encoding/json/v2: fix typo in comment
+ 2025-09-17 3db5979e8c testing: use reflect.TypeAssert and reflect.TypeFor
+ 2025-09-17 6a8dbbecbf path/filepath: fix EvalSymlinks to return ENOTDIR on plan9
+ 2025-09-17 bffe7ad9f1 go/parser: Add TestBothLineAndLeadComment
+ 2025-09-17 02a888e820 go/ast: document that (*ast.File).Comments is sorted by position
+ 2025-09-16 594deca981 cmd/link: simplify PE relocations mapping
+ 2025-09-16 9df1a289ac go/parser: simplify expectSemi
+ 2025-09-16 72ba117bda internal/buildcfg: enable randomizedHeapBase64 by default
+ 2025-09-16 796ea3bc2e os/user: align test file name and build tags
+ 2025-09-16 a69395eab2 runtime/_mkmalloc: add a copy of cloneNode
+ 2025-09-16 cbdad4fc3c cmd/go: check pattern for utf8 validity before call regexp.MustCompile
+ 2025-09-16 c2d85eb999 cmd/go: disable cgo by default if CC unset and  DefaultCC doesn't exist
+ 2025-09-16 ac82fe68aa bytes,strings: remove reference to non-existent SplitFunc
+ 2025-09-16 0b26678db2 cmd/compile: fix mips zerorange implementation
+ 2025-09-16 e2cfc1eb3a cmd/internal/obj/riscv: improve handling of float point moves
+ 2025-09-16 281c632e6e crypto/x509/internal/macos: standardize package name
+ 2025-09-16 61dc7fe30d iter: document that calling yield after terminated range loop causes runtime panic

Change-Id: Ic06019efc855913632003f41eb10c746b3410b0a
2025-09-23 10:32:03 -04:00
Junyang Shao
e34ad6de42 [dev.simd] cmd/compile: optimize VPTEST for 2-operand cases
Change-Id: Ica2d5ee48082c69e86b12b519ba8df7a2556392f
Reviewed-on: https://go-review.googlesource.com/c/go/+/704355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-09-18 11:07:23 -07:00
Xiaolin Zhao
78ef487a6f cmd/compile: fix the issue of shift amount exceeding the valid range
Fixes #75479

Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00
Reviewed-on: https://go-review.googlesource.com/c/go/+/704135
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-09-17 18:05:31 -07:00
Meng Zhuo
2469e92d8c cmd/compile: combine doubling with shift on riscv64
Change-Id: I4bee2770fedf97e35b5a5b9187a8ba3c41f9ec2e
Reviewed-on: https://go-review.googlesource.com/c/go/+/702697
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
2025-09-15 17:31:56 -07:00
Meng Zhuo
e5ee1f2600 test/codegen: check zerobase for newobject on 0-sized types
This CL also adds riscv64 checks

Change-Id: I693e4e606f470615f6b49085592d6d5ca61473d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/703716
Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-15 07:47:55 -07:00
Jake Bailey
dc960d0bfe cmd/compile, reflect: further allow inlining of TypeFor
Previous CLs optimized direct use of abi.Type, but reflect.Type is
indirected, so was not benefiting.

For TypeFor, we can use toRType directly without a nil check because the
types are statically known.

Normally, I'd think SSA would remove the nil check, but due to some
oddity (specifically, late fuse being required to remove the nil check,
but opt doesn't run that late) means that the nil check persists and
gets in the way.

Manually writing the code in this instance seems to fix the problem.

It also exposed another problem; depending on the ordering, writeType
could get to a type symbol before SSA, thereby preventing Extra from
being created on the symbol for later lookups that don't go through
TypeLinksym directly. In writeType, for non-shape types, call
TypeLinksym to ensure that the type is set up for later callers. That
change itself passed toolstash -cmp.

All up, this stack put through compilecmp shows a lot of improvement in
various reflect-using packages, and reflect itself. It is too big to fit
in the commit message but here's some info:

compilecmp master -> HEAD
master (d767064170): cmd/compile: mark abi.PtrType.Elem sym as used
HEAD (846a94c568): cmd/compile, reflect: further allow inlining of TypeFor

file      before    after     Δ       %
addr2line 3735911   3735391   -520    -0.014%
asm       6382235   6382091   -144    -0.002%
buildid   3608568   3608360   -208    -0.006%
cgo       5951816   5951480   -336    -0.006%
compile   28362080  28339772  -22308  -0.079%
cover     6668686   6661414   -7272   -0.109%
dist      4311961   4311425   -536    -0.012%
fix       3771706   3771474   -232    -0.006%
link      8686073   8684993   -1080   -0.012%
nm        3715923   3715459   -464    -0.012%
objdump   6074366   6073774   -592    -0.010%
pack      3025653   3025277   -376    -0.012%
pprof     18269485  18261653  -7832   -0.043%
test2json 3442726   3438390   -4336   -0.126%
trace     16984831  16981767  -3064   -0.018%
vet       10701931  10696355  -5576   -0.052%
total     133693951 133639075 -54876  -0.041%

runtime
runtime.stkobjinit 240 -> 165  (-31.25%)

runtime [cmd/compile]
runtime.stkobjinit 240 -> 165  (-31.25%)

reflect
reflect.Value.Seq2.func3 309 -> 245  (-20.71%)
reflect.Value.Seq2.func1.1 281 -> 198  (-29.54%)
reflect.Value.Seq.func1.1 242 -> 165  (-31.82%)
reflect.Value.Seq2.func2 360 -> 285  (-20.83%)
reflect.Value.Seq.func4 281 -> 239  (-14.95%)
reflect.Value.Seq2.func4 399 -> 284  (-28.82%)
reflect.Value.Seq.func2 271 -> 230  (-15.13%)
reflect.TypeFor[go.shape.uint64] 33 -> 18  (-45.45%)
reflect.Value.Seq.func3 219 -> 178  (-18.72%)

reflect [cmd/compile]
reflect.Value.Seq2.func2 360 -> 285  (-20.83%)
reflect.Value.Seq.func4 281 -> 239  (-14.95%)
reflect.Value.Seq.func2 271 -> 230  (-15.13%)
reflect.Value.Seq.func1.1 242 -> 165  (-31.82%)
reflect.Value.Seq2.func1.1 281 -> 198  (-29.54%)
reflect.Value.Seq2.func3 309 -> 245  (-20.71%)
reflect.Value.Seq.func3 219 -> 178  (-18.72%)
reflect.TypeFor[go.shape.uint64] 33 -> 18  (-45.45%)
reflect.Value.Seq2.func4 399 -> 284  (-28.82%)

fmt
fmt.(*pp).fmtBytes 1723 -> 1691  (-1.86%)

database/sql/driver
reflect.TypeFor[go.shape.interface 33 -> 18  (-45.45%)
database/sql/driver.init 72 -> 57  (-20.83%)

Change-Id: I9eb750cf0b7ebf532589f939431feb0a899e42ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/701301
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-12 09:34:43 -07:00
Keith Randall
80a2aae922 Revert "cmd/compile: improve stp merging for non-sequent cases"
This reverts commit 4c63d798cb.

Reason for revert: Causes miscompilations. See issue 75365.

Change-Id: Icd1fcfeb23d2ec524b16eb556030f43875e1c90d
Reviewed-on: https://go-review.googlesource.com/c/go/+/702455
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-09-10 11:11:11 -07:00
Youlin Feng
a5fa5ea51c cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}
This CL slightly speeds up strings.HasPrefix when testing constant
prefixes of length {3,5,6,7}.

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │      old     │                 new                 │
                │    sec/op    │   sec/op     vs base                │
StringPrefix3-8   11.125n ± 2%   8.539n ± 1%  -23.25% (p=0.000 n=20)
StringPrefix5-8   11.170n ± 2%   8.700n ± 1%  -22.11% (p=0.000 n=20)
StringPrefix6-8   11.190n ± 2%   8.655n ± 1%  -22.65% (p=0.000 n=20)
StringPrefix7-8   11.095n ± 1%   8.878n ± 1%  -19.98% (p=0.000 n=20)

Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/700816
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-09 12:10:07 -07:00
Melnikov Denis
4c63d798cb cmd/compile: improve stp merging for non-sequent cases
Original algorithm merges stores with the first
mergeable store in the chain, but it misses some
cases. Additional reordering stores in increasing order
of memory access in the chain allows merging in these cases.

Fixes #71987

There are the results of sweet benchmarks and
the difference between sizes of sections .text

                        │ old.results │            new.results             │
                        │   sec/op    │   sec/op     vs base               │
BleveIndexBatch100-4      7.614 ± 2%    7.548 ± 1%       ~ (p=0.190 n=10)
ESBuildThreeJS-4         821.3m ± 0%   819.0m ± 1%       ~ (p=0.165 n=10)
ESBuildRomeTS-4          206.2m ± 1%   204.4m ± 1%  -0.90% (p=0.023 n=10)
EtcdPut-4                64.89m ± 1%   64.94m ± 2%       ~ (p=0.684 n=10)
EtcdSTM-4                318.4m ± 0%   319.2m ± 1%       ~ (p=0.631 n=10)
GoBuildKubelet-4          157.4 ± 0%    157.6 ± 0%       ~ (p=0.105 n=10)
GoBuildKubeletLink-4      12.42 ± 2%    12.41 ± 1%       ~ (p=0.529 n=10)
GoBuildIstioctl-4         124.4 ± 0%    124.4 ± 0%       ~ (p=0.579 n=10)
GoBuildIstioctlLink-4     8.700 ± 1%    8.693 ± 1%       ~ (p=0.912 n=10)
GoBuildFrontend-4         46.52 ± 0%    46.50 ± 0%       ~ (p=0.971 n=10)
GoBuildFrontendLink-4     2.282 ± 1%    2.272 ± 1%       ~ (p=0.529 n=10)
GoBuildTsgo-4             75.02 ± 1%    75.31 ± 1%       ~ (p=0.436 n=10)
GoBuildTsgoLink-4         1.229 ± 1%    1.219 ± 1%  -0.82% (p=0.035 n=10)
GopherLuaKNucleotide-4    34.77 ± 5%    34.31 ± 1%  -1.33% (p=0.015 n=10)
MarkdownRenderXHTML-4    286.6m ± 0%   285.7m ± 1%       ~ (p=0.315 n=10)
Tile38QueryLoad-4        657.2µ ± 1%   660.3µ ± 0%       ~ (p=0.436 n=10)
geomean                   2.570         2.563       -0.24%

Executable            Old .text  New .text     Change
-------------------------------------------------------
benchmark               6504820    6504020     -0.01%
bleve-index-bench       3903860    3903636     -0.01%
esbuild                 4801012    4801172     +0.00%
esbuild-bench           1256404    1256340     -0.01%
etcd                    9188148    9187076     -0.01%
etcd-bench              6462228    6461524     -0.01%
go                      5924468    5923892     -0.01%
go-build-bench          1282004    1281940     -0.00%
gopher-lua-bench        1639540    1639348     -0.01%
markdown-bench          1478452    1478356     -0.01%
tile38-bench            2753524    2753300     -0.01%
tile38-server          10241380   10240068     -0.01%

Change-Id: Ieb4fdfd656aca458f65fc45938de70550632bd13
Reviewed-on: https://go-review.googlesource.com/c/go/+/698097
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-09 12:10:01 -07:00
Xiaolin Zhao
f5b20689e9 cmd/compile: optimize loads from readonly globals into constants on loong64
Ref: CL 141118
Update #26498

Change-Id: I9c4ad2bedc4d50bd273bbe9119a898d4fca95e45
Reviewed-on: https://go-review.googlesource.com/c/go/+/700875
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-05 08:42:28 -07:00
Xiaolin Zhao
3492e4262b cmd/compile: simplify specific addition operations using the ADDV16 instruction
On loong64, the addi.d instruction can only directly handle 12-bit
immediate numbers. If a larger immediate number needs to be processed,
it must first be placed in a register, and then the add.d instruction
is used to complete the processing of the larger immediate number.
If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0,
then the ADDV16 instruction can be used to complete the addition operation.

Removes 164 instructions from the go binary on loong64.

Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/700536
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-09-05 08:18:04 -07:00
Youlin Feng
df29038486 cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem
This CL makes the generated code for reflect.TypeFor as simple as an
intrinsic function.

Fixes #75203

Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/700336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-04 07:25:26 -07:00
limeidan
bd71b94659 cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64
Reduce the number of go toolchain instructions on loong64 as follows:

	file	    before	after	    Δ 	     %
	go	    1573148	1571708	   -1,440  -0.0915%
	gofmt	    320578	320090	   -488    -0.1522%
	asm	    555066	554406	   -660    -0.1189%
	cgo	    481566	480926	   -640    -0.1329%
	compile	    2475962	2473880	   -2,082  -0.0841%
	cover	    516536	515920	   -616    -0.1193%
	link	    702172	701404	   -768    -0.1094%
	preprofile  238626	238274	   -352    -0.1475%
	vet	    792928	792100	   -828    -0.1044%

Change-Id: I61e462726835959c60e1b4e5256d4020202418ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/693877
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-25 12:30:16 -07:00
Xiaolin Zhao
44c5956bf7 test/codegen: add Mul2 and DivPow2 test for loong64
Change-Id: I29ccd105c5418955146a3f4873162963da489a70
Reviewed-on: https://go-review.googlesource.com/c/go/+/697935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-24 18:14:28 -07:00
Xiaolin Zhao
0aa8019e94 test/codegen: add Mul* test for loong64
Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697895
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-24 18:14:22 -07:00
Xiaolin Zhao
83420974b7 test/codegen: add sqrt* abs and copysign test for loong64
Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73
Reviewed-on: https://go-review.googlesource.com/c/go/+/697955
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-24 18:14:13 -07:00
limeidan
1843f1e9c0 cmd/compile: use zero register instead of specialized *zero instructions on loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this.

Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4
Reviewed-on: https://go-review.googlesource.com/c/go/+/693876
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
2025-08-21 11:23:05 -07:00
Xiaolin Zhao
9632ba8160 cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64
Pattern1: (the type of c is uint16)
    c>>8 | c<<8
To:
    revb2h c

Pattern2: (the type of c is uint32)
    (c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
    revb2h c

Pattern3: (the type of c is uint64)
    (c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
    revb4h c

Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda
Reviewed-on: https://go-review.googlesource.com/c/go/+/696335
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-21 11:19:34 -07:00
Xiaolin Zhao
fa706ea50f cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38
Reviewed-on: https://go-review.googlesource.com/c/go/+/697956
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-21 11:16:49 -07:00
Michael Munday
320df537cc cmd/compile: emit classify instructions for infinity tests on riscv64
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.

The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │        sec/op        │   sec/op     vs base                │
Acos                           159.9n ± 0%   173.7n ± 0%   +8.66% (p=0.000 n=10)
Acosh                          249.8n ± 0%   254.4n ± 0%   +1.86% (p=0.000 n=10)
Asin                           159.9n ± 0%   173.7n ± 0%   +8.66% (p=0.000 n=10)
Asinh                          292.2n ± 0%   283.0n ± 0%   -3.15% (p=0.000 n=10)
Atan                           119.1n ± 0%   119.0n ± 0%   -0.08% (p=0.036 n=10)
Atanh                          265.1n ± 0%   271.6n ± 0%   +2.43% (p=0.000 n=10)
Atan2                          194.9n ± 0%   186.7n ± 0%   -4.23% (p=0.000 n=10)
Cbrt                           216.3n ± 0%   203.1n ± 0%   -6.10% (p=0.000 n=10)
Ceil                           31.82n ± 0%   31.81n ± 0%        ~ (p=0.063 n=10)
Copysign                       4.897n ± 0%   4.893n ± 3%   -0.08% (p=0.038 n=10)
Cos                            123.9n ± 0%   107.7n ± 1%  -13.03% (p=0.000 n=10)
Cosh                           293.0n ± 0%   264.6n ± 0%   -9.68% (p=0.000 n=10)
Erf                            150.0n ± 0%   133.8n ± 0%  -10.80% (p=0.000 n=10)
Erfc                           151.8n ± 0%   137.9n ± 0%   -9.16% (p=0.000 n=10)
Erfinv                         173.8n ± 0%   173.8n ± 0%        ~ (p=0.820 n=10)
Erfcinv                        173.8n ± 0%   173.8n ± 0%        ~ (p=1.000 n=10)
Exp                            247.7n ± 0%   220.4n ± 0%  -11.04% (p=0.000 n=10)
ExpGo                          261.4n ± 0%   232.5n ± 0%  -11.04% (p=0.000 n=10)
Expm1                          176.2n ± 0%   164.9n ± 0%   -6.41% (p=0.000 n=10)
Exp2                           220.4n ± 0%   190.2n ± 0%  -13.70% (p=0.000 n=10)
Exp2Go                         232.5n ± 0%   204.0n ± 0%  -12.22% (p=0.000 n=10)
Abs                            4.897n ± 0%   4.897n ± 0%        ~ (p=0.726 n=10)
Dim                            16.32n ± 0%   16.31n ± 0%        ~ (p=0.770 n=10)
Floor                          31.84n ± 0%   31.83n ± 0%        ~ (p=0.677 n=10)
Max                            26.11n ± 0%   26.13n ± 0%        ~ (p=0.290 n=10)
Min                            26.10n ± 0%   26.11n ± 0%        ~ (p=0.424 n=10)
Mod                            416.2n ± 0%   337.8n ± 0%  -18.83% (p=0.000 n=10)
Frexp                          63.65n ± 0%   50.60n ± 0%  -20.50% (p=0.000 n=10)
Gamma                          218.8n ± 0%   206.4n ± 0%   -5.62% (p=0.000 n=10)
Hypot                          92.20n ± 0%   94.69n ± 0%   +2.70% (p=0.000 n=10)
HypotGo                        107.7n ± 0%   109.3n ± 0%   +1.49% (p=0.000 n=10)
Ilogb                          59.54n ± 0%   44.04n ± 0%  -26.04% (p=0.000 n=10)
J0                             708.9n ± 0%   674.5n ± 0%   -4.86% (p=0.000 n=10)
J1                             707.6n ± 0%   676.1n ± 0%   -4.44% (p=0.000 n=10)
Jn                             1.513µ ± 0%   1.427µ ± 0%   -5.68% (p=0.000 n=10)
Ldexp                          70.20n ± 0%   57.09n ± 0%  -18.68% (p=0.000 n=10)
Lgamma                         201.5n ± 0%   185.3n ± 1%   -8.01% (p=0.000 n=10)
Log                            201.5n ± 0%   182.7n ± 0%   -9.35% (p=0.000 n=10)
Logb                           59.54n ± 0%   46.53n ± 0%  -21.86% (p=0.000 n=10)
Log1p                          178.8n ± 0%   173.9n ± 6%   -2.74% (p=0.021 n=10)
Log10                          201.4n ± 0%   184.3n ± 0%   -8.49% (p=0.000 n=10)
Log2                           79.17n ± 0%   66.07n ± 0%  -16.54% (p=0.000 n=10)
Modf                           34.27n ± 0%   34.25n ± 0%        ~ (p=0.559 n=10)
Nextafter32                    49.34n ± 0%   49.37n ± 0%   +0.05% (p=0.040 n=10)
Nextafter64                    43.66n ± 0%   43.66n ± 0%        ~ (p=0.869 n=10)
PowInt                         309.1n ± 0%   267.4n ± 0%  -13.49% (p=0.000 n=10)
PowFrac                        769.6n ± 0%   677.3n ± 0%  -11.98% (p=0.000 n=10)
Pow10Pos                       13.88n ± 0%   13.88n ± 0%        ~ (p=0.811 n=10)
Pow10Neg                       19.58n ± 0%   19.57n ± 0%        ~ (p=0.993 n=10)
Round                          23.65n ± 0%   23.66n ± 0%        ~ (p=0.354 n=10)
RoundToEven                    27.75n ± 0%   27.75n ± 0%        ~ (p=0.971 n=10)
Remainder                      380.0n ± 0%   309.9n ± 0%  -18.45% (p=0.000 n=10)
Signbit                        13.06n ± 0%   13.06n ± 0%        ~ (p=1.000 n=10)
Sin                            133.8n ± 0%   120.8n ± 0%   -9.75% (p=0.000 n=10)
Sincos                         160.7n ± 0%   147.7n ± 0%   -8.12% (p=0.000 n=10)
Sinh                           305.9n ± 0%   277.9n ± 0%   -9.17% (p=0.000 n=10)
SqrtIndirect                   3.265n ± 0%   3.264n ± 0%        ~ (p=0.546 n=10)
SqrtLatency                    19.58n ± 0%   19.58n ± 0%        ~ (p=0.973 n=10)
SqrtIndirectLatency            19.59n ± 0%   19.58n ± 0%        ~ (p=0.370 n=10)
SqrtGoLatency                  205.7n ± 0%   202.7n ± 0%   -1.46% (p=0.000 n=10)
SqrtPrime                      4.953µ ± 0%   4.954µ ± 0%        ~ (p=0.477 n=10)
Tan                            163.2n ± 0%   150.2n ± 0%   -7.99% (p=0.000 n=10)
Tanh                           312.4n ± 0%   284.2n ± 0%   -9.01% (p=0.000 n=10)
Trunc                          31.83n ± 0%   31.83n ± 0%        ~ (p=0.663 n=10)
Y0                             701.0n ± 0%   669.2n ± 0%   -4.54% (p=0.000 n=10)
Y1                             704.5n ± 0%   672.4n ± 0%   -4.55% (p=0.000 n=10)
Yn                             1.490µ ± 0%   1.422µ ± 0%   -4.60% (p=0.000 n=10)
Float64bits                    5.713n ± 0%   5.710n ± 0%        ~ (p=0.926 n=10)
Float64frombits                4.896n ± 0%   4.896n ± 0%        ~ (p=0.663 n=10)
Float32bits                    12.25n ± 0%   12.25n ± 0%        ~ (p=0.571 n=10)
Float32frombits                4.898n ± 0%   4.896n ± 0%        ~ (p=0.754 n=10)
FMA                            4.895n ± 0%   4.895n ± 0%        ~ (p=0.745 n=10)
geomean                        94.40n        89.43n        -5.27%

Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-13 20:33:56 -07:00
limeidan
90b7d7aaa2 cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  │     old      │                 new                  │
                  │    sec/op    │    sec/op     vs base                │
MulconstI32/3       0.8004n ± 0%   0.4247n ± 2%  -46.94% (p=0.000 n=10)
MulconstI32/5       0.8005n ± 0%   0.4256n ± 1%  -46.83% (p=0.000 n=10)
MulconstI32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstI32/120     0.8090n ± 0%   0.8067n ± 0%   -0.28% (p=0.007 n=10)
MulconstI32/-120    0.8109n ± 0%   0.8072n ± 0%   -0.47% (p=0.000 n=10)
MulconstI32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstI32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstI64/3       0.8005n ± 0%   0.4241n ± 1%  -47.02% (p=0.000 n=10)
MulconstI64/5       0.8004n ± 0%   0.4249n ± 1%  -46.91% (p=0.000 n=10)
MulconstI64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstI64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.635 n=10)
MulconstI64/-120    0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65538   0.8096n ± 0%   0.8004n ± 0%   -1.14% (p=0.000 n=10)
MulconstU32/3       0.8004n ± 0%   0.4263n ± 1%  -46.75% (p=0.000 n=10)
MulconstU32/5       0.8005n ± 0%   0.4262n ± 1%  -46.76% (p=0.000 n=10)
MulconstU32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstU32/120     0.8105n ± 0%   0.8096n ± 0%        ~ (p=0.183 n=10)
MulconstU32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstU32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=1.000 n=10)
MulconstU64/3       0.8004n ± 0%   0.4265n ± 4%  -46.71% (p=0.000 n=10)
MulconstU64/5       0.8004n ± 0%   0.4256n ± 0%  -46.82% (p=0.000 n=10)
MulconstU64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstU64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.387 n=10)
MulconstU64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstU64/65538   0.8080n ± 0%   0.8004n ± 0%   -0.93% (p=0.000 n=10)
geomean             0.8539n        0.6597n       -22.74%

Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-12 23:01:49 -07:00
Keith Randall
f04421ea9a cmd/compile: soften test for 74788
We now (as of CL 678620) use float registers other than X0 for copying.

Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25
Reviewed-on: https://go-review.googlesource.com/c/go/+/695275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-08-12 10:05:55 -07:00
Michael Munday
084c0f8494 cmd/compile: allow InlMark operations to be speculatively executed
Although InlMark takes a memory argument it ultimately becomes a
NOP and therefore is safe to speculatively execute.

Fixes #74915

Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82
Reviewed-on: https://go-review.googlesource.com/c/go/+/693795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-11 00:52:23 -07:00
Xiaolin Zhao
a552737418 cmd/compile: fold negation into multiplication on loong64
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
        |  bench.old   |              bench.new              |
        |    sec/op    |   sec/op     vs base                |
MulNeg     828.4n ± 0%   655.9n ± 0%  -20.82% (p=0.000 n=10)
Mul2Neg   1062.0n ± 0%   826.8n ± 0%  -22.15% (p=0.000 n=10)
geomean    938.0n        736.4n       -21.49%

Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-08-05 18:02:06 -07:00
Michael Munday
fcc036f03b cmd/compile: optimise float <-> int register moves on riscv64
Use the FMV* instructions to move values between the floating point and
integer register files.

Note: I'm unsure why there is a slowdown in the Float32bits benchmark,
I've checked and an FMVXS instruction is being used as expected. There
are multiple loads and other instructions in the main loop.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │ fmv-before.txt │            fmv-after.txt            │
                    │     sec/op     │   sec/op     vs base                │
Acos                     122.7n ± 0%   122.7n ± 0%        ~ (p=1.000 n=10)
Acosh                    197.2n ± 0%   191.5n ± 0%   -2.89% (p=0.000 n=10)
Asin                     122.7n ± 0%   122.7n ± 0%        ~ (p=0.474 n=10)
Asinh                    231.0n ± 0%   224.1n ± 0%   -2.99% (p=0.000 n=10)
Atan                     91.39n ± 0%   91.41n ± 0%        ~ (p=0.465 n=10)
Atanh                    210.3n ± 0%   203.4n ± 0%   -3.26% (p=0.000 n=10)
Atan2                    149.6n ± 0%   149.6n ± 0%        ~ (p=0.721 n=10)
Cbrt                     176.5n ± 0%   165.9n ± 0%   -6.01% (p=0.000 n=10)
Ceil                     25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Copysign                 3.756n ± 0%   3.756n ± 0%        ~ (p=0.149 n=10)
Cos                      95.15n ± 0%   95.15n ± 0%        ~ (p=0.374 n=10)
Cosh                     228.6n ± 0%   224.7n ± 0%   -1.71% (p=0.000 n=10)
Erf                      115.2n ± 0%   115.2n ± 0%        ~ (p=0.474 n=10)
Erfc                     116.4n ± 0%   116.4n ± 0%        ~ (p=0.628 n=10)
Erfinv                   133.3n ± 0%   133.3n ± 0%        ~ (p=1.000 n=10)
Erfcinv                  133.3n ± 0%   133.3n ± 0%        ~ (p=1.000 n=10)
Exp                      194.1n ± 0%   190.3n ± 0%   -1.93% (p=0.000 n=10)
ExpGo                    204.7n ± 0%   200.3n ± 0%   -2.15% (p=0.000 n=10)
Expm1                    137.7n ± 0%   135.2n ± 0%   -1.82% (p=0.000 n=10)
Exp2                     173.4n ± 0%   169.0n ± 0%   -2.54% (p=0.000 n=10)
Exp2Go                   182.8n ± 0%   178.4n ± 0%   -2.41% (p=0.000 n=10)
Abs                      3.756n ± 0%   3.756n ± 0%        ~ (p=0.157 n=10)
Dim                      12.52n ± 0%   12.52n ± 0%        ~ (p=0.737 n=10)
Floor                    25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Max                      21.29n ± 0%   20.03n ± 0%   -5.92% (p=0.000 n=10)
Min                      21.28n ± 0%   20.04n ± 0%   -5.85% (p=0.000 n=10)
Mod                      344.9n ± 0%   319.2n ± 0%   -7.45% (p=0.000 n=10)
Frexp                    55.71n ± 0%   48.85n ± 0%  -12.30% (p=0.000 n=10)
Gamma                    165.9n ± 0%   167.8n ± 0%   +1.15% (p=0.000 n=10)
Hypot                    73.24n ± 0%   70.74n ± 0%   -3.41% (p=0.000 n=10)
HypotGo                  84.50n ± 0%   82.63n ± 0%   -2.21% (p=0.000 n=10)
Ilogb                    49.45n ± 0%   45.70n ± 0%   -7.59% (p=0.000 n=10)
J0                       556.5n ± 0%   544.0n ± 0%   -2.25% (p=0.000 n=10)
J1                       555.3n ± 0%   542.8n ± 0%   -2.24% (p=0.000 n=10)
Jn                       1.181µ ± 0%   1.156µ ± 0%   -2.12% (p=0.000 n=10)
Ldexp                    59.47n ± 0%   53.84n ± 0%   -9.47% (p=0.000 n=10)
Lgamma                   167.2n ± 0%   154.6n ± 0%   -7.51% (p=0.000 n=10)
Log                      160.9n ± 0%   154.6n ± 0%   -3.92% (p=0.000 n=10)
Logb                     49.45n ± 0%   45.70n ± 0%   -7.58% (p=0.000 n=10)
Log1p                    147.1n ± 0%   137.1n ± 0%   -6.80% (p=0.000 n=10)
Log10                    162.1n ± 1%   154.6n ± 0%   -4.63% (p=0.000 n=10)
Log2                     66.99n ± 0%   60.72n ± 0%   -9.36% (p=0.000 n=10)
Modf                     29.42n ± 0%   26.29n ± 0%  -10.64% (p=0.000 n=10)
Nextafter32              41.95n ± 0%   37.88n ± 0%   -9.70% (p=0.000 n=10)
Nextafter64              38.82n ± 0%   33.49n ± 0%  -13.73% (p=0.000 n=10)
PowInt                   252.3n ± 0%   237.3n ± 0%   -5.95% (p=0.000 n=10)
PowFrac                  615.5n ± 0%   589.7n ± 0%   -4.19% (p=0.000 n=10)
Pow10Pos                 10.64n ± 0%   10.64n ± 0%        ~ (p=1.000 n=10)
Pow10Neg                 24.42n ± 0%   15.02n ± 0%  -38.49% (p=0.000 n=10)
Round                    21.91n ± 0%   18.16n ± 0%  -17.12% (p=0.000 n=10)
RoundToEven              24.42n ± 0%   21.29n ± 0%  -12.84% (p=0.000 n=10)
Remainder                308.0n ± 0%   291.2n ± 0%   -5.44% (p=0.000 n=10)
Signbit                  10.02n ± 0%   10.02n ± 0%        ~ (p=1.000 n=10)
Sin                      102.7n ± 0%   102.7n ± 0%        ~ (p=0.211 n=10)
Sincos                   124.0n ± 1%   123.3n ± 0%   -0.56% (p=0.002 n=10)
Sinh                     239.1n ± 0%   234.7n ± 0%   -1.84% (p=0.000 n=10)
SqrtIndirect             2.504n ± 0%   2.504n ± 0%        ~ (p=0.303 n=10)
SqrtLatency              15.03n ± 0%   15.02n ± 0%        ~ (p=0.598 n=10)
SqrtIndirectLatency      15.02n ± 0%   15.02n ± 0%        ~ (p=0.907 n=10)
SqrtGoLatency            165.3n ± 0%   157.2n ± 0%   -4.90% (p=0.000 n=10)
SqrtPrime                3.801µ ± 0%   3.802µ ± 0%        ~ (p=1.000 n=10)
Tan                      125.2n ± 0%   125.2n ± 0%        ~ (p=0.458 n=10)
Tanh                     244.2n ± 0%   239.9n ± 0%   -1.76% (p=0.000 n=10)
Trunc                    25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Y0                       550.2n ± 0%   538.1n ± 0%   -2.21% (p=0.000 n=10)
Y1                       552.8n ± 0%   540.6n ± 0%   -2.21% (p=0.000 n=10)
Yn                       1.168µ ± 0%   1.143µ ± 0%   -2.14% (p=0.000 n=10)
Float64bits              8.139n ± 0%   4.385n ± 0%  -46.13% (p=0.000 n=10)
Float64frombits          7.512n ± 0%   3.759n ± 0%  -49.96% (p=0.000 n=10)
Float32bits              8.138n ± 0%   9.393n ± 0%  +15.42% (p=0.000 n=10)
Float32frombits          7.513n ± 0%   3.757n ± 0%  -49.98% (p=0.000 n=10)
FMA                      3.756n ± 0%   3.756n ± 0%        ~ (p=0.246 n=10)
geomean                  77.43n        72.42n        -6.47%

Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667
Reviewed-on: https://go-review.googlesource.com/c/go/+/599235
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-08-05 08:27:15 -07:00
Xiaolin Zhao
e071617222 cmd/compile: optimize multiplication rules on loong64
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI32/120     1.6010n ± 0%   0.8130n ± 0%  -49.22% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8109n ± 0%  -49.35% (p=0.000 n=10)
MulconstI32/65537   1.6275n ± 0%   0.8005n ± 0%  -50.81% (p=0.000 n=10)
MulconstI32/65538   1.6290n ± 0%   0.8004n ± 0%  -50.87% (p=0.000 n=10)
MulconstI64/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/-120    1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/65537   1.6270n ± 0%   0.8005n ± 0%  -50.80% (p=0.000 n=10)
MulconstI64/65538   1.6290n ± 0%   0.8071n ± 1%  -50.45% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8066n ± 0%  -49.62% (p=0.000 n=10)
MulconstU32/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU32/65538   1.6280n ± 0%   0.8005n ± 0%  -50.83% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU64/65538   1.6300n ± 0%   0.8067n ± 0%  -50.51% (p=0.000 n=10)
geomean              1.609n        0.8537n       -46.95%

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstI32/120     1.6020n ± 0%   0.8012n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/65537   1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI32/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/3       1.6015n ± 0%   0.8007n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/5       1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/12       1.602n ± 0%    1.202n ± 0%  -25.00% (p=0.000 n=10)
MulconstI64/120     1.6030n ± 0%   0.8011n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/-120    1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65538   1.6020n ± 0%   0.8009n ± 0%  -50.01% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8007n ± 0%  -49.98% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstU64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
geomean              1.601n        0.8523n       -46.77%

Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-08-01 08:42:40 -07:00
Keith Randall
eb7f515c4d cmd/compile: use generated loops instead of DUFFZERO on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
                        │     base      │                 exp                 │
                        │    sec/op     │   sec/op     vs base                │
MemclrKnownSize112-20      1.270n ± 14%   1.006n ± 0%  -20.72% (p=0.000 n=10)
MemclrKnownSize128-20      1.266n ±  0%   1.005n ± 0%  -20.58% (p=0.000 n=10)
MemclrKnownSize192-20      1.771n ±  0%   1.579n ± 1%  -10.84% (p=0.000 n=10)
MemclrKnownSize248-20      4.034n ±  0%   3.520n ± 0%  -12.75% (p=0.000 n=10)
MemclrKnownSize256-20      2.269n ±  0%   2.014n ± 0%  -11.26% (p=0.000 n=10)
MemclrKnownSize512-20      4.280n ±  0%   4.030n ± 0%   -5.84% (p=0.000 n=10)
MemclrKnownSize1024-20     8.309n ±  1%   8.057n ± 0%   -3.03% (p=0.000 n=10)

Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765
Reviewed-on: https://go-review.googlesource.com/c/go/+/678937
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-31 17:12:39 -07:00