all: REVERSE MERGE dev.simd (7d65463) into master

This commit is a REVERSE MERGE.
It merges dev.simd back into its parent branch, master.
The development of simd will continue on (only) dev.simd,
and it will be merged to the master branch when necessary.

Merge List:

+ 2025-11-24 7d65463a54 [dev.simd] all: merge master (e704b09) into dev.simd
+ 2025-11-24 afd1721fc5 [dev.simd] all: merge master (02d1f3a) into dev.simd
+ 2025-11-24 a9914886da [dev.simd] internal/buildcfg: don't enable SIMD experiment by default
+ 2025-11-24 61a5a6b016 [dev.simd] simd: add goexperiment tag to generate.go
+ 2025-11-24 f045ed4110 [dev.simd] go/doc/comment: don't include experimental packages in std list
+ 2025-11-24 220d73cc44 [dev.simd] all: merge master (8dd5b13) into dev.simd
+ 2025-11-24 0c69e77343 Revert "[dev.simd] internal/runtime/gc: add simd package based greentea kernels"
+ 2025-11-21 da92168ec8 [dev.simd] internal/runtime/gc: add simd package based greentea kernels
+ 2025-11-21 3fdd183aef [dev.simd] cmd/compile, simd: update conversion API names
+ 2025-11-21 d3a0321dba [dev.simd] cmd/compile: fix incorrect mapping of SHA256MSG2128
+ 2025-11-20 74ebdd28d1 [dev.simd] simd, cmd/compile: add more element types for Select128FromPair
+ 2025-11-20 4d26d66a49 [dev.simd] simd: fix signatures for PermuteConstant* methods
+ 2025-11-20 e3d4645693 [dev.simd] all: merge master (ca37d24) into dev.simd
+ 2025-11-20 95b4ad525f [dev.simd] simd: reorganize internal tests so that simd does not import testing
+ 2025-11-18 3fe246ae0f [dev.simd] simd: make 'go generate' generate everything
+ 2025-11-18 cf45adf140 [dev.simd] simd: move template code generator into _gen
+ 2025-11-18 19b4a30899 [dev.simd] simd/_gen/simdgen: remove outdated asm.yaml.toy
+ 2025-11-18 9461db5c59 [dev.simd] simd: fix comment in file generator
+ 2025-11-18 4004ff3523 [dev.simd] simd: remove FlattenedTranspose from exports
+ 2025-11-18 896f293a25 [dev.simd] cmd/compile, simd: change DotProductQuadruple and add peepholes
+ 2025-11-18 be9c50c6a0 [dev.simd] cmd/compile, simd: change SHA ops names and types
+ 2025-11-17 0978935a99 [dev.simd] cmd/compile, simd: change AES op names and add missing size
+ 2025-11-17 95871e4a00 [dev.simd] cmd/compile, simd: add VPALIGNR
+ 2025-11-17 934dbcea1a [dev.simd] simd: update CPU feature APIs
+ 2025-11-17 e4d9484220 [dev.simd] cmd/compile: fix unstable output
+ 2025-11-13 d7a0c45642 [dev.simd] all: merge master (57362e9) into dev.simd
+ 2025-11-11 86b4fe31d9 [dev.simd] cmd/compile: add masked merging ops and optimizations
+ 2025-11-10 771a1dc216 [dev.simd] cmd/compile: add peepholes for all masked ops and bug fixes
+ 2025-11-10 972732b245 [dev.simd] simd, cmd/compile: remove move from API
+ 2025-11-10 bf77323efa [dev.simd] simd: put unexported methods to another file
+ 2025-11-04 fe040658b2 [dev.simd] simd/_gen: fix sorting ops slices
+ 2025-10-29 e452f4ac7d [dev.simd] cmd/compile: enhance inlining for closure-of-SIMD
+ 2025-10-27 ca1264ac50 [dev.simd] test: add some trickier cases to ternary-boolean simd test
+ 2025-10-24 f6b4711095 [dev.simd] cmd/compile, simd: add rewrite to convert logical expression trees into TERNLOG instructions
+ 2025-10-24 cf7c1a4cbb [dev.simd] cmd/compile, simd: add SHA features
+ 2025-10-24 2b8eded4f4 [dev.simd] simd/_gen: parse SHA features from XED
+ 2025-10-24 c75965b666 [dev.simd] simd: added String() method to SIMD vectors.
+ 2025-10-22 d03634f807 [dev.simd] cmd/compile, simd: add definitions for VPTERNLOG[DQ]
+ 2025-10-20 20b3339542 [dev.simd] simd: add AES feature check
+ 2025-10-14 fc3bc49337 [dev.simd] simd: clean up mask load comments
+ 2025-10-14 416332dba2 [dev.simd] cmd/compile, simd: update DotProd to DotProduct
+ 2025-10-14 647c790143 [dev.simd] cmd/compile: peephole simd mask load/stores from bits
+ 2025-10-14 2e71cf1a2a [dev.simd] cmd/compile, simd: remove mask load and stores
+ 2025-10-13 c4fbf3b4cf [dev.simd] simd/_gen: add mem peephole with feat mismatches
+ 2025-10-13 ba72ee0f30 [dev.simd] cmd/compile: more support for cpufeatures
+ 2025-10-09 be57d94c4c [dev.simd] simd: add emulated Not method
+ 2025-10-07 d2270bccbd [dev.simd] cmd/compile: track which CPU features are in scope
+ 2025-10-03 48756abd3a [dev.simd] cmd/compile: inliner tweaks to favor simd-handling functions
+ 2025-10-03 fb1749a3fe [dev.simd] all: merge master (adce7f1) into dev.simd
+ 2025-09-30 703a5fbaad [dev.simd] cmd/compile, simd: add AES instructions
+ 2025-09-29 1c961c2fb2 [dev.simd] simd: use new data movement instructions to do "fast" transposes
+ 2025-09-26 fe4af1c067 [dev.simd] simd: repair broken comments in generated ops_amd64.go
+ 2025-09-26 ea3b2ecd28 [dev.simd] cmd/compile, simd: add 64-bit select-from-pair methods
+ 2025-09-26 25c36b95d1 [dev.simd] simd, cmd/compile: add 128 bit select-from-pair
+ 2025-09-26 f0e281e693 [dev.simd] cmd/compile: don't require single use for SIMD load/store folding
+ 2025-09-26 b4d1e018a8 [dev.simd] cmd/compile: remove unnecessary code from early simd prototype
+ 2025-09-26 578777bf7c [dev.simd] cmd/compile: make condtion of CanSSA smarter for SIMD fields
+ 2025-09-26 c28b2a0ca1 [dev.simd] simd: generalize select-float32-from-pair
+ 2025-09-25 a693ae1e9a [dev.simd] all: merge master (d70ad4e) into dev.simd
+ 2025-09-25 5a78e1a4a1 [dev.simd] simd, cmd/compile: mark simd vectors uncomparable
+ 2025-09-23 bf00f5dfd6 [dev.simd] simd, cmd/compile: added simd methods for VSHUFP[DS]
+ 2025-09-23 8e60feeb41 [dev.simd] cmd/compile: improve slicemask removal
+ 2025-09-23 2b50ffe172 [dev.simd] cmd/compile: remove stores to unread parameters
+ 2025-09-23 2d8cb80d7c [dev.simd] all: merge master (9b2d39b) into dev.simd
+ 2025-09-22 63a09d6d3d [dev.simd] cmd/compile: fix SIMD const rematerialization condition
+ 2025-09-20 2ca96d218d [dev.simd] cmd/compile: enhance prove to infer bounds in slice len/cap calculations
+ 2025-09-19 c0f031fcc3 [dev.simd] cmd/compile: spill the correct SIMD register for morestack
+ 2025-09-19 58fa1d023e [dev.simd] cmd/compile: enhance the chunked indexing case to include reslicing
+ 2025-09-18 7ae0eb2e80 [dev.simd] cmd/compile: remove Add32x4 generic op
+ 2025-09-18 31b664d40b [dev.simd] cmd/compile: widen index for simd intrinsics jumptable
+ 2025-09-18 e34ad6de42 [dev.simd] cmd/compile: optimize VPTEST for 2-operand cases
+ 2025-09-18 f1e3651c33 [dev.simd] cmd/compile, simd: add VPTEST
+ 2025-09-18 d9751166a6 [dev.simd] cmd/compile: handle rematerialized op for incompatible reg constraint
+ 2025-09-18 4eb5c6e07b [dev.simd] cmd/compile, simd/_gen: add rewrite for const load ops
+ 2025-09-18 443b7aeddb [dev.simd] cmd/compile, simd/_gen: make rewrite rules consistent on CPU Features
+ 2025-09-16 bdd30e25ca [dev.simd] all: merge master (ca0e035) into dev.simd
+ 2025-09-16 0e590a505d [dev.simd] cmd/compile: use the right type for spill slot
+ 2025-09-15 dabe2bb4fb [dev.simd] cmd/compile: fix holes in mask peepholes
+ 2025-09-12 3ec0b25ab7 [dev.simd] cmd/compile, simd/_gen/simdgen: add const load mops
+ 2025-09-12 1e5631d4e0 [dev.simd] cmd/compile: peephole simd load
+ 2025-09-11 48f366d826 [dev.simd] cmd/compile: add memop peephole rules
+ 2025-09-11 9a349f8e72 [dev.simd] all: merge master (cf5e993) into dev.simd
+ 2025-09-11 5a0446d449 [dev.simd] simd/_gen/simdgen, cmd/compile: add memory op machine ops
+ 2025-09-08 c39b2fdd1e [dev.simd] cmd/compile, simd: add VPLZCNT[DQ]
+ 2025-09-07 832c1f76dc [dev.simd] cmd/compile: enhance prove to deal with double-offset IsInBounds checks
+ 2025-09-06 0b323350a5 [dev.simd] simd/_gen/simdgen: merge memory ops
+ 2025-09-06 f42c9261d3 [dev.simd] simd/_gen/simdgen: parse memory operands
+ 2025-09-05 356c48d8e9 [dev.simd] cmd/compile, simd: add ClearAVXUpperBits
+ 2025-09-03 7c8b9115bc [dev.simd] all: merge master (4c4cefc) into dev.simd
+ 2025-09-02 9125351583 [dev.simd] internal/cpu: report AVX1 and 2 as supported on macOS 15 Rosetta 2
+ 2025-09-02 b509516b2e [dev.simd] simd, cmd/compile: add Interleave{Hi,Lo} (VPUNPCK*)
+ 2025-09-02 6890aa2e20 [dev.simd] cmd/compile: add instructions and rewrites for scalar-> vector moves
+ 2025-08-24 5ebe2d05d5 [dev.simd] simd: correct SumAbsDiff documentation
+ 2025-08-22 a5137ec92a [dev.simd] cmd/compile: sample peephole optimization for SIMD broadcast
+ 2025-08-22 83714616aa [dev.simd] cmd/compile: remove VPADDD4
+ 2025-08-22 4a3ea146ae [dev.simd] cmd/compile: correct register mask of some AVX512 ops
+ 2025-08-22 8d874834f1 [dev.simd] cmd/compile: use X15 for zero value in AVX context
+ 2025-08-22 4c311aa38f [dev.simd] cmd/compile: ensure the whole X15 register is zeroed
+ 2025-08-22 baea0c700b [dev.simd] cmd/compile, simd: complete AVX2? u?int shuffles
+ 2025-08-22 fa1e78c9ad [dev.simd] cmd/compile, simd: make Permute 128-bit use AVX VPSHUFB
+ 2025-08-22 bc217d4170 [dev.simd] cmd/compile, simd: add packed saturated u?int conversions
+ 2025-08-22 4fa23b0d29 [dev.simd] cmd/compile, simd: add saturated u?int conversions
+ 2025-08-21 3f6bab5791 [dev.simd] simd: move tests to a subdirectory to declutter "simd"
+ 2025-08-21 aea0a5e8d7 [dev.simd] simd/_gen/unify: improve envSet doc comment
+ 2025-08-21 7fdb1da6b0 [dev.simd] cmd/compile, simd: complete truncating u?int conversions.
+ 2025-08-21 f4c41d9922 [dev.simd] cmd/compile, simd: complete u?int widening conversions
+ 2025-08-21 6af8881adb [dev.simd] simd: reorganize cvt rules
+ 2025-08-21 58cfc2a5f6 [dev.simd] cmd/compile, simd: add VPSADBW
+ 2025-08-21 f7c6fa709e [dev.simd] simd/_gen/unify: fix some missing environments
+ 2025-08-20 7c84e984e6 [dev.simd] cmd/compile: rewrite to elide Slicemask from len==c>0 slicing
+ 2025-08-20 cf31b15635 [dev.simd] simd, cmd/compile: added .Masked() peephole opt for many operations.
+ 2025-08-20 1334285862 [dev.simd] simd: template field name cleanup in genfiles
+ 2025-08-20 af6475df73 [dev.simd] simd: add testing hooks for size-changing conversions
+ 2025-08-20 ede64cf0d8 [dev.simd] simd, cmd/compile: sample peephole optimization for .Masked()
+ 2025-08-20 103b6e39ca [dev.simd] all: merge master (9de69f6) into dev.simd
+ 2025-08-20 728ac3e050 [dev.simd] simd: tweaks to improve test disassembly
+ 2025-08-20 4fce49b86c [dev.simd] simd, cmd/compile: add widening unsigned converts 8->16->32
+ 2025-08-19 0f660d675f [dev.simd] simd: make OpMasked machine ops only
+ 2025-08-19 a034826e26 [dev.simd] simd, cmd/compile: implement ToMask, unexport asMask.
+ 2025-08-18 8ccd6c2034 [dev.simd] simd, cmd/compile: mark BLEND instructions as not-zero-mask
+ 2025-08-18 9a934d5080 [dev.simd] cmd/compile, simd: added methods for "float" GetElem
+ 2025-08-15 7380213a4e [dev.simd] cmd/compile: make move/load/store dependent only on reg and width
+ 2025-08-15 908e3e8166 [dev.simd] cmd/compile: make (most) move/load/store lowering use reg and width only
+ 2025-08-14 9783f86bc8 [dev.simd] cmd/compile: accounts rematerialize ops's output reginfo
+ 2025-08-14 a4ad41708d [dev.simd] all: merge master (924fe98) into dev.simd
+ 2025-08-13 8b90d48d8c [dev.simd] simd/_gen/simdgen: rewrite etetest.sh
+ 2025-08-13 b7c8698549 [dev.simd] simd/_gen: migrate simdgen from x/arch
+ 2025-08-13 257c1356ec [dev.simd] go/types: exclude simd/_gen module from TestStdlib
+ 2025-08-13 858a8d2276 [dev.simd] simd: reorganize/rename generated emulation files
+ 2025-08-13 2080415aa2 [dev.simd] simd: add emulations for missing AVX2 comparisons
+ 2025-08-13 ddb689c7bb [dev.simd] simd, cmd/compile: generated code for Broadcast
+ 2025-08-13 e001300cf2 [dev.simd] cmd/compile: fix LoadReg so it is aware of register target
+ 2025-08-13 d5dea86993 [dev.simd] cmd/compile: fix isIntrinsic for methods; fix fp <-> gp moves
+ 2025-08-13 08ab8e24a3 [dev.simd] cmd/compile: generated code from 'fix generated rules for shifts'
+ 2025-08-11 702ee2d51e [dev.simd] cmd/compile, simd: update generated files
+ 2025-08-11 e33eb1a7a5 [dev.simd] cmd/compile, simd: update generated files
+ 2025-08-11 667add4f1c [dev.simd] cmd/compile, simd: update generated files
+ 2025-08-11 1755c2909d [dev.simd] cmd/compile, simd: update generated files
+ 2025-08-11 2fd49d8f30 [dev.simd] simd: imm doc improve
+ 2025-08-11 ce0e803ab9 [dev.simd] cmd/compile: keep track of multiple rule file names in ssa/_gen
+ 2025-08-11 38b76bf2a3 [dev.simd] cmd/compile, simd: jump table for imm ops
+ 2025-08-08 94d72355f6 [dev.simd] simd: add emulations for bitwise ops and for mask/merge methods
+ 2025-08-07 8eb5f6020e [dev.simd] cmd/compile, simd: API interface fixes
+ 2025-08-07 b226bcc4a9 [dev.simd] cmd/compile, simd: add value conversion ToBits for mask
+ 2025-08-06 5b0ef7fcdc [dev.simd] cmd/compile, simd: add Expand
+ 2025-08-06 d3cf582f8a [dev.simd] cmd/compile, simd: (Set|Get)(Lo|Hi)
+ 2025-08-05 7ca34599ec [dev.simd] simd, cmd/compile: generated files to add 'blend' and 'blendMasked'
+ 2025-08-05 82d056ddd7 [dev.simd] cmd/compile: add ShiftAll immediate variant
+ 2025-08-04 775fb52745 [dev.simd] all: merge master (7a1679d) into dev.simd
+ 2025-08-04 6b9b59e144 [dev.simd] simd, cmd/compile: rename some methods
+ 2025-08-04 d375b95357 [dev.simd] simd: move lots of slice functions and methods to generated code
+ 2025-08-04 3f92aa1eca [dev.simd] cmd/compile, simd: make bitwise logic ops available to all u?int vectors
+ 2025-08-04 c2d775d401 [dev.simd] cmd/compile, simd: change PairDotProdAccumulate to AddDotProd
+ 2025-08-04 2c25f3e846 [dev.simd] cmd/compile, simd: change Shift*AndFillUpperFrom to Shift*Concat
+ 2025-08-01 c25e5c86b2 [dev.simd] cmd/compile: generated code for K-mask-register slice load/stores
+ 2025-08-01 1ac5f3533f [dev.simd] cmd/compile: opcodes and rules and code generation to enable AVX512 masked loads/stores
+ 2025-08-01 f39711a03d [dev.simd] cmd/compile: test for int-to-mask conversion
+ 2025-08-01 08bec02907 [dev.simd] cmd/compile: add register-to-mask moves, other simd glue
+ 2025-08-01 09ff25e350 [dev.simd] simd: add tests for simd conversions to Int32/Uint32.
+ 2025-08-01 a24ffe3379 [dev.simd] simd: modify test generation to make it more flexible
+ 2025-08-01 ec5c20ba5a [dev.simd] cmd/compile: generated simd code to add some conversions
+ 2025-08-01 e62e377ed6 [dev.simd] cmd/compile, simd: generated code from repaired simdgen sort
+ 2025-08-01 761894d4a5 [dev.simd] simd: add partial slice load/store for 32/64-bits on AVX2
+ 2025-08-01 acc1492b7d [dev.simd] cmd/compile: Generated code for AVX2 SIMD masked load/store
+ 2025-08-01 a0b87a7478 [dev.simd] cmd/compile: changes for AVX2 SIMD masked load/store
+ 2025-08-01 88568519b4 [dev.simd] simd: move test generation into Go repo
+ 2025-07-31 6f7a1164e7 [dev.simd] cmd/compile, simd: support store to bits for mask
+ 2025-07-21 41054cdb1c [dev.simd] simd, internal/cpu: support more AVX CPU Feature checks
+ 2025-07-21 957f06c410 [dev.simd] cmd/compile, simd: support load from bits for mask
+ 2025-07-21 f0e9dc0975 [dev.simd] cmd/compile: fix opLen(2|3)Imm8_2I intrinsic function
+ 2025-07-17 03a3887f31 [dev.simd] simd: clean up masked op doc
+ 2025-07-17 c61743e4f0 [dev.simd] cmd/compile, simd: reorder PairDotProdAccumulate
+ 2025-07-15 ef5f6cc921 [dev.simd] cmd/compile: adjust param order for AndNot
+ 2025-07-15 6d10680141 [dev.simd] cmd/compile, simd: add Compress
+ 2025-07-15 17baae72db [dev.simd] simd: default mask param's name to mask
+ 2025-07-15 01f7f57025 [dev.simd] cmd/compile, simd: add variable Permute
+ 2025-07-14 f5f42753ab [dev.simd] cmd/compile, simd: add VDPPS
+ 2025-07-14 08ffd66ab2 [dev.simd] simd: updates CPU Feature in doc
+ 2025-07-14 3f789721d6 [dev.simd] cmd/compile: mark SIMD types non-fat
+ 2025-07-11 b69622b83e [dev.simd] cmd/compile, simd: adjust Shift.* operations
+ 2025-07-11 4993a91ae1 [dev.simd] simd: change imm param name to constant
+ 2025-07-11 bbb6dccd84 [dev.simd] simd: fix documentations
+ 2025-07-11 1440ff7036 [dev.simd] cmd/compile: exclude simd vars from merge local
+ 2025-07-11 ccb43dcec7 [dev.simd] cmd/compile: add VZEROUPPER and VZEROALL inst
+ 2025-07-11 21596f2f75 [dev.simd] all: merge master (88cf0c5) into dev.simd
+ 2025-07-10 ab7f839280 [dev.simd] cmd/compile: fix maskreg/simdreg chaos
+ 2025-07-09 47b07a87a6 [dev.simd] cmd/compile, simd: fix Int64x2 Greater output type to mask
+ 2025-07-09 08cd62e9f5 [dev.simd] cmd/compile: remove X15 from register mask
+ 2025-07-09 9ea33ed538 [dev.simd] cmd/compile: output of simd generator, more ... rewrite rules
+ 2025-07-09 aab8b173a9 [dev.simd] cmd/compile, simd: Int64x2 Greater and Uint* Equal
+ 2025-07-09 8db7f41674 [dev.simd] cmd/compile: use upper registers for AVX512 simd ops
+ 2025-07-09 574854fd86 [dev.simd] runtime: save Z16-Z31 registers in async preempt
+ 2025-07-09 5429328b0c [dev.simd] cmd/compile: change register mask names for simd ops
+ 2025-07-09 029d7ec3e9 [dev.simd] cmd/compile, simd: rename Masked$OP to $(OP)Masked.
+ 2025-07-09 983e81ce57 [dev.simd] simd: rename stubs_amd64.go to ops_amd64.go
+ 2025-07-08 56ca67682b [dev.simd] cmd/compile, simd: remove FP bitwise logic operations.
+ 2025-07-08 0870ed04a3 [dev.simd] cmd/compile: make compares between NaNs all false.
+ 2025-07-08 24f2b8ae2e [dev.simd] simd: {Int,Uint}{8x{16,32},16x{8,16}} subvector loads/stores from slices.
+ 2025-07-08 2bb45cb8a5 [dev.simd] cmd/compile: minor tweak for race detector
+ 2025-07-07 43a61aef56 [dev.simd] cmd/compile: add EXTRACT[IF]128 instructions
+ 2025-07-07 292db9b676 [dev.simd] cmd/compile: add INSERT[IF]128 instructions
+ 2025-07-07 d8fa853b37 [dev.simd] cmd/compile: make regalloc simd aware on copy
+ 2025-07-07 dfd75f82d4 [dev.simd] cmd/compile: output of simdgen with invariant type order
+ 2025-07-04 72c39ef834 [dev.simd] cmd/compile: fix the "always panic" code to actually panic
+ 2025-07-01 1ee72a15a3 [dev.simd] internal/cpu: add GFNI feature check
+ 2025-06-30 0710cce6eb [dev.simd] runtime: remove write barrier in xRegRestore
+ 2025-06-30 59846af331 [dev.simd] cmd/compile, simd: cleanup operations and documentations
+ 2025-06-30 f849225b3b [dev.simd] all: merge master (740857f) into dev.simd
+ 2025-06-30 9eeb1e7a9a [dev.simd] runtime: save AVX2 and AVX-512 state on asynchronous preemption
+ 2025-06-30 426cf36b4d [dev.simd] runtime: save scalar registers off stack in amd64 async preemption
+ 2025-06-30 ead249a2e2 [dev.simd] cmd/compile: reorder operands for some simd operations
+ 2025-06-30 55665e1e37 [dev.simd] cmd/compile: undoes reorder transform in prior commit, changes names
+ 2025-06-26 10c9621936 [dev.simd] cmd/compile, simd: add galois field operations
+ 2025-06-26 e61ebfce56 [dev.simd] cmd/compile, simd: add shift operations
+ 2025-06-26 35b8cf7fed [dev.simd] cmd/compile: tweak sort order in generator
+ 2025-06-26 7fadfa9638 [dev.simd] cmd/compile: add simd VPEXTRA*
+ 2025-06-26 0d8cb89f5c [dev.simd] cmd/compile: support simd(imm,fp) returns gp
+ 2025-06-25 f4a7c124cc [dev.simd] all: merge master (f8ccda2) into dev.simd
+ 2025-06-25 4fda27c0cc [dev.simd] cmd/compile: glue codes for Shift and Rotate
+ 2025-06-24 61c1183342 [dev.simd] simd: add test wrappers
+ 2025-06-23 e32488003d [dev.simd] cmd/compile: make simd regmask naming more like existing conventions
+ 2025-06-23 1fa4bcfcda [dev.simd] simd, cmd/compile: generated code for VPINSR[BWDQ], and test
+ 2025-06-23 dd63b7aa0e [dev.simd] simd: add AVX512 aggregated check
+ 2025-06-23 0cdb2697d1 [dev.simd] simd: add tests for intrinsic used as a func value and via reflection
+ 2025-06-23 88c013d6ff [dev.simd] cmd/compile: generate function body for bodyless intrinsics
+ 2025-06-20 a8669c78f5 [dev.simd] sync: correct the type of runtime_StoreReluintptr
+ 2025-06-20 7c6ac35275 [dev.simd] cmd/compile: add simdFp1gp1fp1Imm8 helper to amd64 code generation
+ 2025-06-20 4150372a5d [dev.simd] cmd/compile: don't treat devel compiler as a released compiler
+ 2025-06-18 1b87d52549 [dev.simd] cmd/compile: add fp1gp1fp1 register mask for AMD64
+ 2025-06-18 1313521f75 [dev.simd] cmd/compile: remove fused mul/add/sub shapes.
+ 2025-06-17 1be5eb2686 [dev.simd] cmd/compile: fix signature error of PairDotProdAccumulate.
+ 2025-06-17 3a4d10bfca [dev.simd] cmd/compile: removed a map iteration from generator; tweaked type order
+ 2025-06-17 21d6573154 [dev.simd] cmd/compile: alphabetize SIMD intrinsics
+ 2025-06-16 ee1d9f3f85 [dev.simd] cmd/compile: reorder stubs
+ 2025-06-13 6c50c8b892 [dev.simd] cmd/compile: move simd helpers into compiler, out of generated code
+ 2025-06-13 7392dfd43e [dev.simd] cmd/compile: generated simd*ops files weren't up to date
+ 2025-06-13 00a8dacbe4 [dev.simd] cmd/compile: remove unused simd intrinsics "helpers"
+ 2025-06-13 b9a548775f cmd/compile: add up-to-date test for generated files
+ 2025-06-13 ca01eab9c7 [dev.simd] cmd/compile: add fused mul add sub ops
+ 2025-06-13 ded6e0ac71 [dev.simd] cmd/compile: add more dot products
+ 2025-06-13 3df41c856e [dev.simd] simd: update documentations
+ 2025-06-13 9ba7db36b5 [dev.simd] cmd/compile: add dot product ops
+ 2025-06-13 34a9cdef87 [dev.simd] cmd/compile: add round simd ops
+ 2025-06-13 5289e0f24e [dev.simd] cmd/compile: updates simd ordering and docs
+ 2025-06-13 c81cb05e3e [dev.simd] cmd/compile: add simdGen prog writer
+ 2025-06-13 9b9af3d638 [dev.simd] internal/cpu: add AVX-512-CD and DQ, and derived "basic AVX-512"
+ 2025-06-13 dfa6c74263 [dev.simd] runtime: eliminate global state in mkpreempt.go
+ 2025-06-10 b2e8ddba3c [dev.simd] all: merge master (773701a) into dev.simd
+ 2025-06-09 884f646966 [dev.simd] cmd/compile: add fp3m1fp1 shape to regalloc
+ 2025-06-09 6bc3505773 [dev.simd] cmd/compile: add fp3fp1 regsiter shape
+ 2025-06-05 2eaa5a0703 [dev.simd] simd: add functions+methods to load-from/store-to slices
+ 2025-06-05 8ecbd59ebb [dev.simd] cmd/compile: generated codes for amd64 SIMD
+ 2025-06-02 baa72c25f1 [dev.simd] all: merge master (711ff94) into dev.simd
+ 2025-05-30 0ff18a9cca [dev.simd] cmd/compile: disable intrinsics test for new simd stuff
+ 2025-05-30 7800f3813c [dev.simd] cmd/compile: flip sense of intrinsics test for SIMD
+ 2025-05-29 eba2430c16 [dev.simd] simd, cmd/compile, go build, go/doc: test tweaks
+ 2025-05-29 71c0e550cd [dev.simd] cmd/dist: disable API check on dev branch
+ 2025-05-29 62e1fccfb9 [dev.simd] internal: delete unused internal/simd directory
+ 2025-05-29 1161228bf1 [dev.simd] cmd/compile: add a fp1m1fp1 register shape to amd64
+ 2025-05-28 fdb067d946 [dev.simd] simd: initialize directory to make it suitable for testing SIMD
+ 2025-05-28 11d2b28bff [dev.simd] cmd/compile: add and fix k register supports
+ 2025-05-28 04b1030ae4 [dev.simd] cmd/compile: adapters for simd
+ 2025-05-27 2ef7106881 [dev.simd] internal/buildcfg: enable SIMD GOEXPERIMENT for amd64
+ 2025-05-22 4d2c71ebf9 [dev.simd] internal/goexperiment: add SIMD goexperiment
+ 2025-05-22 3ac5f2f962 [dev.simd] codereview.cfg: set up dev.simd branch

Change-Id: I60f2cd2ea055384a3788097738c6989630207871
This commit is contained in:
Cherry Mui 2025-11-24 16:02:01 -05:00
commit d4f5650cc5
186 changed files with 146299 additions and 835 deletions

View file

@ -150,12 +150,12 @@ func appendParamTypes(rts []*types.Type, t *types.Type) []*types.Type {
if w == 0 {
return rts
}
if t.IsScalar() || t.IsPtrShaped() {
if t.IsScalar() || t.IsPtrShaped() || t.IsSIMD() {
if t.IsComplex() {
c := types.FloatForComplex(t)
return append(rts, c, c)
} else {
if int(t.Size()) <= types.RegSize {
if int(t.Size()) <= types.RegSize || t.IsSIMD() {
return append(rts, t)
}
// assume 64bit int on 32-bit machine
@ -199,6 +199,9 @@ func appendParamOffsets(offsets []int64, at int64, t *types.Type) ([]int64, int6
if w == 0 {
return offsets, at
}
if t.IsSIMD() {
return append(offsets, at), at + w
}
if t.IsScalar() || t.IsPtrShaped() {
if t.IsComplex() || int(t.Size()) > types.RegSize { // complex and *int64 on 32-bit
s := w / 2
@ -521,11 +524,11 @@ func (state *assignState) allocateRegs(regs []RegIndex, t *types.Type) []RegInde
}
ri := state.rUsed.intRegs
rf := state.rUsed.floatRegs
if t.IsScalar() || t.IsPtrShaped() {
if t.IsScalar() || t.IsPtrShaped() || t.IsSIMD() {
if t.IsComplex() {
regs = append(regs, RegIndex(rf+state.rTotal.intRegs), RegIndex(rf+1+state.rTotal.intRegs))
rf += 2
} else if t.IsFloat() {
} else if t.IsFloat() || t.IsSIMD() {
regs = append(regs, RegIndex(rf+state.rTotal.intRegs))
rf += 1
} else {

File diff suppressed because it is too large Load diff

View file

@ -18,6 +18,7 @@ import (
"cmd/internal/obj"
"cmd/internal/obj/x86"
"internal/abi"
"internal/buildcfg"
)
// ssaMarkMoves marks any MOVXconst ops that need to avoid clobbering flags.
@ -43,11 +44,23 @@ func ssaMarkMoves(s *ssagen.State, b *ssa.Block) {
}
}
// loadByType returns the load instruction of the given type.
func loadByType(t *types.Type) obj.As {
// Avoid partial register write
if !t.IsFloat() {
switch t.Size() {
func isFPReg(r int16) bool {
return x86.REG_X0 <= r && r <= x86.REG_Z31
}
func isKReg(r int16) bool {
return x86.REG_K0 <= r && r <= x86.REG_K7
}
func isLowFPReg(r int16) bool {
return x86.REG_X0 <= r && r <= x86.REG_X15
}
// loadByRegWidth returns the load instruction of the given register of a given width.
func loadByRegWidth(r int16, width int64) obj.As {
// Avoid partial register write for GPR
if !isFPReg(r) && !isKReg(r) {
switch width {
case 1:
return x86.AMOVBLZX
case 2:
@ -55,20 +68,35 @@ func loadByType(t *types.Type) obj.As {
}
}
// Otherwise, there's no difference between load and store opcodes.
return storeByType(t)
return storeByRegWidth(r, width)
}
// storeByType returns the store instruction of the given type.
func storeByType(t *types.Type) obj.As {
width := t.Size()
if t.IsFloat() {
// storeByRegWidth returns the store instruction of the given register of a given width.
// It's also used for loading const to a reg.
func storeByRegWidth(r int16, width int64) obj.As {
if isFPReg(r) {
switch width {
case 4:
return x86.AMOVSS
case 8:
return x86.AMOVSD
}
case 16:
// int128s are in SSE registers
if isLowFPReg(r) {
return x86.AMOVUPS
} else {
return x86.AVMOVDQU
}
case 32:
return x86.AVMOVDQU
case 64:
return x86.AVMOVDQU64
}
}
if isKReg(r) {
return x86.AKMOVQ
}
// gp
switch width {
case 1:
return x86.AMOVB
@ -78,23 +106,35 @@ func storeByType(t *types.Type) obj.As {
return x86.AMOVL
case 8:
return x86.AMOVQ
case 16:
return x86.AMOVUPS
}
}
panic(fmt.Sprintf("bad store type %v", t))
panic(fmt.Sprintf("bad store reg=%v, width=%d", r, width))
}
// moveByType returns the reg->reg move instruction of the given type.
func moveByType(t *types.Type) obj.As {
if t.IsFloat() {
// moveByRegsWidth returns the reg->reg move instruction of the given dest/src registers of a given width.
func moveByRegsWidth(dest, src int16, width int64) obj.As {
// fp -> fp
if isFPReg(dest) && isFPReg(src) {
// Moving the whole sse2 register is faster
// than moving just the correct low portion of it.
// There is no xmm->xmm move with 1 byte opcode,
// so use movups, which has 2 byte opcode.
if isLowFPReg(dest) && isLowFPReg(src) && width <= 16 {
return x86.AMOVUPS
} else {
switch t.Size() {
}
if width <= 32 {
return x86.AVMOVDQU
}
return x86.AVMOVDQU64
}
// k -> gp, gp -> k, k -> k
if isKReg(dest) || isKReg(src) {
if isFPReg(dest) || isFPReg(src) {
panic(fmt.Sprintf("bad move, src=%v, dest=%v, width=%d", src, dest, width))
}
return x86.AKMOVQ
}
// gp -> fp, fp -> gp, gp -> gp
switch width {
case 1:
// Avoids partial register write
return x86.AMOVL
@ -105,11 +145,18 @@ func moveByType(t *types.Type) obj.As {
case 8:
return x86.AMOVQ
case 16:
return x86.AMOVUPS // int128s are in SSE registers
default:
panic(fmt.Sprintf("bad int register width %d:%v", t.Size(), t))
if isLowFPReg(dest) && isLowFPReg(src) {
// int128s are in SSE registers
return x86.AMOVUPS
} else {
return x86.AVMOVDQU
}
case 32:
return x86.AVMOVDQU
case 64:
return x86.AVMOVDQU64
}
panic(fmt.Sprintf("bad move, src=%v, dest=%v, width=%d", src, dest, width))
}
// opregreg emits instructions for
@ -605,7 +652,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
// But this requires a way for regalloc to know that SRC might be
// clobbered by this instruction.
t := v.RegTmp()
opregreg(s, moveByType(v.Type), t, v.Args[1].Reg())
opregreg(s, moveByRegsWidth(t, v.Args[1].Reg(), v.Type.Size()), t, v.Args[1].Reg())
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
@ -777,9 +824,14 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
p.From.Offset = v.AuxInt
p.To.Type = obj.TYPE_REG
p.To.Reg = x
case ssa.OpAMD64MOVSSconst, ssa.OpAMD64MOVSDconst:
x := v.Reg()
p := s.Prog(v.Op.Asm())
if !isFPReg(x) && v.AuxInt == 0 && v.Aux == nil {
opregreg(s, x86.AXORL, x, x)
break
}
p := s.Prog(storeByRegWidth(x, v.Type.Size()))
p.From.Type = obj.TYPE_FCONST
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
p.To.Type = obj.TYPE_REG
@ -1176,27 +1228,39 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
}
x := v.Args[0].Reg()
y := v.Reg()
if v.Type.IsSIMD() {
x = simdOrMaskReg(v.Args[0])
y = simdOrMaskReg(v)
}
if x != y {
opregreg(s, moveByType(v.Type), y, x)
opregreg(s, moveByRegsWidth(y, x, v.Type.Size()), y, x)
}
case ssa.OpLoadReg:
if v.Type.IsFlags() {
v.Fatalf("load flags not implemented: %v", v.LongString())
return
}
p := s.Prog(loadByType(v.Type))
r := v.Reg()
p := s.Prog(loadByRegWidth(r, v.Type.Size()))
ssagen.AddrAuto(&p.From, v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
if v.Type.IsSIMD() {
r = simdOrMaskReg(v)
}
p.To.Reg = r
case ssa.OpStoreReg:
if v.Type.IsFlags() {
v.Fatalf("store flags not implemented: %v", v.LongString())
return
}
p := s.Prog(storeByType(v.Type))
r := v.Args[0].Reg()
if v.Type.IsSIMD() {
r = simdOrMaskReg(v.Args[0])
}
p := s.Prog(storeByRegWidth(r, v.Type.Size()))
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.From.Reg = r
ssagen.AddrAuto(&p.To, v)
case ssa.OpAMD64LoweredHasCPUFeature:
p := s.Prog(x86.AMOVBLZX)
@ -1210,8 +1274,14 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
for _, ap := range v.Block.Func.RegArgs {
// Pass the spill/unspill information along to the assembler, offset by size of return PC pushed on stack.
addr := ssagen.SpillSlotAddr(ap, x86.REG_SP, v.Block.Func.Config.PtrSize)
reg := ap.Reg
t := ap.Type
sz := t.Size()
if t.IsSIMD() {
reg = simdRegBySize(reg, sz)
}
s.FuncInfo().AddSpill(
obj.RegSpill{Reg: ap.Reg, Addr: addr, Unspill: loadByType(ap.Type), Spill: storeByType(ap.Type)})
obj.RegSpill{Reg: reg, Addr: addr, Unspill: loadByRegWidth(reg, sz), Spill: storeByRegWidth(reg, sz)})
}
v.Block.Func.RegArgs = nil
ssagen.CheckArgReg(v)
@ -1227,7 +1297,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
case ssa.OpAMD64CALLstatic, ssa.OpAMD64CALLtail:
if s.ABI == obj.ABI0 && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABIInternal {
// zeroing X15 when entering ABIInternal from ABI0
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
zeroX15(s)
// set G register from TLS
getgFromTLS(s, x86.REG_R14)
}
@ -1238,7 +1308,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
s.Call(v)
if s.ABI == obj.ABIInternal && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABI0 {
// zeroing X15 when entering ABIInternal from ABI0
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
zeroX15(s)
// set G register from TLS
getgFromTLS(s, x86.REG_R14)
}
@ -1643,10 +1713,683 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
p.From.Offset = int64(x)
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
// SIMD ops
case ssa.OpAMD64VZEROUPPER, ssa.OpAMD64VZEROALL:
s.Prog(v.Op.Asm())
case ssa.OpAMD64Zero128, ssa.OpAMD64Zero256, ssa.OpAMD64Zero512: // no code emitted
case ssa.OpAMD64VMOVSSf2v, ssa.OpAMD64VMOVSDf2v:
// These are for initializing the least 32/64 bits of a SIMD register from a "float".
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.AddRestSourceReg(x86.REG_X15)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
case ssa.OpAMD64VMOVQload, ssa.OpAMD64VMOVDload,
ssa.OpAMD64VMOVSSload, ssa.OpAMD64VMOVSDload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
case ssa.OpAMD64VMOVSSconst, ssa.OpAMD64VMOVSDconst:
// for loading constants directly into SIMD registers
x := simdReg(v)
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_FCONST
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
p.To.Type = obj.TYPE_REG
p.To.Reg = x
case ssa.OpAMD64VMOVD, ssa.OpAMD64VMOVQ:
// These are for initializing the least 32/64 bits of a SIMD register from an "int".
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
case ssa.OpAMD64VMOVDQUload128, ssa.OpAMD64VMOVDQUload256, ssa.OpAMD64VMOVDQUload512,
ssa.OpAMD64KMOVBload, ssa.OpAMD64KMOVWload, ssa.OpAMD64KMOVDload, ssa.OpAMD64KMOVQload:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdOrMaskReg(v)
case ssa.OpAMD64VMOVDQUstore128, ssa.OpAMD64VMOVDQUstore256, ssa.OpAMD64VMOVDQUstore512,
ssa.OpAMD64KMOVBstore, ssa.OpAMD64KMOVWstore, ssa.OpAMD64KMOVDstore, ssa.OpAMD64KMOVQstore:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdOrMaskReg(v.Args[1])
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.To, v)
case ssa.OpAMD64VPMASK32load128, ssa.OpAMD64VPMASK64load128, ssa.OpAMD64VPMASK32load256, ssa.OpAMD64VPMASK64load256:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
p.AddRestSourceReg(simdReg(v.Args[1])) // masking simd reg
case ssa.OpAMD64VPMASK32store128, ssa.OpAMD64VPMASK64store128, ssa.OpAMD64VPMASK32store256, ssa.OpAMD64VPMASK64store256:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.To, v)
p.AddRestSourceReg(simdReg(v.Args[1])) // masking simd reg
case ssa.OpAMD64VPMASK64load512, ssa.OpAMD64VPMASK32load512, ssa.OpAMD64VPMASK16load512, ssa.OpAMD64VPMASK8load512:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
p.AddRestSourceReg(v.Args[1].Reg()) // simd mask reg
x86.ParseSuffix(p, "Z") // must be zero if not in mask
case ssa.OpAMD64VPMASK64store512, ssa.OpAMD64VPMASK32store512, ssa.OpAMD64VPMASK16store512, ssa.OpAMD64VPMASK8store512:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.To.Type = obj.TYPE_MEM
p.To.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.To, v)
p.AddRestSourceReg(v.Args[1].Reg()) // simd mask reg
case ssa.OpAMD64VPMOVMToVec8x16,
ssa.OpAMD64VPMOVMToVec8x32,
ssa.OpAMD64VPMOVMToVec8x64,
ssa.OpAMD64VPMOVMToVec16x8,
ssa.OpAMD64VPMOVMToVec16x16,
ssa.OpAMD64VPMOVMToVec16x32,
ssa.OpAMD64VPMOVMToVec32x4,
ssa.OpAMD64VPMOVMToVec32x8,
ssa.OpAMD64VPMOVMToVec32x16,
ssa.OpAMD64VPMOVMToVec64x2,
ssa.OpAMD64VPMOVMToVec64x4,
ssa.OpAMD64VPMOVMToVec64x8:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
case ssa.OpAMD64VPMOVVec8x16ToM,
ssa.OpAMD64VPMOVVec8x32ToM,
ssa.OpAMD64VPMOVVec8x64ToM,
ssa.OpAMD64VPMOVVec16x8ToM,
ssa.OpAMD64VPMOVVec16x16ToM,
ssa.OpAMD64VPMOVVec16x32ToM,
ssa.OpAMD64VPMOVVec32x4ToM,
ssa.OpAMD64VPMOVVec32x8ToM,
ssa.OpAMD64VPMOVVec32x16ToM,
ssa.OpAMD64VPMOVVec64x2ToM,
ssa.OpAMD64VPMOVVec64x4ToM,
ssa.OpAMD64VPMOVVec64x8ToM:
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpAMD64KMOVQk, ssa.OpAMD64KMOVDk, ssa.OpAMD64KMOVWk, ssa.OpAMD64KMOVBk,
ssa.OpAMD64KMOVQi, ssa.OpAMD64KMOVDi, ssa.OpAMD64KMOVWi, ssa.OpAMD64KMOVBi:
// See also ssa.OpAMD64KMOVQload
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[0].Reg()
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
case ssa.OpAMD64VPTEST:
// Some instructions setting flags put their second operand into the destination reg.
// See also CMP[BWDQ].
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v.Args[1])
default:
if !ssaGenSIMDValue(s, v) {
v.Fatalf("genValue not implemented: %s", v.LongString())
}
}
}
// zeroX15 zeroes the X15 register.
func zeroX15(s *ssagen.State) {
vxorps := func(s *ssagen.State) {
p := s.Prog(x86.AVXORPS)
p.From.Type = obj.TYPE_REG
p.From.Reg = x86.REG_X15
p.AddRestSourceReg(x86.REG_X15)
p.To.Type = obj.TYPE_REG
p.To.Reg = x86.REG_X15
}
if buildcfg.GOAMD64 >= 3 {
vxorps(s)
return
}
// AVX may not be available, check before zeroing the high bits.
p := s.Prog(x86.ACMPB)
p.From.Type = obj.TYPE_MEM
p.From.Name = obj.NAME_EXTERN
p.From.Sym = ir.Syms.X86HasAVX
p.To.Type = obj.TYPE_CONST
p.To.Offset = 1
jmp := s.Prog(x86.AJNE)
jmp.To.Type = obj.TYPE_BRANCH
vxorps(s)
sse := opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
jmp.To.SetTarget(sse)
}
// Example instruction: VRSQRTPS X1, X1
func simdV11(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[0])
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPSUBD X1, X2, X3
func simdV21(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
// Vector registers operands follows a right-to-left order.
// e.g. VPSUBD X1, X2, X3 means X3 = X2 - X1.
p.From.Reg = simdReg(v.Args[1])
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// This function is to accustomize the shifts.
// The 2nd arg is an XMM, and this function merely checks that.
// Example instruction: VPSLLQ Z1, X1, Z2
func simdVfpv(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
// Vector registers operands follows a right-to-left order.
// e.g. VPSUBD X1, X2, X3 means X3 = X2 - X1.
p.From.Reg = v.Args[1].Reg()
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPCMPEQW Z26, Z30, K4
func simdV2k(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[1])
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VPMINUQ X21, X3, K3, X31
func simdV2kv(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[1])
p.AddRestSourceReg(simdReg(v.Args[0]))
// These "simd*" series of functions assumes:
// Any "K" register that serves as the write-mask
// or "predicate" for "predicated AVX512 instructions"
// sits right at the end of the operand list.
// TODO: verify this assumption.
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPABSB X1, X2, K3 (masking merging)
func simdV2kvResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[1])
// These "simd*" series of functions assumes:
// Any "K" register that serves as the write-mask
// or "predicate" for "predicated AVX512 instructions"
// sits right at the end of the operand list.
// TODO: verify this assumption.
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// This function is to accustomize the shifts.
// The 2nd arg is an XMM, and this function merely checks that.
// Example instruction: VPSLLQ Z1, X1, K1, Z2
func simdVfpkv(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = v.Args[1].Reg()
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPCMPEQW Z26, Z30, K1, K4
func simdV2kk(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[1])
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VPOPCNTB X14, K4, X16
func simdVkv(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[0])
p.AddRestSourceReg(maskReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VROUNDPD $7, X2, X2
func simdV11Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VREDUCEPD $126, X1, K3, X31
func simdVkvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VCMPPS $7, X2, X9, X2
func simdV21Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPINSRB $3, DX, X0, X0
func simdVgpvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(v.Args[1].Reg())
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPCMPD $1, Z1, Z2, K1
func simdV2kImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VPCMPD $1, Z1, Z2, K2, K1
func simdV2kkImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
func simdV2kvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VFMADD213PD Z2, Z1, Z0
func simdV31ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.AddRestSourceReg(simdReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
func simdV31ResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[2]))
p.AddRestSourceReg(simdReg(v.Args[1]))
// p.AddRestSourceReg(x86.REG_K0)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// v31loadResultInArg0Imm8
// Example instruction:
// for (VPTERNLOGD128load {sym} [makeValAndOff(int32(int8(c)),off)] x y ptr mem)
func simdV31loadResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[2].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(simdReg(v.Args[1]))
return p
}
// Example instruction: VFMADD213PD Z2, Z1, K1, Z0
func simdV3kvResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(maskReg(v.Args[3]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
func simdVgpImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = v.Reg()
return p
}
// Currently unused
func simdV31(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Currently unused
func simdV3kv(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[2])
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[3]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VRCP14PS (DI), K6, X22
func simdVkvload(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(maskReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPSLLVD (DX), X7, X18
func simdV21load(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[1].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPDPWSSD (SI), X24, X18
func simdV31loadResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[2].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(simdReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPDPWSSD (SI), X24, K1, X18
func simdV3kvloadResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[2].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(simdReg(v.Args[1]))
p.AddRestSourceReg(maskReg(v.Args[3]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPSLLVD (SI), X1, K1, X2
func simdV2kvload(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[1].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPCMPEQD (SI), X1, K1
func simdV2kload(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[1].Reg()
ssagen.AddAux(&p.From, v)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VCVTTPS2DQ (BX), X2
func simdV11load(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_MEM
p.From.Reg = v.Args[0].Reg()
ssagen.AddAux(&p.From, v)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPSHUFD $7, (BX), X11
func simdV11loadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[0].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPRORD $81, -15(R14), K7, Y1
func simdVkvloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[0].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(maskReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VPSHLDD $82, 7(SI), Y21, Y3
func simdV21loadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: VCMPPS $81, -7(DI), Y16, K3
func simdV2kloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VCMPPS $81, -7(DI), Y16, K1, K3
func simdV2kkloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = maskReg(v)
return p
}
// Example instruction: VGF2P8AFFINEINVQB $64, -17(BP), X31, K3, X26
func simdV2kvloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
sc := v.AuxValAndOff()
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_CONST
p.From.Offset = sc.Val64()
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
ssagen.AddAux2(&m, v, sc.Off64())
p.AddRestSource(m)
p.AddRestSourceReg(simdReg(v.Args[0]))
p.AddRestSourceReg(maskReg(v.Args[2]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: SHA1NEXTE X2, X2
func simdV21ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Type = obj.TYPE_REG
p.From.Reg = simdReg(v.Args[1])
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: SHA1RNDS4 $1, X2, X2
func simdV21ResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
p := s.Prog(v.Op.Asm())
p.From.Offset = int64(v.AuxUInt8())
p.From.Type = obj.TYPE_CONST
p.AddRestSourceReg(simdReg(v.Args[1]))
p.To.Type = obj.TYPE_REG
p.To.Reg = simdReg(v)
return p
}
// Example instruction: SHA256RNDS2 X0, X11, X2
func simdV31x0AtIn2ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
return simdV31ResultInArg0(s, v)
}
var blockJump = [...]struct {
asm, invasm obj.As
@ -1732,7 +2475,7 @@ func ssaGenBlock(s *ssagen.State, b, next *ssa.Block) {
}
func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
p := s.Prog(loadByType(t))
p := s.Prog(loadByRegWidth(reg, t.Size()))
p.From.Type = obj.TYPE_MEM
p.From.Name = obj.NAME_AUTO
p.From.Sym = n.Linksym()
@ -1743,7 +2486,7 @@ func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir
}
func spillArgReg(pp *objw.Progs, p *obj.Prog, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
p = pp.Append(p, storeByType(t), obj.TYPE_REG, reg, 0, obj.TYPE_MEM, 0, n.FrameOffset()+off)
p = pp.Append(p, storeByRegWidth(reg, t.Size()), obj.TYPE_REG, reg, 0, obj.TYPE_MEM, 0, n.FrameOffset()+off)
p.To.Name = obj.NAME_PARAM
p.To.Sym = n.Linksym()
p.Pos = p.Pos.WithNotStmt()
@ -1778,3 +2521,58 @@ func move16(s *ssagen.State, src, dst, tmp int16, off int64) {
p.To.Reg = dst
p.To.Offset = off
}
// XXX maybe make this part of v.Reg?
// On the other hand, it is architecture-specific.
func simdReg(v *ssa.Value) int16 {
t := v.Type
if !t.IsSIMD() {
base.Fatalf("simdReg: not a simd type; v=%s, b=b%d, f=%s", v.LongString(), v.Block.ID, v.Block.Func.Name)
}
return simdRegBySize(v.Reg(), t.Size())
}
func simdRegBySize(reg int16, size int64) int16 {
switch size {
case 16:
return reg
case 32:
return reg + (x86.REG_Y0 - x86.REG_X0)
case 64:
return reg + (x86.REG_Z0 - x86.REG_X0)
}
panic("simdRegBySize: bad size")
}
// XXX k mask
func maskReg(v *ssa.Value) int16 {
t := v.Type
if !t.IsSIMD() {
base.Fatalf("maskReg: not a simd type; v=%s, b=b%d, f=%s", v.LongString(), v.Block.ID, v.Block.Func.Name)
}
switch t.Size() {
case 8:
return v.Reg()
}
panic("unreachable")
}
// XXX k mask + vec
func simdOrMaskReg(v *ssa.Value) int16 {
t := v.Type
if t.Size() <= 8 {
return maskReg(v)
}
return simdReg(v)
}
// XXX this is used for shift operations only.
// regalloc will issue OpCopy with incorrect type, but the assigned
// register should be correct, and this function is merely checking
// the sanity of this part.
func simdCheckRegOnly(v *ssa.Value, regStart, regEnd int16) int16 {
if v.Reg() > regEnd || v.Reg() < regStart {
panic("simdCheckRegOnly: not the desired register")
}
return v.Reg()
}

View file

@ -29,7 +29,7 @@ var (
compilequeue []*ir.Func // functions waiting to be compiled
)
func enqueueFunc(fn *ir.Func) {
func enqueueFunc(fn *ir.Func, symABIs *ssagen.SymABIs) {
if ir.CurFunc != nil {
base.FatalfAt(fn.Pos(), "enqueueFunc %v inside %v", fn, ir.CurFunc)
}
@ -49,6 +49,13 @@ func enqueueFunc(fn *ir.Func) {
}
if len(fn.Body) == 0 {
if ir.IsIntrinsicSym(fn.Sym()) && fn.Sym().Linkname == "" && !symABIs.HasDef(fn.Sym()) {
// Generate the function body for a bodyless intrinsic, in case it
// is used in a non-call context (e.g. as a function pointer).
// We skip functions defined in assembly, or has a linkname (which
// could be defined in another package).
ssagen.GenIntrinsicBody(fn)
} else {
// Initialize ABI wrappers if necessary.
ir.InitLSym(fn, false)
types.CalcSize(fn.Type())
@ -66,6 +73,7 @@ func enqueueFunc(fn *ir.Func) {
}
return
}
}
errorsBefore := base.Errors()

View file

@ -188,9 +188,9 @@ func Main(archInit func(*ssagen.ArchInfo)) {
ir.EscFmt = escape.Fmt
ir.IsIntrinsicCall = ssagen.IsIntrinsicCall
ir.IsIntrinsicSym = ssagen.IsIntrinsicSym
inline.SSADumpInline = ssagen.DumpInline
ssagen.InitEnv()
ssagen.InitTables()
types.PtrSize = ssagen.Arch.LinkArch.PtrSize
types.RegSize = ssagen.Arch.LinkArch.RegSize
@ -204,6 +204,11 @@ func Main(archInit func(*ssagen.ArchInfo)) {
typecheck.InitRuntime()
rttype.Init()
// Some intrinsics (notably, the simd intrinsics) mention
// types "eagerly", thus ssagen must be initialized AFTER
// the type system is ready.
ssagen.InitTables()
// Parse and typecheck input.
noder.LoadPackage(flag.Args())
@ -309,7 +314,7 @@ func Main(archInit func(*ssagen.ArchInfo)) {
}
if nextFunc < len(typecheck.Target.Funcs) {
enqueueFunc(typecheck.Target.Funcs[nextFunc])
enqueueFunc(typecheck.Target.Funcs[nextFunc], symABIs)
nextFunc++
continue
}

View file

@ -179,6 +179,25 @@ func CanInlineFuncs(funcs []*ir.Func, profile *pgoir.Profile) {
})
}
func simdCreditMultiplier(fn *ir.Func) int32 {
for _, field := range fn.Type().RecvParamsResults() {
if field.Type.IsSIMD() {
return 3
}
}
// Sometimes code uses closures, that do not take simd
// parameters, to perform repetitive SIMD operations.
// fn. These really need to be inlined, or the anticipated
// awesome SIMD performance will be missed.
for _, v := range fn.ClosureVars {
if v.Type().IsSIMD() {
return 11 // 11 ought to be enough.
}
}
return 1
}
// inlineBudget determines the max budget for function 'fn' prior to
// analyzing the hairiness of the body of 'fn'. We pass in the pgo
// profile if available (which can change the budget), also a
@ -186,9 +205,14 @@ func CanInlineFuncs(funcs []*ir.Func, profile *pgoir.Profile) {
// possibility that a call to the function might have its score
// adjusted downwards. If 'verbose' is set, then print a remark where
// we boost the budget due to PGO.
// Note that inlineCostOk has the final say on whether an inline will
// happen; changes here merely make inlines possible.
func inlineBudget(fn *ir.Func, profile *pgoir.Profile, relaxed bool, verbose bool) int32 {
// Update the budget for profile-guided inlining.
budget := int32(inlineMaxBudget)
budget *= simdCreditMultiplier(fn)
if IsPgoHotFunc(fn, profile) {
budget = inlineHotMaxBudget
if verbose {
@ -202,6 +226,7 @@ func inlineBudget(fn *ir.Func, profile *pgoir.Profile, relaxed bool, verbose boo
// be very liberal here, if the closure is only called once, the budget is large
budget = max(budget, inlineClosureCalledOnceCost)
}
return budget
}
@ -263,6 +288,7 @@ func CanInline(fn *ir.Func, profile *pgoir.Profile) {
visitor := hairyVisitor{
curFunc: fn,
debug: isDebugFn(fn),
isBigFunc: IsBigFunc(fn),
budget: budget,
maxBudget: budget,
@ -407,6 +433,7 @@ type hairyVisitor struct {
// This is needed to access the current caller in the doNode function.
curFunc *ir.Func
isBigFunc bool
debug bool
budget int32
maxBudget int32
reason string
@ -416,6 +443,16 @@ type hairyVisitor struct {
profile *pgoir.Profile
}
func isDebugFn(fn *ir.Func) bool {
// if n := fn.Nname; n != nil {
// if n.Sym().Name == "Int32x8.Transpose8" && n.Sym().Pkg.Path == "simd" {
// fmt.Printf("isDebugFn '%s' DOT '%s'\n", n.Sym().Pkg.Path, n.Sym().Name)
// return true
// }
// }
return false
}
func (v *hairyVisitor) tooHairy(fn *ir.Func) bool {
v.do = v.doNode // cache closure
if ir.DoChildren(fn, v.do) {
@ -434,6 +471,9 @@ func (v *hairyVisitor) doNode(n ir.Node) bool {
if n == nil {
return false
}
if v.debug {
fmt.Printf("%v: doNode %v budget is %d\n", ir.Line(n), n.Op(), v.budget)
}
opSwitch:
switch n.Op() {
// Call is okay if inlinable and we have the budget for the body.
@ -551,12 +591,19 @@ opSwitch:
}
if cheap {
if v.debug {
if ir.IsIntrinsicCall(n) {
fmt.Printf("%v: cheap call is also intrinsic, %v\n", ir.Line(n), n)
}
}
break // treat like any other node, that is, cost of 1
}
if ir.IsIntrinsicCall(n) {
// Treat like any other node.
break
if v.debug {
fmt.Printf("%v: intrinsic call, %v\n", ir.Line(n), n)
}
break // Treat like any other node.
}
if callee := inlCallee(v.curFunc, n.Fun, v.profile, false); callee != nil && typecheck.HaveInlineBody(callee) {
@ -583,6 +630,10 @@ opSwitch:
}
}
if v.debug {
fmt.Printf("%v: costly OCALLFUNC %v\n", ir.Line(n), n)
}
// Call cost for non-leaf inlining.
v.budget -= extraCost
@ -592,6 +643,9 @@ opSwitch:
// Things that are too hairy, irrespective of the budget
case ir.OCALL, ir.OCALLINTER:
// Call cost for non-leaf inlining.
if v.debug {
fmt.Printf("%v: costly OCALL %v\n", ir.Line(n), n)
}
v.budget -= v.extraCallCost
case ir.OPANIC:
@ -754,7 +808,7 @@ opSwitch:
v.budget--
// When debugging, don't stop early, to get full cost of inlining this function
if v.budget < 0 && base.Flag.LowerM < 2 && !logopt.Enabled() {
if v.budget < 0 && base.Flag.LowerM < 2 && !logopt.Enabled() && !v.debug {
v.reason = "too expensive"
return true
}
@ -914,6 +968,8 @@ func inlineCostOK(n *ir.CallExpr, caller, callee *ir.Func, bigCaller, closureCal
maxCost = inlineBigFunctionMaxCost
}
simdMaxCost := simdCreditMultiplier(callee) * maxCost
if callee.ClosureParent != nil {
maxCost *= 2 // favor inlining closures
if closureCalledOnce { // really favor inlining the one call to this closure
@ -921,6 +977,8 @@ func inlineCostOK(n *ir.CallExpr, caller, callee *ir.Func, bigCaller, closureCal
}
}
maxCost = max(maxCost, simdMaxCost)
metric := callee.Inl.Cost
if inlheur.Enabled() {
score, ok := inlheur.GetCallSiteScore(caller, n)

View file

@ -1031,6 +1031,9 @@ func StaticCalleeName(n Node) *Name {
// IsIntrinsicCall reports whether the compiler back end will treat the call as an intrinsic operation.
var IsIntrinsicCall = func(*CallExpr) bool { return false }
// IsIntrinsicSym reports whether the compiler back end will treat a call to this symbol as an intrinsic operation.
var IsIntrinsicSym = func(*types.Sym) bool { return false }
// SameSafeExpr checks whether it is safe to reuse one of l and r
// instead of computing both. SameSafeExpr assumes that l and r are
// used in the same statement or expression. In order for it to be
@ -1149,6 +1152,14 @@ func ParamNames(ft *types.Type) []Node {
return args
}
func RecvParamNames(ft *types.Type) []Node {
args := make([]Node, ft.NumRecvs()+ft.NumParams())
for i, f := range ft.RecvParams() {
args[i] = f.Nname.(*Name)
}
return args
}
// MethodSym returns the method symbol representing a method name
// associated with a specific receiver type.
//

View file

@ -53,6 +53,7 @@ type symsStruct struct {
PanicdottypeI *obj.LSym
Panicnildottype *obj.LSym
Panicoverflow *obj.LSym
PanicSimdImm *obj.LSym
Racefuncenter *obj.LSym
Racefuncexit *obj.LSym
Raceread *obj.LSym
@ -76,6 +77,7 @@ type symsStruct struct {
Loong64HasLAM_BH *obj.LSym
Loong64HasLSX *obj.LSym
RISCV64HasZbb *obj.LSym
X86HasAVX *obj.LSym
X86HasFMA *obj.LSym
X86HasPOPCNT *obj.LSym
X86HasSSE41 *obj.LSym

View file

@ -1534,6 +1534,9 @@ func isfat(t *types.Type) bool {
}
return true
case types.TSTRUCT:
if t.IsSIMD() {
return false
}
// Struct with 1 field, check if field is fat
if t.NumFields() == 1 {
return isfat(t.Field(0).Type)

View file

@ -1657,3 +1657,171 @@
// If we don't use the flags any more, just use the standard op.
(Select0 a:(ADD(Q|L)constflags [c] x)) && a.Uses == 1 => (ADD(Q|L)const [c] x)
// SIMD lowering rules
// Mask conversions
// integers to masks
(Cvt16toMask8x16 <t> x) => (VPMOVMToVec8x16 <types.TypeVec128> (KMOVWk <t> x))
(Cvt32toMask8x32 <t> x) => (VPMOVMToVec8x32 <types.TypeVec256> (KMOVDk <t> x))
(Cvt64toMask8x64 <t> x) => (VPMOVMToVec8x64 <types.TypeVec512> (KMOVQk <t> x))
(Cvt8toMask16x8 <t> x) => (VPMOVMToVec16x8 <types.TypeVec128> (KMOVBk <t> x))
(Cvt16toMask16x16 <t> x) => (VPMOVMToVec16x16 <types.TypeVec256> (KMOVWk <t> x))
(Cvt32toMask16x32 <t> x) => (VPMOVMToVec16x32 <types.TypeVec512> (KMOVDk <t> x))
(Cvt8toMask32x4 <t> x) => (VPMOVMToVec32x4 <types.TypeVec128> (KMOVBk <t> x))
(Cvt8toMask32x8 <t> x) => (VPMOVMToVec32x8 <types.TypeVec256> (KMOVBk <t> x))
(Cvt16toMask32x16 <t> x) => (VPMOVMToVec32x16 <types.TypeVec512> (KMOVWk <t> x))
(Cvt8toMask64x2 <t> x) => (VPMOVMToVec64x2 <types.TypeVec128> (KMOVBk <t> x))
(Cvt8toMask64x4 <t> x) => (VPMOVMToVec64x4 <types.TypeVec256> (KMOVBk <t> x))
(Cvt8toMask64x8 <t> x) => (VPMOVMToVec64x8 <types.TypeVec512> (KMOVBk <t> x))
// masks to integers
(CvtMask8x16to16 <t> x) => (KMOVWi <t> (VPMOVVec8x16ToM <types.TypeMask> x))
(CvtMask8x32to32 <t> x) => (KMOVDi <t> (VPMOVVec8x32ToM <types.TypeMask> x))
(CvtMask8x64to64 <t> x) => (KMOVQi <t> (VPMOVVec8x64ToM <types.TypeMask> x))
(CvtMask16x8to8 <t> x) => (KMOVBi <t> (VPMOVVec16x8ToM <types.TypeMask> x))
(CvtMask16x16to16 <t> x) => (KMOVWi <t> (VPMOVVec16x16ToM <types.TypeMask> x))
(CvtMask16x32to32 <t> x) => (KMOVDi <t> (VPMOVVec16x32ToM <types.TypeMask> x))
(CvtMask32x4to8 <t> x) => (KMOVBi <t> (VPMOVVec32x4ToM <types.TypeMask> x))
(CvtMask32x8to8 <t> x) => (KMOVBi <t> (VPMOVVec32x8ToM <types.TypeMask> x))
(CvtMask32x16to16 <t> x) => (KMOVWi <t> (VPMOVVec32x16ToM <types.TypeMask> x))
(CvtMask64x2to8 <t> x) => (KMOVBi <t> (VPMOVVec64x2ToM <types.TypeMask> x))
(CvtMask64x4to8 <t> x) => (KMOVBi <t> (VPMOVVec64x4ToM <types.TypeMask> x))
(CvtMask64x8to8 <t> x) => (KMOVBi <t> (VPMOVVec64x8ToM <types.TypeMask> x))
// optimizations
(MOVBstore [off] {sym} ptr (KMOVBi mask) mem) => (KMOVBstore [off] {sym} ptr mask mem)
(MOVWstore [off] {sym} ptr (KMOVWi mask) mem) => (KMOVWstore [off] {sym} ptr mask mem)
(MOVLstore [off] {sym} ptr (KMOVDi mask) mem) => (KMOVDstore [off] {sym} ptr mask mem)
(MOVQstore [off] {sym} ptr (KMOVQi mask) mem) => (KMOVQstore [off] {sym} ptr mask mem)
(KMOVBk l:(MOVBload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVBload [off] {sym} ptr mem)
(KMOVWk l:(MOVWload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVWload [off] {sym} ptr mem)
(KMOVDk l:(MOVLload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVDload [off] {sym} ptr mem)
(KMOVQk l:(MOVQload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVQload [off] {sym} ptr mem)
// SIMD vector loads and stores
(Load <t> ptr mem) && t.Size() == 16 => (VMOVDQUload128 ptr mem)
(Store {t} ptr val mem) && t.Size() == 16 => (VMOVDQUstore128 ptr val mem)
(Load <t> ptr mem) && t.Size() == 32 => (VMOVDQUload256 ptr mem)
(Store {t} ptr val mem) && t.Size() == 32 => (VMOVDQUstore256 ptr val mem)
(Load <t> ptr mem) && t.Size() == 64 => (VMOVDQUload512 ptr mem)
(Store {t} ptr val mem) && t.Size() == 64 => (VMOVDQUstore512 ptr val mem)
// SIMD vector integer-vector-masked loads and stores.
(LoadMasked32 <t> ptr mask mem) && t.Size() == 16 => (VPMASK32load128 ptr mask mem)
(LoadMasked32 <t> ptr mask mem) && t.Size() == 32 => (VPMASK32load256 ptr mask mem)
(LoadMasked64 <t> ptr mask mem) && t.Size() == 16 => (VPMASK64load128 ptr mask mem)
(LoadMasked64 <t> ptr mask mem) && t.Size() == 32 => (VPMASK64load256 ptr mask mem)
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 16 => (VPMASK32store128 ptr mask val mem)
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 32 => (VPMASK32store256 ptr mask val mem)
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 16 => (VPMASK64store128 ptr mask val mem)
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 32 => (VPMASK64store256 ptr mask val mem)
// Misc
(IsZeroVec x) => (SETEQ (VPTEST x x))
// SIMD vector K-masked loads and stores
(LoadMasked64 <t> ptr mask mem) && t.Size() == 64 => (VPMASK64load512 ptr (VPMOVVec64x8ToM <types.TypeMask> mask) mem)
(LoadMasked32 <t> ptr mask mem) && t.Size() == 64 => (VPMASK32load512 ptr (VPMOVVec32x16ToM <types.TypeMask> mask) mem)
(LoadMasked16 <t> ptr mask mem) && t.Size() == 64 => (VPMASK16load512 ptr (VPMOVVec16x32ToM <types.TypeMask> mask) mem)
(LoadMasked8 <t> ptr mask mem) && t.Size() == 64 => (VPMASK8load512 ptr (VPMOVVec8x64ToM <types.TypeMask> mask) mem)
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK64store512 ptr (VPMOVVec64x8ToM <types.TypeMask> mask) val mem)
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK32store512 ptr (VPMOVVec32x16ToM <types.TypeMask> mask) val mem)
(StoreMasked16 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK16store512 ptr (VPMOVVec16x32ToM <types.TypeMask> mask) val mem)
(StoreMasked8 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK8store512 ptr (VPMOVVec8x64ToM <types.TypeMask> mask) val mem)
(ZeroSIMD <t>) && t.Size() == 16 => (Zero128 <t>)
(ZeroSIMD <t>) && t.Size() == 32 => (Zero256 <t>)
(ZeroSIMD <t>) && t.Size() == 64 => (Zero512 <t>)
(VPMOVVec8x16ToM (VPMOVMToVec8x16 x)) => x
(VPMOVVec8x32ToM (VPMOVMToVec8x32 x)) => x
(VPMOVVec8x64ToM (VPMOVMToVec8x64 x)) => x
(VPMOVVec16x8ToM (VPMOVMToVec16x8 x)) => x
(VPMOVVec16x16ToM (VPMOVMToVec16x16 x)) => x
(VPMOVVec16x32ToM (VPMOVMToVec16x32 x)) => x
(VPMOVVec32x4ToM (VPMOVMToVec32x4 x)) => x
(VPMOVVec32x8ToM (VPMOVMToVec32x8 x)) => x
(VPMOVVec32x16ToM (VPMOVMToVec32x16 x)) => x
(VPMOVVec64x2ToM (VPMOVMToVec64x2 x)) => x
(VPMOVVec64x4ToM (VPMOVMToVec64x4 x)) => x
(VPMOVVec64x8ToM (VPMOVMToVec64x8 x)) => x
(VPANDQ512 x (VPMOVMToVec64x8 k)) => (VMOVDQU64Masked512 x k)
(VPANDQ512 x (VPMOVMToVec32x16 k)) => (VMOVDQU32Masked512 x k)
(VPANDQ512 x (VPMOVMToVec16x32 k)) => (VMOVDQU16Masked512 x k)
(VPANDQ512 x (VPMOVMToVec8x64 k)) => (VMOVDQU8Masked512 x k)
(VPANDD512 x (VPMOVMToVec64x8 k)) => (VMOVDQU64Masked512 x k)
(VPANDD512 x (VPMOVMToVec32x16 k)) => (VMOVDQU32Masked512 x k)
(VPANDD512 x (VPMOVMToVec16x32 k)) => (VMOVDQU16Masked512 x k)
(VPANDD512 x (VPMOVMToVec8x64 k)) => (VMOVDQU8Masked512 x k)
(VPAND128 x (VPMOVMToVec8x16 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU8Masked128 x k)
(VPAND128 x (VPMOVMToVec16x8 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU16Masked128 x k)
(VPAND128 x (VPMOVMToVec32x4 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU32Masked128 x k)
(VPAND128 x (VPMOVMToVec64x2 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU64Masked128 x k)
(VPAND256 x (VPMOVMToVec8x32 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU8Masked256 x k)
(VPAND256 x (VPMOVMToVec16x16 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU16Masked256 x k)
(VPAND256 x (VPMOVMToVec32x8 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU32Masked256 x k)
(VPAND256 x (VPMOVMToVec64x4 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU64Masked256 x k)
// Insert to zero of 32/64 bit floats and ints to a zero is just MOVS[SD]
(VPINSRQ128 [0] (Zero128 <t>) y) && y.Type.IsFloat() => (VMOVSDf2v <types.TypeVec128> y)
(VPINSRD128 [0] (Zero128 <t>) y) && y.Type.IsFloat() => (VMOVSSf2v <types.TypeVec128> y)
(VPINSRQ128 [0] (Zero128 <t>) y) && !y.Type.IsFloat() => (VMOVQ <types.TypeVec128> y)
(VPINSRD128 [0] (Zero128 <t>) y) && !y.Type.IsFloat() => (VMOVD <types.TypeVec128> y)
// These rewrites can skip zero-extending the 8/16-bit inputs because they are
// only used as the input to a broadcast; the potentially "bad" bits are ignored
(VPBROADCASTB(128|256|512) x:(VPINSRB128 [0] (Zero128 <t>) y)) && x.Uses == 1 =>
(VPBROADCASTB(128|256|512) (VMOVQ <types.TypeVec128> y))
(VPBROADCASTW(128|256|512) x:(VPINSRW128 [0] (Zero128 <t>) y)) && x.Uses == 1 =>
(VPBROADCASTW(128|256|512) (VMOVQ <types.TypeVec128> y))
(VMOVQ x:(MOVQload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVQload <v.Type> [off] {sym} ptr mem)
(VMOVD x:(MOVLload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVDload <v.Type> [off] {sym} ptr mem)
(VMOVSDf2v x:(MOVSDload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVSDload <v.Type> [off] {sym} ptr mem)
(VMOVSSf2v x:(MOVSSload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVSSload <v.Type> [off] {sym} ptr mem)
(VMOVSDf2v x:(MOVSDconst [c] )) => (VMOVSDconst [c] )
(VMOVSSf2v x:(MOVSSconst [c] )) => (VMOVSSconst [c] )
(VMOVDQUload(128|256|512) [off1] {sym} x:(ADDQconst [off2] ptr) mem) && is32Bit(int64(off1)+int64(off2)) => (VMOVDQUload(128|256|512) [off1+off2] {sym} ptr mem)
(VMOVDQUstore(128|256|512) [off1] {sym} x:(ADDQconst [off2] ptr) val mem) && is32Bit(int64(off1)+int64(off2)) => (VMOVDQUstore(128|256|512) [off1+off2] {sym} ptr val mem)
(VMOVDQUload(128|256|512) [off1] {sym1} x:(LEAQ [off2] {sym2} base) mem) && is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) => (VMOVDQUload(128|256|512) [off1+off2] {mergeSym(sym1, sym2)} base mem)
(VMOVDQUstore(128|256|512) [off1] {sym1} x:(LEAQ [off2] {sym2} base) val mem) && is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) => (VMOVDQUstore(128|256|512) [off1+off2] {mergeSym(sym1, sym2)} base val mem)
// 2-op VPTEST optimizations
(SETEQ (VPTEST x:(VPAND(128|256) j k) y)) && x == y && x.Uses == 2 => (SETEQ (VPTEST j k))
(SETEQ (VPTEST x:(VPAND(D|Q)512 j k) y)) && x == y && x.Uses == 2 => (SETEQ (VPTEST j k))
(SETEQ (VPTEST x:(VPANDN(128|256) j k) y)) && x == y && x.Uses == 2 => (SETB (VPTEST k j)) // AndNot has swapped its operand order
(SETEQ (VPTEST x:(VPANDN(D|Q)512 j k) y)) && x == y && x.Uses == 2 => (SETB (VPTEST k j)) // AndNot has swapped its operand order
(EQ (VPTEST x:(VPAND(128|256) j k) y) yes no) && x == y && x.Uses == 2 => (EQ (VPTEST j k) yes no)
(EQ (VPTEST x:(VPAND(D|Q)512 j k) y) yes no) && x == y && x.Uses == 2 => (EQ (VPTEST j k) yes no)
(EQ (VPTEST x:(VPANDN(128|256) j k) y) yes no) && x == y && x.Uses == 2 => (ULT (VPTEST k j) yes no) // AndNot has swapped its operand order
(EQ (VPTEST x:(VPANDN(D|Q)512 j k) y) yes no) && x == y && x.Uses == 2 => (ULT (VPTEST k j) yes no) // AndNot has swapped its operand order
// DotProductQuadruple optimizations
(VPADDD128 (VPDPBUSD128 (Zero128 <t>) x y) z) => (VPDPBUSD128 <t> z x y)
(VPADDD256 (VPDPBUSD256 (Zero256 <t>) x y) z) => (VPDPBUSD256 <t> z x y)
(VPADDD512 (VPDPBUSD512 (Zero512 <t>) x y) z) => (VPDPBUSD512 <t> z x y)
(VPADDD128 (VPDPBUSDS128 (Zero128 <t>) x y) z) => (VPDPBUSDS128 <t> z x y)
(VPADDD256 (VPDPBUSDS256 (Zero256 <t>) x y) z) => (VPDPBUSDS256 <t> z x y)
(VPADDD512 (VPDPBUSDS512 (Zero512 <t>) x y) z) => (VPDPBUSDS512 <t> z x y)

View file

@ -62,7 +62,33 @@ var regNamesAMD64 = []string{
"X13",
"X14",
"X15", // constant 0 in ABIInternal
"X16",
"X17",
"X18",
"X19",
"X20",
"X21",
"X22",
"X23",
"X24",
"X25",
"X26",
"X27",
"X28",
"X29",
"X30",
"X31",
// TODO: update asyncPreempt for K registers.
// asyncPreempt also needs to store Z0-Z15 properly.
"K0",
"K1",
"K2",
"K3",
"K4",
"K5",
"K6",
"K7",
// If you add registers, update asyncPreempt in runtime
// pseudo-registers
@ -98,16 +124,28 @@ func init() {
gp = buildReg("AX CX DX BX BP SI DI R8 R9 R10 R11 R12 R13 R15")
g = buildReg("g")
fp = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14")
v = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14")
w = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31")
x15 = buildReg("X15")
mask = buildReg("K1 K2 K3 K4 K5 K6 K7")
gpsp = gp | buildReg("SP")
gpspsb = gpsp | buildReg("SB")
gpspsbg = gpspsb | g
callerSave = gp | fp | g // runtime.setg (and anything calling it) may clobber g
vz = v | x15
wz = w | x15
x0 = buildReg("X0")
)
// Common slices of register masks
var (
gponly = []regMask{gp}
fponly = []regMask{fp}
vonly = []regMask{v}
wonly = []regMask{w}
maskonly = []regMask{mask}
vzonly = []regMask{vz}
wzonly = []regMask{wz}
)
// Common regInfo
@ -170,6 +208,67 @@ func init() {
fpstore = regInfo{inputs: []regMask{gpspsb, fp, 0}}
fpstoreidx = regInfo{inputs: []regMask{gpspsb, gpsp, fp, 0}}
// masked loads/stores, vector register or mask register
vloadv = regInfo{inputs: []regMask{gpspsb, v, 0}, outputs: vonly}
vstorev = regInfo{inputs: []regMask{gpspsb, v, v, 0}}
vloadk = regInfo{inputs: []regMask{gpspsb, mask, 0}, outputs: vonly}
vstorek = regInfo{inputs: []regMask{gpspsb, mask, v, 0}}
v11 = regInfo{inputs: vonly, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
v21 = regInfo{inputs: []regMask{v, vz}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
vk = regInfo{inputs: vzonly, outputs: maskonly}
kv = regInfo{inputs: maskonly, outputs: vonly}
v2k = regInfo{inputs: []regMask{vz, vz}, outputs: maskonly}
vkv = regInfo{inputs: []regMask{vz, mask}, outputs: vonly}
v2kv = regInfo{inputs: []regMask{vz, vz, mask}, outputs: vonly}
v2kk = regInfo{inputs: []regMask{vz, vz, mask}, outputs: maskonly}
v31 = regInfo{inputs: []regMask{v, vz, vz}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
v3kv = regInfo{inputs: []regMask{v, vz, vz, mask}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
vgpv = regInfo{inputs: []regMask{vz, gp}, outputs: vonly}
vgp = regInfo{inputs: vonly, outputs: gponly}
vfpv = regInfo{inputs: []regMask{vz, fp}, outputs: vonly}
vfpkv = regInfo{inputs: []regMask{vz, fp, mask}, outputs: vonly}
fpv = regInfo{inputs: []regMask{fp}, outputs: vonly}
gpv = regInfo{inputs: []regMask{gp}, outputs: vonly}
v2flags = regInfo{inputs: []regMask{vz, vz}}
w11 = regInfo{inputs: wonly, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
w21 = regInfo{inputs: []regMask{wz, wz}, outputs: wonly}
wk = regInfo{inputs: wzonly, outputs: maskonly}
kw = regInfo{inputs: maskonly, outputs: wonly}
w2k = regInfo{inputs: []regMask{wz, wz}, outputs: maskonly}
wkw = regInfo{inputs: []regMask{wz, mask}, outputs: wonly}
w2kw = regInfo{inputs: []regMask{w, wz, mask}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
w2kk = regInfo{inputs: []regMask{wz, wz, mask}, outputs: maskonly}
w31 = regInfo{inputs: []regMask{w, wz, wz}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
w3kw = regInfo{inputs: []regMask{w, wz, wz, mask}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
wgpw = regInfo{inputs: []regMask{wz, gp}, outputs: wonly}
wgp = regInfo{inputs: wzonly, outputs: gponly}
wfpw = regInfo{inputs: []regMask{wz, fp}, outputs: wonly}
wfpkw = regInfo{inputs: []regMask{wz, fp, mask}, outputs: wonly}
// These register masks are used by SIMD only, they follow the pattern:
// Mem last, k mask second to last (if any), address right before mem and k mask.
wkwload = regInfo{inputs: []regMask{gpspsb, mask, 0}, outputs: wonly}
v21load = regInfo{inputs: []regMask{v, gpspsb, 0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
v31load = regInfo{inputs: []regMask{v, vz, gpspsb, 0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
v11load = regInfo{inputs: []regMask{gpspsb, 0}, outputs: vonly}
w21load = regInfo{inputs: []regMask{wz, gpspsb, 0}, outputs: wonly}
w31load = regInfo{inputs: []regMask{w, wz, gpspsb, 0}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
w2kload = regInfo{inputs: []regMask{wz, gpspsb, 0}, outputs: maskonly}
w2kwload = regInfo{inputs: []regMask{wz, gpspsb, mask, 0}, outputs: wonly}
w11load = regInfo{inputs: []regMask{gpspsb, 0}, outputs: wonly}
w3kwload = regInfo{inputs: []regMask{w, wz, gpspsb, mask, 0}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
w2kkload = regInfo{inputs: []regMask{wz, gpspsb, mask, 0}, outputs: maskonly}
v31x0AtIn2 = regInfo{inputs: []regMask{v, vz, x0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
kload = regInfo{inputs: []regMask{gpspsb, 0}, outputs: maskonly}
kstore = regInfo{inputs: []regMask{gpspsb, mask, 0}}
gpk = regInfo{inputs: gponly, outputs: maskonly}
kgp = regInfo{inputs: maskonly, outputs: gponly}
x15only = regInfo{inputs: nil, outputs: []regMask{x15}}
prefreg = regInfo{inputs: []regMask{gpspsbg}}
)
@ -1235,6 +1334,118 @@ func init() {
//
// output[i] = (input[i] >> 7) & 1
{name: "PMOVMSKB", argLength: 1, reg: fpgp, asm: "PMOVMSKB"},
// SIMD ops
{name: "VMOVDQUload128", argLength: 2, reg: fpload, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
{name: "VMOVDQUstore128", argLength: 3, reg: fpstore, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
{name: "VMOVDQUload256", argLength: 2, reg: fpload, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
{name: "VMOVDQUstore256", argLength: 3, reg: fpstore, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
{name: "VMOVDQUload512", argLength: 2, reg: fpload, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
{name: "VMOVDQUstore512", argLength: 3, reg: fpstore, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
// AVX2 32 and 64-bit element int-vector masked moves.
{name: "VPMASK32load128", argLength: 3, reg: vloadv, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
{name: "VPMASK32store128", argLength: 4, reg: vstorev, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
{name: "VPMASK64load128", argLength: 3, reg: vloadv, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
{name: "VPMASK64store128", argLength: 4, reg: vstorev, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
{name: "VPMASK32load256", argLength: 3, reg: vloadv, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
{name: "VPMASK32store256", argLength: 4, reg: vstorev, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
{name: "VPMASK64load256", argLength: 3, reg: vloadv, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
{name: "VPMASK64store256", argLength: 4, reg: vstorev, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
// AVX512 8-64-bit element mask-register masked moves
{name: "VPMASK8load512", argLength: 3, reg: vloadk, asm: "VMOVDQU8", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
{name: "VPMASK8store512", argLength: 4, reg: vstorek, asm: "VMOVDQU8", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
{name: "VPMASK16load512", argLength: 3, reg: vloadk, asm: "VMOVDQU16", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
{name: "VPMASK16store512", argLength: 4, reg: vstorek, asm: "VMOVDQU16", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
{name: "VPMASK32load512", argLength: 3, reg: vloadk, asm: "VMOVDQU32", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
{name: "VPMASK32store512", argLength: 4, reg: vstorek, asm: "VMOVDQU32", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
{name: "VPMASK64load512", argLength: 3, reg: vloadk, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
{name: "VPMASK64store512", argLength: 4, reg: vstorek, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
{name: "VPMOVMToVec8x16", argLength: 1, reg: kv, asm: "VPMOVM2B"},
{name: "VPMOVMToVec8x32", argLength: 1, reg: kv, asm: "VPMOVM2B"},
{name: "VPMOVMToVec8x64", argLength: 1, reg: kw, asm: "VPMOVM2B"},
{name: "VPMOVMToVec16x8", argLength: 1, reg: kv, asm: "VPMOVM2W"},
{name: "VPMOVMToVec16x16", argLength: 1, reg: kv, asm: "VPMOVM2W"},
{name: "VPMOVMToVec16x32", argLength: 1, reg: kw, asm: "VPMOVM2W"},
{name: "VPMOVMToVec32x4", argLength: 1, reg: kv, asm: "VPMOVM2D"},
{name: "VPMOVMToVec32x8", argLength: 1, reg: kv, asm: "VPMOVM2D"},
{name: "VPMOVMToVec32x16", argLength: 1, reg: kw, asm: "VPMOVM2D"},
{name: "VPMOVMToVec64x2", argLength: 1, reg: kv, asm: "VPMOVM2Q"},
{name: "VPMOVMToVec64x4", argLength: 1, reg: kv, asm: "VPMOVM2Q"},
{name: "VPMOVMToVec64x8", argLength: 1, reg: kw, asm: "VPMOVM2Q"},
{name: "VPMOVVec8x16ToM", argLength: 1, reg: vk, asm: "VPMOVB2M"},
{name: "VPMOVVec8x32ToM", argLength: 1, reg: vk, asm: "VPMOVB2M"},
{name: "VPMOVVec8x64ToM", argLength: 1, reg: wk, asm: "VPMOVB2M"},
{name: "VPMOVVec16x8ToM", argLength: 1, reg: vk, asm: "VPMOVW2M"},
{name: "VPMOVVec16x16ToM", argLength: 1, reg: vk, asm: "VPMOVW2M"},
{name: "VPMOVVec16x32ToM", argLength: 1, reg: wk, asm: "VPMOVW2M"},
{name: "VPMOVVec32x4ToM", argLength: 1, reg: vk, asm: "VPMOVD2M"},
{name: "VPMOVVec32x8ToM", argLength: 1, reg: vk, asm: "VPMOVD2M"},
{name: "VPMOVVec32x16ToM", argLength: 1, reg: wk, asm: "VPMOVD2M"},
{name: "VPMOVVec64x2ToM", argLength: 1, reg: vk, asm: "VPMOVQ2M"},
{name: "VPMOVVec64x4ToM", argLength: 1, reg: vk, asm: "VPMOVQ2M"},
{name: "VPMOVVec64x8ToM", argLength: 1, reg: wk, asm: "VPMOVQ2M"},
{name: "Zero128", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
{name: "Zero256", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
{name: "Zero512", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
{name: "VMOVSDf2v", argLength: 1, reg: fpv, asm: "VMOVSD"},
{name: "VMOVSSf2v", argLength: 1, reg: fpv, asm: "VMOVSS"},
{name: "VMOVQ", argLength: 1, reg: gpv, asm: "VMOVQ"},
{name: "VMOVD", argLength: 1, reg: gpv, asm: "VMOVD"},
{name: "VMOVQload", argLength: 2, reg: fpload, asm: "VMOVQ", aux: "SymOff", typ: "UInt64", faultOnNilArg0: true, symEffect: "Read"},
{name: "VMOVDload", argLength: 2, reg: fpload, asm: "VMOVD", aux: "SymOff", typ: "UInt32", faultOnNilArg0: true, symEffect: "Read"},
{name: "VMOVSSload", argLength: 2, reg: fpload, asm: "VMOVSS", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
{name: "VMOVSDload", argLength: 2, reg: fpload, asm: "VMOVSD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
{name: "VMOVSSconst", reg: fp01, asm: "VMOVSS", aux: "Float32", rematerializeable: true},
{name: "VMOVSDconst", reg: fp01, asm: "VMOVSD", aux: "Float64", rematerializeable: true},
{name: "VZEROUPPER", argLength: 1, reg: regInfo{clobbers: v}, asm: "VZEROUPPER"}, // arg=mem, returns mem
{name: "VZEROALL", argLength: 1, reg: regInfo{clobbers: v}, asm: "VZEROALL"}, // arg=mem, returns mem
// KMOVxload: loads masks
// Load (Q=8,D=4,W=2,B=1) bytes from (arg0+auxint+aux), arg1=mem.
// "+auxint+aux" == add auxint and the offset of the symbol in aux (if any) to the effective address
{name: "KMOVBload", argLength: 2, reg: kload, asm: "KMOVB", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
{name: "KMOVWload", argLength: 2, reg: kload, asm: "KMOVW", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
{name: "KMOVDload", argLength: 2, reg: kload, asm: "KMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
{name: "KMOVQload", argLength: 2, reg: kload, asm: "KMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
// KMOVxstore: stores masks
// Store (Q=8,D=4,W=2,B=1) low bytes of arg1.
// Does *(arg0+auxint+aux) = arg1, arg2=mem.
{name: "KMOVBstore", argLength: 3, reg: kstore, asm: "KMOVB", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
{name: "KMOVWstore", argLength: 3, reg: kstore, asm: "KMOVW", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
{name: "KMOVDstore", argLength: 3, reg: kstore, asm: "KMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
{name: "KMOVQstore", argLength: 3, reg: kstore, asm: "KMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
// Move GP directly to mask register
{name: "KMOVQk", argLength: 1, reg: gpk, asm: "KMOVQ"},
{name: "KMOVDk", argLength: 1, reg: gpk, asm: "KMOVD"},
{name: "KMOVWk", argLength: 1, reg: gpk, asm: "KMOVW"},
{name: "KMOVBk", argLength: 1, reg: gpk, asm: "KMOVB"},
{name: "KMOVQi", argLength: 1, reg: kgp, asm: "KMOVQ"},
{name: "KMOVDi", argLength: 1, reg: kgp, asm: "KMOVD"},
{name: "KMOVWi", argLength: 1, reg: kgp, asm: "KMOVW"},
{name: "KMOVBi", argLength: 1, reg: kgp, asm: "KMOVB"},
// VPTEST
{name: "VPTEST", asm: "VPTEST", argLength: 2, reg: v2flags, clobberFlags: true, typ: "Flags"},
}
var AMD64blocks = []blockData{
@ -1266,14 +1477,17 @@ func init() {
name: "AMD64",
pkg: "cmd/internal/obj/x86",
genfile: "../../amd64/ssa.go",
ops: AMD64ops,
genSIMDfile: "../../amd64/simdssa.go",
ops: append(AMD64ops, simdAMD64Ops(v11, v21, v2k, vkv, v2kv, v2kk, v31, v3kv, vgpv, vgp, vfpv, vfpkv,
w11, w21, w2k, wkw, w2kw, w2kk, w31, w3kw, wgpw, wgp, wfpw, wfpkw, wkwload, v21load, v31load, v11load,
w21load, w31load, w2kload, w2kwload, w11load, w3kwload, w2kkload, v31x0AtIn2)...), // AMD64ops,
blocks: AMD64blocks,
regnames: regNamesAMD64,
ParamIntRegNames: "AX BX CX DI SI R8 R9 R10 R11",
ParamFloatRegNames: "X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14",
gpregmask: gp,
fpregmask: fp,
specialregmask: x15,
specialregmask: mask,
framepointerreg: int8(num["BP"]),
linkreg: -1, // not used
})

View file

@ -941,7 +941,7 @@
// struct operations
(StructSelect [i] x:(StructMake ___)) => x.Args[i]
(Load <t> _ _) && t.IsStruct() && CanSSA(t) => rewriteStructLoad(v)
(Load <t> _ _) && t.IsStruct() && CanSSA(t) && !t.IsSIMD() => rewriteStructLoad(v)
(Store _ (StructMake ___) _) => rewriteStructStore(v)
(StructSelect [i] x:(Load <t> ptr mem)) && !CanSSA(t) =>

View file

@ -375,6 +375,18 @@ var genericOps = []opData{
{name: "Load", argLength: 2}, // Load from arg0. arg1=memory
{name: "Dereference", argLength: 2}, // Load from arg0. arg1=memory. Helper op for arg/result passing, result is an otherwise not-SSA-able "value".
{name: "Store", argLength: 3, typ: "Mem", aux: "Typ"}, // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
// masked memory operations.
// TODO add 16 and 8
{name: "LoadMasked8", argLength: 3}, // Load from arg0, arg1 = mask of 8-bits, arg2 = memory
{name: "LoadMasked16", argLength: 3}, // Load from arg0, arg1 = mask of 16-bits, arg2 = memory
{name: "LoadMasked32", argLength: 3}, // Load from arg0, arg1 = mask of 32-bits, arg2 = memory
{name: "LoadMasked64", argLength: 3}, // Load from arg0, arg1 = mask of 64-bits, arg2 = memory
{name: "StoreMasked8", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 8-bits, arg3 = memory
{name: "StoreMasked16", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 16-bits, arg3 = memory
{name: "StoreMasked32", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 32-bits, arg3 = memory
{name: "StoreMasked64", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 64-bits, arg3 = memory
// Normally we require that the source and destination of Move do not overlap.
// There is an exception when we know all the loads will happen before all
// the stores. In that case, overlap is ok. See
@ -666,6 +678,40 @@ var genericOps = []opData{
// Prefetch instruction
{name: "PrefetchCache", argLength: 2, hasSideEffects: true}, // Do prefetch arg0 to cache. arg0=addr, arg1=memory.
{name: "PrefetchCacheStreamed", argLength: 2, hasSideEffects: true}, // Do non-temporal or streamed prefetch arg0 to cache. arg0=addr, arg1=memory.
// SIMD
{name: "ZeroSIMD", argLength: 0}, // zero value of a vector
// Convert integers to masks
{name: "Cvt16toMask8x16", argLength: 1}, // arg0 = integer mask value
{name: "Cvt32toMask8x32", argLength: 1}, // arg0 = integer mask value
{name: "Cvt64toMask8x64", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask16x8", argLength: 1}, // arg0 = integer mask value
{name: "Cvt16toMask16x16", argLength: 1}, // arg0 = integer mask value
{name: "Cvt32toMask16x32", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask32x4", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask32x8", argLength: 1}, // arg0 = integer mask value
{name: "Cvt16toMask32x16", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask64x2", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask64x4", argLength: 1}, // arg0 = integer mask value
{name: "Cvt8toMask64x8", argLength: 1}, // arg0 = integer mask value
// Convert masks to integers
{name: "CvtMask8x16to16", argLength: 1}, // arg0 = mask
{name: "CvtMask8x32to32", argLength: 1}, // arg0 = mask
{name: "CvtMask8x64to64", argLength: 1}, // arg0 = mask
{name: "CvtMask16x8to8", argLength: 1}, // arg0 = mask
{name: "CvtMask16x16to16", argLength: 1}, // arg0 = mask
{name: "CvtMask16x32to32", argLength: 1}, // arg0 = mask
{name: "CvtMask32x4to8", argLength: 1}, // arg0 = mask
{name: "CvtMask32x8to8", argLength: 1}, // arg0 = mask
{name: "CvtMask32x16to16", argLength: 1}, // arg0 = mask
{name: "CvtMask64x2to8", argLength: 1}, // arg0 = mask
{name: "CvtMask64x4to8", argLength: 1}, // arg0 = mask
{name: "CvtMask64x8to8", argLength: 1}, // arg0 = mask
// Returns true if arg0 is all zero.
{name: "IsZeroVec", argLength: 1},
}
// kind controls successors implicit exit
@ -693,6 +739,7 @@ var genericBlocks = []blockData{
}
func init() {
genericOps = append(genericOps, simdGenericOps()...)
archs = append(archs, arch{
name: "generic",
ops: genericOps,

View file

@ -32,6 +32,7 @@ type arch struct {
name string
pkg string // obj package to import for this arch.
genfile string // source file containing opcode code generation.
genSIMDfile string // source file containing opcode code generation for SIMD.
ops []opData
blocks []blockData
regnames []string
@ -547,6 +548,15 @@ func genOp() {
if err != nil {
log.Fatalf("can't read %s: %v", a.genfile, err)
}
// Append the file of simd operations, too
if a.genSIMDfile != "" {
simdSrc, err := os.ReadFile(a.genSIMDfile)
if err != nil {
log.Fatalf("can't read %s: %v", a.genSIMDfile, err)
}
src = append(src, simdSrc...)
}
seen := make(map[string]bool, len(a.ops))
for _, m := range rxOp.FindAllSubmatch(src, -1) {
seen[string(m[1])] = true

View file

@ -0,0 +1,117 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bufio"
"io"
)
// NamedScanner is a simple struct to pair a name with a Scanner.
type NamedScanner struct {
Name string
Scanner *bufio.Scanner
}
// NamedReader is a simple struct to pair a name with a Reader,
// which will be converted to a Scanner using bufio.NewScanner.
type NamedReader struct {
Name string
Reader io.Reader
}
// MultiScanner scans over multiple bufio.Scanners as if they were a single stream.
// It also keeps track of the name of the current scanner and the line number.
type MultiScanner struct {
scanners []NamedScanner
scannerIdx int
line int
totalLine int
err error
}
// NewMultiScanner creates a new MultiScanner from slice of NamedScanners.
func NewMultiScanner(scanners []NamedScanner) *MultiScanner {
return &MultiScanner{
scanners: scanners,
scannerIdx: -1, // Start before the first scanner
}
}
// MultiScannerFromReaders creates a new MultiScanner from a slice of NamedReaders.
func MultiScannerFromReaders(readers []NamedReader) *MultiScanner {
var scanners []NamedScanner
for _, r := range readers {
scanners = append(scanners, NamedScanner{
Name: r.Name,
Scanner: bufio.NewScanner(r.Reader),
})
}
return NewMultiScanner(scanners)
}
// Scan advances the scanner to the next token, which will then be
// available through the Text method. It returns false when the scan stops,
// either by reaching the end of the input or an error.
// After Scan returns false, the Err method will return any error that
// occurred during scanning, except that if it was io.EOF, Err
// will return nil.
func (ms *MultiScanner) Scan() bool {
if ms.scannerIdx == -1 {
ms.scannerIdx = 0
}
for ms.scannerIdx < len(ms.scanners) {
current := ms.scanners[ms.scannerIdx]
if current.Scanner.Scan() {
ms.line++
ms.totalLine++
return true
}
if err := current.Scanner.Err(); err != nil {
ms.err = err
return false
}
// Move to the next scanner
ms.scannerIdx++
ms.line = 0
}
return false
}
// Text returns the most recent token generated by a call to Scan.
func (ms *MultiScanner) Text() string {
if ms.scannerIdx < 0 || ms.scannerIdx >= len(ms.scanners) {
return ""
}
return ms.scanners[ms.scannerIdx].Scanner.Text()
}
// Err returns the first non-EOF error that was encountered by the MultiScanner.
func (ms *MultiScanner) Err() error {
return ms.err
}
// Name returns the name of the current scanner.
func (ms *MultiScanner) Name() string {
if ms.scannerIdx < 0 {
return "<before first>"
}
if ms.scannerIdx >= len(ms.scanners) {
return "<after last>"
}
return ms.scanners[ms.scannerIdx].Name
}
// Line returns the current line number within the current scanner.
func (ms *MultiScanner) Line() int {
return ms.line
}
// TotalLine returns the total number of lines scanned across all scanners.
func (ms *MultiScanner) TotalLine() int {
return ms.totalLine
}

View file

@ -94,8 +94,11 @@ func genSplitLoadRules(arch arch) { genRulesSuffix(arch, "splitload") }
func genLateLowerRules(arch arch) { genRulesSuffix(arch, "latelower") }
func genRulesSuffix(arch arch, suff string) {
var readers []NamedReader
// Open input file.
text, err := os.Open(arch.name + suff + ".rules")
var text io.Reader
name := arch.name + suff + ".rules"
text, err := os.Open(name)
if err != nil {
if suff == "" {
// All architectures must have a plain rules file.
@ -104,18 +107,28 @@ func genRulesSuffix(arch arch, suff string) {
// Some architectures have bonus rules files that others don't share. That's fine.
return
}
readers = append(readers, NamedReader{name, text})
// Check for file of SIMD rules to add
if suff == "" {
simdname := "simd" + arch.name + ".rules"
simdtext, err := os.Open(simdname)
if err == nil {
readers = append(readers, NamedReader{simdname, simdtext})
}
}
// oprules contains a list of rules for each block and opcode
blockrules := map[string][]Rule{}
oprules := map[string][]Rule{}
// read rule file
scanner := bufio.NewScanner(text)
scanner := MultiScannerFromReaders(readers)
rule := ""
var lineno int
var ruleLineno int // line number of "=>"
for scanner.Scan() {
lineno++
lineno = scanner.Line()
line := scanner.Text()
if i := strings.Index(line, "//"); i >= 0 {
// Remove comments. Note that this isn't string safe, so
@ -142,7 +155,7 @@ func genRulesSuffix(arch arch, suff string) {
break // continuing the line can't help, and it will only make errors worse
}
loc := fmt.Sprintf("%s%s.rules:%d", arch.name, suff, ruleLineno)
loc := fmt.Sprintf("%s:%d", scanner.Name(), ruleLineno)
for _, rule2 := range expandOr(rule) {
r := Rule{Rule: rule2, Loc: loc}
if rawop := strings.Split(rule2, " ")[0][1:]; isBlock(rawop, arch) {
@ -162,7 +175,7 @@ func genRulesSuffix(arch arch, suff string) {
log.Fatalf("scanner failed: %v\n", err)
}
if balance(rule) != 0 {
log.Fatalf("%s.rules:%d: unbalanced rule: %v\n", arch.name, lineno, rule)
log.Fatalf("%s:%d: unbalanced rule: %v\n", scanner.Name(), lineno, rule)
}
// Order all the ops.
@ -862,7 +875,7 @@ func declReserved(name, value string) *Declare {
if !reservedNames[name] {
panic(fmt.Sprintf("declReserved call does not use a reserved name: %q", name))
}
return &Declare{name, exprf(value)}
return &Declare{name, exprf("%s", value)}
}
// breakf constructs a simple "if cond { break }" statement, using exprf for its
@ -889,7 +902,7 @@ func genBlockRewrite(rule Rule, arch arch, data blockData) *RuleRewrite {
if vname == "" {
vname = fmt.Sprintf("v_%v", i)
}
rr.add(declf(rr.Loc, vname, cname))
rr.add(declf(rr.Loc, vname, "%s", cname))
p, op := genMatch0(rr, arch, expr, vname, nil, false) // TODO: pass non-nil cnt?
if op != "" {
check := fmt.Sprintf("%s.Op == %s", cname, op)
@ -904,7 +917,7 @@ func genBlockRewrite(rule Rule, arch arch, data blockData) *RuleRewrite {
}
pos[i] = p
} else {
rr.add(declf(rr.Loc, arg, cname))
rr.add(declf(rr.Loc, arg, "%s", cname))
pos[i] = arg + ".Pos"
}
}

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -18,6 +18,9 @@ type Block struct {
// Source position for block's control operation
Pos src.XPos
// What cpu features (AVXnnn, SVEyyy) are implied to reach/execute this block?
CPUfeatures CPUfeatures
// The kind of block this is.
Kind BlockKind
@ -449,3 +452,57 @@ const (
HotPgoInitial = HotPgo | HotInitial // special case; single block loop, initial block is header block has a flow-in entry, but PGO says it is hot
HotPgoInitialNotFLowIn = HotPgo | HotInitial | HotNotFlowIn // PGO says it is hot, and the loop is rotated so flow enters loop with a branch
)
type CPUfeatures uint32
const (
CPUNone CPUfeatures = 0
CPUAll CPUfeatures = ^CPUfeatures(0)
CPUavx CPUfeatures = 1 << iota
CPUavx2
CPUavxvnni
CPUavx512
CPUbitalg
CPUgfni
CPUvbmi
CPUvbmi2
CPUvpopcntdq
CPUavx512vnni
CPUneon
CPUsve2
)
func (f CPUfeatures) hasFeature(x CPUfeatures) bool {
return f&x == x
}
func (f CPUfeatures) String() string {
if f == CPUNone {
return "none"
}
if f == CPUAll {
return "all"
}
s := ""
foo := func(what string, feat CPUfeatures) {
if feat&f != 0 {
if s != "" {
s += "+"
}
s += what
}
}
foo("avx", CPUavx)
foo("avx2", CPUavx2)
foo("avx512", CPUavx512)
foo("avxvnni", CPUavxvnni)
foo("bitalg", CPUbitalg)
foo("gfni", CPUgfni)
foo("vbmi", CPUvbmi)
foo("vbmi2", CPUvbmi2)
foo("popcntdq", CPUvpopcntdq)
foo("avx512vnni", CPUavx512vnni)
return s
}

View file

@ -150,8 +150,9 @@ func checkFunc(f *Func) {
case auxInt128:
// AuxInt must be zero, so leave canHaveAuxInt set to false.
case auxUInt8:
if v.AuxInt != int64(uint8(v.AuxInt)) {
f.Fatalf("bad uint8 AuxInt value for %v", v)
// Cast to int8 due to requirement of AuxInt, check its comment for details.
if v.AuxInt != int64(int8(v.AuxInt)) {
f.Fatalf("bad uint8 AuxInt value for %v, saw %d but need %d", v, v.AuxInt, int64(int8(v.AuxInt)))
}
canHaveAuxInt = true
case auxFloat32:

View file

@ -488,6 +488,8 @@ var passes = [...]pass{
{name: "writebarrier", fn: writebarrier, required: true}, // expand write barrier ops
{name: "insert resched checks", fn: insertLoopReschedChecks,
disabled: !buildcfg.Experiment.PreemptibleLoops}, // insert resched checks in loops.
{name: "cpufeatures", fn: cpufeatures, required: buildcfg.Experiment.SIMD, disabled: !buildcfg.Experiment.SIMD},
{name: "rewrite tern", fn: rewriteTern, required: false, disabled: !buildcfg.Experiment.SIMD},
{name: "lower", fn: lower, required: true},
{name: "addressing modes", fn: addressingModes, required: false},
{name: "late lower", fn: lateLower, required: true},
@ -596,6 +598,8 @@ var passOrder = [...]constraint{
{"branchelim", "late opt"},
// branchelim is an arch-independent pass.
{"branchelim", "lower"},
// lower needs cpu feature information (for SIMD)
{"cpufeatures", "lower"},
}
func init() {

View file

@ -88,6 +88,10 @@ type Types struct {
Float32Ptr *types.Type
Float64Ptr *types.Type
BytePtrPtr *types.Type
Vec128 *types.Type
Vec256 *types.Type
Vec512 *types.Type
Mask *types.Type
}
// NewTypes creates and populates a Types.
@ -122,6 +126,10 @@ func (t *Types) SetTypPtrs() {
t.Float32Ptr = types.NewPtr(types.Types[types.TFLOAT32])
t.Float64Ptr = types.NewPtr(types.Types[types.TFLOAT64])
t.BytePtrPtr = types.NewPtr(types.NewPtr(types.Types[types.TUINT8]))
t.Vec128 = types.TypeVec128
t.Vec256 = types.TypeVec256
t.Vec512 = types.TypeVec512
t.Mask = types.TypeMask
}
type Logger interface {

View file

@ -0,0 +1,262 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
import (
"cmd/compile/internal/types"
"cmd/internal/obj"
"fmt"
"internal/goarch"
)
type localEffect struct {
start CPUfeatures // features present at beginning of block
internal CPUfeatures // features implied by execution of block
end [2]CPUfeatures // for BlockIf, features present on outgoing edges
visited bool // On the first iteration this will be false for backedges.
}
func (e localEffect) String() string {
return fmt.Sprintf("visited=%v, start=%v, internal=%v, end[0]=%v, end[1]=%v", e.visited, e.start, e.internal, e.end[0], e.end[1])
}
// ifEffect pattern matches for a BlockIf conditional on a load
// of a field from internal/cpu.X86 and returns the corresponding
// effect.
func ifEffect(b *Block) (features CPUfeatures, taken int) {
// TODO generalize for other architectures.
if b.Kind != BlockIf {
return
}
c := b.Controls[0]
if c.Op == OpNot {
taken = 1
c = c.Args[0]
}
if c.Op != OpLoad {
return
}
offPtr := c.Args[0]
if offPtr.Op != OpOffPtr {
return
}
addr := offPtr.Args[0]
if addr.Op != OpAddr || addr.Args[0].Op != OpSB {
return
}
sym := addr.Aux.(*obj.LSym)
if sym.Name != "internal/cpu.X86" {
return
}
o := offPtr.AuxInt
t := addr.Type
if !t.IsPtr() {
b.Func.Fatalf("The symbol %s is not a pointer, found %v instead", sym.Name, t)
}
t = t.Elem()
if !t.IsStruct() {
b.Func.Fatalf("The referent of symbol %s is not a struct, found %v instead", sym.Name, t)
}
match := ""
for _, f := range t.Fields() {
if o == f.Offset && f.Sym != nil {
match = f.Sym.Name
break
}
}
switch match {
case "HasAVX":
features = CPUavx
case "HasAVXVNNI":
features = CPUavx | CPUavxvnni
case "HasAVX2":
features = CPUavx2 | CPUavx
// Compiler currently treats these all alike.
case "HasAVX512", "HasAVX512F", "HasAVX512CD", "HasAVX512BW",
"HasAVX512DQ", "HasAVX512VL", "HasAVX512VPCLMULQDQ":
features = CPUavx512 | CPUavx2 | CPUavx
case "HasAVX512GFNI":
features = CPUavx512 | CPUgfni | CPUavx2 | CPUavx
case "HasAVX512VNNI":
features = CPUavx512 | CPUavx512vnni | CPUavx2 | CPUavx
case "HasAVX512VBMI":
features = CPUavx512 | CPUvbmi | CPUavx2 | CPUavx
case "HasAVX512VBMI2":
features = CPUavx512 | CPUvbmi2 | CPUavx2 | CPUavx
case "HasAVX512BITALG":
features = CPUavx512 | CPUbitalg | CPUavx2 | CPUavx
case "HasAVX512VPOPCNTDQ":
features = CPUavx512 | CPUvpopcntdq | CPUavx2 | CPUavx
case "HasBMI1":
features = CPUvbmi
case "HasBMI2":
features = CPUvbmi2
// Features that are not currently interesting to the compiler.
case "HasAES", "HasADX", "HasERMS", "HasFSRM", "HasFMA", "HasGFNI", "HasOSXSAVE",
"HasPCLMULQDQ", "HasPOPCNT", "HasRDTSCP", "HasSHA",
"HasSSE3", "HasSSSE3", "HasSSE41", "HasSSE42":
}
if b.Func.pass.debug > 2 {
b.Func.Warnl(b.Pos, "%s, block b%v has features offset %d, match is %s, features is %v", b.Func.Name, b.ID, o, match, features)
}
return
}
func cpufeatures(f *Func) {
arch := f.Config.Ctxt().Arch.Family
// TODO there are other SIMD architectures
if arch != goarch.AMD64 {
return
}
po := f.Postorder()
effects := make([]localEffect, 1+f.NumBlocks(), 1+f.NumBlocks())
features := func(t *types.Type) CPUfeatures {
if t.IsSIMD() {
switch t.Size() {
case 16, 32:
return CPUavx
case 64:
return CPUavx512 | CPUavx2 | CPUavx
}
}
return CPUNone
}
// visit blocks in reverse post order
// when b is visited, all of its predecessors (except for loop back edges)
// will have been visited
for i := len(po) - 1; i >= 0; i-- {
b := po[i]
var feat CPUfeatures
if b == f.Entry {
// Check the types of inputs and outputs, as well as annotations.
// Start with none and union all that is implied by all the types seen.
if f.Type != nil { // a problem for SSA tests
for _, field := range f.Type.RecvParamsResults() {
feat |= features(field.Type)
}
}
} else {
// Start with all and intersect over predecessors
feat = CPUAll
for _, p := range b.Preds {
pb := p.Block()
if !effects[pb.ID].visited {
continue
}
pi := p.Index()
if pb.Kind != BlockIf {
pi = 0
}
feat &= effects[pb.ID].end[pi]
}
}
e := localEffect{start: feat, visited: true}
// Separately capture the internal effects of this block
var internal CPUfeatures
for _, v := range b.Values {
// the rule applied here is, if the block contains any
// instruction that would fault if the feature (avx, avx512)
// were not present, then assume that the feature is present
// for all the instructions in the block, a fault is a fault.
t := v.Type
if t.IsResults() {
for i := 0; i < t.NumFields(); i++ {
feat |= features(t.FieldType(i))
}
} else {
internal |= features(v.Type)
}
}
e.internal = internal
feat |= internal
branchEffect, taken := ifEffect(b)
e.end = [2]CPUfeatures{feat, feat}
e.end[taken] |= branchEffect
effects[b.ID] = e
if f.pass.debug > 1 && feat != CPUNone {
f.Warnl(b.Pos, "%s, block b%v has features %v", b.Func.Name, b.ID, feat)
}
b.CPUfeatures = feat
f.maxCPUFeatures |= feat // not necessary to refine this estimate below
}
// If the flow graph is irreducible, things can still change on backedges.
change := true
for change {
change = false
for i := len(po) - 1; i >= 0; i-- {
b := po[i]
if b == f.Entry {
continue // cannot change
}
feat := CPUAll
for _, p := range b.Preds {
pb := p.Block()
pi := p.Index()
if pb.Kind != BlockIf {
pi = 0
}
feat &= effects[pb.ID].end[pi]
}
e := effects[b.ID]
if feat == e.start {
continue
}
e.start = feat
effects[b.ID] = e
// uh-oh, something changed
if f.pass.debug > 1 {
f.Warnl(b.Pos, "%s, block b%v saw predecessor feature change", b.Func.Name, b.ID)
}
feat |= e.internal
if feat == e.end[0]&e.end[1] {
continue
}
branchEffect, taken := ifEffect(b)
e.end = [2]CPUfeatures{feat, feat}
e.end[taken] |= branchEffect
effects[b.ID] = e
b.CPUfeatures = feat
if f.pass.debug > 1 {
f.Warnl(b.Pos, "%s, block b%v has new features %v", b.Func.Name, b.ID, feat)
}
change = true
}
}
if f.pass.debug > 0 {
for _, b := range f.Blocks {
if b.CPUfeatures != CPUNone {
f.Warnl(b.Pos, "%s, block b%v has features %v", b.Func.Name, b.ID, b.CPUfeatures)
}
}
}
}

View file

@ -100,7 +100,7 @@ func decomposeBuiltin(f *Func) {
}
case t.IsFloat():
// floats are never decomposed, even ones bigger than RegSize
case t.Size() > f.Config.RegSize:
case t.Size() > f.Config.RegSize && !t.IsSIMD():
f.Fatalf("undecomposed named type %s %v", name, t)
}
}
@ -135,7 +135,7 @@ func decomposeBuiltinPhi(v *Value) {
decomposeInterfacePhi(v)
case v.Type.IsFloat():
// floats are never decomposed, even ones bigger than RegSize
case v.Type.Size() > v.Block.Func.Config.RegSize:
case v.Type.Size() > v.Block.Func.Config.RegSize && !v.Type.IsSIMD():
v.Fatalf("%v undecomposed type %v", v, v.Type)
}
}
@ -248,7 +248,7 @@ func decomposeUser(f *Func) {
for _, name := range f.Names {
t := name.Type
switch {
case t.IsStruct():
case isStructNotSIMD(t):
newNames = decomposeUserStructInto(f, name, newNames)
case t.IsArray():
newNames = decomposeUserArrayInto(f, name, newNames)
@ -293,7 +293,7 @@ func decomposeUserArrayInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Loc
if t.Elem().IsArray() {
return decomposeUserArrayInto(f, elemName, slots)
} else if t.Elem().IsStruct() {
} else if isStructNotSIMD(t.Elem()) {
return decomposeUserStructInto(f, elemName, slots)
}
@ -313,7 +313,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
fnames = append(fnames, fs)
// arrays and structs will be decomposed further, so
// there's no need to record a name
if !fs.Type.IsArray() && !fs.Type.IsStruct() {
if !fs.Type.IsArray() && !isStructNotSIMD(fs.Type) {
slots = maybeAppend(f, slots, fs)
}
}
@ -339,7 +339,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
// now that this f.NamedValues contains values for the struct
// fields, recurse into nested structs
for i := 0; i < n; i++ {
if name.Type.FieldType(i).IsStruct() {
if isStructNotSIMD(name.Type.FieldType(i)) {
slots = decomposeUserStructInto(f, fnames[i], slots)
delete(f.NamedValues, *fnames[i])
} else if name.Type.FieldType(i).IsArray() {
@ -351,7 +351,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
}
func decomposeUserPhi(v *Value) {
switch {
case v.Type.IsStruct():
case isStructNotSIMD(v.Type):
decomposeStructPhi(v)
case v.Type.IsArray():
decomposeArrayPhi(v)
@ -458,3 +458,7 @@ func deleteNamedVals(f *Func, toDelete []namedVal) {
}
f.Names = f.Names[:end]
}
func isStructNotSIMD(t *types.Type) bool {
return t.IsStruct() && !t.IsSIMD()
}

View file

@ -396,6 +396,9 @@ func (x *expandState) decomposeAsNecessary(pos src.XPos, b *Block, a, m0 *Value,
return mem
case types.TSTRUCT:
if at.IsSIMD() {
break // XXX
}
for i := 0; i < at.NumFields(); i++ {
et := at.Field(i).Type // might need to read offsets from the fields
e := b.NewValue1I(pos, OpStructSelect, et, int64(i), a)
@ -551,6 +554,9 @@ func (x *expandState) rewriteSelectOrArg(pos src.XPos, b *Block, container, a, m
case types.TSTRUCT:
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
if at.IsSIMD() {
break // XXX
}
for i := 0; i < at.NumFields(); i++ {
et := at.Field(i).Type
e := x.rewriteSelectOrArg(pos, b, container, nil, m0, et, rc.next(et))
@ -717,6 +723,9 @@ func (x *expandState) rewriteWideSelectToStores(pos src.XPos, b *Block, containe
case types.TSTRUCT:
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
if at.IsSIMD() {
break // XXX
}
for i := 0; i < at.NumFields(); i++ {
et := at.Field(i).Type
m0 = x.rewriteWideSelectToStores(pos, b, container, m0, et, rc.next(et))

View file

@ -41,6 +41,8 @@ type Func struct {
ABISelf *abi.ABIConfig // ABI for function being compiled
ABIDefault *abi.ABIConfig // ABI for rtcall and other no-parsed-signature/pragma functions.
maxCPUFeatures CPUfeatures // union of all the CPU features in all the blocks.
scheduled bool // Values in Blocks are in final order
laidout bool // Blocks are ordered
NoSplit bool // true if function is marked as nosplit. Used by schedule check pass.
@ -632,6 +634,19 @@ func (b *Block) NewValue4(pos src.XPos, op Op, t *types.Type, arg0, arg1, arg2,
return v
}
// NewValue4A returns a new value in the block with four arguments and zero aux values.
func (b *Block) NewValue4A(pos src.XPos, op Op, t *types.Type, aux Aux, arg0, arg1, arg2, arg3 *Value) *Value {
v := b.Func.newValue(op, t, b, pos)
v.AuxInt = 0
v.Aux = aux
v.Args = []*Value{arg0, arg1, arg2, arg3}
arg0.Uses++
arg1.Uses++
arg2.Uses++
arg3.Uses++
return v
}
// NewValue4I returns a new value in the block with four arguments and auxint value.
func (b *Block) NewValue4I(pos src.XPos, op Op, t *types.Type, auxint int64, arg0, arg1, arg2, arg3 *Value) *Value {
v := b.Func.newValue(op, t, b, pos)

File diff suppressed because it is too large Load diff

View file

@ -931,6 +931,14 @@ func (s *regAllocState) compatRegs(t *types.Type) regMask {
if t.IsTuple() || t.IsFlags() {
return 0
}
if t.IsSIMD() {
if t.Size() > 8 {
return s.f.Config.fpRegMask & s.allocatable
} else {
// K mask
return s.f.Config.gpRegMask & s.allocatable
}
}
if t.IsFloat() || t == types.TypeInt128 {
if t.Kind() == types.TFLOAT32 && s.f.Config.fp32RegMask != 0 {
m = s.f.Config.fp32RegMask
@ -1439,6 +1447,13 @@ func (s *regAllocState) regalloc(f *Func) {
s.sb = v.ID
case OpARM64ZERO, OpLOONG64ZERO, OpMIPS64ZERO:
s.assignReg(s.ZeroIntReg, v, v)
case OpAMD64Zero128, OpAMD64Zero256, OpAMD64Zero512:
regspec := s.regspec(v)
m := regspec.outputs[0].regs
if countRegs(m) != 1 {
f.Fatalf("bad fixed-register op %s", v)
}
s.assignReg(pickReg(m), v, v)
default:
f.Fatalf("unknown fixed-register op %s", v)
}

File diff suppressed because it is too large Load diff

View file

@ -12416,11 +12416,11 @@ func rewriteValuegeneric_OpLoad(v *Value) bool {
return true
}
// match: (Load <t> _ _)
// cond: t.IsStruct() && CanSSA(t)
// cond: t.IsStruct() && CanSSA(t) && !t.IsSIMD()
// result: rewriteStructLoad(v)
for {
t := v.Type
if !(t.IsStruct() && CanSSA(t)) {
if !(t.IsStruct() && CanSSA(t) && !t.IsSIMD()) {
break
}
v.copyOf(rewriteStructLoad(v))

View file

@ -0,0 +1,292 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
import (
"fmt"
"internal/goarch"
"slices"
)
var truthTableValues [3]uint8 = [3]uint8{0b1111_0000, 0b1100_1100, 0b1010_1010}
func (slop SIMDLogicalOP) String() string {
if slop == sloInterior {
return "leaf"
}
interior := ""
if slop&sloInterior != 0 {
interior = "+interior"
}
switch slop &^ sloInterior {
case sloAnd:
return "and" + interior
case sloXor:
return "xor" + interior
case sloOr:
return "or" + interior
case sloAndNot:
return "andNot" + interior
case sloNot:
return "not" + interior
}
return "wrong"
}
func rewriteTern(f *Func) {
if f.maxCPUFeatures == CPUNone {
return
}
arch := f.Config.Ctxt().Arch.Family
// TODO there are other SIMD architectures
if arch != goarch.AMD64 {
return
}
boolExprTrees := make(map[*Value]SIMDLogicalOP)
// Find logical-expr expression trees, including leaves.
// interior nodes will be marked sloInterior,
// root nodes will not be marked sloInterior,
// leaf nodes are only marked sloInterior.
for _, b := range f.Blocks {
for _, v := range b.Values {
slo := classifyBooleanSIMD(v)
switch slo {
case sloOr,
sloAndNot,
sloXor,
sloAnd:
boolExprTrees[v.Args[1]] |= sloInterior
fallthrough
case sloNot:
boolExprTrees[v.Args[0]] |= sloInterior
boolExprTrees[v] |= slo
}
}
}
// get a canonical sorted set of roots
var roots []*Value
for v, slo := range boolExprTrees {
if f.pass.debug > 1 {
f.Warnl(v.Pos, "%s has SLO %v", v.LongString(), slo)
}
if slo&sloInterior == 0 && v.Block.CPUfeatures.hasFeature(CPUavx512) {
roots = append(roots, v)
}
}
slices.SortFunc(roots, func(u, v *Value) int { return int(u.ID - v.ID) }) // IDs are small enough to not care about overflow.
// This rewrite works by iterating over the root set.
// For each boolean expression, it walks the expression
// bottom up accumulating sets of variables mentioned in
// subexpressions, lazy-greedily finding the largest subexpressions
// of 3 inputs that can be rewritten to use ternary-truth-table instructions.
// rewrite recursively attempts to replace v and v's subexpressions with
// ternary-logic truth-table operations, returning a set of not more than 3
// subexpressions within v that may be combined into a parent's replacement.
// V need not have the CPU features that allow a ternary-logic operation;
// in that case, v will not be rewritten. Replacements also require
// exactly 3 different variable inputs to a boolean expression.
//
// Given the CPU feature and 3 inputs, v is replaced in the following
// cases:
//
// 1) v is a root
// 2) u = NOT(v) and u lacks the CPU feature
// 3) u = OP(v, w) and u lacks the CPU feature
// 4) u = OP(v, w) and u has more than 3 variable inputs. var rewrite func(v *Value) [3]*Value
var rewrite func(v *Value) [3]*Value
// computeTT returns the truth table for a boolean expression
// over the variables in vars, where vars[0] varies slowest in
// the truth table and vars[2] varies fastest.
// e.g. computeTT( "and(x, or(y, not(z)))", {x,y,z} ) returns
// (bit 0 first) 0 0 0 0 1 0 1 1 = (reversed) 1101_0000 = 0xD0
// x: 0 0 0 0 1 1 1 1
// y: 0 0 1 1 0 0 1 1
// z: 0 1 0 1 0 1 0 1
var computeTT func(v *Value, vars [3]*Value) uint8
// combine two sets of variables into one, returning ok/not
// if the two sets contained 3 or fewer elements. Combine
// ensures that the sets of Values never contain duplicates.
// (Duplicates would create less-efficient code, not incorrect code.)
combine := func(a, b [3]*Value) ([3]*Value, bool) {
var c [3]*Value
i := 0
for _, v := range a {
if v == nil {
break
}
c[i] = v
i++
}
bloop:
for _, v := range b {
if v == nil {
break
}
for _, u := range a {
if v == u {
continue bloop
}
}
if i == 3 {
return [3]*Value{}, false
}
c[i] = v
i++
}
return c, true
}
computeTT = func(v *Value, vars [3]*Value) uint8 {
i := 0
for ; i < len(vars); i++ {
if vars[i] == v {
return truthTableValues[i]
}
}
slo := boolExprTrees[v] &^ sloInterior
a := computeTT(v.Args[0], vars)
switch slo {
case sloNot:
return ^a
case sloAnd:
return a & computeTT(v.Args[1], vars)
case sloXor:
return a ^ computeTT(v.Args[1], vars)
case sloOr:
return a | computeTT(v.Args[1], vars)
case sloAndNot:
return a & ^computeTT(v.Args[1], vars)
}
panic("switch should have covered all cases, or unknown var in logical expression")
}
replace := func(a0 *Value, vars0 [3]*Value) {
imm := computeTT(a0, vars0)
op := ternOpForLogical(a0.Op)
if op == a0.Op {
panic(fmt.Errorf("should have mapped away from input op, a0 is %s", a0.LongString()))
}
if f.pass.debug > 0 {
f.Warnl(a0.Pos, "Rewriting %s into %v of 0b%b %v %v %v", a0.LongString(), op, imm,
vars0[0], vars0[1], vars0[2])
}
a0.reset(op)
a0.SetArgs3(vars0[0], vars0[1], vars0[2])
a0.AuxInt = int64(int8(imm))
}
// addOne ensures the no-duplicates addition of a single value
// to a set that is not full. It seems possible that a shared
// subexpression in tricky combination with blocks lacking the
// AVX512 feature might permit this.
addOne := func(vars [3]*Value, v *Value) [3]*Value {
if vars[2] != nil {
panic("rewriteTern.addOne, vars[2] should be nil")
}
if v == vars[0] || v == vars[1] {
return vars
}
if vars[1] == nil {
vars[1] = v
} else {
vars[2] = v
}
return vars
}
rewrite = func(v *Value) [3]*Value {
slo := boolExprTrees[v]
if slo == sloInterior { // leaf node, i.e., a "variable"
return [3]*Value{v, nil, nil}
}
var vars [3]*Value
hasFeature := v.Block.CPUfeatures.hasFeature(CPUavx512)
if slo&sloNot == sloNot {
vars = rewrite(v.Args[0])
if !hasFeature {
if vars[2] != nil {
replace(v.Args[0], vars)
return [3]*Value{v, nil, nil}
}
return vars
}
} else {
var ok bool
a0, a1 := v.Args[0], v.Args[1]
vars0 := rewrite(a0)
vars1 := rewrite(a1)
vars, ok = combine(vars0, vars1)
if f.pass.debug > 1 {
f.Warnl(a0.Pos, "combine(%v, %v) -> %v, %v", vars0, vars1, vars, ok)
}
if !(ok && v.Block.CPUfeatures.hasFeature(CPUavx512)) {
// too many variables, or cannot rewrite current values.
// rewrite one or both subtrees if possible
if vars0[2] != nil && a0.Block.CPUfeatures.hasFeature(CPUavx512) {
replace(a0, vars0)
}
if vars1[2] != nil && a1.Block.CPUfeatures.hasFeature(CPUavx512) {
replace(a1, vars1)
}
// 3-element var arrays are either rewritten, or unable to be rewritten
// because of the features in effect in their block. Either way, they
// are treated as a "new var" if 3 elements are present.
if vars0[2] == nil {
if vars1[2] == nil {
// both subtrees are 2-element and were not rewritten.
//
// TODO a clever person would look at subtrees of inputs,
// e.g. rewrite
// ((a AND b) XOR b) XOR (d XOR (c AND d))
// to (((a AND b) XOR b) XOR d) XOR (c AND d)
// to v = TERNLOG(truthtable, a, b, d) XOR (c AND d)
// and return the variable set {v, c, d}
//
// But for now, just restart with a0 and a1.
return [3]*Value{a0, a1, nil}
} else {
// a1 (maybe) rewrote, a0 has room for another var
vars = addOne(vars0, a1)
}
} else if vars1[2] == nil {
// a0 (maybe) rewrote, a1 has room for another var
vars = addOne(vars1, a0)
} else if !ok {
// both (maybe) rewrote
// a0 and a1 are different because otherwise their variable
// sets would have combined "ok".
return [3]*Value{a0, a1, nil}
}
// continue with either the vars from "ok" or the updated set of vars.
}
}
// if root and 3 vars and hasFeature, rewrite.
if slo&sloInterior == 0 && vars[2] != nil && hasFeature {
replace(v, vars)
return [3]*Value{v, nil, nil}
}
return vars
}
for _, v := range roots {
if f.pass.debug > 1 {
f.Warnl(v.Pos, "SLO root %s", v.LongString())
}
rewrite(v)
}
}

View file

@ -21,7 +21,7 @@ func TestSizeof(t *testing.T) {
_64bit uintptr // size on 64bit platforms
}{
{Value{}, 72, 112},
{Block{}, 164, 304},
{Block{}, 168, 312},
{LocalSlot{}, 28, 40},
{valState{}, 28, 40},
}

View file

@ -0,0 +1,160 @@
// Code generated by 'go run genfiles.go'; DO NOT EDIT.
package ssa
type SIMDLogicalOP uint8
const (
// boolean simd operations, for reducing expression to VPTERNLOG* instructions
// sloInterior is set for non-root nodes in logical-op expression trees.
// the operations are even-numbered.
sloInterior SIMDLogicalOP = 1
sloNone SIMDLogicalOP = 2 * iota
sloAnd
sloOr
sloAndNot
sloXor
sloNot
)
func classifyBooleanSIMD(v *Value) SIMDLogicalOP {
switch v.Op {
case OpAndInt8x16, OpAndInt16x8, OpAndInt32x4, OpAndInt64x2, OpAndInt8x32, OpAndInt16x16, OpAndInt32x8, OpAndInt64x4, OpAndInt8x64, OpAndInt16x32, OpAndInt32x16, OpAndInt64x8:
return sloAnd
case OpOrInt8x16, OpOrInt16x8, OpOrInt32x4, OpOrInt64x2, OpOrInt8x32, OpOrInt16x16, OpOrInt32x8, OpOrInt64x4, OpOrInt8x64, OpOrInt16x32, OpOrInt32x16, OpOrInt64x8:
return sloOr
case OpAndNotInt8x16, OpAndNotInt16x8, OpAndNotInt32x4, OpAndNotInt64x2, OpAndNotInt8x32, OpAndNotInt16x16, OpAndNotInt32x8, OpAndNotInt64x4, OpAndNotInt8x64, OpAndNotInt16x32, OpAndNotInt32x16, OpAndNotInt64x8:
return sloAndNot
case OpXorInt8x16:
if y := v.Args[1]; y.Op == OpEqualInt8x16 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt16x8:
if y := v.Args[1]; y.Op == OpEqualInt16x8 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt32x4:
if y := v.Args[1]; y.Op == OpEqualInt32x4 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt64x2:
if y := v.Args[1]; y.Op == OpEqualInt64x2 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt8x32:
if y := v.Args[1]; y.Op == OpEqualInt8x32 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt16x16:
if y := v.Args[1]; y.Op == OpEqualInt16x16 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt32x8:
if y := v.Args[1]; y.Op == OpEqualInt32x8 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt64x4:
if y := v.Args[1]; y.Op == OpEqualInt64x4 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt8x64:
if y := v.Args[1]; y.Op == OpEqualInt8x64 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt16x32:
if y := v.Args[1]; y.Op == OpEqualInt16x32 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt32x16:
if y := v.Args[1]; y.Op == OpEqualInt32x16 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
case OpXorInt64x8:
if y := v.Args[1]; y.Op == OpEqualInt64x8 &&
y.Args[0] == y.Args[1] {
return sloNot
}
return sloXor
}
return sloNone
}
func ternOpForLogical(op Op) Op {
switch op {
case OpAndInt8x16, OpOrInt8x16, OpXorInt8x16, OpAndNotInt8x16:
return OpternInt32x4
case OpAndUint8x16, OpOrUint8x16, OpXorUint8x16, OpAndNotUint8x16:
return OpternUint32x4
case OpAndInt16x8, OpOrInt16x8, OpXorInt16x8, OpAndNotInt16x8:
return OpternInt32x4
case OpAndUint16x8, OpOrUint16x8, OpXorUint16x8, OpAndNotUint16x8:
return OpternUint32x4
case OpAndInt32x4, OpOrInt32x4, OpXorInt32x4, OpAndNotInt32x4:
return OpternInt32x4
case OpAndUint32x4, OpOrUint32x4, OpXorUint32x4, OpAndNotUint32x4:
return OpternUint32x4
case OpAndInt64x2, OpOrInt64x2, OpXorInt64x2, OpAndNotInt64x2:
return OpternInt64x2
case OpAndUint64x2, OpOrUint64x2, OpXorUint64x2, OpAndNotUint64x2:
return OpternUint64x2
case OpAndInt8x32, OpOrInt8x32, OpXorInt8x32, OpAndNotInt8x32:
return OpternInt32x8
case OpAndUint8x32, OpOrUint8x32, OpXorUint8x32, OpAndNotUint8x32:
return OpternUint32x8
case OpAndInt16x16, OpOrInt16x16, OpXorInt16x16, OpAndNotInt16x16:
return OpternInt32x8
case OpAndUint16x16, OpOrUint16x16, OpXorUint16x16, OpAndNotUint16x16:
return OpternUint32x8
case OpAndInt32x8, OpOrInt32x8, OpXorInt32x8, OpAndNotInt32x8:
return OpternInt32x8
case OpAndUint32x8, OpOrUint32x8, OpXorUint32x8, OpAndNotUint32x8:
return OpternUint32x8
case OpAndInt64x4, OpOrInt64x4, OpXorInt64x4, OpAndNotInt64x4:
return OpternInt64x4
case OpAndUint64x4, OpOrUint64x4, OpXorUint64x4, OpAndNotUint64x4:
return OpternUint64x4
case OpAndInt8x64, OpOrInt8x64, OpXorInt8x64, OpAndNotInt8x64:
return OpternInt32x16
case OpAndUint8x64, OpOrUint8x64, OpXorUint8x64, OpAndNotUint8x64:
return OpternUint32x16
case OpAndInt16x32, OpOrInt16x32, OpXorInt16x32, OpAndNotInt16x32:
return OpternInt32x16
case OpAndUint16x32, OpOrUint16x32, OpXorUint16x32, OpAndNotUint16x32:
return OpternUint32x16
case OpAndInt32x16, OpOrInt32x16, OpXorInt32x16, OpAndNotInt32x16:
return OpternInt32x16
case OpAndUint32x16, OpOrUint32x16, OpXorUint32x16, OpAndNotUint32x16:
return OpternUint32x16
case OpAndInt64x8, OpOrInt64x8, OpXorInt64x8, OpAndNotInt64x8:
return OpternInt64x8
case OpAndUint64x8, OpOrUint64x8, OpXorUint64x8, OpAndNotUint64x8:
return OpternUint64x8
}
return op
}

View file

@ -9,6 +9,7 @@ import (
"cmd/compile/internal/types"
"cmd/internal/src"
"fmt"
"internal/buildcfg"
"math"
"sort"
"strings"
@ -612,12 +613,18 @@ func AutoVar(v *Value) (*ir.Name, int64) {
// CanSSA reports whether values of type t can be represented as a Value.
func CanSSA(t *types.Type) bool {
types.CalcSize(t)
if t.Size() > int64(4*types.PtrSize) {
if t.IsSIMD() {
return true
}
sizeLimit := int64(MaxStruct * types.PtrSize)
if t.Size() > sizeLimit {
// 4*Widthptr is an arbitrary constant. We want it
// to be at least 3*Widthptr so slices can be registerized.
// Too big and we'll introduce too much register pressure.
if !buildcfg.Experiment.SIMD {
return false
}
}
switch t.Kind() {
case types.TARRAY:
// We can't do larger arrays because dynamic indexing is
@ -636,7 +643,17 @@ func CanSSA(t *types.Type) bool {
return false
}
}
// Special check for SIMD. If the composite type
// contains SIMD vectors we can return true
// if it pass the checks below.
if !buildcfg.Experiment.SIMD {
return true
}
if t.Size() <= sizeLimit {
return true
}
i, f := t.Registers()
return i+f <= MaxStruct
default:
return true
}

View file

@ -99,6 +99,18 @@ func (s *SymABIs) ReadSymABIs(file string) {
}
}
// HasDef returns whether the given symbol has an assembly definition.
func (s *SymABIs) HasDef(sym *types.Sym) bool {
symName := sym.Linkname
if symName == "" {
symName = sym.Pkg.Prefix + "." + sym.Name
}
symName = s.canonicalize(symName)
_, hasDefABI := s.defs[symName]
return hasDefABI
}
// GenABIWrappers applies ABI information to Funcs and generates ABI
// wrapper functions where necessary.
func (s *SymABIs) GenABIWrappers() {

View file

@ -12,6 +12,7 @@ import (
"cmd/compile/internal/base"
"cmd/compile/internal/ir"
"cmd/compile/internal/ssa"
"cmd/compile/internal/typecheck"
"cmd/compile/internal/types"
"cmd/internal/sys"
)
@ -1632,6 +1633,495 @@ func initIntrinsics(cfg *intrinsicBuildConfig) {
return s.newValue1(ssa.OpCvtBoolToUint8, types.Types[types.TUINT8], args[0])
},
all...)
if buildcfg.Experiment.SIMD {
// Only enable intrinsics, if SIMD experiment.
simdIntrinsics(addF)
addF("simd", "ClearAVXUpperBits",
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
s.vars[memVar] = s.newValue1(ssa.OpAMD64VZEROUPPER, types.TypeMem, s.mem())
return nil
},
sys.AMD64)
addF(simdPackage, "Int8x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int16x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int32x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int64x2.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint8x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint16x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint32x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint64x2.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int8x32.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int16x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int32x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Int64x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint8x32.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint16x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint32x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
addF(simdPackage, "Uint64x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
sfp4 := func(method string, hwop ssa.Op, vectype *types.Type) {
addF("simd", method,
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
x, a, b, c, d, y := args[0], args[1], args[2], args[3], args[4], args[5]
if a.Op == ssa.OpConst8 && b.Op == ssa.OpConst8 && c.Op == ssa.OpConst8 && d.Op == ssa.OpConst8 {
return select4FromPair(x, a, b, c, d, y, s, hwop, vectype)
} else {
return s.callResult(n, callNormal)
}
},
sys.AMD64)
}
sfp4("Int32x4.SelectFromPair", ssa.OpconcatSelectedConstantInt32x4, types.TypeVec128)
sfp4("Uint32x4.SelectFromPair", ssa.OpconcatSelectedConstantUint32x4, types.TypeVec128)
sfp4("Float32x4.SelectFromPair", ssa.OpconcatSelectedConstantFloat32x4, types.TypeVec128)
sfp4("Int32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt32x8, types.TypeVec256)
sfp4("Uint32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint32x8, types.TypeVec256)
sfp4("Float32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat32x8, types.TypeVec256)
sfp4("Int32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt32x16, types.TypeVec512)
sfp4("Uint32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint32x16, types.TypeVec512)
sfp4("Float32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat32x16, types.TypeVec512)
sfp2 := func(method string, hwop ssa.Op, vectype *types.Type, cscimm func(i, j uint8) int64) {
addF("simd", method,
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
x, a, b, y := args[0], args[1], args[2], args[3]
if a.Op == ssa.OpConst8 && b.Op == ssa.OpConst8 {
return select2FromPair(x, a, b, y, s, hwop, vectype, cscimm)
} else {
return s.callResult(n, callNormal)
}
},
sys.AMD64)
}
sfp2("Uint64x2.SelectFromPair", ssa.OpconcatSelectedConstantUint64x2, types.TypeVec128, cscimm2)
sfp2("Int64x2.SelectFromPair", ssa.OpconcatSelectedConstantInt64x2, types.TypeVec128, cscimm2)
sfp2("Float64x2.SelectFromPair", ssa.OpconcatSelectedConstantFloat64x2, types.TypeVec128, cscimm2)
sfp2("Uint64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint64x4, types.TypeVec256, cscimm2g2)
sfp2("Int64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt64x4, types.TypeVec256, cscimm2g2)
sfp2("Float64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat64x4, types.TypeVec256, cscimm2g2)
sfp2("Uint64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint64x8, types.TypeVec512, cscimm2g4)
sfp2("Int64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt64x8, types.TypeVec512, cscimm2g4)
sfp2("Float64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat64x8, types.TypeVec512, cscimm2g4)
}
}
func cscimm4(a, b, c, d uint8) int64 {
return se(a + b<<2 + c<<4 + d<<6)
}
func cscimm2(a, b uint8) int64 {
return se(a + b<<1)
}
func cscimm2g2(a, b uint8) int64 {
g := cscimm2(a, b)
return int64(int8(g + g<<2))
}
func cscimm2g4(a, b uint8) int64 {
g := cscimm2g2(a, b)
return int64(int8(g + g<<4))
}
const (
_LLLL = iota
_HLLL
_LHLL
_HHLL
_LLHL
_HLHL
_LHHL
_HHHL
_LLLH
_HLLH
_LHLH
_HHLH
_LLHH
_HLHH
_LHHH
_HHHH
)
const (
_LL = iota
_HL
_LH
_HH
)
func select2FromPair(x, _a, _b, y *ssa.Value, s *state, op ssa.Op, t *types.Type, csc func(a, b uint8) int64) *ssa.Value {
a, b := uint8(_a.AuxInt8()), uint8(_b.AuxInt8())
pattern := (a&2)>>1 + (b & 2)
a, b = a&1, b&1
switch pattern {
case _LL:
return s.newValue2I(op, t, csc(a, b), x, x)
case _HH:
return s.newValue2I(op, t, csc(a, b), y, y)
case _LH:
return s.newValue2I(op, t, csc(a, b), x, y)
case _HL:
return s.newValue2I(op, t, csc(a, b), y, x)
}
panic("The preceding switch should have been exhaustive")
}
func select4FromPair(x, _a, _b, _c, _d, y *ssa.Value, s *state, op ssa.Op, t *types.Type) *ssa.Value {
a, b, c, d := uint8(_a.AuxInt8()), uint8(_b.AuxInt8()), uint8(_c.AuxInt8()), uint8(_d.AuxInt8())
pattern := a>>2 + (b&4)>>1 + (c & 4) + (d&4)<<1
a, b, c, d = a&3, b&3, c&3, d&3
switch pattern {
case _LLLL:
// TODO DETECT 0,1,2,3, 0,0,0,0
return s.newValue2I(op, t, cscimm4(a, b, c, d), x, x)
case _HHHH:
// TODO DETECT 0,1,2,3, 0,0,0,0
return s.newValue2I(op, t, cscimm4(a, b, c, d), y, y)
case _LLHH:
return s.newValue2I(op, t, cscimm4(a, b, c, d), x, y)
case _HHLL:
return s.newValue2I(op, t, cscimm4(a, b, c, d), y, x)
case _HLLL:
z := s.newValue2I(op, t, cscimm4(a, a, b, b), y, x)
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, x)
case _LHLL:
z := s.newValue2I(op, t, cscimm4(a, a, b, b), x, y)
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, x)
case _HLHH:
z := s.newValue2I(op, t, cscimm4(a, a, b, b), y, x)
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, y)
case _LHHH:
z := s.newValue2I(op, t, cscimm4(a, a, b, b), x, y)
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, y)
case _LLLH:
z := s.newValue2I(op, t, cscimm4(c, c, d, d), x, y)
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), x, z)
case _LLHL:
z := s.newValue2I(op, t, cscimm4(c, c, d, d), y, x)
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), x, z)
case _HHLH:
z := s.newValue2I(op, t, cscimm4(c, c, d, d), x, y)
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), y, z)
case _HHHL:
z := s.newValue2I(op, t, cscimm4(c, c, d, d), y, x)
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), y, z)
case _LHLH:
z := s.newValue2I(op, t, cscimm4(a, c, b, d), x, y)
return s.newValue2I(op, t, se(0b11_01_10_00), z, z)
case _HLHL:
z := s.newValue2I(op, t, cscimm4(b, d, a, c), x, y)
return s.newValue2I(op, t, se(0b01_11_00_10), z, z)
case _HLLH:
z := s.newValue2I(op, t, cscimm4(b, c, a, d), x, y)
return s.newValue2I(op, t, se(0b11_01_00_10), z, z)
case _LHHL:
z := s.newValue2I(op, t, cscimm4(a, d, b, c), x, y)
return s.newValue2I(op, t, se(0b01_11_10_00), z, z)
}
panic("The preceding switch should have been exhaustive")
}
// se smears the not-really-a-sign bit of a uint8 to conform to the conventions
// for representing AuxInt in ssa.
func se(x uint8) int64 {
return int64(int8(x))
}
func opLen1(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue1(op, t, args[0])
}
}
func opLen2(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue2(op, t, args[0], args[1])
}
}
func opLen2_21(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue2(op, t, args[1], args[0])
}
}
func opLen3(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue3(op, t, args[0], args[1], args[2])
}
}
var ssaVecBySize = map[int64]*types.Type{
16: types.TypeVec128,
32: types.TypeVec256,
64: types.TypeVec512,
}
func opLen3_31Zero3(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if t, ok := ssaVecBySize[args[1].Type.Size()]; !ok {
panic("unknown simd vector size")
} else {
return s.newValue3(op, t, s.newValue0(ssa.OpZeroSIMD, t), args[1], args[0])
}
}
}
func opLen3_21(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue3(op, t, args[1], args[0], args[2])
}
}
func opLen3_231(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue3(op, t, args[2], args[0], args[1])
}
}
func opLen4(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue4(op, t, args[0], args[1], args[2], args[3])
}
}
func opLen4_231(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue4(op, t, args[2], args[0], args[1], args[3])
}
}
func opLen4_31(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue4(op, t, args[2], args[1], args[0], args[3])
}
}
func immJumpTable(s *state, idx *ssa.Value, intrinsicCall *ir.CallExpr, genOp func(*state, int)) *ssa.Value {
// Make blocks we'll need.
bEnd := s.f.NewBlock(ssa.BlockPlain)
if !idx.Type.IsKind(types.TUINT8) {
panic("immJumpTable expects uint8 value")
}
// We will exhaust 0-255, so no need to check the bounds.
t := types.Types[types.TUINTPTR]
idx = s.conv(nil, idx, idx.Type, t)
b := s.curBlock
b.Kind = ssa.BlockJumpTable
b.Pos = intrinsicCall.Pos()
if base.Flag.Cfg.SpectreIndex {
// Potential Spectre vulnerability hardening?
idx = s.newValue2(ssa.OpSpectreSliceIndex, t, idx, s.uintptrConstant(255))
}
b.SetControl(idx)
targets := [256]*ssa.Block{}
for i := range 256 {
t := s.f.NewBlock(ssa.BlockPlain)
targets[i] = t
b.AddEdgeTo(t)
}
s.endBlock()
for i, t := range targets {
s.startBlock(t)
genOp(s, i)
if t.Kind != ssa.BlockExit {
t.AddEdgeTo(bEnd)
}
s.endBlock()
}
s.startBlock(bEnd)
ret := s.variable(intrinsicCall, intrinsicCall.Type())
return ret
}
func opLen1Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 {
return s.newValue1I(op, t, args[1].AuxInt<<int64(offset), args[0])
}
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue1I(op, t, int64(int8(idx<<offset)), args[0])
})
}
}
func opLen2Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 {
return s.newValue2I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2])
}
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset)), args[0], args[2])
})
}
}
func opLen3Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 {
return s.newValue3I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2], args[3])
}
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue3I(op, t, int64(int8(idx<<offset)), args[0], args[2], args[3])
})
}
}
func opLen2Imm8_2I(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[2].Op == ssa.OpConst8 {
return s.newValue2I(op, t, args[2].AuxInt<<int64(offset), args[0], args[1])
}
return immJumpTable(s, args[2], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset)), args[0], args[1])
})
}
}
// Two immediates instead of just 1. Offset is ignored, so it is a _ parameter instead.
func opLen2Imm8_II(op ssa.Op, t *types.Type, _ int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 && args[2].Op == ssa.OpConst8 && args[1].AuxInt & ^3 == 0 && args[2].AuxInt & ^3 == 0 {
i1, i2 := args[1].AuxInt, args[2].AuxInt
return s.newValue2I(op, t, int64(int8(i1+i2<<4)), args[0], args[3])
}
four := s.constInt64(types.Types[types.TUINT8], 4)
shifted := s.newValue2(ssa.OpLsh8x8, types.Types[types.TUINT8], args[2], four)
combined := s.newValue2(ssa.OpAdd8, types.Types[types.TUINT8], args[1], shifted)
return immJumpTable(s, combined, n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
// TODO for "zeroing" values, panic instead.
if idx & ^(3+3<<4) == 0 {
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx)), args[0], args[3])
} else {
sNew.rtcall(ir.Syms.PanicSimdImm, false, nil)
}
})
}
}
// The assembler requires the imm value of a SHA1RNDS4 instruction to be one of 0,1,2,3...
func opLen2Imm8_SHA1RNDS4(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 {
return s.newValue2I(op, t, (args[1].AuxInt<<int64(offset))&0b11, args[0], args[2])
}
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset))&0b11, args[0], args[2])
})
}
}
func opLen3Imm8_2I(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[2].Op == ssa.OpConst8 {
return s.newValue3I(op, t, args[2].AuxInt<<int64(offset), args[0], args[1], args[3])
}
return immJumpTable(s, args[2], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue3I(op, t, int64(int8(idx<<offset)), args[0], args[1], args[3])
})
}
}
func opLen4Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
if args[1].Op == ssa.OpConst8 {
return s.newValue4I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2], args[3], args[4])
}
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
// Encode as int8 due to requirement of AuxInt, check its comment for details.
s.vars[n] = sNew.newValue4I(op, t, int64(int8(idx<<offset)), args[0], args[2], args[3], args[4])
})
}
}
func simdLoad() func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue2(ssa.OpLoad, n.Type(), args[0], s.mem())
}
}
func simdStore() func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
s.store(args[0].Type, args[1], args[0])
return nil
}
}
var cvtVToMaskOpcodes = map[int]map[int]ssa.Op{
8: {16: ssa.OpCvt16toMask8x16, 32: ssa.OpCvt32toMask8x32, 64: ssa.OpCvt64toMask8x64},
16: {8: ssa.OpCvt8toMask16x8, 16: ssa.OpCvt16toMask16x16, 32: ssa.OpCvt32toMask16x32},
32: {4: ssa.OpCvt8toMask32x4, 8: ssa.OpCvt8toMask32x8, 16: ssa.OpCvt16toMask32x16},
64: {2: ssa.OpCvt8toMask64x2, 4: ssa.OpCvt8toMask64x4, 8: ssa.OpCvt8toMask64x8},
}
var cvtMaskToVOpcodes = map[int]map[int]ssa.Op{
8: {16: ssa.OpCvtMask8x16to16, 32: ssa.OpCvtMask8x32to32, 64: ssa.OpCvtMask8x64to64},
16: {8: ssa.OpCvtMask16x8to8, 16: ssa.OpCvtMask16x16to16, 32: ssa.OpCvtMask16x32to32},
32: {4: ssa.OpCvtMask32x4to8, 8: ssa.OpCvtMask32x8to8, 16: ssa.OpCvtMask32x16to16},
64: {2: ssa.OpCvtMask64x2to8, 4: ssa.OpCvtMask64x4to8, 8: ssa.OpCvtMask64x8to8},
}
func simdCvtVToMask(elemBits, lanes int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
op := cvtVToMaskOpcodes[elemBits][lanes]
if op == 0 {
panic(fmt.Sprintf("Unknown mask shape: Mask%dx%d", elemBits, lanes))
}
return s.newValue1(op, types.TypeMask, args[0])
}
}
func simdCvtMaskToV(elemBits, lanes int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
op := cvtMaskToVOpcodes[elemBits][lanes]
if op == 0 {
panic(fmt.Sprintf("Unknown mask shape: Mask%dx%d", elemBits, lanes))
}
return s.newValue1(op, n.Type(), args[0])
}
}
func simdMaskedLoad(op ssa.Op) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return s.newValue3(op, n.Type(), args[0], args[1], s.mem())
}
}
func simdMaskedStore(op ssa.Op) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
s.vars[memVar] = s.newValue4A(op, types.TypeMem, args[0].Type, args[1], args[2], args[0], s.mem())
return nil
}
}
// findIntrinsic returns a function which builds the SSA equivalent of the
@ -1657,7 +2147,8 @@ func findIntrinsic(sym *types.Sym) intrinsicBuilder {
fn := sym.Name
if ssa.IntrinsicsDisable {
if pkg == "internal/runtime/sys" && (fn == "GetCallerPC" || fn == "GrtCallerSP" || fn == "GetClosurePtr") {
if pkg == "internal/runtime/sys" && (fn == "GetCallerPC" || fn == "GrtCallerSP" || fn == "GetClosurePtr") ||
pkg == "internal/simd" || pkg == "simd" { // TODO after simd has been moved to package simd, remove internal/simd
// These runtime functions don't have definitions, must be intrinsics.
} else {
return nil
@ -1672,7 +2163,74 @@ func IsIntrinsicCall(n *ir.CallExpr) bool {
}
name, ok := n.Fun.(*ir.Name)
if !ok {
if n.Fun.Op() == ir.OMETHEXPR {
if meth := ir.MethodExprName(n.Fun); meth != nil {
if fn := meth.Func; fn != nil {
return IsIntrinsicSym(fn.Sym())
}
}
}
return false
}
return findIntrinsic(name.Sym()) != nil
return IsIntrinsicSym(name.Sym())
}
func IsIntrinsicSym(sym *types.Sym) bool {
return findIntrinsic(sym) != nil
}
// GenIntrinsicBody generates the function body for a bodyless intrinsic.
// This is used when the intrinsic is used in a non-call context, e.g.
// as a function pointer, or (for a method) being referenced from the type
// descriptor.
//
// The compiler already recognizes a call to fn as an intrinsic and can
// directly generate code for it. So we just fill in the body with a call
// to fn.
func GenIntrinsicBody(fn *ir.Func) {
if ir.CurFunc != nil {
base.FatalfAt(fn.Pos(), "enqueueFunc %v inside %v", fn, ir.CurFunc)
}
if base.Flag.LowerR != 0 {
fmt.Println("generate intrinsic for", ir.FuncName(fn))
}
pos := fn.Pos()
ft := fn.Type()
var ret ir.Node
// For a method, it usually starts with an ODOTMETH (pre-typecheck) or
// OMETHEXPR (post-typecheck) referencing the method symbol without the
// receiver type, and Walk rewrites it to a call directly to the
// type-qualified method symbol, moving the receiver to an argument.
// Here fn has already the type-qualified method symbol, and it is hard
// to get the unqualified symbol. So we just generate the post-Walk form
// and mark it typechecked and Walked.
call := ir.NewCallExpr(pos, ir.OCALLFUNC, fn.Nname, nil)
call.Args = ir.RecvParamNames(ft)
call.IsDDD = ft.IsVariadic()
typecheck.Exprs(call.Args)
call.SetTypecheck(1)
call.SetWalked(true)
ret = call
if ft.NumResults() > 0 {
if ft.NumResults() == 1 {
call.SetType(ft.Result(0).Type)
} else {
call.SetType(ft.ResultsTuple())
}
n := ir.NewReturnStmt(base.Pos, nil)
n.Results = []ir.Node{call}
ret = n
}
fn.Body.Append(ret)
if base.Flag.LowerR != 0 {
ir.DumpList("generate intrinsic body", fn.Body)
}
ir.CurFunc = fn
typecheck.Stmts(fn.Body)
ir.CurFunc = nil // we know CurFunc is nil at entry
}

View file

@ -16,6 +16,9 @@ import (
var updateIntrinsics = flag.Bool("update", false, "Print an updated intrinsics table")
// TODO turn on after SIMD is stable. The time burned keeping this test happy during SIMD development has already well exceeded any plausible benefit.
var simd = flag.Bool("simd", false, "Also check SIMD intrinsics; for now, it is noisy and not helpful")
type testIntrinsicKey struct {
archName string
pkg string
@ -1403,13 +1406,13 @@ func TestIntrinsics(t *testing.T) {
gotIntrinsics[testIntrinsicKey{ik.arch.Name, ik.pkg, ik.fn}] = struct{}{}
}
for ik, _ := range gotIntrinsics {
if _, found := wantIntrinsics[ik]; !found {
if _, found := wantIntrinsics[ik]; !found && (ik.pkg != "simd" || *simd) {
t.Errorf("Got unwanted intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
}
}
for ik, _ := range wantIntrinsics {
if _, found := gotIntrinsics[ik]; !found {
if _, found := gotIntrinsics[ik]; !found && (ik.pkg != "simd" || *simd) {
t.Errorf("Want missing intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
}
}

File diff suppressed because it is too large Load diff

View file

@ -156,6 +156,7 @@ func InitConfig() {
ir.Syms.Panicnildottype = typecheck.LookupRuntimeFunc("panicnildottype")
ir.Syms.Panicoverflow = typecheck.LookupRuntimeFunc("panicoverflow")
ir.Syms.Panicshift = typecheck.LookupRuntimeFunc("panicshift")
ir.Syms.PanicSimdImm = typecheck.LookupRuntimeFunc("panicSimdImm")
ir.Syms.Racefuncenter = typecheck.LookupRuntimeFunc("racefuncenter")
ir.Syms.Racefuncexit = typecheck.LookupRuntimeFunc("racefuncexit")
ir.Syms.Raceread = typecheck.LookupRuntimeFunc("raceread")
@ -165,9 +166,10 @@ func InitConfig() {
ir.Syms.TypeAssert = typecheck.LookupRuntimeFunc("typeAssert")
ir.Syms.WBZero = typecheck.LookupRuntimeFunc("wbZero")
ir.Syms.WBMove = typecheck.LookupRuntimeFunc("wbMove")
ir.Syms.X86HasAVX = typecheck.LookupRuntimeVar("x86HasAVX") // bool
ir.Syms.X86HasFMA = typecheck.LookupRuntimeVar("x86HasFMA") // bool
ir.Syms.X86HasPOPCNT = typecheck.LookupRuntimeVar("x86HasPOPCNT") // bool
ir.Syms.X86HasSSE41 = typecheck.LookupRuntimeVar("x86HasSSE41") // bool
ir.Syms.X86HasFMA = typecheck.LookupRuntimeVar("x86HasFMA") // bool
ir.Syms.ARMHasVFPv4 = typecheck.LookupRuntimeVar("armHasVFPv4") // bool
ir.Syms.ARM64HasATOMICS = typecheck.LookupRuntimeVar("arm64HasATOMICS") // bool
ir.Syms.Loong64HasLAMCAS = typecheck.LookupRuntimeVar("loong64HasLAMCAS") // bool
@ -600,6 +602,9 @@ func buildssa(fn *ir.Func, worker int, isPgoHot bool) *ssa.Func {
// TODO figure out exactly what's unused, don't spill it. Make liveness fine-grained, also.
for _, p := range params.InParams() {
typs, offs := p.RegisterTypesAndOffsets()
if len(offs) < len(typs) {
s.Fatalf("len(offs)=%d < len(typs)=%d, params=\n%s", len(offs), len(typs), params)
}
for i, t := range typs {
o := offs[i] // offset within parameter
fo := p.FrameOffset(params) // offset of parameter in frame
@ -1333,6 +1338,11 @@ func (s *state) newValue4(op ssa.Op, t *types.Type, arg0, arg1, arg2, arg3 *ssa.
return s.curBlock.NewValue4(s.peekPos(), op, t, arg0, arg1, arg2, arg3)
}
// newValue4A adds a new value with four arguments and an aux value to the current block.
func (s *state) newValue4A(op ssa.Op, t *types.Type, aux ssa.Aux, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue4A(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
}
// newValue4I adds a new value with four arguments and an auxint value to the current block.
func (s *state) newValue4I(op ssa.Op, t *types.Type, aux int64, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
return s.curBlock.NewValue4I(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
@ -1462,7 +1472,7 @@ func (s *state) instrument(t *types.Type, addr *ssa.Value, kind instrumentKind)
// If it is instrumenting for MSAN or ASAN and t is a struct type, it instruments
// operation for each field, instead of for the whole struct.
func (s *state) instrumentFields(t *types.Type, addr *ssa.Value, kind instrumentKind) {
if !(base.Flag.MSan || base.Flag.ASan) || !t.IsStruct() {
if !(base.Flag.MSan || base.Flag.ASan) || !isStructNotSIMD(t) {
s.instrument(t, addr, kind)
return
}
@ -4585,7 +4595,7 @@ func (s *state) zeroVal(t *types.Type) *ssa.Value {
return s.constInterface(t)
case t.IsSlice():
return s.constSlice(t)
case t.IsStruct():
case isStructNotSIMD(t):
n := t.NumFields()
v := s.entryNewValue0(ssa.OpStructMake, t)
for i := 0; i < n; i++ {
@ -4599,6 +4609,8 @@ func (s *state) zeroVal(t *types.Type) *ssa.Value {
case 1:
return s.entryNewValue1(ssa.OpArrayMake1, t, s.zeroVal(t.Elem()))
}
case t.IsSIMD():
return s.newValue0(ssa.OpZeroSIMD, t)
}
s.Fatalf("zero for type %v not implemented", t)
return nil
@ -5578,7 +5590,7 @@ func (s *state) storeType(t *types.Type, left, right *ssa.Value, skip skipMask,
// do *left = right for all scalar (non-pointer) parts of t.
func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip skipMask) {
switch {
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex():
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex() || t.IsSIMD():
s.store(t, left, right)
case t.IsPtrShaped():
if t.IsPtr() && t.Elem().NotInHeap() {
@ -5607,7 +5619,7 @@ func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip ski
// itab field doesn't need a write barrier (even though it is a pointer).
itab := s.newValue1(ssa.OpITab, s.f.Config.Types.BytePtr, right)
s.store(types.Types[types.TUINTPTR], left, itab)
case t.IsStruct():
case isStructNotSIMD(t):
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
@ -5644,7 +5656,7 @@ func (s *state) storeTypePtrs(t *types.Type, left, right *ssa.Value) {
idata := s.newValue1(ssa.OpIData, s.f.Config.Types.BytePtr, right)
idataAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.BytePtrPtr, s.config.PtrSize, left)
s.store(s.f.Config.Types.BytePtr, idataAddr, idata)
case t.IsStruct():
case isStructNotSIMD(t):
n := t.NumFields()
for i := 0; i < n; i++ {
ft := t.FieldType(i)
@ -6757,7 +6769,7 @@ func EmitArgInfo(f *ir.Func, abiInfo *abi.ABIParamResultInfo) *obj.LSym {
uintptrTyp := types.Types[types.TUINTPTR]
isAggregate := func(t *types.Type) bool {
return t.IsStruct() || t.IsArray() || t.IsComplex() || t.IsInterface() || t.IsString() || t.IsSlice()
return isStructNotSIMD(t) || t.IsArray() || t.IsComplex() || t.IsInterface() || t.IsString() || t.IsSlice()
}
wOff := 0
@ -6817,7 +6829,7 @@ func EmitArgInfo(f *ir.Func, abiInfo *abi.ABIParamResultInfo) *obj.LSym {
}
baseOffset += t.Elem().Size()
}
case t.IsStruct():
case isStructNotSIMD(t):
if t.NumFields() == 0 {
n++ // {} counts as a component
break
@ -7837,7 +7849,7 @@ func (s *State) UseArgs(n int64) {
// fieldIdx finds the index of the field referred to by the ODOT node n.
func fieldIdx(n *ir.SelectorExpr) int {
t := n.X.Type()
if !t.IsStruct() {
if !isStructNotSIMD(t) {
panic("ODOT's LHS is not a struct")
}
@ -8045,4 +8057,8 @@ func SpillSlotAddr(spill ssa.Spill, baseReg int16, extraOffset int64) obj.Addr {
}
}
func isStructNotSIMD(t *types.Type) bool {
return t.IsStruct() && !t.IsSIMD()
}
var BoundsCheckFunc [ssa.BoundsKindCount]*obj.LSym

View file

@ -0,0 +1,41 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package test
import (
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"internal/buildcfg"
"testing"
)
// This file contains tests for ssa values, types and their utility functions.
func TestCanSSA(t *testing.T) {
i64 := types.Types[types.TINT64]
v128 := types.TypeVec128
s1 := mkstruct(i64, mkstruct(i64, i64, i64, i64))
if ssa.CanSSA(s1) {
// Test size check for struct.
t.Errorf("CanSSA(%v) returned true, expected false", s1)
}
a1 := types.NewArray(s1, 1)
if ssa.CanSSA(a1) {
// Test size check for array.
t.Errorf("CanSSA(%v) returned true, expected false", a1)
}
if buildcfg.Experiment.SIMD {
s2 := mkstruct(v128, v128, v128, v128)
if !ssa.CanSSA(s2) {
// Test size check for SIMD struct special case.
t.Errorf("CanSSA(%v) returned false, expected true", s2)
}
a2 := types.NewArray(s2, 1)
if !ssa.CanSSA(a2) {
// Test size check for SIMD array special case.
t.Errorf("CanSSA(%v) returned false, expected true", a2)
}
}
}

View file

@ -292,9 +292,10 @@ func libfuzzerHookEqualFold(string, string, uint)
func addCovMeta(p unsafe.Pointer, len uint32, hash [16]byte, pkpath string, pkgId int, cmode uint8, cgran uint8) uint32
// architecture variants
var x86HasAVX bool
var x86HasFMA bool
var x86HasPOPCNT bool
var x86HasSSE41 bool
var x86HasFMA bool
var armHasVFPv4 bool
var arm64HasATOMICS bool
var loong64HasLAMCAS bool

View file

@ -239,9 +239,10 @@ var runtimeDecls = [...]struct {
{"libfuzzerHookStrCmp", funcTag, 163},
{"libfuzzerHookEqualFold", funcTag, 163},
{"addCovMeta", funcTag, 165},
{"x86HasAVX", varTag, 6},
{"x86HasFMA", varTag, 6},
{"x86HasPOPCNT", varTag, 6},
{"x86HasSSE41", varTag, 6},
{"x86HasFMA", varTag, 6},
{"armHasVFPv4", varTag, 6},
{"arm64HasATOMICS", varTag, 6},
{"loong64HasLAMCAS", varTag, 6},

View file

@ -10,6 +10,7 @@ import (
"cmd/compile/internal/base"
"cmd/internal/src"
"internal/buildcfg"
"internal/types/errors"
)
@ -452,6 +453,31 @@ func CalcSize(t *Type) {
ResumeCheckSize()
}
// simdify marks as type as "SIMD", either as a tag field,
// or having the SIMD attribute. The tag field is a marker
// type used to identify a struct that is not really a struct.
// A SIMD type is allocated to a vector register (on amd64,
// xmm, ymm, or zmm). The fields of a SIMD type are ignored
// by the compiler except for the space that they reserve.
func simdify(st *Type, isTag bool) {
st.align = 8
st.alg = ANOALG // not comparable with ==
st.intRegs = 0
st.isSIMD = true
if isTag {
st.width = 0
st.isSIMDTag = true
st.floatRegs = 0
} else {
st.floatRegs = 1
}
// if st.Sym() != nil {
// base.Warn("Simdify %s, %v, %d", st.Sym().Name, isTag, st.width)
// } else {
// base.Warn("Simdify %v, %v, %d", st, isTag, st.width)
// }
}
// CalcStructSize calculates the size of t,
// filling in t.width, t.align, t.intRegs, and t.floatRegs,
// even if size calculation is otherwise disabled.
@ -464,10 +490,27 @@ func CalcStructSize(t *Type) {
switch {
case sym.Name == "align64" && isAtomicStdPkg(sym.Pkg):
maxAlign = 8
case buildcfg.Experiment.SIMD && (sym.Pkg.Path == "internal/simd" || sym.Pkg.Path == "simd") && len(t.Fields()) >= 1:
// This gates the experiment -- without it, no user-visible types can be "simd".
// The SSA-visible SIMD types remain.
// TODO after simd has been moved to package simd, remove internal/simd.
switch sym.Name {
case "v128":
simdify(t, true)
return
case "v256":
simdify(t, true)
return
case "v512":
simdify(t, true)
return
}
}
}
fields := t.Fields()
size := calcStructOffset(t, fields, 0)
// For non-zero-sized structs which end in a zero-sized field, we
@ -540,6 +583,11 @@ func CalcStructSize(t *Type) {
break
}
}
if len(t.Fields()) >= 1 && t.Fields()[0].Type.isSIMDTag {
// this catches `type Foo simd.Whatever` -- Foo is also SIMD.
simdify(t, false)
}
}
// CalcArraySize calculates the size of t,

View file

@ -202,6 +202,7 @@ type Type struct {
flags bitset8
alg AlgKind // valid if Align > 0
isSIMDTag, isSIMD bool // tag is the marker type, isSIMD means has marker type
// size of prefix of object that contains all pointers. valid if Align > 0.
// Note that for pointers, this is always PtrSize even if the element type
@ -594,6 +595,12 @@ func newSSA(name string) *Type {
return t
}
func newSIMD(name string) *Type {
t := newSSA(name)
t.isSIMD = true
return t
}
// NewMap returns a new map Type with key type k and element (aka value) type v.
func NewMap(k, v *Type) *Type {
t := newType(TMAP)
@ -982,17 +989,16 @@ func (t *Type) ArgWidth() int64 {
return t.extra.(*Func).Argwid
}
// Size returns the width of t in bytes.
func (t *Type) Size() int64 {
if t.kind == TSSA {
if t == TypeInt128 {
return 16
}
return 0
return t.width
}
CalcSize(t)
return t.width
}
// Alignment returns the alignment of t in bytes.
func (t *Type) Alignment() int64 {
CalcSize(t)
return int64(t.align)
@ -1598,12 +1604,26 @@ var (
TypeFlags = newSSA("flags")
TypeVoid = newSSA("void")
TypeInt128 = newSSA("int128")
TypeVec128 = newSIMD("vec128")
TypeVec256 = newSIMD("vec256")
TypeVec512 = newSIMD("vec512")
TypeMask = newSIMD("mask") // not a vector, not 100% sure what this should be.
TypeResultMem = newResults([]*Type{TypeMem})
)
func init() {
TypeInt128.width = 16
TypeInt128.align = 8
TypeVec128.width = 16
TypeVec128.align = 8
TypeVec256.width = 32
TypeVec256.align = 8
TypeVec512.width = 64
TypeVec512.align = 8
TypeMask.width = 8 // This will depend on the architecture; spilling will be "interesting".
TypeMask.align = 8
}
// NewNamed returns a new named type for the given type name. obj should be an
@ -1963,3 +1983,7 @@ var SimType [NTYPE]Kind
// Fake package for shape types (see typecheck.Shapify()).
var ShapePkg = NewPkg("go.shape", "go.shape")
func (t *Type) IsSIMD() bool {
return t.isSIMD
}

View file

@ -361,6 +361,8 @@ var excluded = map[string]bool{
"builtin": true,
"cmd/compile/internal/ssa/_gen": true,
"runtime/_mkmalloc": true,
"simd/_gen/simdgen": true,
"simd/_gen/unify": true,
}
// printPackageMu synchronizes the printing of type-checked package files in

View file

@ -956,7 +956,9 @@ func (t *tester) registerTests() {
// which is darwin,linux,windows/amd64 and darwin/arm64.
//
// The same logic applies to the release notes that correspond to each api/next file.
if goos == "darwin" || ((goos == "linux" || goos == "windows") && goarch == "amd64") {
//
// TODO: remove the exclusion of goexperiment simd right before dev.simd branch is merged to master.
if goos == "darwin" || ((goos == "linux" || goos == "windows") && (goarch == "amd64" && !strings.Contains(goexperiment, "simd"))) {
t.registerTest("API release note check", &goTest{variant: "check", pkg: "cmd/relnote", testFlags: []string{"-check"}})
t.registerTest("API check", &goTest{variant: "check", pkg: "cmd/api", timeout: 5 * time.Minute, testFlags: []string{"-check"}})
}

View file

@ -236,7 +236,7 @@ func progedit(ctxt *obj.Link, p *obj.Prog, newprog obj.ProgAlloc) {
// Rewrite float constants to values stored in memory.
switch p.As {
// Convert AMOVSS $(0), Xx to AXORPS Xx, Xx
case AMOVSS:
case AMOVSS, AVMOVSS:
if p.From.Type == obj.TYPE_FCONST {
// f == 0 can't be used here due to -0, so use Float64bits
if f := p.From.Val.(float64); math.Float64bits(f) == 0 {
@ -272,7 +272,7 @@ func progedit(ctxt *obj.Link, p *obj.Prog, newprog obj.ProgAlloc) {
p.From.Offset = 0
}
case AMOVSD:
case AMOVSD, AVMOVSD:
// Convert AMOVSD $(0), Xx to AXORPS Xx, Xx
if p.From.Type == obj.TYPE_FCONST {
// f == 0 can't be used here due to -0, so use Float64bits

View file

@ -67,7 +67,7 @@ var (
// dirs are the directories to look for *.go files in.
// TODO(bradfitz): just use all directories?
dirs = []string{".", "ken", "chan", "interface", "internal/runtime/sys", "syntax", "dwarf", "fixedbugs", "codegen", "abi", "typeparam", "typeparam/mdempsky", "arenas"}
dirs = []string{".", "ken", "chan", "interface", "internal/runtime/sys", "syntax", "dwarf", "fixedbugs", "codegen", "abi", "typeparam", "typeparam/mdempsky", "arenas", "simd"}
)
// Test is the main entrypoint that runs tests in the GOROOT/test directory.

View file

@ -54,6 +54,7 @@ var depsRules = `
internal/goexperiment,
internal/goos,
internal/goversion,
internal/itoa,
internal/nettrace,
internal/platform,
internal/profilerecord,
@ -71,6 +72,8 @@ var depsRules = `
internal/byteorder, internal/cpu, internal/goarch < internal/chacha8rand;
internal/goarch, math/bits < internal/strconv;
internal/cpu, internal/strconv < simd;
# RUNTIME is the core runtime group of packages, all of them very light-weight.
internal/abi,
internal/chacha8rand,
@ -80,6 +83,7 @@ var depsRules = `
internal/godebugs,
internal/goexperiment,
internal/goos,
internal/itoa,
internal/profilerecord,
internal/strconv,
internal/trace/tracev2,
@ -697,6 +701,9 @@ var depsRules = `
FMT, DEBUG, flag, runtime/trace, internal/sysinfo, math/rand
< testing;
testing, math
< simd/internal/test_helpers;
log/slog, testing
< testing/slogtest;

View file

@ -19,6 +19,6 @@ echo "// Copyright 2022 The Go Authors. All rights reserved.
package comment
var stdPkgs = []string{"
go list std | grep -v / | sort | sed 's/.*/"&",/'
GOEXPERIMENT=none go list std | grep -v / | sort | sed 's/.*/"&",/'
echo "}"
) | gofmt >std.go.tmp && mv std.go.tmp std.go

View file

@ -13,7 +13,9 @@ import (
)
func TestStd(t *testing.T) {
out, err := testenv.Command(t, testenv.GoToolPath(t), "list", "std").CombinedOutput()
cmd := testenv.Command(t, testenv.GoToolPath(t), "list", "std")
cmd.Env = append(cmd.Environ(), "GOEXPERIMENT=none")
out, err := cmd.CombinedOutput()
if err != nil {
t.Fatalf("%v\n%s", err, out)
}

View file

@ -361,6 +361,8 @@ var excluded = map[string]bool{
"builtin": true,
"cmd/compile/internal/ssa/_gen": true,
"runtime/_mkmalloc": true,
"simd/_gen/simdgen": true,
"simd/_gen/unify": true,
}
// printPackageMu synchronizes the printing of type-checked package files in

View file

@ -88,8 +88,6 @@ func ParseGOEXPERIMENT(goos, goarch, goexp string) (*ExperimentFlags, error) {
SizeSpecializedMalloc: true,
GreenTeaGC: true,
}
// Start with the statically enabled set of experiments.
flags := &ExperimentFlags{
Flags: baseline,
baseline: baseline,

View file

@ -25,17 +25,22 @@ var X86 struct {
HasAES bool
HasADX bool
HasAVX bool
HasAVXVNNI bool
HasAVX2 bool
HasAVX512 bool // Virtual feature: F+CD+BW+DQ+VL
HasAVX512F bool
HasAVX512CD bool
HasAVX512BITALG bool
HasAVX512BW bool
HasAVX512DQ bool
HasAVX512VL bool
HasAVX512VPCLMULQDQ bool
HasAVX512GFNI bool
HasAVX512VAES bool
HasAVX512VNNI bool
HasAVX512VBMI bool
HasAVX512VBMI2 bool
HasAVX512BITALG bool
HasAVX512VPOPCNTDQ bool
HasAVX512VPCLMULQDQ bool
HasBMI1 bool
HasBMI2 bool
HasERMS bool

View file

@ -6,8 +6,6 @@
package cpu
import _ "unsafe" // for linkname
func osInit() {
// macOS 12 moved these to the hw.optional.arm tree, but as of Go 1.24 we
// still support macOS 11. See [Determine Encryption Capabilities].
@ -29,24 +27,3 @@ func osInit() {
ARM64.HasSHA1 = true
ARM64.HasSHA2 = true
}
//go:noescape
func getsysctlbyname(name []byte) (int32, int32)
// sysctlEnabled should be an internal detail,
// but widely used packages access it using linkname.
// Notable members of the hall of shame include:
// - github.com/bytedance/gopkg
// - github.com/songzhibin97/gkit
//
// Do not remove or change the type signature.
// See go.dev/issue/67401.
//
//go:linkname sysctlEnabled
func sysctlEnabled(name []byte) bool {
ret, value := getsysctlbyname(name)
if ret < 0 {
return false
}
return value > 0
}

View file

@ -0,0 +1,72 @@
// Copyright 2020 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:build darwin && !ios
package cpu
import _ "unsafe" // for linkname
// Pushed from runtime.
//
//go:noescape
func sysctlbynameInt32(name []byte) (int32, int32)
// Pushed from runtime.
//
//go:noescape
func sysctlbynameBytes(name, out []byte) int32
// sysctlEnabled should be an internal detail,
// but widely used packages access it using linkname.
// Notable members of the hall of shame include:
// - github.com/bytedance/gopkg
// - github.com/songzhibin97/gkit
//
// Do not remove or change the type signature.
// See go.dev/issue/67401.
//
//go:linkname sysctlEnabled
func sysctlEnabled(name []byte) bool {
ret, value := sysctlbynameInt32(name)
if ret < 0 {
return false
}
return value > 0
}
// darwinKernelVersionCheck reports if Darwin kernel version is at
// least major.minor.patch.
//
// Code borrowed from x/sys/cpu.
func darwinKernelVersionCheck(major, minor, patch int) bool {
var release [256]byte
ret := sysctlbynameBytes([]byte("kern.osrelease\x00"), release[:])
if ret < 0 {
return false
}
var mmp [3]int
c := 0
Loop:
for _, b := range release[:] {
switch {
case b >= '0' && b <= '9':
mmp[c] = 10*mmp[c] + int(b-'0')
case b == '.':
c++
if c > 2 {
return false
}
case b == 0:
break Loop
default:
return false
}
}
if c != 2 {
return false
}
return mmp[0] > major || mmp[0] == major && (mmp[1] > minor || mmp[1] == minor && mmp[2] >= patch)
}

View file

@ -18,11 +18,21 @@ func xgetbv() (eax, edx uint32)
func getGOAMD64level() int32
const (
// Bits returned in ECX for CPUID EAX=0x1 ECX=0x0
// eax bits
cpuid_AVXVNNI = 1 << 4
// ecx bits
cpuid_SSE3 = 1 << 0
cpuid_PCLMULQDQ = 1 << 1
cpuid_AVX512VBMI = 1 << 1
cpuid_AVX512VBMI2 = 1 << 6
cpuid_SSSE3 = 1 << 9
cpuid_AVX512GFNI = 1 << 8
cpuid_AVX512VAES = 1 << 9
cpuid_AVX512VNNI = 1 << 11
cpuid_AVX512BITALG = 1 << 12
cpuid_FMA = 1 << 12
cpuid_AVX512VPOPCNTDQ = 1 << 14
cpuid_SSE41 = 1 << 19
cpuid_SSE42 = 1 << 20
cpuid_POPCNT = 1 << 23
@ -105,6 +115,7 @@ func doinit() {
maxID, _, _, _ := cpuid(0, 0)
if maxID < 1 {
osInit()
return
}
@ -149,10 +160,11 @@ func doinit() {
X86.HasAVX = isSet(ecx1, cpuid_AVX) && osSupportsAVX
if maxID < 7 {
osInit()
return
}
_, ebx7, ecx7, edx7 := cpuid(7, 0)
eax7, ebx7, ecx7, edx7 := cpuid(7, 0)
X86.HasBMI1 = isSet(ebx7, cpuid_BMI1)
X86.HasAVX2 = isSet(ebx7, cpuid_AVX2) && osSupportsAVX
X86.HasBMI2 = isSet(ebx7, cpuid_BMI2)
@ -166,6 +178,13 @@ func doinit() {
X86.HasAVX512BW = isSet(ebx7, cpuid_AVX512BW)
X86.HasAVX512DQ = isSet(ebx7, cpuid_AVX512DQ)
X86.HasAVX512VL = isSet(ebx7, cpuid_AVX512VL)
X86.HasAVX512GFNI = isSet(ecx7, cpuid_AVX512GFNI)
X86.HasAVX512BITALG = isSet(ecx7, cpuid_AVX512BITALG)
X86.HasAVX512VPOPCNTDQ = isSet(ecx7, cpuid_AVX512VPOPCNTDQ)
X86.HasAVX512VBMI = isSet(ecx7, cpuid_AVX512VBMI)
X86.HasAVX512VBMI2 = isSet(ecx7, cpuid_AVX512VBMI2)
X86.HasAVX512VAES = isSet(ecx7, cpuid_AVX512VAES)
X86.HasAVX512VNNI = isSet(ecx7, cpuid_AVX512VNNI)
X86.HasAVX512VPCLMULQDQ = isSet(ecx7, cpuid_AVX512VPCLMULQDQ)
X86.HasAVX512VBMI = isSet(ecx7, cpuid_AVX512_VBMI)
X86.HasAVX512VBMI2 = isSet(ecx7, cpuid_AVX512_VBMI2)
@ -179,6 +198,7 @@ func doinit() {
maxExtendedInformation, _, _, _ = cpuid(0x80000000, 0)
if maxExtendedInformation < 0x80000001 {
osInit()
return
}
@ -195,6 +215,15 @@ func doinit() {
// included in AVX10.1.
X86.HasAVX512 = X86.HasAVX512F && X86.HasAVX512CD && X86.HasAVX512BW && X86.HasAVX512DQ && X86.HasAVX512VL
}
if eax7 >= 1 {
eax71, _, _, _ := cpuid(7, 1)
if X86.HasAVX {
X86.HasAVXVNNI = isSet(4, eax71)
}
}
osInit()
}
func isSet(hwc uint32, value uint32) bool {

View file

@ -0,0 +1,23 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:build (386 || amd64) && darwin && !ios
package cpu
func osInit() {
if isRosetta() && darwinKernelVersionCheck(24, 0, 0) {
// Apparently, on macOS 15 (Darwin kernel version 24) or newer,
// Rosetta 2 supports AVX1 and 2. However, neither CPUID nor
// sysctl says it has AVX. Detect this situation here and report
// AVX1 and 2 as supported.
// TODO: check if any other feature is actually supported.
X86.HasAVX = true
X86.HasAVX2 = true
}
}
func isRosetta() bool {
return sysctlEnabled([]byte("sysctl.proc_translated\x00"))
}

View file

@ -0,0 +1,9 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:build (386 || amd64) && (!darwin || ios)
package cpu
func osInit() {}

View file

@ -0,0 +1,8 @@
// Code generated by mkconsts.go. DO NOT EDIT.
//go:build !goexperiment.simd
package goexperiment
const SIMD = false
const SIMDInt = 0

View file

@ -0,0 +1,8 @@
// Code generated by mkconsts.go. DO NOT EDIT.
//go:build goexperiment.simd
package goexperiment
const SIMD = true
const SIMDInt = 1

View file

@ -121,4 +121,8 @@ type Flags struct {
// GoroutineLeakProfile enables the collection of goroutine leak profiles.
GoroutineLeakProfile bool
// SIMD enables the simd package and the compiler's handling
// of SIMD intrinsics.
SIMD bool
}

View file

@ -1049,6 +1049,9 @@ needm:
// there's no need to handle that. Clear R14 so that there's
// a bad value in there, in case needm tries to use it.
XORPS X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
XORQ R14, R14
MOVQ $runtime·needAndBindM<ABIInternal>(SB), AX
CALL AX
@ -1746,6 +1749,9 @@ TEXT ·sigpanic0(SB),NOSPLIT,$0-0
get_tls(R14)
MOVQ g(R14), R14
XORPS X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
JMP ·sigpanic<ABIInternal>(SB)
// gcWriteBarrier informs the GC about heap pointer writes.

View file

@ -28,9 +28,10 @@ const (
var (
// Set in runtime.cpuinit.
// TODO: deprecate these; use internal/cpu directly.
x86HasAVX bool
x86HasFMA bool
x86HasPOPCNT bool
x86HasSSE41 bool
x86HasFMA bool
armHasVFPv4 bool

View file

@ -0,0 +1,19 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package runtime_test
import (
"runtime"
"testing"
)
func TestHasAVX(t *testing.T) {
t.Parallel()
output := runTestProg(t, "testprog", "CheckAVX")
ok := output == "OK\n"
if *runtime.X86HasAVX != ok {
t.Fatalf("x86HasAVX: %v, CheckAVX got:\n%s", *runtime.X86HasAVX, output)
}
}

View file

@ -1978,6 +1978,8 @@ func TraceStack(gp *G, tab *TraceStackTable) {
traceStack(0, gp, (*traceStackTable)(tab))
}
var X86HasAVX = &x86HasAVX
var DebugDecorateMappings = &debug.decoratemappings
func SetVMANameSupported() bool { return setVMANameSupported() }

View file

@ -402,7 +402,7 @@ func genAMD64(g *gen) {
// Create layouts for X, Y, and Z registers.
const (
numXRegs = 16
numZRegs = 16 // TODO: If we start using upper registers, change to 32
numZRegs = 32
numKRegs = 8
)
lZRegs := layout{sp: xReg} // Non-GP registers

View file

@ -162,11 +162,22 @@ func sysctlbynameInt32(name []byte) (int32, int32) {
return ret, out
}
//go:linkname internal_cpu_getsysctlbyname internal/cpu.getsysctlbyname
func internal_cpu_getsysctlbyname(name []byte) (int32, int32) {
func sysctlbynameBytes(name, out []byte) int32 {
nout := uintptr(len(out))
ret := sysctlbyname(&name[0], &out[0], &nout, nil, 0)
return ret
}
//go:linkname internal_cpu_sysctlbynameInt32 internal/cpu.sysctlbynameInt32
func internal_cpu_sysctlbynameInt32(name []byte) (int32, int32) {
return sysctlbynameInt32(name)
}
//go:linkname internal_cpu_sysctlbynameBytes internal/cpu.sysctlbynameBytes
func internal_cpu_sysctlbynameBytes(name, out []byte) int32 {
return sysctlbynameBytes(name, out)
}
const (
_CTL_HW = 6
_HW_NCPU = 3

View file

@ -341,6 +341,13 @@ func panicmemAddr(addr uintptr) {
panic(errorAddressString{msg: "invalid memory address or nil pointer dereference", addr: addr})
}
var simdImmError = error(errorString("out-of-range immediate for simd intrinsic"))
func panicSimdImm() {
panicCheck2("simd immediate error")
panic(simdImmError)
}
// Create a new deferred function fn, which has no arguments and results.
// The compiler turns a defer statement into a call to this.
func deferproc(fn func()) {

View file

@ -19,6 +19,22 @@ type xRegs struct {
Z13 [64]byte
Z14 [64]byte
Z15 [64]byte
Z16 [64]byte
Z17 [64]byte
Z18 [64]byte
Z19 [64]byte
Z20 [64]byte
Z21 [64]byte
Z22 [64]byte
Z23 [64]byte
Z24 [64]byte
Z25 [64]byte
Z26 [64]byte
Z27 [64]byte
Z28 [64]byte
Z29 [64]byte
Z30 [64]byte
Z31 [64]byte
K0 uint64
K1 uint64
K2 uint64

View file

@ -95,14 +95,30 @@ saveAVX512:
VMOVDQU64 Z13, 832(AX)
VMOVDQU64 Z14, 896(AX)
VMOVDQU64 Z15, 960(AX)
KMOVQ K0, 1024(AX)
KMOVQ K1, 1032(AX)
KMOVQ K2, 1040(AX)
KMOVQ K3, 1048(AX)
KMOVQ K4, 1056(AX)
KMOVQ K5, 1064(AX)
KMOVQ K6, 1072(AX)
KMOVQ K7, 1080(AX)
VMOVDQU64 Z16, 1024(AX)
VMOVDQU64 Z17, 1088(AX)
VMOVDQU64 Z18, 1152(AX)
VMOVDQU64 Z19, 1216(AX)
VMOVDQU64 Z20, 1280(AX)
VMOVDQU64 Z21, 1344(AX)
VMOVDQU64 Z22, 1408(AX)
VMOVDQU64 Z23, 1472(AX)
VMOVDQU64 Z24, 1536(AX)
VMOVDQU64 Z25, 1600(AX)
VMOVDQU64 Z26, 1664(AX)
VMOVDQU64 Z27, 1728(AX)
VMOVDQU64 Z28, 1792(AX)
VMOVDQU64 Z29, 1856(AX)
VMOVDQU64 Z30, 1920(AX)
VMOVDQU64 Z31, 1984(AX)
KMOVQ K0, 2048(AX)
KMOVQ K1, 2056(AX)
KMOVQ K2, 2064(AX)
KMOVQ K3, 2072(AX)
KMOVQ K4, 2080(AX)
KMOVQ K5, 2088(AX)
KMOVQ K6, 2096(AX)
KMOVQ K7, 2104(AX)
JMP preempt
preempt:
CALL ·asyncPreempt2(SB)
@ -153,14 +169,30 @@ restoreAVX2:
VMOVDQU 0(AX), Y0
JMP restoreGPs
restoreAVX512:
KMOVQ 1080(AX), K7
KMOVQ 1072(AX), K6
KMOVQ 1064(AX), K5
KMOVQ 1056(AX), K4
KMOVQ 1048(AX), K3
KMOVQ 1040(AX), K2
KMOVQ 1032(AX), K1
KMOVQ 1024(AX), K0
KMOVQ 2104(AX), K7
KMOVQ 2096(AX), K6
KMOVQ 2088(AX), K5
KMOVQ 2080(AX), K4
KMOVQ 2072(AX), K3
KMOVQ 2064(AX), K2
KMOVQ 2056(AX), K1
KMOVQ 2048(AX), K0
VMOVDQU64 1984(AX), Z31
VMOVDQU64 1920(AX), Z30
VMOVDQU64 1856(AX), Z29
VMOVDQU64 1792(AX), Z28
VMOVDQU64 1728(AX), Z27
VMOVDQU64 1664(AX), Z26
VMOVDQU64 1600(AX), Z25
VMOVDQU64 1536(AX), Z24
VMOVDQU64 1472(AX), Z23
VMOVDQU64 1408(AX), Z22
VMOVDQU64 1344(AX), Z21
VMOVDQU64 1280(AX), Z20
VMOVDQU64 1216(AX), Z19
VMOVDQU64 1152(AX), Z18
VMOVDQU64 1088(AX), Z17
VMOVDQU64 1024(AX), Z16
VMOVDQU64 960(AX), Z15
VMOVDQU64 896(AX), Z14
VMOVDQU64 832(AX), Z13

View file

@ -763,9 +763,10 @@ func cpuinit(env string) {
// to guard execution of instructions that can not be assumed to be always supported.
switch GOARCH {
case "386", "amd64":
x86HasAVX = cpu.X86.HasAVX
x86HasFMA = cpu.X86.HasFMA
x86HasPOPCNT = cpu.X86.HasPOPCNT
x86HasSSE41 = cpu.X86.HasSSE41
x86HasFMA = cpu.X86.HasFMA
case "arm":
armHasVFPv4 = cpu.ARM.HasVFPv4

View file

@ -456,6 +456,9 @@ call:
// Back to Go world, set special registers.
// The g register (R14) is preserved in C.
XORPS X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
RET
// C->Go callback thunk that allows to call runtime·racesymbolize from C code.

View file

@ -177,6 +177,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -228,6 +228,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -265,6 +265,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking
@ -290,6 +293,9 @@ TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -340,6 +340,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking
@ -365,6 +368,9 @@ TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -310,6 +310,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -64,6 +64,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
get_tls(R12)
MOVQ g(R12), R14
PXOR X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
// Reserve space for spill slots.
NOP SP // disable vet stack checking

View file

@ -32,6 +32,9 @@ TEXT sigtramp<>(SB),NOSPLIT,$0-0
// R14 is cleared in case there's a non-zero value in there
// if called from a non-go thread.
XORPS X15, X15
CMPB internalcpu·X86+const_offsetX86HasAVX(SB), $1
JNE 2(PC)
VXORPS X15, X15, X15
XORQ R14, R14
get_tls(AX)

View file

@ -0,0 +1,18 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import "fmt"
func init() {
register("CheckAVX", CheckAVX)
}
func CheckAVX() {
checkAVX()
fmt.Println("OK")
}
func checkAVX()

View file

@ -0,0 +1,9 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
#include "textflag.h"
TEXT ·checkAVX(SB), NOSPLIT|NOFRAME, $0-0
VXORPS X1, X2, X3
RET

8
src/simd/_gen/go.mod Normal file
View file

@ -0,0 +1,8 @@
module simd/_gen
go 1.24
require (
golang.org/x/arch v0.20.0
gopkg.in/yaml.v3 v3.0.1
)

6
src/simd/_gen/go.sum Normal file
View file

@ -0,0 +1,6 @@
golang.org/x/arch v0.20.0 h1:dx1zTU0MAE98U+TQ8BLl7XsJbgze2WnNKF/8tGp/Q6c=
golang.org/x/arch v0.20.0/go.mod h1:bdwinDaKcfZUGpH09BB7ZmOfhalA8lQdzl62l8gGWsk=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

149
src/simd/_gen/main.go Normal file
View file

@ -0,0 +1,149 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Run all SIMD-related code generators.
package main
import (
"flag"
"fmt"
"os"
"os/exec"
"path/filepath"
"strings"
)
const defaultXedPath = "$XEDPATH" + string(filepath.ListSeparator) + "./simdgen/xeddata" + string(filepath.ListSeparator) + "$HOME/xed/obj/dgen"
var (
flagTmplgen = flag.Bool("tmplgen", true, "run tmplgen generator")
flagSimdgen = flag.Bool("simdgen", true, "run simdgen generator")
flagN = flag.Bool("n", false, "dry run")
flagXedPath = flag.String("xedPath", defaultXedPath, "load XED datafile from `path`, which must be the XED obj/dgen directory")
)
var goRoot string
func main() {
flag.Parse()
if flag.NArg() > 0 {
flag.Usage()
os.Exit(1)
}
if *flagXedPath == defaultXedPath {
// In general we want the shell to do variable expansion, but for the
// default value we don't get that, so do it ourselves.
*flagXedPath = os.ExpandEnv(defaultXedPath)
}
var err error
goRoot, err = resolveGOROOT()
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
if *flagTmplgen {
doTmplgen()
}
if *flagSimdgen {
doSimdgen()
}
}
func doTmplgen() {
goRun("-C", "tmplgen", ".")
}
func doSimdgen() {
xedPath, err := resolveXEDPath(*flagXedPath)
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
// Regenerate the XED-derived SIMD files
goRun("-C", "simdgen", ".", "-o", "godefs", "-goroot", goRoot, "-xedPath", prettyPath("./simdgen", xedPath), "go.yaml", "types.yaml", "categories.yaml")
// simdgen produces SSA rule files, so update the SSA files
goRun("-C", prettyPath(".", filepath.Join(goRoot, "src", "cmd", "compile", "internal", "ssa", "_gen")), ".")
}
func resolveXEDPath(pathList string) (xedPath string, err error) {
for _, path := range filepath.SplitList(pathList) {
if path == "" {
// Probably an unknown shell variable. Ignore.
continue
}
if _, err := os.Stat(filepath.Join(path, "all-dec-instructions.txt")); err == nil {
return filepath.Abs(path)
}
}
return "", fmt.Errorf("set $XEDPATH or -xedPath to the XED obj/dgen directory")
}
func resolveGOROOT() (goRoot string, err error) {
cmd := exec.Command("go", "env", "GOROOT")
cmd.Stderr = os.Stderr
out, err := cmd.Output()
if err != nil {
return "", fmt.Errorf("%s: %s", cmd, err)
}
goRoot = strings.TrimSuffix(string(out), "\n")
return goRoot, nil
}
func goRun(args ...string) {
exe := filepath.Join(goRoot, "bin", "go")
cmd := exec.Command(exe, append([]string{"run"}, args...)...)
run(cmd)
}
func run(cmd *exec.Cmd) {
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
fmt.Fprintf(os.Stderr, "%s\n", cmdString(cmd))
if *flagN {
return
}
if err := cmd.Run(); err != nil {
fmt.Fprintf(os.Stderr, "%s failed: %s\n", cmd, err)
}
}
func prettyPath(base, path string) string {
base, err := filepath.Abs(base)
if err != nil {
return path
}
p, err := filepath.Rel(base, path)
if err != nil {
return path
}
return p
}
func cmdString(cmd *exec.Cmd) string {
// TODO: Shell quoting?
// TODO: Environment.
var buf strings.Builder
cmdPath, err := exec.LookPath(filepath.Base(cmd.Path))
if err == nil && cmdPath == cmd.Path {
cmdPath = filepath.Base(cmdPath)
} else {
cmdPath = prettyPath(".", cmd.Path)
}
buf.WriteString(cmdPath)
for _, arg := range cmd.Args[1:] {
buf.WriteByte(' ')
buf.WriteString(arg)
}
return buf.String()
}

3
src/simd/_gen/simdgen/.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
testdata/*
.gemini/*
.gemini*

View file

@ -0,0 +1 @@
!import ops/*/categories.yaml

View file

@ -0,0 +1,48 @@
#!/bin/bash
# This is an end-to-end test of Go SIMD. It updates all generated
# files in this repo and then runs several tests.
XEDDATA="${XEDDATA:-xeddata}"
if [[ ! -d "$XEDDATA" ]]; then
echo >&2 "Must either set \$XEDDATA or symlink xeddata/ to the XED obj/dgen directory."
exit 1
fi
which go >/dev/null || exit 1
goroot="$(go env GOROOT)"
if [[ ! ../../../.. -ef "$goroot" ]]; then
# We might be able to make this work but it's SO CONFUSING.
echo >&2 "go command in path has GOROOT $goroot"
exit 1
fi
if [[ $(go env GOEXPERIMENT) != simd ]]; then
echo >&2 "GOEXPERIMENT=$(go env GOEXPERIMENT), expected simd"
exit 1
fi
set -ex
# Regenerate SIMD files
go run . -o godefs -goroot "$goroot" -xedPath "$XEDDATA" go.yaml types.yaml categories.yaml
# Regenerate SSA files from SIMD rules
go run -C "$goroot"/src/cmd/compile/internal/ssa/_gen .
# Rebuild compiler
cd "$goroot"/src
go install cmd/compile
# Tests
GOARCH=amd64 go run -C simd/testdata .
GOARCH=amd64 go test -v simd
go test go/doc go/build
go test cmd/api -v -check -run ^TestCheck$
go test cmd/compile/internal/ssagen -simd=0
# Check tests without the GOEXPERIMENT
GOEXPERIMENT= go test go/doc go/build
GOEXPERIMENT= go test cmd/api -v -check -run ^TestCheck$
GOEXPERIMENT= go test cmd/compile/internal/ssagen -simd=0
# TODO: Add some tests of SIMD itself

View file

@ -0,0 +1,73 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"fmt"
"sort"
)
const simdGenericOpsTmpl = `
package main
func simdGenericOps() []opData {
return []opData{
{{- range .Ops }}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, commutative: {{.Comm}}},
{{- end }}
{{- range .OpsImm }}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, commutative: {{.Comm}}, aux: "UInt8"},
{{- end }}
}
}
`
// writeSIMDGenericOps generates the generic ops and writes it to simdAMD64ops.go
// within the specified directory.
func writeSIMDGenericOps(ops []Operation) *bytes.Buffer {
t := templateOf(simdGenericOpsTmpl, "simdgenericOps")
buffer := new(bytes.Buffer)
buffer.WriteString(generatedHeader)
type genericOpsData struct {
OpName string
OpInLen int
Comm bool
}
type opData struct {
Ops []genericOpsData
OpsImm []genericOpsData
}
var opsData opData
for _, op := range ops {
if op.NoGenericOps != nil && *op.NoGenericOps == "true" {
continue
}
if op.SkipMaskedMethod() {
continue
}
_, _, _, immType, gOp := op.shape()
gOpData := genericOpsData{gOp.GenericName(), len(gOp.In), op.Commutative}
if immType == VarImm || immType == ConstVarImm {
opsData.OpsImm = append(opsData.OpsImm, gOpData)
} else {
opsData.Ops = append(opsData.Ops, gOpData)
}
}
sort.Slice(opsData.Ops, func(i, j int) bool {
return compareNatural(opsData.Ops[i].OpName, opsData.Ops[j].OpName) < 0
})
sort.Slice(opsData.OpsImm, func(i, j int) bool {
return compareNatural(opsData.OpsImm[i].OpName, opsData.OpsImm[j].OpName) < 0
})
err := t.Execute(buffer, opsData)
if err != nil {
panic(fmt.Errorf("failed to execute template: %w", err))
}
return buffer
}

View file

@ -0,0 +1,156 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"fmt"
"slices"
)
const simdIntrinsicsTmpl = `
{{define "header"}}
package ssagen
import (
"cmd/compile/internal/ir"
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"cmd/internal/sys"
)
const simdPackage = "` + simdPackage + `"
func simdIntrinsics(addF func(pkg, fn string, b intrinsicBuilder, archFamilies ...sys.ArchFamily)) {
{{end}}
{{define "op1"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen1(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op2"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen2(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op2_21"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen2_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op2_21Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op3"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen3(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op3_21"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen3_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op3_21Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op3_231Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3_231(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op3_31Zero3"}} addF(simdPackage, "{{(index .In 2).Go}}.{{.Go}}", opLen3_31Zero3(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op4"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen4(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op4_231Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen4_231(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op4_31"}} addF(simdPackage, "{{(index .In 2).Go}}.{{.Go}}", opLen4_31(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
{{end}}
{{define "op1Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen1Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op2Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op2Imm8_2I"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_2I(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op2Imm8_II"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_II(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op2Imm8_SHA1RNDS4"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_SHA1RNDS4(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op3Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op3Imm8_2I"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3Imm8_2I(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "op4Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen4Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
{{end}}
{{define "vectorConversion"}} addF(simdPackage, "{{.Tsrc.Name}}.As{{.Tdst.Name}}", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
{{end}}
{{define "loadStore"}} addF(simdPackage, "Load{{.Name}}", simdLoad(), sys.AMD64)
addF(simdPackage, "{{.Name}}.Store", simdStore(), sys.AMD64)
{{end}}
{{define "maskedLoadStore"}} addF(simdPackage, "LoadMasked{{.Name}}", simdMaskedLoad(ssa.OpLoadMasked{{.ElemBits}}), sys.AMD64)
addF(simdPackage, "{{.Name}}.StoreMasked", simdMaskedStore(ssa.OpStoreMasked{{.ElemBits}}), sys.AMD64)
{{end}}
{{define "mask"}} addF(simdPackage, "{{.Name}}.As{{.VectorCounterpart}}", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
addF(simdPackage, "{{.VectorCounterpart}}.asMask", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
addF(simdPackage, "{{.Name}}.And", opLen2(ssa.OpAnd{{.ReshapedVectorWithAndOr}}, types.TypeVec{{.Size}}), sys.AMD64)
addF(simdPackage, "{{.Name}}.Or", opLen2(ssa.OpOr{{.ReshapedVectorWithAndOr}}, types.TypeVec{{.Size}}), sys.AMD64)
addF(simdPackage, "{{.Name}}FromBits", simdCvtVToMask({{.ElemBits}}, {{.Lanes}}), sys.AMD64)
addF(simdPackage, "{{.Name}}.ToBits", simdCvtMaskToV({{.ElemBits}}, {{.Lanes}}), sys.AMD64)
{{end}}
{{define "footer"}}}
{{end}}
`
// writeSIMDIntrinsics generates the intrinsic mappings and writes it to simdintrinsics.go
// within the specified directory.
func writeSIMDIntrinsics(ops []Operation, typeMap simdTypeMap) *bytes.Buffer {
t := templateOf(simdIntrinsicsTmpl, "simdintrinsics")
buffer := new(bytes.Buffer)
buffer.WriteString(generatedHeader)
if err := t.ExecuteTemplate(buffer, "header", nil); err != nil {
panic(fmt.Errorf("failed to execute header template: %w", err))
}
slices.SortFunc(ops, compareOperations)
for _, op := range ops {
if op.NoTypes != nil && *op.NoTypes == "true" {
continue
}
if op.SkipMaskedMethod() {
continue
}
if s, op, err := classifyOp(op); err == nil {
if err := t.ExecuteTemplate(buffer, s, op); err != nil {
panic(fmt.Errorf("failed to execute template %s for op %s: %w", s, op.Go, err))
}
} else {
panic(fmt.Errorf("failed to classify op %v: %w", op.Go, err))
}
}
for _, conv := range vConvertFromTypeMap(typeMap) {
if err := t.ExecuteTemplate(buffer, "vectorConversion", conv); err != nil {
panic(fmt.Errorf("failed to execute vectorConversion template: %w", err))
}
}
for _, typ := range typesFromTypeMap(typeMap) {
if typ.Type != "mask" {
if err := t.ExecuteTemplate(buffer, "loadStore", typ); err != nil {
panic(fmt.Errorf("failed to execute loadStore template: %w", err))
}
}
}
for _, typ := range typesFromTypeMap(typeMap) {
if typ.MaskedLoadStoreFilter() {
if err := t.ExecuteTemplate(buffer, "maskedLoadStore", typ); err != nil {
panic(fmt.Errorf("failed to execute maskedLoadStore template: %w", err))
}
}
}
for _, mask := range masksFromTypeMap(typeMap) {
if err := t.ExecuteTemplate(buffer, "mask", mask); err != nil {
panic(fmt.Errorf("failed to execute mask template: %w", err))
}
}
if err := t.ExecuteTemplate(buffer, "footer", nil); err != nil {
panic(fmt.Errorf("failed to execute footer template: %w", err))
}
return buffer
}

View file

@ -0,0 +1,256 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"fmt"
"log"
"sort"
"strings"
)
const simdMachineOpsTmpl = `
package main
func simdAMD64Ops(v11, v21, v2k, vkv, v2kv, v2kk, v31, v3kv, vgpv, vgp, vfpv, vfpkv, w11, w21, w2k, wkw, w2kw, w2kk, w31, w3kw, wgpw, wgp, wfpw, wfpkw,
wkwload, v21load, v31load, v11load, w21load, w31load, w2kload, w2kwload, w11load, w3kwload, w2kkload, v31x0AtIn2 regInfo) []opData {
return []opData{
{{- range .OpsData }}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", resultInArg0: {{.ResultInArg0}}},
{{- end }}
{{- range .OpsDataImm }}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", aux: "UInt8", commutative: {{.Comm}}, typ: "{{.Type}}", resultInArg0: {{.ResultInArg0}}},
{{- end }}
{{- range .OpsDataLoad}}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", aux: "SymOff", symEffect: "Read", resultInArg0: {{.ResultInArg0}}},
{{- end}}
{{- range .OpsDataImmLoad}}
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", aux: "SymValAndOff", symEffect: "Read", resultInArg0: {{.ResultInArg0}}},
{{- end}}
{{- range .OpsDataMerging }}
{name: "{{.OpName}}Merging", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: false, typ: "{{.Type}}", resultInArg0: true},
{{- end }}
{{- range .OpsDataImmMerging }}
{name: "{{.OpName}}Merging", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", aux: "UInt8", commutative: false, typ: "{{.Type}}", resultInArg0: true},
{{- end }}
}
}
`
// writeSIMDMachineOps generates the machine ops and writes it to simdAMD64ops.go
// within the specified directory.
func writeSIMDMachineOps(ops []Operation) *bytes.Buffer {
t := templateOf(simdMachineOpsTmpl, "simdAMD64Ops")
buffer := new(bytes.Buffer)
buffer.WriteString(generatedHeader)
type opData struct {
OpName string
Asm string
OpInLen int
RegInfo string
Comm bool
Type string
ResultInArg0 bool
}
type machineOpsData struct {
OpsData []opData
OpsDataImm []opData
OpsDataLoad []opData
OpsDataImmLoad []opData
OpsDataMerging []opData
OpsDataImmMerging []opData
}
regInfoSet := map[string]bool{
"v11": true, "v21": true, "v2k": true, "v2kv": true, "v2kk": true, "vkv": true, "v31": true, "v3kv": true, "vgpv": true, "vgp": true, "vfpv": true, "vfpkv": true,
"w11": true, "w21": true, "w2k": true, "w2kw": true, "w2kk": true, "wkw": true, "w31": true, "w3kw": true, "wgpw": true, "wgp": true, "wfpw": true, "wfpkw": true,
"wkwload": true, "v21load": true, "v31load": true, "v11load": true, "w21load": true, "w31load": true, "w2kload": true, "w2kwload": true, "w11load": true,
"w3kwload": true, "w2kkload": true, "v31x0AtIn2": true}
opsData := make([]opData, 0)
opsDataImm := make([]opData, 0)
opsDataLoad := make([]opData, 0)
opsDataImmLoad := make([]opData, 0)
opsDataMerging := make([]opData, 0)
opsDataImmMerging := make([]opData, 0)
// Determine the "best" version of an instruction to use
best := make(map[string]Operation)
var mOpOrder []string
countOverrides := func(s []Operand) int {
a := 0
for _, o := range s {
if o.OverwriteBase != nil {
a++
}
}
return a
}
for _, op := range ops {
_, _, maskType, _, gOp := op.shape()
asm := machineOpName(maskType, gOp)
other, ok := best[asm]
if !ok {
best[asm] = op
mOpOrder = append(mOpOrder, asm)
continue
}
// see if "op" is better than "other"
if countOverrides(op.In)+countOverrides(op.Out) < countOverrides(other.In)+countOverrides(other.Out) {
best[asm] = op
}
}
regInfoErrs := make([]error, 0)
regInfoMissing := make(map[string]bool, 0)
for _, asm := range mOpOrder {
op := best[asm]
shapeIn, shapeOut, maskType, _, gOp := op.shape()
// TODO: all our masked operations are now zeroing, we need to generate machine ops with merging masks, maybe copy
// one here with a name suffix "Merging". The rewrite rules will need them.
makeRegInfo := func(op Operation, mem memShape) (string, error) {
regInfo, err := op.regShape(mem)
if err != nil {
panic(err)
}
regInfo, err = rewriteVecAsScalarRegInfo(op, regInfo)
if err != nil {
if mem == NoMem || mem == InvalidMem {
panic(err)
}
return "", err
}
if regInfo == "v01load" {
regInfo = "vload"
}
// Makes AVX512 operations use upper registers
if strings.Contains(op.CPUFeature, "AVX512") {
regInfo = strings.ReplaceAll(regInfo, "v", "w")
}
if _, ok := regInfoSet[regInfo]; !ok {
regInfoErrs = append(regInfoErrs, fmt.Errorf("unsupported register constraint, please update the template and AMD64Ops.go: %s. Op is %s", regInfo, op))
regInfoMissing[regInfo] = true
}
return regInfo, nil
}
regInfo, err := makeRegInfo(op, NoMem)
if err != nil {
panic(err)
}
var outType string
if shapeOut == OneVregOut || shapeOut == OneVregOutAtIn || gOp.Out[0].OverwriteClass != nil {
// If class overwrite is happening, that's not really a mask but a vreg.
outType = fmt.Sprintf("Vec%d", *gOp.Out[0].Bits)
} else if shapeOut == OneGregOut {
outType = gOp.GoType() // this is a straight Go type, not a VecNNN type
} else if shapeOut == OneKmaskOut {
outType = "Mask"
} else {
panic(fmt.Errorf("simdgen does not recognize this output shape: %d", shapeOut))
}
resultInArg0 := false
if shapeOut == OneVregOutAtIn {
resultInArg0 = true
}
var memOpData *opData
regInfoMerging := regInfo
hasMerging := false
if op.MemFeatures != nil && *op.MemFeatures == "vbcst" {
// Right now we only have vbcst case
// Make a full vec memory variant.
opMem := rewriteLastVregToMem(op)
regInfo, err := makeRegInfo(opMem, VregMemIn)
if err != nil {
// Just skip it if it's non nill.
// an error could be triggered by [checkVecAsScalar].
// TODO: make [checkVecAsScalar] aware of mem ops.
if *Verbose {
log.Printf("Seen error: %e", err)
}
} else {
memOpData = &opData{asm + "load", gOp.Asm, len(gOp.In) + 1, regInfo, false, outType, resultInArg0}
}
}
hasMerging = gOp.hasMaskedMerging(maskType, shapeOut)
if hasMerging && !resultInArg0 {
// We have to copy the slice here becasue the sort will be visible from other
// aliases when no reslicing is happening.
newIn := make([]Operand, len(op.In), len(op.In)+1)
copy(newIn, op.In)
op.In = newIn
op.In = append(op.In, op.Out[0])
op.sortOperand()
regInfoMerging, err = makeRegInfo(op, NoMem)
if err != nil {
panic(err)
}
}
if shapeIn == OneImmIn || shapeIn == OneKmaskImmIn {
opsDataImm = append(opsDataImm, opData{asm, gOp.Asm, len(gOp.In), regInfo, gOp.Commutative, outType, resultInArg0})
if memOpData != nil {
if *op.MemFeatures != "vbcst" {
panic("simdgen only knows vbcst for mem ops for now")
}
opsDataImmLoad = append(opsDataImmLoad, *memOpData)
}
if hasMerging {
mergingLen := len(gOp.In)
if !resultInArg0 {
mergingLen++
}
opsDataImmMerging = append(opsDataImmMerging, opData{asm, gOp.Asm, mergingLen, regInfoMerging, gOp.Commutative, outType, resultInArg0})
}
} else {
opsData = append(opsData, opData{asm, gOp.Asm, len(gOp.In), regInfo, gOp.Commutative, outType, resultInArg0})
if memOpData != nil {
if *op.MemFeatures != "vbcst" {
panic("simdgen only knows vbcst for mem ops for now")
}
opsDataLoad = append(opsDataLoad, *memOpData)
}
if hasMerging {
mergingLen := len(gOp.In)
if !resultInArg0 {
mergingLen++
}
opsDataMerging = append(opsDataMerging, opData{asm, gOp.Asm, mergingLen, regInfoMerging, gOp.Commutative, outType, resultInArg0})
}
}
}
if len(regInfoErrs) != 0 {
for _, e := range regInfoErrs {
log.Printf("Errors: %e\n", e)
}
panic(fmt.Errorf("these regInfo unseen: %v", regInfoMissing))
}
sort.Slice(opsData, func(i, j int) bool {
return compareNatural(opsData[i].OpName, opsData[j].OpName) < 0
})
sort.Slice(opsDataImm, func(i, j int) bool {
return compareNatural(opsDataImm[i].OpName, opsDataImm[j].OpName) < 0
})
sort.Slice(opsDataLoad, func(i, j int) bool {
return compareNatural(opsDataLoad[i].OpName, opsDataLoad[j].OpName) < 0
})
sort.Slice(opsDataImmLoad, func(i, j int) bool {
return compareNatural(opsDataImmLoad[i].OpName, opsDataImmLoad[j].OpName) < 0
})
sort.Slice(opsDataMerging, func(i, j int) bool {
return compareNatural(opsDataMerging[i].OpName, opsDataMerging[j].OpName) < 0
})
sort.Slice(opsDataImmMerging, func(i, j int) bool {
return compareNatural(opsDataImmMerging[i].OpName, opsDataImmMerging[j].OpName) < 0
})
err := t.Execute(buffer, machineOpsData{opsData, opsDataImm, opsDataLoad, opsDataImmLoad,
opsDataMerging, opsDataImmMerging})
if err != nil {
panic(fmt.Errorf("failed to execute template: %w", err))
}
return buffer
}

View file

@ -0,0 +1,658 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"cmp"
"fmt"
"maps"
"slices"
"sort"
"strings"
"unicode"
)
type simdType struct {
Name string // The go type name of this simd type, for example Int32x4.
Lanes int // The number of elements in this vector/mask.
Base string // The element's type, like for Int32x4 it will be int32.
Fields string // The struct fields, it should be right formatted.
Type string // Either "mask" or "vreg"
VectorCounterpart string // For mask use only: just replacing the "Mask" in [simdType.Name] with "Int"
ReshapedVectorWithAndOr string // For mask use only: vector AND and OR are only available in some shape with element width 32.
Size int // The size of the vector type
}
func (x simdType) ElemBits() int {
return x.Size / x.Lanes
}
// LanesContainer returns the smallest int/uint bit size that is
// large enough to hold one bit for each lane. E.g., Mask32x4
// is 4 lanes, and a uint8 is the smallest uint that has 4 bits.
func (x simdType) LanesContainer() int {
if x.Lanes > 64 {
panic("too many lanes")
}
if x.Lanes > 32 {
return 64
}
if x.Lanes > 16 {
return 32
}
if x.Lanes > 8 {
return 16
}
return 8
}
// MaskedLoadStoreFilter encodes which simd type type currently
// get masked loads/stores generated, it is used in two places,
// this forces coordination.
func (x simdType) MaskedLoadStoreFilter() bool {
return x.Size == 512 || x.ElemBits() >= 32 && x.Type != "mask"
}
func (x simdType) IntelSizeSuffix() string {
switch x.ElemBits() {
case 8:
return "B"
case 16:
return "W"
case 32:
return "D"
case 64:
return "Q"
}
panic("oops")
}
func (x simdType) MaskedLoadDoc() string {
if x.Size == 512 || x.ElemBits() < 32 {
return fmt.Sprintf("// Asm: VMOVDQU%d.Z, CPU Feature: AVX512", x.ElemBits())
} else {
return fmt.Sprintf("// Asm: VMASKMOV%s, CPU Feature: AVX2", x.IntelSizeSuffix())
}
}
func (x simdType) MaskedStoreDoc() string {
if x.Size == 512 || x.ElemBits() < 32 {
return fmt.Sprintf("// Asm: VMOVDQU%d, CPU Feature: AVX512", x.ElemBits())
} else {
return fmt.Sprintf("// Asm: VMASKMOV%s, CPU Feature: AVX2", x.IntelSizeSuffix())
}
}
func compareSimdTypes(x, y simdType) int {
// "vreg" then "mask"
if c := -compareNatural(x.Type, y.Type); c != 0 {
return c
}
// want "flo" < "int" < "uin" (and then 8 < 16 < 32 < 64),
// not "int16" < "int32" < "int64" < "int8")
// so limit comparison to first 3 bytes in string.
if c := compareNatural(x.Base[:3], y.Base[:3]); c != 0 {
return c
}
// base type size, 8 < 16 < 32 < 64
if c := x.ElemBits() - y.ElemBits(); c != 0 {
return c
}
// vector size last
return x.Size - y.Size
}
type simdTypeMap map[int][]simdType
type simdTypePair struct {
Tsrc simdType
Tdst simdType
}
func compareSimdTypePairs(x, y simdTypePair) int {
c := compareSimdTypes(x.Tsrc, y.Tsrc)
if c != 0 {
return c
}
return compareSimdTypes(x.Tdst, y.Tdst)
}
const simdPackageHeader = generatedHeader + `
//go:build goexperiment.simd
package simd
`
const simdTypesTemplates = `
{{define "sizeTmpl"}}
// v{{.}} is a tag type that tells the compiler that this is really {{.}}-bit SIMD
type v{{.}} struct {
_{{.}} [0]func() // uncomparable
}
{{end}}
{{define "typeTmpl"}}
// {{.Name}} is a {{.Size}}-bit SIMD vector of {{.Lanes}} {{.Base}}
type {{.Name}} struct {
{{.Fields}}
}
{{end}}
`
const simdFeaturesTemplate = `
import "internal/cpu"
type X86Features struct {}
var X86 X86Features
{{range .}}
{{- if eq .Feature "AVX512"}}
// {{.Feature}} returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
//
// These five CPU features are bundled together, and no use of AVX-512
// is allowed unless all of these features are supported together.
// Nearly every CPU that has shipped with any support for AVX-512 has
// supported all five of these features.
{{- else -}}
// {{.Feature}} returns whether the CPU supports the {{.Feature}} feature.
{{- end}}
//
// {{.Feature}} is defined on all GOARCHes, but will only return true on
// GOARCH {{.GoArch}}.
func (X86Features) {{.Feature}}() bool {
return cpu.X86.Has{{.Feature}}
}
{{end}}
`
const simdLoadStoreTemplate = `
// Len returns the number of elements in a {{.Name}}
func (x {{.Name}}) Len() int { return {{.Lanes}} }
// Load{{.Name}} loads a {{.Name}} from an array
//
//go:noescape
func Load{{.Name}}(y *[{{.Lanes}}]{{.Base}}) {{.Name}}
// Store stores a {{.Name}} to an array
//
//go:noescape
func (x {{.Name}}) Store(y *[{{.Lanes}}]{{.Base}})
`
const simdMaskFromValTemplate = `
// {{.Name}}FromBits constructs a {{.Name}} from a bitmap value, where 1 means set for the indexed element, 0 means unset.
{{- if ne .Lanes .LanesContainer}}
// Only the lower {{.Lanes}} bits of y are used.
{{- end}}
//
// Asm: KMOV{{.IntelSizeSuffix}}, CPU Feature: AVX512
func {{.Name}}FromBits(y uint{{.LanesContainer}}) {{.Name}}
// ToBits constructs a bitmap from a {{.Name}}, where 1 means set for the indexed element, 0 means unset.
{{- if ne .Lanes .LanesContainer}}
// Only the lower {{.Lanes}} bits of y are used.
{{- end}}
//
// Asm: KMOV{{.IntelSizeSuffix}}, CPU Features: AVX512
func (x {{.Name}}) ToBits() uint{{.LanesContainer}}
`
const simdMaskedLoadStoreTemplate = `
// LoadMasked{{.Name}} loads a {{.Name}} from an array,
// at those elements enabled by mask
//
{{.MaskedLoadDoc}}
//
//go:noescape
func LoadMasked{{.Name}}(y *[{{.Lanes}}]{{.Base}}, mask Mask{{.ElemBits}}x{{.Lanes}}) {{.Name}}
// StoreMasked stores a {{.Name}} to an array,
// at those elements enabled by mask
//
{{.MaskedStoreDoc}}
//
//go:noescape
func (x {{.Name}}) StoreMasked(y *[{{.Lanes}}]{{.Base}}, mask Mask{{.ElemBits}}x{{.Lanes}})
`
const simdStubsTmpl = `
{{define "op1"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}() {{.GoType}}
{{end}}
{{define "op2"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op2_21"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op2_21Type1"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op3"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op3_31Zero3"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op2NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op3_21"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op3_21Type1"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op3_231Type1"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.Op0NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op2VecAsScalar"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}(y uint{{(index .In 1).TreatLikeAScalarOfSize}}) {{(index .Out 0).Go}}
{{end}}
{{define "op3VecAsScalar"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}(y uint{{(index .In 1).TreatLikeAScalarOfSize}}, {{.Op2NameAndType "z"}}) {{(index .Out 0).Go}}
{{end}}
{{define "op4"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op2NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
{{end}}
{{define "op4_231Type1"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.Op0NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
{{end}}
{{define "op4_31"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op2NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op0NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
{{end}}
{{define "op1Imm8"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8) {{.GoType}}
{{end}}
{{define "op2Imm8"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op2Imm8_2I"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.ImmName}} uint8) {{.GoType}}
{{end}}
{{define "op2Imm8_II"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} result in better performance when they are constants, non-constant values will be translated into a jump table.
// {{.ImmName}} should be between 0 and 3, inclusive; other values may result in a runtime panic.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op2Imm8_SHA1RNDS4"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
{{end}}
{{define "op3Imm8"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}, {{.Op3NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op3Imm8_2I"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.ImmName}} uint8, {{.Op3NameAndType "z"}}) {{.GoType}}
{{end}}
{{define "op4Imm8"}}
{{if .Documentation}}{{.Documentation}}
//{{end}}
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
//
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}, {{.Op3NameAndType "z"}}, {{.Op4NameAndType "u"}}) {{.GoType}}
{{end}}
{{define "vectorConversion"}}
// {{.Tdst.Name}} converts from {{.Tsrc.Name}} to {{.Tdst.Name}}
func (from {{.Tsrc.Name}}) As{{.Tdst.Name}}() (to {{.Tdst.Name}})
{{end}}
{{define "mask"}}
// As{{.VectorCounterpart}} converts from {{.Name}} to {{.VectorCounterpart}}
func (from {{.Name}}) As{{.VectorCounterpart}}() (to {{.VectorCounterpart}})
// asMask converts from {{.VectorCounterpart}} to {{.Name}}
func (from {{.VectorCounterpart}}) asMask() (to {{.Name}})
func (x {{.Name}}) And(y {{.Name}}) {{.Name}}
func (x {{.Name}}) Or(y {{.Name}}) {{.Name}}
{{end}}
`
// parseSIMDTypes groups go simd types by their vector sizes, and
// returns a map whose key is the vector size, value is the simd type.
func parseSIMDTypes(ops []Operation) simdTypeMap {
// TODO: maybe instead of going over ops, let's try go over types.yaml.
ret := map[int][]simdType{}
seen := map[string]struct{}{}
processArg := func(arg Operand) {
if arg.Class == "immediate" || arg.Class == "greg" {
// Immediates are not encoded as vector types.
return
}
if _, ok := seen[*arg.Go]; ok {
return
}
seen[*arg.Go] = struct{}{}
lanes := *arg.Lanes
base := fmt.Sprintf("%s%d", *arg.Base, *arg.ElemBits)
tagFieldNameS := fmt.Sprintf("%sx%d", base, lanes)
tagFieldS := fmt.Sprintf("%s v%d", tagFieldNameS, *arg.Bits)
valFieldS := fmt.Sprintf("vals%s[%d]%s", strings.Repeat(" ", len(tagFieldNameS)-3), lanes, base)
fields := fmt.Sprintf("\t%s\n\t%s", tagFieldS, valFieldS)
if arg.Class == "mask" {
vectorCounterpart := strings.ReplaceAll(*arg.Go, "Mask", "Int")
reshapedVectorWithAndOr := fmt.Sprintf("Int32x%d", *arg.Bits/32)
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{*arg.Go, lanes, base, fields, arg.Class, vectorCounterpart, reshapedVectorWithAndOr, *arg.Bits})
// In case the vector counterpart of a mask is not present, put its vector counterpart typedef into the map as well.
if _, ok := seen[vectorCounterpart]; !ok {
seen[vectorCounterpart] = struct{}{}
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{vectorCounterpart, lanes, base, fields, "vreg", "", "", *arg.Bits})
}
} else {
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{*arg.Go, lanes, base, fields, arg.Class, "", "", *arg.Bits})
}
}
for _, op := range ops {
for _, arg := range op.In {
processArg(arg)
}
for _, arg := range op.Out {
processArg(arg)
}
}
return ret
}
func vConvertFromTypeMap(typeMap simdTypeMap) []simdTypePair {
v := []simdTypePair{}
for _, ts := range typeMap {
for i, tsrc := range ts {
for j, tdst := range ts {
if i != j && tsrc.Type == tdst.Type && tsrc.Type == "vreg" &&
tsrc.Lanes > 1 && tdst.Lanes > 1 {
v = append(v, simdTypePair{tsrc, tdst})
}
}
}
}
slices.SortFunc(v, compareSimdTypePairs)
return v
}
func masksFromTypeMap(typeMap simdTypeMap) []simdType {
m := []simdType{}
for _, ts := range typeMap {
for _, tsrc := range ts {
if tsrc.Type == "mask" {
m = append(m, tsrc)
}
}
}
slices.SortFunc(m, compareSimdTypes)
return m
}
func typesFromTypeMap(typeMap simdTypeMap) []simdType {
m := []simdType{}
for _, ts := range typeMap {
for _, tsrc := range ts {
if tsrc.Lanes > 1 {
m = append(m, tsrc)
}
}
}
slices.SortFunc(m, compareSimdTypes)
return m
}
// writeSIMDTypes generates the simd vector types into a bytes.Buffer
func writeSIMDTypes(typeMap simdTypeMap) *bytes.Buffer {
t := templateOf(simdTypesTemplates, "types_amd64")
loadStore := templateOf(simdLoadStoreTemplate, "loadstore_amd64")
maskedLoadStore := templateOf(simdMaskedLoadStoreTemplate, "maskedloadstore_amd64")
maskFromVal := templateOf(simdMaskFromValTemplate, "maskFromVal_amd64")
buffer := new(bytes.Buffer)
buffer.WriteString(simdPackageHeader)
sizes := make([]int, 0, len(typeMap))
for size, types := range typeMap {
slices.SortFunc(types, compareSimdTypes)
sizes = append(sizes, size)
}
sort.Ints(sizes)
for _, size := range sizes {
if size <= 64 {
// these are scalar
continue
}
if err := t.ExecuteTemplate(buffer, "sizeTmpl", size); err != nil {
panic(fmt.Errorf("failed to execute size template for size %d: %w", size, err))
}
for _, typeDef := range typeMap[size] {
if typeDef.Lanes == 1 {
continue
}
if err := t.ExecuteTemplate(buffer, "typeTmpl", typeDef); err != nil {
panic(fmt.Errorf("failed to execute type template for type %s: %w", typeDef.Name, err))
}
if typeDef.Type != "mask" {
if err := loadStore.ExecuteTemplate(buffer, "loadstore_amd64", typeDef); err != nil {
panic(fmt.Errorf("failed to execute loadstore template for type %s: %w", typeDef.Name, err))
}
// restrict to AVX2 masked loads/stores first.
if typeDef.MaskedLoadStoreFilter() {
if err := maskedLoadStore.ExecuteTemplate(buffer, "maskedloadstore_amd64", typeDef); err != nil {
panic(fmt.Errorf("failed to execute maskedloadstore template for type %s: %w", typeDef.Name, err))
}
}
} else {
if err := maskFromVal.ExecuteTemplate(buffer, "maskFromVal_amd64", typeDef); err != nil {
panic(fmt.Errorf("failed to execute maskFromVal template for type %s: %w", typeDef.Name, err))
}
}
}
}
return buffer
}
func writeSIMDFeatures(ops []Operation) *bytes.Buffer {
// Gather all features
type featureKey struct {
GoArch string
Feature string
}
featureSet := make(map[featureKey]struct{})
for _, op := range ops {
// Generate a feature check for each independant feature in a
// composite feature.
for feature := range strings.SplitSeq(op.CPUFeature, ",") {
feature = strings.TrimSpace(feature)
featureSet[featureKey{op.GoArch, feature}] = struct{}{}
}
}
features := slices.SortedFunc(maps.Keys(featureSet), func(a, b featureKey) int {
if c := cmp.Compare(a.GoArch, b.GoArch); c != 0 {
return c
}
return compareNatural(a.Feature, b.Feature)
})
// If we ever have the same feature name on more than one GOARCH, we'll have
// to be more careful about this.
t := templateOf(simdFeaturesTemplate, "features")
buffer := new(bytes.Buffer)
buffer.WriteString(simdPackageHeader)
if err := t.Execute(buffer, features); err != nil {
panic(fmt.Errorf("failed to execute features template: %w", err))
}
return buffer
}
// writeSIMDStubs returns two bytes.Buffers containing the declarations for the public
// and internal-use vector intrinsics.
func writeSIMDStubs(ops []Operation, typeMap simdTypeMap) (f, fI *bytes.Buffer) {
t := templateOf(simdStubsTmpl, "simdStubs")
f = new(bytes.Buffer)
fI = new(bytes.Buffer)
f.WriteString(simdPackageHeader)
fI.WriteString(simdPackageHeader)
slices.SortFunc(ops, compareOperations)
for i, op := range ops {
if op.NoTypes != nil && *op.NoTypes == "true" {
continue
}
if op.SkipMaskedMethod() {
continue
}
idxVecAsScalar, err := checkVecAsScalar(op)
if err != nil {
panic(err)
}
if s, op, err := classifyOp(op); err == nil {
if idxVecAsScalar != -1 {
if s == "op2" || s == "op3" {
s += "VecAsScalar"
} else {
panic(fmt.Errorf("simdgen only supports op2 or op3 with TreatLikeAScalarOfSize"))
}
}
if i == 0 || op.Go != ops[i-1].Go {
if unicode.IsUpper([]rune(op.Go)[0]) {
fmt.Fprintf(f, "\n/* %s */\n", op.Go)
} else {
fmt.Fprintf(fI, "\n/* %s */\n", op.Go)
}
}
if unicode.IsUpper([]rune(op.Go)[0]) {
if err := t.ExecuteTemplate(f, s, op); err != nil {
panic(fmt.Errorf("failed to execute template %s for op %v: %w", s, op, err))
}
} else {
if err := t.ExecuteTemplate(fI, s, op); err != nil {
panic(fmt.Errorf("failed to execute template %s for op %v: %w", s, op, err))
}
}
} else {
panic(fmt.Errorf("failed to classify op %v: %w", op.Go, err))
}
}
vectorConversions := vConvertFromTypeMap(typeMap)
for _, conv := range vectorConversions {
if err := t.ExecuteTemplate(f, "vectorConversion", conv); err != nil {
panic(fmt.Errorf("failed to execute vectorConversion template: %w", err))
}
}
masks := masksFromTypeMap(typeMap)
for _, mask := range masks {
if err := t.ExecuteTemplate(f, "mask", mask); err != nil {
panic(fmt.Errorf("failed to execute mask template for mask %s: %w", mask.Name, err))
}
}
return
}

View file

@ -0,0 +1,397 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"fmt"
"slices"
"strings"
"text/template"
)
type tplRuleData struct {
tplName string // e.g. "sftimm"
GoOp string // e.g. "ShiftAllLeft"
GoType string // e.g. "Uint32x8"
Args string // e.g. "x y"
Asm string // e.g. "VPSLLD256"
ArgsOut string // e.g. "x y"
MaskInConvert string // e.g. "VPMOVVec32x8ToM"
MaskOutConvert string // e.g. "VPMOVMToVec32x8"
ElementSize int // e.g. 32
Size int // e.g. 128
ArgsLoadAddr string // [Args] with its last vreg arg being a concrete "(VMOVDQUload* ptr mem)", and might contain mask.
ArgsAddr string // [Args] with its last vreg arg being replaced by "ptr", and might contain mask, and with a "mem" at the end.
FeatCheck string // e.g. "v.Block.CPUfeatures.hasFeature(CPUavx512)" -- for a ssa/_gen rules file.
}
var (
ruleTemplates = template.Must(template.New("simdRules").Parse(`
{{define "pureVreg"}}({{.GoOp}}{{.GoType}} {{.Args}}) => ({{.Asm}} {{.ArgsOut}})
{{end}}
{{define "maskIn"}}({{.GoOp}}{{.GoType}} {{.Args}} mask) => ({{.Asm}} {{.ArgsOut}} ({{.MaskInConvert}} <types.TypeMask> mask))
{{end}}
{{define "maskOut"}}({{.GoOp}}{{.GoType}} {{.Args}}) => ({{.MaskOutConvert}} ({{.Asm}} {{.ArgsOut}}))
{{end}}
{{define "maskInMaskOut"}}({{.GoOp}}{{.GoType}} {{.Args}} mask) => ({{.MaskOutConvert}} ({{.Asm}} {{.ArgsOut}} ({{.MaskInConvert}} <types.TypeMask> mask)))
{{end}}
{{define "sftimm"}}({{.Asm}} x (MOVQconst [c])) => ({{.Asm}}const [uint8(c)] x)
{{end}}
{{define "masksftimm"}}({{.Asm}} x (MOVQconst [c]) mask) => ({{.Asm}}const [uint8(c)] x mask)
{{end}}
{{define "vregMem"}}({{.Asm}} {{.ArgsLoadAddr}}) && canMergeLoad(v, l) && clobber(l) => ({{.Asm}}load {{.ArgsAddr}})
{{end}}
{{define "vregMemFeatCheck"}}({{.Asm}} {{.ArgsLoadAddr}}) && {{.FeatCheck}} && canMergeLoad(v, l) && clobber(l)=> ({{.Asm}}load {{.ArgsAddr}})
{{end}}
`))
)
func (d tplRuleData) MaskOptimization(asmCheck map[string]bool) string {
asmNoMask := d.Asm
if i := strings.Index(asmNoMask, "Masked"); i == -1 {
return ""
}
asmNoMask = strings.ReplaceAll(asmNoMask, "Masked", "")
if asmCheck[asmNoMask] == false {
return ""
}
for _, nope := range []string{"VMOVDQU", "VPCOMPRESS", "VCOMPRESS", "VPEXPAND", "VEXPAND", "VPBLENDM", "VMOVUP"} {
if strings.HasPrefix(asmNoMask, nope) {
return ""
}
}
size := asmNoMask[len(asmNoMask)-3:]
if strings.HasSuffix(asmNoMask, "const") {
sufLen := len("128const")
size = asmNoMask[len(asmNoMask)-sufLen:][:3]
}
switch size {
case "128", "256", "512":
default:
panic("Unexpected operation size on " + d.Asm)
}
switch d.ElementSize {
case 8, 16, 32, 64:
default:
panic(fmt.Errorf("Unexpected operation width %d on %v", d.ElementSize, d.Asm))
}
return fmt.Sprintf("(VMOVDQU%dMasked%s (%s %s) mask) => (%s %s mask)\n", d.ElementSize, size, asmNoMask, d.Args, d.Asm, d.Args)
}
// SSA rewrite rules need to appear in a most-to-least-specific order. This works for that.
var tmplOrder = map[string]int{
"masksftimm": 0,
"sftimm": 1,
"maskInMaskOut": 2,
"maskOut": 3,
"maskIn": 4,
"pureVreg": 5,
"vregMem": 6,
}
func compareTplRuleData(x, y tplRuleData) int {
if c := compareNatural(x.GoOp, y.GoOp); c != 0 {
return c
}
if c := compareNatural(x.GoType, y.GoType); c != 0 {
return c
}
if c := compareNatural(x.Args, y.Args); c != 0 {
return c
}
if x.tplName == y.tplName {
return 0
}
xo, xok := tmplOrder[x.tplName]
yo, yok := tmplOrder[y.tplName]
if !xok {
panic(fmt.Errorf("Unexpected template name %s, please add to tmplOrder", x.tplName))
}
if !yok {
panic(fmt.Errorf("Unexpected template name %s, please add to tmplOrder", y.tplName))
}
return xo - yo
}
// writeSIMDRules generates the lowering and rewrite rules for ssa and writes it to simdAMD64.rules
// within the specified directory.
func writeSIMDRules(ops []Operation) *bytes.Buffer {
buffer := new(bytes.Buffer)
buffer.WriteString(generatedHeader + "\n")
// asm -> masked merging rules
maskedMergeOpts := make(map[string]string)
s2n := map[int]string{8: "B", 16: "W", 32: "D", 64: "Q"}
asmCheck := map[string]bool{}
var allData []tplRuleData
var optData []tplRuleData // for mask peephole optimizations, and other misc
var memOptData []tplRuleData // for memory peephole optimizations
memOpSeen := make(map[string]bool)
for _, opr := range ops {
opInShape, opOutShape, maskType, immType, gOp := opr.shape()
asm := machineOpName(maskType, gOp)
vregInCnt := len(gOp.In)
if maskType == OneMask {
vregInCnt--
}
data := tplRuleData{
GoOp: gOp.Go,
Asm: asm,
}
if vregInCnt == 1 {
data.Args = "x"
data.ArgsOut = data.Args
} else if vregInCnt == 2 {
data.Args = "x y"
data.ArgsOut = data.Args
} else if vregInCnt == 3 {
data.Args = "x y z"
data.ArgsOut = data.Args
} else {
panic(fmt.Errorf("simdgen does not support more than 3 vreg in inputs"))
}
if immType == ConstImm {
data.ArgsOut = fmt.Sprintf("[%s] %s", *opr.In[0].Const, data.ArgsOut)
} else if immType == VarImm {
data.Args = fmt.Sprintf("[a] %s", data.Args)
data.ArgsOut = fmt.Sprintf("[a] %s", data.ArgsOut)
} else if immType == ConstVarImm {
data.Args = fmt.Sprintf("[a] %s", data.Args)
data.ArgsOut = fmt.Sprintf("[a+%s] %s", *opr.In[0].Const, data.ArgsOut)
}
goType := func(op Operation) string {
if op.OperandOrder != nil {
switch *op.OperandOrder {
case "21Type1", "231Type1":
// Permute uses operand[1] for method receiver.
return *op.In[1].Go
}
}
return *op.In[0].Go
}
var tplName string
// If class overwrite is happening, that's not really a mask but a vreg.
if opOutShape == OneVregOut || opOutShape == OneVregOutAtIn || gOp.Out[0].OverwriteClass != nil {
switch opInShape {
case OneImmIn:
tplName = "pureVreg"
data.GoType = goType(gOp)
case PureVregIn:
tplName = "pureVreg"
data.GoType = goType(gOp)
case OneKmaskImmIn:
fallthrough
case OneKmaskIn:
tplName = "maskIn"
data.GoType = goType(gOp)
rearIdx := len(gOp.In) - 1
// Mask is at the end.
width := *gOp.In[rearIdx].ElemBits
data.MaskInConvert = fmt.Sprintf("VPMOVVec%dx%dToM", width, *gOp.In[rearIdx].Lanes)
data.ElementSize = width
case PureKmaskIn:
panic(fmt.Errorf("simdgen does not support pure k mask instructions, they should be generated by compiler optimizations"))
}
} else if opOutShape == OneGregOut {
tplName = "pureVreg" // TODO this will be wrong
data.GoType = goType(gOp)
} else {
// OneKmaskOut case
data.MaskOutConvert = fmt.Sprintf("VPMOVMToVec%dx%d", *gOp.Out[0].ElemBits, *gOp.In[0].Lanes)
switch opInShape {
case OneImmIn:
fallthrough
case PureVregIn:
tplName = "maskOut"
data.GoType = goType(gOp)
case OneKmaskImmIn:
fallthrough
case OneKmaskIn:
tplName = "maskInMaskOut"
data.GoType = goType(gOp)
rearIdx := len(gOp.In) - 1
data.MaskInConvert = fmt.Sprintf("VPMOVVec%dx%dToM", *gOp.In[rearIdx].ElemBits, *gOp.In[rearIdx].Lanes)
case PureKmaskIn:
panic(fmt.Errorf("simdgen does not support pure k mask instructions, they should be generated by compiler optimizations"))
}
}
if gOp.SpecialLower != nil {
if *gOp.SpecialLower == "sftimm" {
if data.GoType[0] == 'I' {
// only do these for signed types, it is a duplicate rewrite for unsigned
sftImmData := data
if tplName == "maskIn" {
sftImmData.tplName = "masksftimm"
} else {
sftImmData.tplName = "sftimm"
}
allData = append(allData, sftImmData)
asmCheck[sftImmData.Asm+"const"] = true
}
} else {
panic("simdgen sees unknwon special lower " + *gOp.SpecialLower + ", maybe implement it?")
}
}
if gOp.MemFeatures != nil && *gOp.MemFeatures == "vbcst" {
// sanity check
selected := true
for _, a := range gOp.In {
if a.TreatLikeAScalarOfSize != nil {
selected = false
break
}
}
if _, ok := memOpSeen[data.Asm]; ok {
selected = false
}
if selected {
memOpSeen[data.Asm] = true
lastVreg := gOp.In[vregInCnt-1]
// sanity check
if lastVreg.Class != "vreg" {
panic(fmt.Errorf("simdgen expects vbcst replaced operand to be a vreg, but %v found", lastVreg))
}
memOpData := data
// Remove the last vreg from the arg and change it to a load.
origArgs := data.Args[:len(data.Args)-1]
// Prepare imm args.
immArg := ""
immArgCombineOff := " [off] "
if immType != NoImm && immType != InvalidImm {
_, after, found := strings.Cut(origArgs, "]")
if found {
origArgs = after
}
immArg = "[c] "
immArgCombineOff = " [makeValAndOff(int32(int8(c)),off)] "
}
memOpData.ArgsLoadAddr = immArg + origArgs + fmt.Sprintf("l:(VMOVDQUload%d {sym} [off] ptr mem)", *lastVreg.Bits)
// Remove the last vreg from the arg and change it to "ptr".
memOpData.ArgsAddr = "{sym}" + immArgCombineOff + origArgs + "ptr"
if maskType == OneMask {
memOpData.ArgsAddr += " mask"
memOpData.ArgsLoadAddr += " mask"
}
memOpData.ArgsAddr += " mem"
if gOp.MemFeaturesData != nil {
_, feat2 := getVbcstData(*gOp.MemFeaturesData)
knownFeatChecks := map[string]string{
"AVX": "v.Block.CPUfeatures.hasFeature(CPUavx)",
"AVX2": "v.Block.CPUfeatures.hasFeature(CPUavx2)",
"AVX512": "v.Block.CPUfeatures.hasFeature(CPUavx512)",
}
memOpData.FeatCheck = knownFeatChecks[feat2]
memOpData.tplName = "vregMemFeatCheck"
} else {
memOpData.tplName = "vregMem"
}
memOptData = append(memOptData, memOpData)
asmCheck[memOpData.Asm+"load"] = true
}
}
// Generate the masked merging optimization rules
if gOp.hasMaskedMerging(maskType, opOutShape) {
// TODO: handle customized operand order and special lower.
maskElem := gOp.In[len(gOp.In)-1]
if maskElem.Bits == nil {
panic("mask has no bits")
}
if maskElem.ElemBits == nil {
panic("mask has no elemBits")
}
if maskElem.Lanes == nil {
panic("mask has no lanes")
}
switch *maskElem.Bits {
case 128, 256:
// VPBLENDVB cases.
noMaskName := machineOpName(NoMask, gOp)
ruleExisting, ok := maskedMergeOpts[noMaskName]
rule := fmt.Sprintf("(VPBLENDVB%d dst (%s %s) mask) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (%sMerging dst %s (VPMOVVec%dx%dToM <types.TypeMask> mask))\n",
*maskElem.Bits, noMaskName, data.Args, data.Asm, data.Args, *maskElem.ElemBits, *maskElem.Lanes)
if ok && ruleExisting != rule {
panic("multiple masked merge rules for one op")
} else {
maskedMergeOpts[noMaskName] = rule
}
case 512:
// VPBLENDM[BWDQ] cases.
noMaskName := machineOpName(NoMask, gOp)
ruleExisting, ok := maskedMergeOpts[noMaskName]
rule := fmt.Sprintf("(VPBLENDM%sMasked%d dst (%s %s) mask) => (%sMerging dst %s mask)\n",
s2n[*maskElem.ElemBits], *maskElem.Bits, noMaskName, data.Args, data.Asm, data.Args)
if ok && ruleExisting != rule {
panic("multiple masked merge rules for one op")
} else {
maskedMergeOpts[noMaskName] = rule
}
}
}
if tplName == "pureVreg" && data.Args == data.ArgsOut {
data.Args = "..."
data.ArgsOut = "..."
}
data.tplName = tplName
if opr.NoGenericOps != nil && *opr.NoGenericOps == "true" ||
opr.SkipMaskedMethod() {
optData = append(optData, data)
continue
}
allData = append(allData, data)
asmCheck[data.Asm] = true
}
slices.SortFunc(allData, compareTplRuleData)
for _, data := range allData {
if err := ruleTemplates.ExecuteTemplate(buffer, data.tplName, data); err != nil {
panic(fmt.Errorf("failed to execute template %s for %s: %w", data.tplName, data.GoOp+data.GoType, err))
}
}
seen := make(map[string]bool)
for _, data := range optData {
if data.tplName == "maskIn" {
rule := data.MaskOptimization(asmCheck)
if seen[rule] {
continue
}
seen[rule] = true
buffer.WriteString(rule)
}
}
maskedMergeOptsRules := []string{}
for asm, rule := range maskedMergeOpts {
if !asmCheck[asm] {
continue
}
maskedMergeOptsRules = append(maskedMergeOptsRules, rule)
}
slices.Sort(maskedMergeOptsRules)
for _, rule := range maskedMergeOptsRules {
buffer.WriteString(rule)
}
for _, data := range memOptData {
if err := ruleTemplates.ExecuteTemplate(buffer, data.tplName, data); err != nil {
panic(fmt.Errorf("failed to execute template %s for %s: %w", data.tplName, data.Asm, err))
}
}
return buffer
}

View file

@ -0,0 +1,236 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bytes"
"fmt"
"log"
"strings"
"text/template"
)
var (
ssaTemplates = template.Must(template.New("simdSSA").Parse(`
{{define "header"}}// Code generated by x/arch/internal/simdgen using 'go run . -xedPath $XED_PATH -o godefs -goroot $GOROOT go.yaml types.yaml categories.yaml'; DO NOT EDIT.
package amd64
import (
"cmd/compile/internal/ssa"
"cmd/compile/internal/ssagen"
"cmd/internal/obj"
"cmd/internal/obj/x86"
)
func ssaGenSIMDValue(s *ssagen.State, v *ssa.Value) bool {
var p *obj.Prog
switch v.Op {{"{"}}{{end}}
{{define "case"}}
case {{.Cases}}:
p = {{.Helper}}(s, v)
{{end}}
{{define "footer"}}
default:
// Unknown reg shape
return false
}
{{end}}
{{define "zeroing"}}
// Masked operation are always compiled with zeroing.
switch v.Op {
case {{.}}:
x86.ParseSuffix(p, "Z")
}
{{end}}
{{define "ending"}}
return true
}
{{end}}`))
)
type tplSSAData struct {
Cases string
Helper string
}
// writeSIMDSSA generates the ssa to prog lowering codes and writes it to simdssa.go
// within the specified directory.
func writeSIMDSSA(ops []Operation) *bytes.Buffer {
var ZeroingMask []string
regInfoKeys := []string{
"v11",
"v21",
"v2k",
"v2kv",
"v2kk",
"vkv",
"v31",
"v3kv",
"v11Imm8",
"vkvImm8",
"v21Imm8",
"v2kImm8",
"v2kkImm8",
"v31ResultInArg0",
"v3kvResultInArg0",
"vfpv",
"vfpkv",
"vgpvImm8",
"vgpImm8",
"v2kvImm8",
"vkvload",
"v21load",
"v31loadResultInArg0",
"v3kvloadResultInArg0",
"v2kvload",
"v2kload",
"v11load",
"v11loadImm8",
"vkvloadImm8",
"v21loadImm8",
"v2kloadImm8",
"v2kkloadImm8",
"v2kvloadImm8",
"v31ResultInArg0Imm8",
"v31loadResultInArg0Imm8",
"v21ResultInArg0",
"v21ResultInArg0Imm8",
"v31x0AtIn2ResultInArg0",
"v2kvResultInArg0",
}
regInfoSet := map[string][]string{}
for _, key := range regInfoKeys {
regInfoSet[key] = []string{}
}
seen := map[string]struct{}{}
allUnseen := make(map[string][]Operation)
allUnseenCaseStr := make(map[string][]string)
classifyOp := func(op Operation, maskType maskShape, shapeIn inShape, shapeOut outShape, caseStr string, mem memShape) error {
regShape, err := op.regShape(mem)
if err != nil {
return err
}
if regShape == "v01load" {
regShape = "vload"
}
if shapeOut == OneVregOutAtIn {
regShape += "ResultInArg0"
}
if shapeIn == OneImmIn || shapeIn == OneKmaskImmIn {
regShape += "Imm8"
}
regShape, err = rewriteVecAsScalarRegInfo(op, regShape)
if err != nil {
return err
}
if _, ok := regInfoSet[regShape]; !ok {
allUnseen[regShape] = append(allUnseen[regShape], op)
allUnseenCaseStr[regShape] = append(allUnseenCaseStr[regShape], caseStr)
}
regInfoSet[regShape] = append(regInfoSet[regShape], caseStr)
if mem == NoMem && op.hasMaskedMerging(maskType, shapeOut) {
regShapeMerging := regShape
if shapeOut != OneVregOutAtIn {
// We have to copy the slice here becasue the sort will be visible from other
// aliases when no reslicing is happening.
newIn := make([]Operand, len(op.In), len(op.In)+1)
copy(newIn, op.In)
op.In = newIn
op.In = append(op.In, op.Out[0])
op.sortOperand()
regShapeMerging, err = op.regShape(mem)
regShapeMerging += "ResultInArg0"
}
if err != nil {
return err
}
if _, ok := regInfoSet[regShapeMerging]; !ok {
allUnseen[regShapeMerging] = append(allUnseen[regShapeMerging], op)
allUnseenCaseStr[regShapeMerging] = append(allUnseenCaseStr[regShapeMerging], caseStr+"Merging")
}
regInfoSet[regShapeMerging] = append(regInfoSet[regShapeMerging], caseStr+"Merging")
}
return nil
}
for _, op := range ops {
shapeIn, shapeOut, maskType, _, gOp := op.shape()
asm := machineOpName(maskType, gOp)
if _, ok := seen[asm]; ok {
continue
}
seen[asm] = struct{}{}
caseStr := fmt.Sprintf("ssa.OpAMD64%s", asm)
isZeroMasking := false
if shapeIn == OneKmaskIn || shapeIn == OneKmaskImmIn {
if gOp.Zeroing == nil || *gOp.Zeroing {
ZeroingMask = append(ZeroingMask, caseStr)
isZeroMasking = true
}
}
if err := classifyOp(op, maskType, shapeIn, shapeOut, caseStr, NoMem); err != nil {
panic(err)
}
if op.MemFeatures != nil && *op.MemFeatures == "vbcst" {
// Make a full vec memory variant
op = rewriteLastVregToMem(op)
// Ignore the error
// an error could be triggered by [checkVecAsScalar].
// TODO: make [checkVecAsScalar] aware of mem ops.
if err := classifyOp(op, maskType, shapeIn, shapeOut, caseStr+"load", VregMemIn); err != nil {
if *Verbose {
log.Printf("Seen error: %e", err)
}
} else if isZeroMasking {
ZeroingMask = append(ZeroingMask, caseStr+"load")
}
}
}
if len(allUnseen) != 0 {
allKeys := make([]string, 0)
for k := range allUnseen {
allKeys = append(allKeys, k)
}
panic(fmt.Errorf("unsupported register constraint for prog, please update gen_simdssa.go and amd64/ssa.go: %+v\nAll keys: %v\n, cases: %v\n", allUnseen, allKeys, allUnseenCaseStr))
}
buffer := new(bytes.Buffer)
if err := ssaTemplates.ExecuteTemplate(buffer, "header", nil); err != nil {
panic(fmt.Errorf("failed to execute header template: %w", err))
}
for _, regShape := range regInfoKeys {
// Stable traversal of regInfoSet
cases := regInfoSet[regShape]
if len(cases) == 0 {
continue
}
data := tplSSAData{
Cases: strings.Join(cases, ",\n\t\t"),
Helper: "simd" + capitalizeFirst(regShape),
}
if err := ssaTemplates.ExecuteTemplate(buffer, "case", data); err != nil {
panic(fmt.Errorf("failed to execute case template for %s: %w", regShape, err))
}
}
if err := ssaTemplates.ExecuteTemplate(buffer, "footer", nil); err != nil {
panic(fmt.Errorf("failed to execute footer template: %w", err))
}
if len(ZeroingMask) != 0 {
if err := ssaTemplates.ExecuteTemplate(buffer, "zeroing", strings.Join(ZeroingMask, ",\n\t\t")); err != nil {
panic(fmt.Errorf("failed to execute footer template: %w", err))
}
}
if err := ssaTemplates.ExecuteTemplate(buffer, "ending", nil); err != nil {
panic(fmt.Errorf("failed to execute footer template: %w", err))
}
return buffer
}

View file

@ -0,0 +1,830 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"bufio"
"bytes"
"fmt"
"go/format"
"log"
"os"
"path/filepath"
"reflect"
"slices"
"sort"
"strings"
"text/template"
"unicode"
)
func templateOf(temp, name string) *template.Template {
t, err := template.New(name).Parse(temp)
if err != nil {
panic(fmt.Errorf("failed to parse template %s: %w", name, err))
}
return t
}
func createPath(goroot string, file string) (*os.File, error) {
fp := filepath.Join(goroot, file)
dir := filepath.Dir(fp)
err := os.MkdirAll(dir, 0755)
if err != nil {
return nil, fmt.Errorf("failed to create directory %s: %w", dir, err)
}
f, err := os.Create(fp)
if err != nil {
return nil, fmt.Errorf("failed to create file %s: %w", fp, err)
}
return f, nil
}
func formatWriteAndClose(out *bytes.Buffer, goroot string, file string) {
b, err := format.Source(out.Bytes())
if err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
fmt.Fprintf(os.Stderr, "%s\n", numberLines(out.Bytes()))
fmt.Fprintf(os.Stderr, "%v\n", err)
panic(err)
} else {
writeAndClose(b, goroot, file)
}
}
func writeAndClose(b []byte, goroot string, file string) {
ofile, err := createPath(goroot, file)
if err != nil {
panic(err)
}
ofile.Write(b)
ofile.Close()
}
// numberLines takes a slice of bytes, and returns a string where each line
// is numbered, starting from 1.
func numberLines(data []byte) string {
var buf bytes.Buffer
r := bytes.NewReader(data)
s := bufio.NewScanner(r)
for i := 1; s.Scan(); i++ {
fmt.Fprintf(&buf, "%d: %s\n", i, s.Text())
}
return buf.String()
}
type inShape uint8
type outShape uint8
type maskShape uint8
type immShape uint8
type memShape uint8
const (
InvalidIn inShape = iota
PureVregIn // vector register input only
OneKmaskIn // vector and kmask input
OneImmIn // vector and immediate input
OneKmaskImmIn // vector, kmask, and immediate inputs
PureKmaskIn // only mask inputs.
)
const (
InvalidOut outShape = iota
NoOut // no output
OneVregOut // (one) vector register output
OneGregOut // (one) general register output
OneKmaskOut // mask output
OneVregOutAtIn // the first input is also the output
)
const (
InvalidMask maskShape = iota
NoMask // no mask
OneMask // with mask (K1 to K7)
AllMasks // a K mask instruction (K0-K7)
)
const (
InvalidImm immShape = iota
NoImm // no immediate
ConstImm // const only immediate
VarImm // pure imm argument provided by the users
ConstVarImm // a combination of user arg and const
)
const (
InvalidMem memShape = iota
NoMem
VregMemIn // The instruction contains a mem input which is loading a vreg.
)
// opShape returns the several integers describing the shape of the operation,
// and modified versions of the op:
//
// opNoImm is op with its inputs excluding the const imm.
//
// This function does not modify op.
func (op *Operation) shape() (shapeIn inShape, shapeOut outShape, maskType maskShape, immType immShape,
opNoImm Operation) {
if len(op.Out) > 1 {
panic(fmt.Errorf("simdgen only supports 1 output: %s", op))
}
var outputReg int
if len(op.Out) == 1 {
outputReg = op.Out[0].AsmPos
if op.Out[0].Class == "vreg" {
shapeOut = OneVregOut
} else if op.Out[0].Class == "greg" {
shapeOut = OneGregOut
} else if op.Out[0].Class == "mask" {
shapeOut = OneKmaskOut
} else {
panic(fmt.Errorf("simdgen only supports output of class vreg or mask: %s", op))
}
} else {
shapeOut = NoOut
// TODO: are these only Load/Stores?
// We manually supported two Load and Store, are those enough?
panic(fmt.Errorf("simdgen only supports 1 output: %s", op))
}
hasImm := false
maskCount := 0
hasVreg := false
for _, in := range op.In {
if in.AsmPos == outputReg {
if shapeOut != OneVregOutAtIn && in.AsmPos == 0 && in.Class == "vreg" {
shapeOut = OneVregOutAtIn
} else {
panic(fmt.Errorf("simdgen only support output and input sharing the same position case of \"the first input is vreg and the only output\": %s", op))
}
}
if in.Class == "immediate" {
// A manual check on XED data found that AMD64 SIMD instructions at most
// have 1 immediates. So we don't need to check this here.
if *in.Bits != 8 {
panic(fmt.Errorf("simdgen only supports immediates of 8 bits: %s", op))
}
hasImm = true
} else if in.Class == "mask" {
maskCount++
} else {
hasVreg = true
}
}
opNoImm = *op
removeImm := func(o *Operation) {
o.In = o.In[1:]
}
if hasImm {
removeImm(&opNoImm)
if op.In[0].Const != nil {
if op.In[0].ImmOffset != nil {
immType = ConstVarImm
} else {
immType = ConstImm
}
} else if op.In[0].ImmOffset != nil {
immType = VarImm
} else {
panic(fmt.Errorf("simdgen requires imm to have at least one of ImmOffset or Const set: %s", op))
}
} else {
immType = NoImm
}
if maskCount == 0 {
maskType = NoMask
} else {
maskType = OneMask
}
checkPureMask := func() bool {
if hasImm {
panic(fmt.Errorf("simdgen does not support immediates in pure mask operations: %s", op))
}
if hasVreg {
panic(fmt.Errorf("simdgen does not support more than 1 masks in non-pure mask operations: %s", op))
}
return false
}
if !hasImm && maskCount == 0 {
shapeIn = PureVregIn
} else if !hasImm && maskCount > 0 {
if maskCount == 1 {
shapeIn = OneKmaskIn
} else {
if checkPureMask() {
return
}
shapeIn = PureKmaskIn
maskType = AllMasks
}
} else if hasImm && maskCount == 0 {
shapeIn = OneImmIn
} else {
if maskCount == 1 {
shapeIn = OneKmaskImmIn
} else {
checkPureMask()
return
}
}
return
}
// regShape returns a string representation of the register shape.
func (op *Operation) regShape(mem memShape) (string, error) {
_, _, _, _, gOp := op.shape()
var regInfo, fixedName string
var vRegInCnt, gRegInCnt, kMaskInCnt, vRegOutCnt, gRegOutCnt, kMaskOutCnt, memInCnt, memOutCnt int
for i, in := range gOp.In {
switch in.Class {
case "vreg":
vRegInCnt++
case "greg":
gRegInCnt++
case "mask":
kMaskInCnt++
case "memory":
if mem != VregMemIn {
panic("simdgen only knows VregMemIn in regShape")
}
memInCnt++
vRegInCnt++
}
if in.FixedReg != nil {
fixedName = fmt.Sprintf("%sAtIn%d", *in.FixedReg, i)
}
}
for i, out := range gOp.Out {
// If class overwrite is happening, that's not really a mask but a vreg.
if out.Class == "vreg" || out.OverwriteClass != nil {
vRegOutCnt++
} else if out.Class == "greg" {
gRegOutCnt++
} else if out.Class == "mask" {
kMaskOutCnt++
} else if out.Class == "memory" {
if mem != VregMemIn {
panic("simdgen only knows VregMemIn in regShape")
}
vRegOutCnt++
memOutCnt++
}
if out.FixedReg != nil {
fixedName = fmt.Sprintf("%sAtIn%d", *out.FixedReg, i)
}
}
var inRegs, inMasks, outRegs, outMasks string
rmAbbrev := func(s string, i int) string {
if i == 0 {
return ""
}
if i == 1 {
return s
}
return fmt.Sprintf("%s%d", s, i)
}
inRegs = rmAbbrev("v", vRegInCnt)
inRegs += rmAbbrev("gp", gRegInCnt)
inMasks = rmAbbrev("k", kMaskInCnt)
outRegs = rmAbbrev("v", vRegOutCnt)
outRegs += rmAbbrev("gp", gRegOutCnt)
outMasks = rmAbbrev("k", kMaskOutCnt)
if kMaskInCnt == 0 && kMaskOutCnt == 0 && gRegInCnt == 0 && gRegOutCnt == 0 {
// For pure v we can abbreviate it as v%d%d.
regInfo = fmt.Sprintf("v%d%d", vRegInCnt, vRegOutCnt)
} else if kMaskInCnt == 0 && kMaskOutCnt == 0 {
regInfo = fmt.Sprintf("%s%s", inRegs, outRegs)
} else {
regInfo = fmt.Sprintf("%s%s%s%s", inRegs, inMasks, outRegs, outMasks)
}
if memInCnt > 0 {
if memInCnt == 1 {
regInfo += "load"
} else {
panic("simdgen does not understand more than 1 mem op as of now")
}
}
if memOutCnt > 0 {
panic("simdgen does not understand memory as output as of now")
}
regInfo += fixedName
return regInfo, nil
}
// sortOperand sorts op.In by putting immediates first, then vreg, and mask the last.
// TODO: verify that this is a safe assumption of the prog structure.
// from my observation looks like in asm, imms are always the first,
// masks are always the last, with vreg in between.
func (op *Operation) sortOperand() {
priority := map[string]int{"immediate": 0, "vreg": 1, "greg": 1, "mask": 2}
sort.SliceStable(op.In, func(i, j int) bool {
pi := priority[op.In[i].Class]
pj := priority[op.In[j].Class]
if pi != pj {
return pi < pj
}
return op.In[i].AsmPos < op.In[j].AsmPos
})
}
// goNormalType returns the Go type name for the result of an Op that
// does not return a vector, i.e., that returns a result in a general
// register. Currently there's only one family of Ops in Go's simd library
// that does this (GetElem), and so this is specialized to work for that,
// but the problem (mismatch betwen hardware register width and Go type
// width) seems likely to recur if there are any other cases.
func (op Operation) goNormalType() string {
if op.Go == "GetElem" {
// GetElem returns an element of the vector into a general register
// but as far as the hardware is concerned, that result is either 32
// or 64 bits wide, no matter what the vector element width is.
// This is not "wrong" but it is not the right answer for Go source code.
// To get the Go type right, combine the base type ("int", "uint", "float"),
// with the input vector element width in bits (8,16,32,64).
at := 0 // proper value of at depends on whether immediate was stripped or not
if op.In[at].Class == "immediate" {
at++
}
return fmt.Sprintf("%s%d", *op.Out[0].Base, *op.In[at].ElemBits)
}
panic(fmt.Errorf("Implement goNormalType for %v", op))
}
// SSAType returns the string for the type reference in SSA generation,
// for example in the intrinsics generating template.
func (op Operation) SSAType() string {
if op.Out[0].Class == "greg" {
return fmt.Sprintf("types.Types[types.T%s]", strings.ToUpper(op.goNormalType()))
}
return fmt.Sprintf("types.TypeVec%d", *op.Out[0].Bits)
}
// GoType returns the Go type returned by this operation (relative to the simd package),
// for example "int32" or "Int8x16". This is used in a template.
func (op Operation) GoType() string {
if op.Out[0].Class == "greg" {
return op.goNormalType()
}
return *op.Out[0].Go
}
// ImmName returns the name to use for an operation's immediate operand.
// This can be overriden in the yaml with "name" on an operand,
// otherwise, for now, "constant"
func (op Operation) ImmName() string {
return op.Op0Name("constant")
}
func (o Operand) OpName(s string) string {
if n := o.Name; n != nil {
return *n
}
if o.Class == "mask" {
return "mask"
}
return s
}
func (o Operand) OpNameAndType(s string) string {
return o.OpName(s) + " " + *o.Go
}
// GoExported returns [Go] with first character capitalized.
func (op Operation) GoExported() string {
return capitalizeFirst(op.Go)
}
// DocumentationExported returns [Documentation] with method name capitalized.
func (op Operation) DocumentationExported() string {
return strings.ReplaceAll(op.Documentation, op.Go, op.GoExported())
}
// Op0Name returns the name to use for the 0 operand,
// if any is present, otherwise the parameter is used.
func (op Operation) Op0Name(s string) string {
return op.In[0].OpName(s)
}
// Op1Name returns the name to use for the 1 operand,
// if any is present, otherwise the parameter is used.
func (op Operation) Op1Name(s string) string {
return op.In[1].OpName(s)
}
// Op2Name returns the name to use for the 2 operand,
// if any is present, otherwise the parameter is used.
func (op Operation) Op2Name(s string) string {
return op.In[2].OpName(s)
}
// Op3Name returns the name to use for the 3 operand,
// if any is present, otherwise the parameter is used.
func (op Operation) Op3Name(s string) string {
return op.In[3].OpName(s)
}
// Op0NameAndType returns the name and type to use for
// the 0 operand, if a name is provided, otherwise
// the parameter value is used as the default.
func (op Operation) Op0NameAndType(s string) string {
return op.In[0].OpNameAndType(s)
}
// Op1NameAndType returns the name and type to use for
// the 1 operand, if a name is provided, otherwise
// the parameter value is used as the default.
func (op Operation) Op1NameAndType(s string) string {
return op.In[1].OpNameAndType(s)
}
// Op2NameAndType returns the name and type to use for
// the 2 operand, if a name is provided, otherwise
// the parameter value is used as the default.
func (op Operation) Op2NameAndType(s string) string {
return op.In[2].OpNameAndType(s)
}
// Op3NameAndType returns the name and type to use for
// the 3 operand, if a name is provided, otherwise
// the parameter value is used as the default.
func (op Operation) Op3NameAndType(s string) string {
return op.In[3].OpNameAndType(s)
}
// Op4NameAndType returns the name and type to use for
// the 4 operand, if a name is provided, otherwise
// the parameter value is used as the default.
func (op Operation) Op4NameAndType(s string) string {
return op.In[4].OpNameAndType(s)
}
var immClasses []string = []string{"BAD0Imm", "BAD1Imm", "op1Imm8", "op2Imm8", "op3Imm8", "op4Imm8"}
var classes []string = []string{"BAD0", "op1", "op2", "op3", "op4"}
// classifyOp returns a classification string, modified operation, and perhaps error based
// on the stub and intrinsic shape for the operation.
// The classification string is in the regular expression set "op[1234](Imm8)?(_<order>)?"
// where the "<order>" suffix is optionally attached to the Operation in its input yaml.
// The classification string is used to select a template or a clause of a template
// for intrinsics declaration and the ssagen intrinisics glue code in the compiler.
func classifyOp(op Operation) (string, Operation, error) {
_, _, _, immType, gOp := op.shape()
var class string
if immType == VarImm || immType == ConstVarImm {
switch l := len(op.In); l {
case 1:
return "", op, fmt.Errorf("simdgen does not recognize this operation of only immediate input: %s", op)
case 2, 3, 4, 5:
class = immClasses[l]
default:
return "", op, fmt.Errorf("simdgen does not recognize this operation of input length %d: %s", len(op.In), op)
}
if order := op.OperandOrder; order != nil {
class += "_" + *order
}
return class, op, nil
} else {
switch l := len(gOp.In); l {
case 1, 2, 3, 4:
class = classes[l]
default:
return "", op, fmt.Errorf("simdgen does not recognize this operation of input length %d: %s", len(op.In), op)
}
if order := op.OperandOrder; order != nil {
class += "_" + *order
}
return class, gOp, nil
}
}
func checkVecAsScalar(op Operation) (idx int, err error) {
idx = -1
sSize := 0
for i, o := range op.In {
if o.TreatLikeAScalarOfSize != nil {
if idx == -1 {
idx = i
sSize = *o.TreatLikeAScalarOfSize
} else {
err = fmt.Errorf("simdgen only supports one TreatLikeAScalarOfSize in the arg list: %s", op)
return
}
}
}
if idx >= 0 {
if sSize != 8 && sSize != 16 && sSize != 32 && sSize != 64 {
err = fmt.Errorf("simdgen does not recognize this uint size: %d, %s", sSize, op)
return
}
}
return
}
func rewriteVecAsScalarRegInfo(op Operation, regInfo string) (string, error) {
idx, err := checkVecAsScalar(op)
if err != nil {
return "", err
}
if idx != -1 {
if regInfo == "v21" {
regInfo = "vfpv"
} else if regInfo == "v2kv" {
regInfo = "vfpkv"
} else if regInfo == "v31" {
regInfo = "v2fpv"
} else if regInfo == "v3kv" {
regInfo = "v2fpkv"
} else {
return "", fmt.Errorf("simdgen does not recognize uses of treatLikeAScalarOfSize with op regShape %s in op: %s", regInfo, op)
}
}
return regInfo, nil
}
func rewriteLastVregToMem(op Operation) Operation {
newIn := make([]Operand, len(op.In))
lastVregIdx := -1
for i := range len(op.In) {
newIn[i] = op.In[i]
if op.In[i].Class == "vreg" {
lastVregIdx = i
}
}
// vbcst operations put their mem op always as the last vreg.
if lastVregIdx == -1 {
panic("simdgen cannot find one vreg in the mem op vreg original")
}
newIn[lastVregIdx].Class = "memory"
op.In = newIn
return op
}
// dedup is deduping operations in the full structure level.
func dedup(ops []Operation) (deduped []Operation) {
for _, op := range ops {
seen := false
for _, dop := range deduped {
if reflect.DeepEqual(op, dop) {
seen = true
break
}
}
if !seen {
deduped = append(deduped, op)
}
}
return
}
func (op Operation) GenericName() string {
if op.OperandOrder != nil {
switch *op.OperandOrder {
case "21Type1", "231Type1":
// Permute uses operand[1] for method receiver.
return op.Go + *op.In[1].Go
}
}
if op.In[0].Class == "immediate" {
return op.Go + *op.In[1].Go
}
return op.Go + *op.In[0].Go
}
// dedupGodef is deduping operations in [Op.Go]+[*Op.In[0].Go] level.
// By deduping, it means picking the least advanced architecture that satisfy the requirement:
// AVX512 will be least preferred.
// If FlagNoDedup is set, it will report the duplicates to the console.
func dedupGodef(ops []Operation) ([]Operation, error) {
seen := map[string][]Operation{}
for _, op := range ops {
_, _, _, _, gOp := op.shape()
gN := gOp.GenericName()
seen[gN] = append(seen[gN], op)
}
if *FlagReportDup {
for gName, dup := range seen {
if len(dup) > 1 {
log.Printf("Duplicate for %s:\n", gName)
for _, op := range dup {
log.Printf("%s\n", op)
}
}
}
return ops, nil
}
isAVX512 := func(op Operation) bool {
return strings.Contains(op.CPUFeature, "AVX512")
}
deduped := []Operation{}
for _, dup := range seen {
if len(dup) > 1 {
slices.SortFunc(dup, func(i, j Operation) int {
// Put non-AVX512 candidates at the beginning
if !isAVX512(i) && isAVX512(j) {
return -1
}
if isAVX512(i) && !isAVX512(j) {
return 1
}
if i.CPUFeature != j.CPUFeature {
return strings.Compare(i.CPUFeature, j.CPUFeature)
}
// Weirdly Intel sometimes has duplicated definitions for the same instruction,
// this confuses the XED mem-op merge logic: [MemFeature] will only be attached to an instruction
// for only once, which means that for essentially duplicated instructions only one will have the
// proper [MemFeature] set. We have to make this sort deterministic for [MemFeature].
if i.MemFeatures != nil && j.MemFeatures == nil {
return -1
}
if i.MemFeatures == nil && j.MemFeatures != nil {
return 1
}
// Their order does not matter anymore, at least for now.
return 0
})
}
deduped = append(deduped, dup[0])
}
slices.SortFunc(deduped, compareOperations)
return deduped, nil
}
// Copy op.ConstImm to op.In[0].Const
// This is a hack to reduce the size of defs we need for const imm operations.
func copyConstImm(ops []Operation) error {
for _, op := range ops {
if op.ConstImm == nil {
continue
}
_, _, _, immType, _ := op.shape()
if immType == ConstImm || immType == ConstVarImm {
op.In[0].Const = op.ConstImm
}
// Otherwise, just not port it - e.g. {VPCMP[BWDQ] imm=0} and {VPCMPEQ[BWDQ]} are
// the same operations "Equal", [dedupgodef] should be able to distinguish them.
}
return nil
}
func capitalizeFirst(s string) string {
if s == "" {
return ""
}
// Convert the string to a slice of runes to handle multi-byte characters correctly.
r := []rune(s)
r[0] = unicode.ToUpper(r[0])
return string(r)
}
// overwrite corrects some errors due to:
// - The XED data is wrong
// - Go's SIMD API requirement, for example AVX2 compares should also produce masks.
// This rewrite has strict constraints, please see the error message.
// These constraints are also explointed in [writeSIMDRules], [writeSIMDMachineOps]
// and [writeSIMDSSA], please be careful when updating these constraints.
func overwrite(ops []Operation) error {
hasClassOverwrite := false
overwrite := func(op []Operand, idx int, o Operation) error {
if op[idx].OverwriteElementBits != nil {
if op[idx].ElemBits == nil {
panic(fmt.Errorf("ElemBits is nil at operand %d of %v", idx, o))
}
*op[idx].ElemBits = *op[idx].OverwriteElementBits
*op[idx].Lanes = *op[idx].Bits / *op[idx].ElemBits
*op[idx].Go = fmt.Sprintf("%s%dx%d", capitalizeFirst(*op[idx].Base), *op[idx].ElemBits, *op[idx].Lanes)
}
if op[idx].OverwriteClass != nil {
if op[idx].OverwriteBase == nil {
panic(fmt.Errorf("simdgen: [OverwriteClass] must be set together with [OverwriteBase]: %s", op[idx]))
}
oBase := *op[idx].OverwriteBase
oClass := *op[idx].OverwriteClass
if oClass != "mask" {
panic(fmt.Errorf("simdgen: [Class] overwrite only supports overwritting to mask: %s", op[idx]))
}
if oBase != "int" {
panic(fmt.Errorf("simdgen: [Class] overwrite must set [OverwriteBase] to int: %s", op[idx]))
}
if op[idx].Class != "vreg" {
panic(fmt.Errorf("simdgen: [Class] overwrite must be overwriting [Class] from vreg: %s", op[idx]))
}
hasClassOverwrite = true
*op[idx].Base = oBase
op[idx].Class = oClass
*op[idx].Go = fmt.Sprintf("Mask%dx%d", *op[idx].ElemBits, *op[idx].Lanes)
} else if op[idx].OverwriteBase != nil {
oBase := *op[idx].OverwriteBase
*op[idx].Go = strings.ReplaceAll(*op[idx].Go, capitalizeFirst(*op[idx].Base), capitalizeFirst(oBase))
if op[idx].Class == "greg" {
*op[idx].Go = strings.ReplaceAll(*op[idx].Go, *op[idx].Base, oBase)
}
*op[idx].Base = oBase
}
return nil
}
for i, o := range ops {
hasClassOverwrite = false
for j := range ops[i].In {
if err := overwrite(ops[i].In, j, o); err != nil {
return err
}
if hasClassOverwrite {
return fmt.Errorf("simdgen does not support [OverwriteClass] in inputs: %s", ops[i])
}
}
for j := range ops[i].Out {
if err := overwrite(ops[i].Out, j, o); err != nil {
return err
}
}
if hasClassOverwrite {
for _, in := range ops[i].In {
if in.Class == "mask" {
return fmt.Errorf("simdgen only supports [OverwriteClass] for operations without mask inputs")
}
}
}
}
return nil
}
// reportXEDInconsistency reports potential XED inconsistencies.
// We can add more fields to [Operation] to enable more checks and implement it here.
// Supported checks:
// [NameAndSizeCheck]: NAME[BWDQ] should set the elemBits accordingly.
// This check is useful to find inconsistencies, then we can add overwrite fields to
// those defs to correct them manually.
func reportXEDInconsistency(ops []Operation) error {
for _, o := range ops {
if o.NameAndSizeCheck != nil {
suffixSizeMap := map[byte]int{'B': 8, 'W': 16, 'D': 32, 'Q': 64}
checkOperand := func(opr Operand) error {
if opr.ElemBits == nil {
return fmt.Errorf("simdgen expects elemBits to be set when performing NameAndSizeCheck")
}
if v, ok := suffixSizeMap[o.Asm[len(o.Asm)-1]]; !ok {
return fmt.Errorf("simdgen expects asm to end with [BWDQ] when performing NameAndSizeCheck")
} else {
if v != *opr.ElemBits {
return fmt.Errorf("simdgen finds NameAndSizeCheck inconsistency in def: %s", o)
}
}
return nil
}
for _, in := range o.In {
if in.Class != "vreg" && in.Class != "mask" {
continue
}
if in.TreatLikeAScalarOfSize != nil {
// This is an irregular operand, don't check it.
continue
}
if err := checkOperand(in); err != nil {
return err
}
}
for _, out := range o.Out {
if err := checkOperand(out); err != nil {
return err
}
}
}
}
return nil
}
func (o *Operation) hasMaskedMerging(maskType maskShape, outType outShape) bool {
// BLEND and VMOVDQU are not user-facing ops so we should filter them out.
return o.OperandOrder == nil && o.SpecialLower == nil && maskType == OneMask && outType == OneVregOut &&
len(o.InVariant) == 1 && !strings.Contains(o.Asm, "BLEND") && !strings.Contains(o.Asm, "VMOVDQU")
}
func getVbcstData(s string) (feat1Match, feat2Match string) {
_, err := fmt.Sscanf(s, "feat1=%[^;];feat2=%s", &feat1Match, &feat2Match)
if err != nil {
panic(err)
}
return
}
func (o Operation) String() string {
return pprints(o)
}
func (op Operand) String() string {
return pprints(op)
}

View file

@ -0,0 +1 @@
!import ops/*/go.yaml

View file

@ -0,0 +1,438 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
import (
"fmt"
"log"
"regexp"
"slices"
"strconv"
"strings"
"unicode"
"simd/_gen/unify"
)
type Operation struct {
rawOperation
// Go is the Go method name of this operation.
//
// It is derived from the raw Go method name by adding optional suffixes.
// Currently, "Masked" is the only suffix.
Go string
// Documentation is the doc string for this API.
//
// It is computed from the raw documentation:
//
// - "NAME" is replaced by the Go method name.
//
// - For masked operation, a sentence about masking is added.
Documentation string
// In is the sequence of parameters to the Go method.
//
// For masked operations, this will have the mask operand appended.
In []Operand
}
// rawOperation is the unifier representation of an [Operation]. It is
// translated into a more parsed form after unifier decoding.
type rawOperation struct {
Go string // Base Go method name
GoArch string // GOARCH for this definition
Asm string // Assembly mnemonic
OperandOrder *string // optional Operand order for better Go declarations
// Optional tag to indicate this operation is paired with special generic->machine ssa lowering rules.
// Should be paired with special templates in gen_simdrules.go
SpecialLower *string
In []Operand // Parameters
InVariant []Operand // Optional parameters
Out []Operand // Results
MemFeatures *string // The memory operand feature this operation supports
MemFeaturesData *string // Additional data associated with MemFeatures
Commutative bool // Commutativity
CPUFeature string // CPUID/Has* feature name
Zeroing *bool // nil => use asm suffix ".Z"; false => do not use asm suffix ".Z"
Documentation *string // Documentation will be appended to the stubs comments.
AddDoc *string // Additional doc to be appended.
// ConstMask is a hack to reduce the size of defs the user writes for const-immediate
// If present, it will be copied to [In[0].Const].
ConstImm *string
// NameAndSizeCheck is used to check [BWDQ] maps to (8|16|32|64) elemBits.
NameAndSizeCheck *bool
// If non-nil, all generation in gen_simdTypes.go and gen_intrinsics will be skipped.
NoTypes *string
// If non-nil, all generation in gen_simdGenericOps and gen_simdrules will be skipped.
NoGenericOps *string
// If non-nil, this string will be attached to the machine ssa op name. E.g. "const"
SSAVariant *string
// If true, do not emit method declarations, generic ops, or intrinsics for masked variants
// DO emit the architecture-specific opcodes and optimizations.
HideMaskMethods *bool
}
func (o *Operation) IsMasked() bool {
if len(o.InVariant) == 0 {
return false
}
if len(o.InVariant) == 1 && o.InVariant[0].Class == "mask" {
return true
}
panic(fmt.Errorf("unknown inVariant"))
}
func (o *Operation) SkipMaskedMethod() bool {
if o.HideMaskMethods == nil {
return false
}
if *o.HideMaskMethods && o.IsMasked() {
return true
}
return false
}
var reForName = regexp.MustCompile(`\bNAME\b`)
func (o *Operation) DecodeUnified(v *unify.Value) error {
if err := v.Decode(&o.rawOperation); err != nil {
return err
}
isMasked := o.IsMasked()
// Compute full Go method name.
o.Go = o.rawOperation.Go
if isMasked {
o.Go += "Masked"
}
// Compute doc string.
if o.rawOperation.Documentation != nil {
o.Documentation = *o.rawOperation.Documentation
} else {
o.Documentation = "// UNDOCUMENTED"
}
o.Documentation = reForName.ReplaceAllString(o.Documentation, o.Go)
if isMasked {
o.Documentation += "\n//\n// This operation is applied selectively under a write mask."
// Suppress generic op and method declaration for exported methods, if a mask is present.
if unicode.IsUpper([]rune(o.Go)[0]) {
trueVal := "true"
o.NoGenericOps = &trueVal
o.NoTypes = &trueVal
}
}
if o.rawOperation.AddDoc != nil {
o.Documentation += "\n" + reForName.ReplaceAllString(*o.rawOperation.AddDoc, o.Go)
}
o.In = append(o.rawOperation.In, o.rawOperation.InVariant...)
return nil
}
func (o *Operation) VectorWidth() int {
out := o.Out[0]
if out.Class == "vreg" {
return *out.Bits
} else if out.Class == "greg" || out.Class == "mask" {
for i := range o.In {
if o.In[i].Class == "vreg" {
return *o.In[i].Bits
}
}
}
panic(fmt.Errorf("Figure out what the vector width is for %v and implement it", *o))
}
// Right now simdgen computes the machine op name for most instructions
// as $Name$OutputSize, by this denotation, these instructions are "overloaded".
// for example:
// (Uint16x8) ConvertToInt8
// (Uint16x16) ConvertToInt8
// are both VPMOVWB128.
// To make them distinguishable we need to append the input size to them as well.
// TODO: document them well in the generated code.
var demotingConvertOps = map[string]bool{
"VPMOVQD128": true, "VPMOVSQD128": true, "VPMOVUSQD128": true, "VPMOVQW128": true, "VPMOVSQW128": true,
"VPMOVUSQW128": true, "VPMOVDW128": true, "VPMOVSDW128": true, "VPMOVUSDW128": true, "VPMOVQB128": true,
"VPMOVSQB128": true, "VPMOVUSQB128": true, "VPMOVDB128": true, "VPMOVSDB128": true, "VPMOVUSDB128": true,
"VPMOVWB128": true, "VPMOVSWB128": true, "VPMOVUSWB128": true,
"VPMOVQDMasked128": true, "VPMOVSQDMasked128": true, "VPMOVUSQDMasked128": true, "VPMOVQWMasked128": true, "VPMOVSQWMasked128": true,
"VPMOVUSQWMasked128": true, "VPMOVDWMasked128": true, "VPMOVSDWMasked128": true, "VPMOVUSDWMasked128": true, "VPMOVQBMasked128": true,
"VPMOVSQBMasked128": true, "VPMOVUSQBMasked128": true, "VPMOVDBMasked128": true, "VPMOVSDBMasked128": true, "VPMOVUSDBMasked128": true,
"VPMOVWBMasked128": true, "VPMOVSWBMasked128": true, "VPMOVUSWBMasked128": true,
}
func machineOpName(maskType maskShape, gOp Operation) string {
asm := gOp.Asm
if maskType == OneMask {
asm += "Masked"
}
asm = fmt.Sprintf("%s%d", asm, gOp.VectorWidth())
if gOp.SSAVariant != nil {
asm += *gOp.SSAVariant
}
if demotingConvertOps[asm] {
// Need to append the size of the source as well.
// TODO: should be "%sto%d".
asm = fmt.Sprintf("%s_%d", asm, *gOp.In[0].Bits)
}
return asm
}
func compareStringPointers(x, y *string) int {
if x != nil && y != nil {
return compareNatural(*x, *y)
}
if x == nil && y == nil {
return 0
}
if x == nil {
return -1
}
return 1
}
func compareIntPointers(x, y *int) int {
if x != nil && y != nil {
return *x - *y
}
if x == nil && y == nil {
return 0
}
if x == nil {
return -1
}
return 1
}
func compareOperations(x, y Operation) int {
if c := compareNatural(x.Go, y.Go); c != 0 {
return c
}
xIn, yIn := x.In, y.In
if len(xIn) > len(yIn) && xIn[len(xIn)-1].Class == "mask" {
xIn = xIn[:len(xIn)-1]
} else if len(xIn) < len(yIn) && yIn[len(yIn)-1].Class == "mask" {
yIn = yIn[:len(yIn)-1]
}
if len(xIn) < len(yIn) {
return -1
}
if len(xIn) > len(yIn) {
return 1
}
if len(x.Out) < len(y.Out) {
return -1
}
if len(x.Out) > len(y.Out) {
return 1
}
for i := range xIn {
ox, oy := &xIn[i], &yIn[i]
if c := compareOperands(ox, oy); c != 0 {
return c
}
}
return 0
}
func compareOperands(x, y *Operand) int {
if c := compareNatural(x.Class, y.Class); c != 0 {
return c
}
if x.Class == "immediate" {
return compareStringPointers(x.ImmOffset, y.ImmOffset)
} else {
if c := compareStringPointers(x.Base, y.Base); c != 0 {
return c
}
if c := compareIntPointers(x.ElemBits, y.ElemBits); c != 0 {
return c
}
if c := compareIntPointers(x.Bits, y.Bits); c != 0 {
return c
}
return 0
}
}
type Operand struct {
Class string // One of "mask", "immediate", "vreg", "greg", and "mem"
Go *string // Go type of this operand
AsmPos int // Position of this operand in the assembly instruction
Base *string // Base Go type ("int", "uint", "float")
ElemBits *int // Element bit width
Bits *int // Total vector bit width
Const *string // Optional constant value for immediates.
// Optional immediate arg offsets. If this field is non-nil,
// This operand will be an immediate operand:
// The compiler will right-shift the user-passed value by ImmOffset and set it as the AuxInt
// field of the operation.
ImmOffset *string
Name *string // optional name in the Go intrinsic declaration
Lanes *int // *Lanes equals Bits/ElemBits except for scalars, when *Lanes == 1
// TreatLikeAScalarOfSize means only the lower $TreatLikeAScalarOfSize bits of the vector
// is used, so at the API level we can make it just a scalar value of this size; Then we
// can overwrite it to a vector of the right size during intrinsics stage.
TreatLikeAScalarOfSize *int
// If non-nil, it means the [Class] field is overwritten here, right now this is used to
// overwrite the results of AVX2 compares to masks.
OverwriteClass *string
// If non-nil, it means the [Base] field is overwritten here. This field exist solely
// because Intel's XED data is inconsistent. e.g. VANDNP[SD] marks its operand int.
OverwriteBase *string
// If non-nil, it means the [ElementBits] field is overwritten. This field exist solely
// because Intel's XED data is inconsistent. e.g. AVX512 VPMADDUBSW marks its operand
// elemBits 16, which should be 8.
OverwriteElementBits *int
// FixedReg is the name of the fixed registers
FixedReg *string
}
// isDigit returns true if the byte is an ASCII digit.
func isDigit(b byte) bool {
return b >= '0' && b <= '9'
}
// compareNatural performs a "natural sort" comparison of two strings.
// It compares non-digit sections lexicographically and digit sections
// numerically. In the case of string-unequal "equal" strings like
// "a01b" and "a1b", strings.Compare breaks the tie.
//
// It returns:
//
// -1 if s1 < s2
// 0 if s1 == s2
// +1 if s1 > s2
func compareNatural(s1, s2 string) int {
i, j := 0, 0
len1, len2 := len(s1), len(s2)
for i < len1 && j < len2 {
// Find a non-digit segment or a number segment in both strings.
if isDigit(s1[i]) && isDigit(s2[j]) {
// Number segment comparison.
numStart1 := i
for i < len1 && isDigit(s1[i]) {
i++
}
num1, _ := strconv.Atoi(s1[numStart1:i])
numStart2 := j
for j < len2 && isDigit(s2[j]) {
j++
}
num2, _ := strconv.Atoi(s2[numStart2:j])
if num1 < num2 {
return -1
}
if num1 > num2 {
return 1
}
// If numbers are equal, continue to the next segment.
} else {
// Non-digit comparison.
if s1[i] < s2[j] {
return -1
}
if s1[i] > s2[j] {
return 1
}
i++
j++
}
}
// deal with a01b vs a1b; there needs to be an order.
return strings.Compare(s1, s2)
}
const generatedHeader = `// Code generated by x/arch/internal/simdgen using 'go run . -xedPath $XED_PATH -o godefs -goroot $GOROOT go.yaml types.yaml categories.yaml'; DO NOT EDIT.
`
func writeGoDefs(path string, cl unify.Closure) error {
// TODO: Merge operations with the same signature but multiple
// implementations (e.g., SSE vs AVX)
var ops []Operation
for def := range cl.All() {
var op Operation
if !def.Exact() {
continue
}
if err := def.Decode(&op); err != nil {
log.Println(err.Error())
log.Println(def)
continue
}
// TODO: verify that this is safe.
op.sortOperand()
ops = append(ops, op)
}
slices.SortFunc(ops, compareOperations)
// The parsed XED data might contain duplicates, like
// 512 bits VPADDP.
deduped := dedup(ops)
slices.SortFunc(deduped, compareOperations)
if *Verbose {
log.Printf("dedup len: %d\n", len(ops))
}
var err error
if err = overwrite(deduped); err != nil {
return err
}
if *Verbose {
log.Printf("dedup len: %d\n", len(deduped))
}
if *Verbose {
log.Printf("dedup len: %d\n", len(deduped))
}
if !*FlagNoDedup {
// TODO: This can hide mistakes in the API definitions, especially when
// multiple patterns result in the same API unintentionally. Make it stricter.
if deduped, err = dedupGodef(deduped); err != nil {
return err
}
}
if *Verbose {
log.Printf("dedup len: %d\n", len(deduped))
}
if !*FlagNoConstImmPorting {
if err = copyConstImm(deduped); err != nil {
return err
}
}
if *Verbose {
log.Printf("dedup len: %d\n", len(deduped))
}
reportXEDInconsistency(deduped)
typeMap := parseSIMDTypes(deduped)
formatWriteAndClose(writeSIMDTypes(typeMap), path, "src/"+simdPackage+"/types_amd64.go")
formatWriteAndClose(writeSIMDFeatures(deduped), path, "src/"+simdPackage+"/cpu.go")
f, fI := writeSIMDStubs(deduped, typeMap)
formatWriteAndClose(f, path, "src/"+simdPackage+"/ops_amd64.go")
formatWriteAndClose(fI, path, "src/"+simdPackage+"/ops_internal_amd64.go")
formatWriteAndClose(writeSIMDIntrinsics(deduped, typeMap), path, "src/cmd/compile/internal/ssagen/simdintrinsics.go")
formatWriteAndClose(writeSIMDGenericOps(deduped), path, "src/cmd/compile/internal/ssa/_gen/simdgenericOps.go")
formatWriteAndClose(writeSIMDMachineOps(deduped), path, "src/cmd/compile/internal/ssa/_gen/simdAMD64ops.go")
formatWriteAndClose(writeSIMDSSA(deduped), path, "src/cmd/compile/internal/amd64/simdssa.go")
writeAndClose(writeSIMDRules(deduped).Bytes(), path, "src/cmd/compile/internal/ssa/_gen/simdAMD64.rules")
return nil
}

View file

@ -0,0 +1,281 @@
// Copyright 2025 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// simdgen is an experiment in generating Go <-> asm SIMD mappings.
//
// Usage: simdgen [-xedPath=path] [-q=query] input.yaml...
//
// If -xedPath is provided, one of the inputs is a sum of op-code definitions
// generated from the Intel XED data at path.
//
// If input YAML files are provided, each file is read as an input value. See
// [unify.Closure.UnmarshalYAML] or "go doc unify.Closure.UnmarshalYAML" for the
// format of these files.
//
// TODO: Example definitions and values.
//
// The command unifies across all of the inputs and prints all possible results
// of this unification.
//
// If the -q flag is provided, its string value is parsed as a value and treated
// as another input to unification. This is intended as a way to "query" the
// result, typically by narrowing it down to a small subset of results.
//
// Typical usage:
//
// go run . -xedPath $XEDPATH *.yaml
//
// To see just the definitions generated from XED, run:
//
// go run . -xedPath $XEDPATH
//
// (This works because if there's only one input, there's nothing to unify it
// with, so the result is simply itself.)
//
// To see just the definitions for VPADDQ:
//
// go run . -xedPath $XEDPATH -q '{asm: VPADDQ}'
//
// simdgen can also generate Go definitions of SIMD mappings:
// To generate go files to the go root, run:
//
// go run . -xedPath $XEDPATH -o godefs -goroot $PATH/TO/go go.yaml categories.yaml types.yaml
//
// types.yaml is already written, it specifies the shapes of vectors.
// categories.yaml and go.yaml contains definitions that unifies with types.yaml and XED
// data, you can find an example in ops/AddSub/.
//
// When generating Go definitions, simdgen do 3 "magic"s:
// - It splits masked operations(with op's [Masked] field set) to const and non const:
// - One is a normal masked operation, the original
// - The other has its mask operand's [Const] fields set to "K0".
// - This way the user does not need to provide a separate "K0"-masked operation def.
//
// - It deduplicates intrinsic names that have duplicates:
// - If there are two operations that shares the same signature, one is AVX512 the other
// is before AVX512, the other will be selected.
// - This happens often when some operations are defined both before AVX512 and after.
// This way the user does not need to provide a separate "K0" operation for the
// AVX512 counterpart.
//
// - It copies the op's [ConstImm] field to its immediate operand's [Const] field.
// - This way the user does not need to provide verbose op definition while only
// the const immediate field is different. This is useful to reduce verbosity of
// compares with imm control predicates.
//
// These 3 magics could be disabled by enabling -nosplitmask, -nodedup or
// -noconstimmporting flags.
//
// simdgen right now only supports amd64, -arch=$OTHERARCH will trigger a fatal error.
package main
// Big TODOs:
//
// - This can produce duplicates, which can also lead to less efficient
// environment merging. Add hashing and use it for deduplication. Be careful
// about how this shows up in debug traces, since it could make things
// confusing if we don't show it happening.
//
// - Do I need Closure, Value, and Domain? It feels like I should only need two
// types.
import (
"cmp"
"flag"
"fmt"
"log"
"maps"
"os"
"path/filepath"
"runtime/pprof"
"slices"
"strings"
"simd/_gen/unify"
"gopkg.in/yaml.v3"
)
var (
xedPath = flag.String("xedPath", "", "load XED datafiles from `path`")
flagQ = flag.String("q", "", "query: read `def` as another input (skips final validation)")
flagO = flag.String("o", "yaml", "output type: yaml, godefs (generate definitions into a Go source tree")
flagGoDefRoot = flag.String("goroot", ".", "the path to the Go dev directory that will receive the generated files")
FlagNoDedup = flag.Bool("nodedup", false, "disable deduplicating godefs of 2 qualifying operations from different extensions")
FlagNoConstImmPorting = flag.Bool("noconstimmporting", false, "disable const immediate porting from op to imm operand")
FlagArch = flag.String("arch", "amd64", "the target architecture")
Verbose = flag.Bool("v", false, "verbose")
flagDebugXED = flag.Bool("debug-xed", false, "show XED instructions")
flagDebugUnify = flag.Bool("debug-unify", false, "print unification trace")
flagDebugHTML = flag.String("debug-html", "", "write unification trace to `file.html`")
FlagReportDup = flag.Bool("reportdup", false, "report the duplicate godefs")
flagCPUProfile = flag.String("cpuprofile", "", "write CPU profile to `file`")
flagMemProfile = flag.String("memprofile", "", "write memory profile to `file`")
)
const simdPackage = "simd"
func main() {
flag.Parse()
if *flagCPUProfile != "" {
f, err := os.Create(*flagCPUProfile)
if err != nil {
log.Fatalf("-cpuprofile: %s", err)
}
defer f.Close()
pprof.StartCPUProfile(f)
defer pprof.StopCPUProfile()
}
if *flagMemProfile != "" {
f, err := os.Create(*flagMemProfile)
if err != nil {
log.Fatalf("-memprofile: %s", err)
}
defer func() {
pprof.WriteHeapProfile(f)
f.Close()
}()
}
var inputs []unify.Closure
if *FlagArch != "amd64" {
log.Fatalf("simdgen only supports amd64")
}
// Load XED into a defs set.
if *xedPath != "" {
xedDefs := loadXED(*xedPath)
inputs = append(inputs, unify.NewSum(xedDefs...))
}
// Load query.
if *flagQ != "" {
r := strings.NewReader(*flagQ)
def, err := unify.Read(r, "<query>", unify.ReadOpts{})
if err != nil {
log.Fatalf("parsing -q: %s", err)
}
inputs = append(inputs, def)
}
// Load defs files.
must := make(map[*unify.Value]struct{})
for _, path := range flag.Args() {
defs, err := unify.ReadFile(path, unify.ReadOpts{})
if err != nil {
log.Fatal(err)
}
inputs = append(inputs, defs)
if filepath.Base(path) == "go.yaml" {
// These must all be used in the final result
for def := range defs.Summands() {
must[def] = struct{}{}
}
}
}
// Prepare for unification
if *flagDebugUnify {
unify.Debug.UnifyLog = os.Stderr
}
if *flagDebugHTML != "" {
f, err := os.Create(*flagDebugHTML)
if err != nil {
log.Fatal(err)
}
unify.Debug.HTML = f
defer f.Close()
}
// Unify!
unified, err := unify.Unify(inputs...)
if err != nil {
log.Fatal(err)
}
// Validate results.
//
// Don't validate if this is a command-line query because that tends to
// eliminate lots of required defs and is used in cases where maybe defs
// aren't enumerable anyway.
if *flagQ == "" && len(must) > 0 {
validate(unified, must)
}
// Print results.
switch *flagO {
case "yaml":
// Produce a result that looks like encoding a slice, but stream it.
fmt.Println("!sum")
var val1 [1]*unify.Value
for val := range unified.All() {
val1[0] = val
// We have to make a new encoder each time or it'll print a document
// separator between each object.
enc := yaml.NewEncoder(os.Stdout)
if err := enc.Encode(val1); err != nil {
log.Fatal(err)
}
enc.Close()
}
case "godefs":
if err := writeGoDefs(*flagGoDefRoot, unified); err != nil {
log.Fatalf("Failed writing godefs: %+v", err)
}
}
if !*Verbose && *xedPath != "" {
if operandRemarks == 0 {
fmt.Fprintf(os.Stderr, "XED decoding generated no errors, which is unusual.\n")
} else {
fmt.Fprintf(os.Stderr, "XED decoding generated %d \"errors\" which is not cause for alarm, use -v for details.\n", operandRemarks)
}
}
}
func validate(cl unify.Closure, required map[*unify.Value]struct{}) {
// Validate that:
// 1. All final defs are exact
// 2. All required defs are used
for def := range cl.All() {
if _, ok := def.Domain.(unify.Def); !ok {
fmt.Fprintf(os.Stderr, "%s: expected Def, got %T\n", def.PosString(), def.Domain)
continue
}
if !def.Exact() {
fmt.Fprintf(os.Stderr, "%s: def not reduced to an exact value, why is %s:\n", def.PosString(), def.WhyNotExact())
fmt.Fprintf(os.Stderr, "\t%s\n", strings.ReplaceAll(def.String(), "\n", "\n\t"))
}
for root := range def.Provenance() {
delete(required, root)
}
}
// Report unused defs
unused := slices.SortedFunc(maps.Keys(required),
func(a, b *unify.Value) int {
return cmp.Or(
cmp.Compare(a.Pos().Path, b.Pos().Path),
cmp.Compare(a.Pos().Line, b.Pos().Line),
)
})
for _, def := range unused {
// TODO: Can we say anything more actionable? This is always a problem
// with unification: if it fails, it's very hard to point a finger at
// any particular reason. We could go back and try unifying this again
// with each subset of the inputs (starting with individual inputs) to
// at least say "it doesn't unify with anything in x.yaml". That's a lot
// of work, but if we have trouble debugging unification failure it may
// be worth it.
fmt.Fprintf(os.Stderr, "%s: def required, but did not unify (%v)\n",
def.PosString(), def)
}
}

View file

@ -0,0 +1,37 @@
!sum
- go: Add
commutative: true
documentation: !string |-
// NAME adds corresponding elements of two vectors.
- go: AddSaturated
commutative: true
documentation: !string |-
// NAME adds corresponding elements of two vectors with saturation.
- go: Sub
commutative: false
documentation: !string |-
// NAME subtracts corresponding elements of two vectors.
- go: SubSaturated
commutative: false
documentation: !string |-
// NAME subtracts corresponding elements of two vectors with saturation.
- go: AddPairs
commutative: false
documentation: !string |-
// NAME horizontally adds adjacent pairs of elements.
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
- go: SubPairs
commutative: false
documentation: !string |-
// NAME horizontally subtracts adjacent pairs of elements.
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
- go: AddPairsSaturated
commutative: false
documentation: !string |-
// NAME horizontally adds adjacent pairs of elements with saturation.
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
- go: SubPairsSaturated
commutative: false
documentation: !string |-
// NAME horizontally subtracts adjacent pairs of elements with saturation.
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].

Some files were not shown because too many files have changed in this diff Show more