mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
all: REVERSE MERGE dev.simd (7d65463) into master
This commit is a REVERSE MERGE. It merges dev.simd back into its parent branch, master. The development of simd will continue on (only) dev.simd, and it will be merged to the master branch when necessary. Merge List: + 2025-11-247d65463a54[dev.simd] all: merge master (e704b09) into dev.simd + 2025-11-24afd1721fc5[dev.simd] all: merge master (02d1f3a) into dev.simd + 2025-11-24a9914886da[dev.simd] internal/buildcfg: don't enable SIMD experiment by default + 2025-11-2461a5a6b016[dev.simd] simd: add goexperiment tag to generate.go + 2025-11-24f045ed4110[dev.simd] go/doc/comment: don't include experimental packages in std list + 2025-11-24220d73cc44[dev.simd] all: merge master (8dd5b13) into dev.simd + 2025-11-240c69e77343Revert "[dev.simd] internal/runtime/gc: add simd package based greentea kernels" + 2025-11-21da92168ec8[dev.simd] internal/runtime/gc: add simd package based greentea kernels + 2025-11-213fdd183aef[dev.simd] cmd/compile, simd: update conversion API names + 2025-11-21d3a0321dba[dev.simd] cmd/compile: fix incorrect mapping of SHA256MSG2128 + 2025-11-2074ebdd28d1[dev.simd] simd, cmd/compile: add more element types for Select128FromPair + 2025-11-204d26d66a49[dev.simd] simd: fix signatures for PermuteConstant* methods + 2025-11-20e3d4645693[dev.simd] all: merge master (ca37d24) into dev.simd + 2025-11-2095b4ad525f[dev.simd] simd: reorganize internal tests so that simd does not import testing + 2025-11-183fe246ae0f[dev.simd] simd: make 'go generate' generate everything + 2025-11-18cf45adf140[dev.simd] simd: move template code generator into _gen + 2025-11-1819b4a30899[dev.simd] simd/_gen/simdgen: remove outdated asm.yaml.toy + 2025-11-189461db5c59[dev.simd] simd: fix comment in file generator + 2025-11-184004ff3523[dev.simd] simd: remove FlattenedTranspose from exports + 2025-11-18896f293a25[dev.simd] cmd/compile, simd: change DotProductQuadruple and add peepholes + 2025-11-18be9c50c6a0[dev.simd] cmd/compile, simd: change SHA ops names and types + 2025-11-170978935a99[dev.simd] cmd/compile, simd: change AES op names and add missing size + 2025-11-1795871e4a00[dev.simd] cmd/compile, simd: add VPALIGNR + 2025-11-17934dbcea1a[dev.simd] simd: update CPU feature APIs + 2025-11-17e4d9484220[dev.simd] cmd/compile: fix unstable output + 2025-11-13d7a0c45642[dev.simd] all: merge master (57362e9) into dev.simd + 2025-11-1186b4fe31d9[dev.simd] cmd/compile: add masked merging ops and optimizations + 2025-11-10771a1dc216[dev.simd] cmd/compile: add peepholes for all masked ops and bug fixes + 2025-11-10972732b245[dev.simd] simd, cmd/compile: remove move from API + 2025-11-10bf77323efa[dev.simd] simd: put unexported methods to another file + 2025-11-04fe040658b2[dev.simd] simd/_gen: fix sorting ops slices + 2025-10-29e452f4ac7d[dev.simd] cmd/compile: enhance inlining for closure-of-SIMD + 2025-10-27ca1264ac50[dev.simd] test: add some trickier cases to ternary-boolean simd test + 2025-10-24f6b4711095[dev.simd] cmd/compile, simd: add rewrite to convert logical expression trees into TERNLOG instructions + 2025-10-24cf7c1a4cbb[dev.simd] cmd/compile, simd: add SHA features + 2025-10-242b8eded4f4[dev.simd] simd/_gen: parse SHA features from XED + 2025-10-24c75965b666[dev.simd] simd: added String() method to SIMD vectors. + 2025-10-22d03634f807[dev.simd] cmd/compile, simd: add definitions for VPTERNLOG[DQ] + 2025-10-2020b3339542[dev.simd] simd: add AES feature check + 2025-10-14fc3bc49337[dev.simd] simd: clean up mask load comments + 2025-10-14416332dba2[dev.simd] cmd/compile, simd: update DotProd to DotProduct + 2025-10-14647c790143[dev.simd] cmd/compile: peephole simd mask load/stores from bits + 2025-10-142e71cf1a2a[dev.simd] cmd/compile, simd: remove mask load and stores + 2025-10-13c4fbf3b4cf[dev.simd] simd/_gen: add mem peephole with feat mismatches + 2025-10-13ba72ee0f30[dev.simd] cmd/compile: more support for cpufeatures + 2025-10-09be57d94c4c[dev.simd] simd: add emulated Not method + 2025-10-07d2270bccbd[dev.simd] cmd/compile: track which CPU features are in scope + 2025-10-0348756abd3a[dev.simd] cmd/compile: inliner tweaks to favor simd-handling functions + 2025-10-03fb1749a3fe[dev.simd] all: merge master (adce7f1) into dev.simd + 2025-09-30703a5fbaad[dev.simd] cmd/compile, simd: add AES instructions + 2025-09-291c961c2fb2[dev.simd] simd: use new data movement instructions to do "fast" transposes + 2025-09-26fe4af1c067[dev.simd] simd: repair broken comments in generated ops_amd64.go + 2025-09-26ea3b2ecd28[dev.simd] cmd/compile, simd: add 64-bit select-from-pair methods + 2025-09-2625c36b95d1[dev.simd] simd, cmd/compile: add 128 bit select-from-pair + 2025-09-26f0e281e693[dev.simd] cmd/compile: don't require single use for SIMD load/store folding + 2025-09-26b4d1e018a8[dev.simd] cmd/compile: remove unnecessary code from early simd prototype + 2025-09-26578777bf7c[dev.simd] cmd/compile: make condtion of CanSSA smarter for SIMD fields + 2025-09-26c28b2a0ca1[dev.simd] simd: generalize select-float32-from-pair + 2025-09-25a693ae1e9a[dev.simd] all: merge master (d70ad4e) into dev.simd + 2025-09-255a78e1a4a1[dev.simd] simd, cmd/compile: mark simd vectors uncomparable + 2025-09-23bf00f5dfd6[dev.simd] simd, cmd/compile: added simd methods for VSHUFP[DS] + 2025-09-238e60feeb41[dev.simd] cmd/compile: improve slicemask removal + 2025-09-232b50ffe172[dev.simd] cmd/compile: remove stores to unread parameters + 2025-09-232d8cb80d7c[dev.simd] all: merge master (9b2d39b) into dev.simd + 2025-09-2263a09d6d3d[dev.simd] cmd/compile: fix SIMD const rematerialization condition + 2025-09-202ca96d218d[dev.simd] cmd/compile: enhance prove to infer bounds in slice len/cap calculations + 2025-09-19c0f031fcc3[dev.simd] cmd/compile: spill the correct SIMD register for morestack + 2025-09-1958fa1d023e[dev.simd] cmd/compile: enhance the chunked indexing case to include reslicing + 2025-09-187ae0eb2e80[dev.simd] cmd/compile: remove Add32x4 generic op + 2025-09-1831b664d40b[dev.simd] cmd/compile: widen index for simd intrinsics jumptable + 2025-09-18e34ad6de42[dev.simd] cmd/compile: optimize VPTEST for 2-operand cases + 2025-09-18f1e3651c33[dev.simd] cmd/compile, simd: add VPTEST + 2025-09-18d9751166a6[dev.simd] cmd/compile: handle rematerialized op for incompatible reg constraint + 2025-09-184eb5c6e07b[dev.simd] cmd/compile, simd/_gen: add rewrite for const load ops + 2025-09-18443b7aeddb[dev.simd] cmd/compile, simd/_gen: make rewrite rules consistent on CPU Features + 2025-09-16bdd30e25ca[dev.simd] all: merge master (ca0e035) into dev.simd + 2025-09-160e590a505d[dev.simd] cmd/compile: use the right type for spill slot + 2025-09-15dabe2bb4fb[dev.simd] cmd/compile: fix holes in mask peepholes + 2025-09-123ec0b25ab7[dev.simd] cmd/compile, simd/_gen/simdgen: add const load mops + 2025-09-121e5631d4e0[dev.simd] cmd/compile: peephole simd load + 2025-09-1148f366d826[dev.simd] cmd/compile: add memop peephole rules + 2025-09-119a349f8e72[dev.simd] all: merge master (cf5e993) into dev.simd + 2025-09-115a0446d449[dev.simd] simd/_gen/simdgen, cmd/compile: add memory op machine ops + 2025-09-08c39b2fdd1e[dev.simd] cmd/compile, simd: add VPLZCNT[DQ] + 2025-09-07832c1f76dc[dev.simd] cmd/compile: enhance prove to deal with double-offset IsInBounds checks + 2025-09-060b323350a5[dev.simd] simd/_gen/simdgen: merge memory ops + 2025-09-06f42c9261d3[dev.simd] simd/_gen/simdgen: parse memory operands + 2025-09-05356c48d8e9[dev.simd] cmd/compile, simd: add ClearAVXUpperBits + 2025-09-037c8b9115bc[dev.simd] all: merge master (4c4cefc) into dev.simd + 2025-09-029125351583[dev.simd] internal/cpu: report AVX1 and 2 as supported on macOS 15 Rosetta 2 + 2025-09-02b509516b2e[dev.simd] simd, cmd/compile: add Interleave{Hi,Lo} (VPUNPCK*) + 2025-09-026890aa2e20[dev.simd] cmd/compile: add instructions and rewrites for scalar-> vector moves + 2025-08-245ebe2d05d5[dev.simd] simd: correct SumAbsDiff documentation + 2025-08-22a5137ec92a[dev.simd] cmd/compile: sample peephole optimization for SIMD broadcast + 2025-08-2283714616aa[dev.simd] cmd/compile: remove VPADDD4 + 2025-08-224a3ea146ae[dev.simd] cmd/compile: correct register mask of some AVX512 ops + 2025-08-228d874834f1[dev.simd] cmd/compile: use X15 for zero value in AVX context + 2025-08-224c311aa38f[dev.simd] cmd/compile: ensure the whole X15 register is zeroed + 2025-08-22baea0c700b[dev.simd] cmd/compile, simd: complete AVX2? u?int shuffles + 2025-08-22fa1e78c9ad[dev.simd] cmd/compile, simd: make Permute 128-bit use AVX VPSHUFB + 2025-08-22bc217d4170[dev.simd] cmd/compile, simd: add packed saturated u?int conversions + 2025-08-224fa23b0d29[dev.simd] cmd/compile, simd: add saturated u?int conversions + 2025-08-213f6bab5791[dev.simd] simd: move tests to a subdirectory to declutter "simd" + 2025-08-21aea0a5e8d7[dev.simd] simd/_gen/unify: improve envSet doc comment + 2025-08-217fdb1da6b0[dev.simd] cmd/compile, simd: complete truncating u?int conversions. + 2025-08-21f4c41d9922[dev.simd] cmd/compile, simd: complete u?int widening conversions + 2025-08-216af8881adb[dev.simd] simd: reorganize cvt rules + 2025-08-2158cfc2a5f6[dev.simd] cmd/compile, simd: add VPSADBW + 2025-08-21f7c6fa709e[dev.simd] simd/_gen/unify: fix some missing environments + 2025-08-207c84e984e6[dev.simd] cmd/compile: rewrite to elide Slicemask from len==c>0 slicing + 2025-08-20cf31b15635[dev.simd] simd, cmd/compile: added .Masked() peephole opt for many operations. + 2025-08-201334285862[dev.simd] simd: template field name cleanup in genfiles + 2025-08-20af6475df73[dev.simd] simd: add testing hooks for size-changing conversions + 2025-08-20ede64cf0d8[dev.simd] simd, cmd/compile: sample peephole optimization for .Masked() + 2025-08-20103b6e39ca[dev.simd] all: merge master (9de69f6) into dev.simd + 2025-08-20728ac3e050[dev.simd] simd: tweaks to improve test disassembly + 2025-08-204fce49b86c[dev.simd] simd, cmd/compile: add widening unsigned converts 8->16->32 + 2025-08-190f660d675f[dev.simd] simd: make OpMasked machine ops only + 2025-08-19a034826e26[dev.simd] simd, cmd/compile: implement ToMask, unexport asMask. + 2025-08-188ccd6c2034[dev.simd] simd, cmd/compile: mark BLEND instructions as not-zero-mask + 2025-08-189a934d5080[dev.simd] cmd/compile, simd: added methods for "float" GetElem + 2025-08-157380213a4e[dev.simd] cmd/compile: make move/load/store dependent only on reg and width + 2025-08-15908e3e8166[dev.simd] cmd/compile: make (most) move/load/store lowering use reg and width only + 2025-08-149783f86bc8[dev.simd] cmd/compile: accounts rematerialize ops's output reginfo + 2025-08-14a4ad41708d[dev.simd] all: merge master (924fe98) into dev.simd + 2025-08-138b90d48d8c[dev.simd] simd/_gen/simdgen: rewrite etetest.sh + 2025-08-13b7c8698549[dev.simd] simd/_gen: migrate simdgen from x/arch + 2025-08-13257c1356ec[dev.simd] go/types: exclude simd/_gen module from TestStdlib + 2025-08-13858a8d2276[dev.simd] simd: reorganize/rename generated emulation files + 2025-08-132080415aa2[dev.simd] simd: add emulations for missing AVX2 comparisons + 2025-08-13ddb689c7bb[dev.simd] simd, cmd/compile: generated code for Broadcast + 2025-08-13e001300cf2[dev.simd] cmd/compile: fix LoadReg so it is aware of register target + 2025-08-13d5dea86993[dev.simd] cmd/compile: fix isIntrinsic for methods; fix fp <-> gp moves + 2025-08-1308ab8e24a3[dev.simd] cmd/compile: generated code from 'fix generated rules for shifts' + 2025-08-11702ee2d51e[dev.simd] cmd/compile, simd: update generated files + 2025-08-11e33eb1a7a5[dev.simd] cmd/compile, simd: update generated files + 2025-08-11667add4f1c[dev.simd] cmd/compile, simd: update generated files + 2025-08-111755c2909d[dev.simd] cmd/compile, simd: update generated files + 2025-08-112fd49d8f30[dev.simd] simd: imm doc improve + 2025-08-11ce0e803ab9[dev.simd] cmd/compile: keep track of multiple rule file names in ssa/_gen + 2025-08-1138b76bf2a3[dev.simd] cmd/compile, simd: jump table for imm ops + 2025-08-0894d72355f6[dev.simd] simd: add emulations for bitwise ops and for mask/merge methods + 2025-08-078eb5f6020e[dev.simd] cmd/compile, simd: API interface fixes + 2025-08-07b226bcc4a9[dev.simd] cmd/compile, simd: add value conversion ToBits for mask + 2025-08-065b0ef7fcdc[dev.simd] cmd/compile, simd: add Expand + 2025-08-06d3cf582f8a[dev.simd] cmd/compile, simd: (Set|Get)(Lo|Hi) + 2025-08-057ca34599ec[dev.simd] simd, cmd/compile: generated files to add 'blend' and 'blendMasked' + 2025-08-0582d056ddd7[dev.simd] cmd/compile: add ShiftAll immediate variant + 2025-08-04775fb52745[dev.simd] all: merge master (7a1679d) into dev.simd + 2025-08-046b9b59e144[dev.simd] simd, cmd/compile: rename some methods + 2025-08-04d375b95357[dev.simd] simd: move lots of slice functions and methods to generated code + 2025-08-043f92aa1eca[dev.simd] cmd/compile, simd: make bitwise logic ops available to all u?int vectors + 2025-08-04c2d775d401[dev.simd] cmd/compile, simd: change PairDotProdAccumulate to AddDotProd + 2025-08-042c25f3e846[dev.simd] cmd/compile, simd: change Shift*AndFillUpperFrom to Shift*Concat + 2025-08-01c25e5c86b2[dev.simd] cmd/compile: generated code for K-mask-register slice load/stores + 2025-08-011ac5f3533f[dev.simd] cmd/compile: opcodes and rules and code generation to enable AVX512 masked loads/stores + 2025-08-01f39711a03d[dev.simd] cmd/compile: test for int-to-mask conversion + 2025-08-0108bec02907[dev.simd] cmd/compile: add register-to-mask moves, other simd glue + 2025-08-0109ff25e350[dev.simd] simd: add tests for simd conversions to Int32/Uint32. + 2025-08-01a24ffe3379[dev.simd] simd: modify test generation to make it more flexible + 2025-08-01ec5c20ba5a[dev.simd] cmd/compile: generated simd code to add some conversions + 2025-08-01e62e377ed6[dev.simd] cmd/compile, simd: generated code from repaired simdgen sort + 2025-08-01761894d4a5[dev.simd] simd: add partial slice load/store for 32/64-bits on AVX2 + 2025-08-01acc1492b7d[dev.simd] cmd/compile: Generated code for AVX2 SIMD masked load/store + 2025-08-01a0b87a7478[dev.simd] cmd/compile: changes for AVX2 SIMD masked load/store + 2025-08-0188568519b4[dev.simd] simd: move test generation into Go repo + 2025-07-316f7a1164e7[dev.simd] cmd/compile, simd: support store to bits for mask + 2025-07-2141054cdb1c[dev.simd] simd, internal/cpu: support more AVX CPU Feature checks + 2025-07-21957f06c410[dev.simd] cmd/compile, simd: support load from bits for mask + 2025-07-21f0e9dc0975[dev.simd] cmd/compile: fix opLen(2|3)Imm8_2I intrinsic function + 2025-07-1703a3887f31[dev.simd] simd: clean up masked op doc + 2025-07-17c61743e4f0[dev.simd] cmd/compile, simd: reorder PairDotProdAccumulate + 2025-07-15ef5f6cc921[dev.simd] cmd/compile: adjust param order for AndNot + 2025-07-156d10680141[dev.simd] cmd/compile, simd: add Compress + 2025-07-1517baae72db[dev.simd] simd: default mask param's name to mask + 2025-07-1501f7f57025[dev.simd] cmd/compile, simd: add variable Permute + 2025-07-14f5f42753ab[dev.simd] cmd/compile, simd: add VDPPS + 2025-07-1408ffd66ab2[dev.simd] simd: updates CPU Feature in doc + 2025-07-143f789721d6[dev.simd] cmd/compile: mark SIMD types non-fat + 2025-07-11b69622b83e[dev.simd] cmd/compile, simd: adjust Shift.* operations + 2025-07-114993a91ae1[dev.simd] simd: change imm param name to constant + 2025-07-11bbb6dccd84[dev.simd] simd: fix documentations + 2025-07-111440ff7036[dev.simd] cmd/compile: exclude simd vars from merge local + 2025-07-11ccb43dcec7[dev.simd] cmd/compile: add VZEROUPPER and VZEROALL inst + 2025-07-1121596f2f75[dev.simd] all: merge master (88cf0c5) into dev.simd + 2025-07-10ab7f839280[dev.simd] cmd/compile: fix maskreg/simdreg chaos + 2025-07-0947b07a87a6[dev.simd] cmd/compile, simd: fix Int64x2 Greater output type to mask + 2025-07-0908cd62e9f5[dev.simd] cmd/compile: remove X15 from register mask + 2025-07-099ea33ed538[dev.simd] cmd/compile: output of simd generator, more ... rewrite rules + 2025-07-09aab8b173a9[dev.simd] cmd/compile, simd: Int64x2 Greater and Uint* Equal + 2025-07-098db7f41674[dev.simd] cmd/compile: use upper registers for AVX512 simd ops + 2025-07-09574854fd86[dev.simd] runtime: save Z16-Z31 registers in async preempt + 2025-07-095429328b0c[dev.simd] cmd/compile: change register mask names for simd ops + 2025-07-09029d7ec3e9[dev.simd] cmd/compile, simd: rename Masked$OP to $(OP)Masked. + 2025-07-09983e81ce57[dev.simd] simd: rename stubs_amd64.go to ops_amd64.go + 2025-07-0856ca67682b[dev.simd] cmd/compile, simd: remove FP bitwise logic operations. + 2025-07-080870ed04a3[dev.simd] cmd/compile: make compares between NaNs all false. + 2025-07-0824f2b8ae2e[dev.simd] simd: {Int,Uint}{8x{16,32},16x{8,16}} subvector loads/stores from slices. + 2025-07-082bb45cb8a5[dev.simd] cmd/compile: minor tweak for race detector + 2025-07-0743a61aef56[dev.simd] cmd/compile: add EXTRACT[IF]128 instructions + 2025-07-07292db9b676[dev.simd] cmd/compile: add INSERT[IF]128 instructions + 2025-07-07d8fa853b37[dev.simd] cmd/compile: make regalloc simd aware on copy + 2025-07-07dfd75f82d4[dev.simd] cmd/compile: output of simdgen with invariant type order + 2025-07-0472c39ef834[dev.simd] cmd/compile: fix the "always panic" code to actually panic + 2025-07-011ee72a15a3[dev.simd] internal/cpu: add GFNI feature check + 2025-06-300710cce6eb[dev.simd] runtime: remove write barrier in xRegRestore + 2025-06-3059846af331[dev.simd] cmd/compile, simd: cleanup operations and documentations + 2025-06-30f849225b3b[dev.simd] all: merge master (740857f) into dev.simd + 2025-06-309eeb1e7a9a[dev.simd] runtime: save AVX2 and AVX-512 state on asynchronous preemption + 2025-06-30426cf36b4d[dev.simd] runtime: save scalar registers off stack in amd64 async preemption + 2025-06-30ead249a2e2[dev.simd] cmd/compile: reorder operands for some simd operations + 2025-06-3055665e1e37[dev.simd] cmd/compile: undoes reorder transform in prior commit, changes names + 2025-06-2610c9621936[dev.simd] cmd/compile, simd: add galois field operations + 2025-06-26e61ebfce56[dev.simd] cmd/compile, simd: add shift operations + 2025-06-2635b8cf7fed[dev.simd] cmd/compile: tweak sort order in generator + 2025-06-267fadfa9638[dev.simd] cmd/compile: add simd VPEXTRA* + 2025-06-260d8cb89f5c[dev.simd] cmd/compile: support simd(imm,fp) returns gp + 2025-06-25f4a7c124cc[dev.simd] all: merge master (f8ccda2) into dev.simd + 2025-06-254fda27c0cc[dev.simd] cmd/compile: glue codes for Shift and Rotate + 2025-06-2461c1183342[dev.simd] simd: add test wrappers + 2025-06-23e32488003d[dev.simd] cmd/compile: make simd regmask naming more like existing conventions + 2025-06-231fa4bcfcda[dev.simd] simd, cmd/compile: generated code for VPINSR[BWDQ], and test + 2025-06-23dd63b7aa0e[dev.simd] simd: add AVX512 aggregated check + 2025-06-230cdb2697d1[dev.simd] simd: add tests for intrinsic used as a func value and via reflection + 2025-06-2388c013d6ff[dev.simd] cmd/compile: generate function body for bodyless intrinsics + 2025-06-20a8669c78f5[dev.simd] sync: correct the type of runtime_StoreReluintptr + 2025-06-207c6ac35275[dev.simd] cmd/compile: add simdFp1gp1fp1Imm8 helper to amd64 code generation + 2025-06-204150372a5d[dev.simd] cmd/compile: don't treat devel compiler as a released compiler + 2025-06-181b87d52549[dev.simd] cmd/compile: add fp1gp1fp1 register mask for AMD64 + 2025-06-181313521f75[dev.simd] cmd/compile: remove fused mul/add/sub shapes. + 2025-06-171be5eb2686[dev.simd] cmd/compile: fix signature error of PairDotProdAccumulate. + 2025-06-173a4d10bfca[dev.simd] cmd/compile: removed a map iteration from generator; tweaked type order + 2025-06-1721d6573154[dev.simd] cmd/compile: alphabetize SIMD intrinsics + 2025-06-16ee1d9f3f85[dev.simd] cmd/compile: reorder stubs + 2025-06-136c50c8b892[dev.simd] cmd/compile: move simd helpers into compiler, out of generated code + 2025-06-137392dfd43e[dev.simd] cmd/compile: generated simd*ops files weren't up to date + 2025-06-1300a8dacbe4[dev.simd] cmd/compile: remove unused simd intrinsics "helpers" + 2025-06-13b9a548775fcmd/compile: add up-to-date test for generated files + 2025-06-13ca01eab9c7[dev.simd] cmd/compile: add fused mul add sub ops + 2025-06-13ded6e0ac71[dev.simd] cmd/compile: add more dot products + 2025-06-133df41c856e[dev.simd] simd: update documentations + 2025-06-139ba7db36b5[dev.simd] cmd/compile: add dot product ops + 2025-06-1334a9cdef87[dev.simd] cmd/compile: add round simd ops + 2025-06-135289e0f24e[dev.simd] cmd/compile: updates simd ordering and docs + 2025-06-13c81cb05e3e[dev.simd] cmd/compile: add simdGen prog writer + 2025-06-139b9af3d638[dev.simd] internal/cpu: add AVX-512-CD and DQ, and derived "basic AVX-512" + 2025-06-13dfa6c74263[dev.simd] runtime: eliminate global state in mkpreempt.go + 2025-06-10b2e8ddba3c[dev.simd] all: merge master (773701a) into dev.simd + 2025-06-09884f646966[dev.simd] cmd/compile: add fp3m1fp1 shape to regalloc + 2025-06-096bc3505773[dev.simd] cmd/compile: add fp3fp1 regsiter shape + 2025-06-052eaa5a0703[dev.simd] simd: add functions+methods to load-from/store-to slices + 2025-06-058ecbd59ebb[dev.simd] cmd/compile: generated codes for amd64 SIMD + 2025-06-02baa72c25f1[dev.simd] all: merge master (711ff94) into dev.simd + 2025-05-300ff18a9cca[dev.simd] cmd/compile: disable intrinsics test for new simd stuff + 2025-05-307800f3813c[dev.simd] cmd/compile: flip sense of intrinsics test for SIMD + 2025-05-29eba2430c16[dev.simd] simd, cmd/compile, go build, go/doc: test tweaks + 2025-05-2971c0e550cd[dev.simd] cmd/dist: disable API check on dev branch + 2025-05-2962e1fccfb9[dev.simd] internal: delete unused internal/simd directory + 2025-05-291161228bf1[dev.simd] cmd/compile: add a fp1m1fp1 register shape to amd64 + 2025-05-28fdb067d946[dev.simd] simd: initialize directory to make it suitable for testing SIMD + 2025-05-2811d2b28bff[dev.simd] cmd/compile: add and fix k register supports + 2025-05-2804b1030ae4[dev.simd] cmd/compile: adapters for simd + 2025-05-272ef7106881[dev.simd] internal/buildcfg: enable SIMD GOEXPERIMENT for amd64 + 2025-05-224d2c71ebf9[dev.simd] internal/goexperiment: add SIMD goexperiment + 2025-05-223ac5f2f962[dev.simd] codereview.cfg: set up dev.simd branch Change-Id: I60f2cd2ea055384a3788097738c6989630207871
This commit is contained in:
commit
d4f5650cc5
186 changed files with 146299 additions and 835 deletions
|
|
@ -150,12 +150,12 @@ func appendParamTypes(rts []*types.Type, t *types.Type) []*types.Type {
|
||||||
if w == 0 {
|
if w == 0 {
|
||||||
return rts
|
return rts
|
||||||
}
|
}
|
||||||
if t.IsScalar() || t.IsPtrShaped() {
|
if t.IsScalar() || t.IsPtrShaped() || t.IsSIMD() {
|
||||||
if t.IsComplex() {
|
if t.IsComplex() {
|
||||||
c := types.FloatForComplex(t)
|
c := types.FloatForComplex(t)
|
||||||
return append(rts, c, c)
|
return append(rts, c, c)
|
||||||
} else {
|
} else {
|
||||||
if int(t.Size()) <= types.RegSize {
|
if int(t.Size()) <= types.RegSize || t.IsSIMD() {
|
||||||
return append(rts, t)
|
return append(rts, t)
|
||||||
}
|
}
|
||||||
// assume 64bit int on 32-bit machine
|
// assume 64bit int on 32-bit machine
|
||||||
|
|
@ -199,6 +199,9 @@ func appendParamOffsets(offsets []int64, at int64, t *types.Type) ([]int64, int6
|
||||||
if w == 0 {
|
if w == 0 {
|
||||||
return offsets, at
|
return offsets, at
|
||||||
}
|
}
|
||||||
|
if t.IsSIMD() {
|
||||||
|
return append(offsets, at), at + w
|
||||||
|
}
|
||||||
if t.IsScalar() || t.IsPtrShaped() {
|
if t.IsScalar() || t.IsPtrShaped() {
|
||||||
if t.IsComplex() || int(t.Size()) > types.RegSize { // complex and *int64 on 32-bit
|
if t.IsComplex() || int(t.Size()) > types.RegSize { // complex and *int64 on 32-bit
|
||||||
s := w / 2
|
s := w / 2
|
||||||
|
|
@ -521,11 +524,11 @@ func (state *assignState) allocateRegs(regs []RegIndex, t *types.Type) []RegInde
|
||||||
}
|
}
|
||||||
ri := state.rUsed.intRegs
|
ri := state.rUsed.intRegs
|
||||||
rf := state.rUsed.floatRegs
|
rf := state.rUsed.floatRegs
|
||||||
if t.IsScalar() || t.IsPtrShaped() {
|
if t.IsScalar() || t.IsPtrShaped() || t.IsSIMD() {
|
||||||
if t.IsComplex() {
|
if t.IsComplex() {
|
||||||
regs = append(regs, RegIndex(rf+state.rTotal.intRegs), RegIndex(rf+1+state.rTotal.intRegs))
|
regs = append(regs, RegIndex(rf+state.rTotal.intRegs), RegIndex(rf+1+state.rTotal.intRegs))
|
||||||
rf += 2
|
rf += 2
|
||||||
} else if t.IsFloat() {
|
} else if t.IsFloat() || t.IsSIMD() {
|
||||||
regs = append(regs, RegIndex(rf+state.rTotal.intRegs))
|
regs = append(regs, RegIndex(rf+state.rTotal.intRegs))
|
||||||
rf += 1
|
rf += 1
|
||||||
} else {
|
} else {
|
||||||
|
|
|
||||||
3454
src/cmd/compile/internal/amd64/simdssa.go
Normal file
3454
src/cmd/compile/internal/amd64/simdssa.go
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -18,6 +18,7 @@ import (
|
||||||
"cmd/internal/obj"
|
"cmd/internal/obj"
|
||||||
"cmd/internal/obj/x86"
|
"cmd/internal/obj/x86"
|
||||||
"internal/abi"
|
"internal/abi"
|
||||||
|
"internal/buildcfg"
|
||||||
)
|
)
|
||||||
|
|
||||||
// ssaMarkMoves marks any MOVXconst ops that need to avoid clobbering flags.
|
// ssaMarkMoves marks any MOVXconst ops that need to avoid clobbering flags.
|
||||||
|
|
@ -43,11 +44,23 @@ func ssaMarkMoves(s *ssagen.State, b *ssa.Block) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// loadByType returns the load instruction of the given type.
|
func isFPReg(r int16) bool {
|
||||||
func loadByType(t *types.Type) obj.As {
|
return x86.REG_X0 <= r && r <= x86.REG_Z31
|
||||||
// Avoid partial register write
|
}
|
||||||
if !t.IsFloat() {
|
|
||||||
switch t.Size() {
|
func isKReg(r int16) bool {
|
||||||
|
return x86.REG_K0 <= r && r <= x86.REG_K7
|
||||||
|
}
|
||||||
|
|
||||||
|
func isLowFPReg(r int16) bool {
|
||||||
|
return x86.REG_X0 <= r && r <= x86.REG_X15
|
||||||
|
}
|
||||||
|
|
||||||
|
// loadByRegWidth returns the load instruction of the given register of a given width.
|
||||||
|
func loadByRegWidth(r int16, width int64) obj.As {
|
||||||
|
// Avoid partial register write for GPR
|
||||||
|
if !isFPReg(r) && !isKReg(r) {
|
||||||
|
switch width {
|
||||||
case 1:
|
case 1:
|
||||||
return x86.AMOVBLZX
|
return x86.AMOVBLZX
|
||||||
case 2:
|
case 2:
|
||||||
|
|
@ -55,20 +68,35 @@ func loadByType(t *types.Type) obj.As {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// Otherwise, there's no difference between load and store opcodes.
|
// Otherwise, there's no difference between load and store opcodes.
|
||||||
return storeByType(t)
|
return storeByRegWidth(r, width)
|
||||||
}
|
}
|
||||||
|
|
||||||
// storeByType returns the store instruction of the given type.
|
// storeByRegWidth returns the store instruction of the given register of a given width.
|
||||||
func storeByType(t *types.Type) obj.As {
|
// It's also used for loading const to a reg.
|
||||||
width := t.Size()
|
func storeByRegWidth(r int16, width int64) obj.As {
|
||||||
if t.IsFloat() {
|
if isFPReg(r) {
|
||||||
switch width {
|
switch width {
|
||||||
case 4:
|
case 4:
|
||||||
return x86.AMOVSS
|
return x86.AMOVSS
|
||||||
case 8:
|
case 8:
|
||||||
return x86.AMOVSD
|
return x86.AMOVSD
|
||||||
}
|
case 16:
|
||||||
|
// int128s are in SSE registers
|
||||||
|
if isLowFPReg(r) {
|
||||||
|
return x86.AMOVUPS
|
||||||
} else {
|
} else {
|
||||||
|
return x86.AVMOVDQU
|
||||||
|
}
|
||||||
|
case 32:
|
||||||
|
return x86.AVMOVDQU
|
||||||
|
case 64:
|
||||||
|
return x86.AVMOVDQU64
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if isKReg(r) {
|
||||||
|
return x86.AKMOVQ
|
||||||
|
}
|
||||||
|
// gp
|
||||||
switch width {
|
switch width {
|
||||||
case 1:
|
case 1:
|
||||||
return x86.AMOVB
|
return x86.AMOVB
|
||||||
|
|
@ -78,23 +106,35 @@ func storeByType(t *types.Type) obj.As {
|
||||||
return x86.AMOVL
|
return x86.AMOVL
|
||||||
case 8:
|
case 8:
|
||||||
return x86.AMOVQ
|
return x86.AMOVQ
|
||||||
case 16:
|
|
||||||
return x86.AMOVUPS
|
|
||||||
}
|
}
|
||||||
}
|
panic(fmt.Sprintf("bad store reg=%v, width=%d", r, width))
|
||||||
panic(fmt.Sprintf("bad store type %v", t))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// moveByType returns the reg->reg move instruction of the given type.
|
// moveByRegsWidth returns the reg->reg move instruction of the given dest/src registers of a given width.
|
||||||
func moveByType(t *types.Type) obj.As {
|
func moveByRegsWidth(dest, src int16, width int64) obj.As {
|
||||||
if t.IsFloat() {
|
// fp -> fp
|
||||||
|
if isFPReg(dest) && isFPReg(src) {
|
||||||
// Moving the whole sse2 register is faster
|
// Moving the whole sse2 register is faster
|
||||||
// than moving just the correct low portion of it.
|
// than moving just the correct low portion of it.
|
||||||
// There is no xmm->xmm move with 1 byte opcode,
|
// There is no xmm->xmm move with 1 byte opcode,
|
||||||
// so use movups, which has 2 byte opcode.
|
// so use movups, which has 2 byte opcode.
|
||||||
|
if isLowFPReg(dest) && isLowFPReg(src) && width <= 16 {
|
||||||
return x86.AMOVUPS
|
return x86.AMOVUPS
|
||||||
} else {
|
}
|
||||||
switch t.Size() {
|
if width <= 32 {
|
||||||
|
return x86.AVMOVDQU
|
||||||
|
}
|
||||||
|
return x86.AVMOVDQU64
|
||||||
|
}
|
||||||
|
// k -> gp, gp -> k, k -> k
|
||||||
|
if isKReg(dest) || isKReg(src) {
|
||||||
|
if isFPReg(dest) || isFPReg(src) {
|
||||||
|
panic(fmt.Sprintf("bad move, src=%v, dest=%v, width=%d", src, dest, width))
|
||||||
|
}
|
||||||
|
return x86.AKMOVQ
|
||||||
|
}
|
||||||
|
// gp -> fp, fp -> gp, gp -> gp
|
||||||
|
switch width {
|
||||||
case 1:
|
case 1:
|
||||||
// Avoids partial register write
|
// Avoids partial register write
|
||||||
return x86.AMOVL
|
return x86.AMOVL
|
||||||
|
|
@ -105,11 +145,18 @@ func moveByType(t *types.Type) obj.As {
|
||||||
case 8:
|
case 8:
|
||||||
return x86.AMOVQ
|
return x86.AMOVQ
|
||||||
case 16:
|
case 16:
|
||||||
return x86.AMOVUPS // int128s are in SSE registers
|
if isLowFPReg(dest) && isLowFPReg(src) {
|
||||||
default:
|
// int128s are in SSE registers
|
||||||
panic(fmt.Sprintf("bad int register width %d:%v", t.Size(), t))
|
return x86.AMOVUPS
|
||||||
|
} else {
|
||||||
|
return x86.AVMOVDQU
|
||||||
}
|
}
|
||||||
|
case 32:
|
||||||
|
return x86.AVMOVDQU
|
||||||
|
case 64:
|
||||||
|
return x86.AVMOVDQU64
|
||||||
}
|
}
|
||||||
|
panic(fmt.Sprintf("bad move, src=%v, dest=%v, width=%d", src, dest, width))
|
||||||
}
|
}
|
||||||
|
|
||||||
// opregreg emits instructions for
|
// opregreg emits instructions for
|
||||||
|
|
@ -605,7 +652,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
// But this requires a way for regalloc to know that SRC might be
|
// But this requires a way for regalloc to know that SRC might be
|
||||||
// clobbered by this instruction.
|
// clobbered by this instruction.
|
||||||
t := v.RegTmp()
|
t := v.RegTmp()
|
||||||
opregreg(s, moveByType(v.Type), t, v.Args[1].Reg())
|
opregreg(s, moveByRegsWidth(t, v.Args[1].Reg(), v.Type.Size()), t, v.Args[1].Reg())
|
||||||
|
|
||||||
p := s.Prog(v.Op.Asm())
|
p := s.Prog(v.Op.Asm())
|
||||||
p.From.Type = obj.TYPE_REG
|
p.From.Type = obj.TYPE_REG
|
||||||
|
|
@ -777,9 +824,14 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
p.From.Offset = v.AuxInt
|
p.From.Offset = v.AuxInt
|
||||||
p.To.Type = obj.TYPE_REG
|
p.To.Type = obj.TYPE_REG
|
||||||
p.To.Reg = x
|
p.To.Reg = x
|
||||||
|
|
||||||
case ssa.OpAMD64MOVSSconst, ssa.OpAMD64MOVSDconst:
|
case ssa.OpAMD64MOVSSconst, ssa.OpAMD64MOVSDconst:
|
||||||
x := v.Reg()
|
x := v.Reg()
|
||||||
p := s.Prog(v.Op.Asm())
|
if !isFPReg(x) && v.AuxInt == 0 && v.Aux == nil {
|
||||||
|
opregreg(s, x86.AXORL, x, x)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
p := s.Prog(storeByRegWidth(x, v.Type.Size()))
|
||||||
p.From.Type = obj.TYPE_FCONST
|
p.From.Type = obj.TYPE_FCONST
|
||||||
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
|
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
|
||||||
p.To.Type = obj.TYPE_REG
|
p.To.Type = obj.TYPE_REG
|
||||||
|
|
@ -1176,27 +1228,39 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
}
|
}
|
||||||
x := v.Args[0].Reg()
|
x := v.Args[0].Reg()
|
||||||
y := v.Reg()
|
y := v.Reg()
|
||||||
|
if v.Type.IsSIMD() {
|
||||||
|
x = simdOrMaskReg(v.Args[0])
|
||||||
|
y = simdOrMaskReg(v)
|
||||||
|
}
|
||||||
if x != y {
|
if x != y {
|
||||||
opregreg(s, moveByType(v.Type), y, x)
|
opregreg(s, moveByRegsWidth(y, x, v.Type.Size()), y, x)
|
||||||
}
|
}
|
||||||
case ssa.OpLoadReg:
|
case ssa.OpLoadReg:
|
||||||
if v.Type.IsFlags() {
|
if v.Type.IsFlags() {
|
||||||
v.Fatalf("load flags not implemented: %v", v.LongString())
|
v.Fatalf("load flags not implemented: %v", v.LongString())
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
p := s.Prog(loadByType(v.Type))
|
r := v.Reg()
|
||||||
|
p := s.Prog(loadByRegWidth(r, v.Type.Size()))
|
||||||
ssagen.AddrAuto(&p.From, v.Args[0])
|
ssagen.AddrAuto(&p.From, v.Args[0])
|
||||||
p.To.Type = obj.TYPE_REG
|
p.To.Type = obj.TYPE_REG
|
||||||
p.To.Reg = v.Reg()
|
if v.Type.IsSIMD() {
|
||||||
|
r = simdOrMaskReg(v)
|
||||||
|
}
|
||||||
|
p.To.Reg = r
|
||||||
|
|
||||||
case ssa.OpStoreReg:
|
case ssa.OpStoreReg:
|
||||||
if v.Type.IsFlags() {
|
if v.Type.IsFlags() {
|
||||||
v.Fatalf("store flags not implemented: %v", v.LongString())
|
v.Fatalf("store flags not implemented: %v", v.LongString())
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
p := s.Prog(storeByType(v.Type))
|
r := v.Args[0].Reg()
|
||||||
|
if v.Type.IsSIMD() {
|
||||||
|
r = simdOrMaskReg(v.Args[0])
|
||||||
|
}
|
||||||
|
p := s.Prog(storeByRegWidth(r, v.Type.Size()))
|
||||||
p.From.Type = obj.TYPE_REG
|
p.From.Type = obj.TYPE_REG
|
||||||
p.From.Reg = v.Args[0].Reg()
|
p.From.Reg = r
|
||||||
ssagen.AddrAuto(&p.To, v)
|
ssagen.AddrAuto(&p.To, v)
|
||||||
case ssa.OpAMD64LoweredHasCPUFeature:
|
case ssa.OpAMD64LoweredHasCPUFeature:
|
||||||
p := s.Prog(x86.AMOVBLZX)
|
p := s.Prog(x86.AMOVBLZX)
|
||||||
|
|
@ -1210,8 +1274,14 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
for _, ap := range v.Block.Func.RegArgs {
|
for _, ap := range v.Block.Func.RegArgs {
|
||||||
// Pass the spill/unspill information along to the assembler, offset by size of return PC pushed on stack.
|
// Pass the spill/unspill information along to the assembler, offset by size of return PC pushed on stack.
|
||||||
addr := ssagen.SpillSlotAddr(ap, x86.REG_SP, v.Block.Func.Config.PtrSize)
|
addr := ssagen.SpillSlotAddr(ap, x86.REG_SP, v.Block.Func.Config.PtrSize)
|
||||||
|
reg := ap.Reg
|
||||||
|
t := ap.Type
|
||||||
|
sz := t.Size()
|
||||||
|
if t.IsSIMD() {
|
||||||
|
reg = simdRegBySize(reg, sz)
|
||||||
|
}
|
||||||
s.FuncInfo().AddSpill(
|
s.FuncInfo().AddSpill(
|
||||||
obj.RegSpill{Reg: ap.Reg, Addr: addr, Unspill: loadByType(ap.Type), Spill: storeByType(ap.Type)})
|
obj.RegSpill{Reg: reg, Addr: addr, Unspill: loadByRegWidth(reg, sz), Spill: storeByRegWidth(reg, sz)})
|
||||||
}
|
}
|
||||||
v.Block.Func.RegArgs = nil
|
v.Block.Func.RegArgs = nil
|
||||||
ssagen.CheckArgReg(v)
|
ssagen.CheckArgReg(v)
|
||||||
|
|
@ -1227,7 +1297,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
case ssa.OpAMD64CALLstatic, ssa.OpAMD64CALLtail:
|
case ssa.OpAMD64CALLstatic, ssa.OpAMD64CALLtail:
|
||||||
if s.ABI == obj.ABI0 && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABIInternal {
|
if s.ABI == obj.ABI0 && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABIInternal {
|
||||||
// zeroing X15 when entering ABIInternal from ABI0
|
// zeroing X15 when entering ABIInternal from ABI0
|
||||||
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
zeroX15(s)
|
||||||
// set G register from TLS
|
// set G register from TLS
|
||||||
getgFromTLS(s, x86.REG_R14)
|
getgFromTLS(s, x86.REG_R14)
|
||||||
}
|
}
|
||||||
|
|
@ -1238,7 +1308,7 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
s.Call(v)
|
s.Call(v)
|
||||||
if s.ABI == obj.ABIInternal && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABI0 {
|
if s.ABI == obj.ABIInternal && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABI0 {
|
||||||
// zeroing X15 when entering ABIInternal from ABI0
|
// zeroing X15 when entering ABIInternal from ABI0
|
||||||
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
zeroX15(s)
|
||||||
// set G register from TLS
|
// set G register from TLS
|
||||||
getgFromTLS(s, x86.REG_R14)
|
getgFromTLS(s, x86.REG_R14)
|
||||||
}
|
}
|
||||||
|
|
@ -1643,10 +1713,683 @@ func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
||||||
p.From.Offset = int64(x)
|
p.From.Offset = int64(x)
|
||||||
p.To.Type = obj.TYPE_REG
|
p.To.Type = obj.TYPE_REG
|
||||||
p.To.Reg = v.Reg()
|
p.To.Reg = v.Reg()
|
||||||
|
|
||||||
|
// SIMD ops
|
||||||
|
case ssa.OpAMD64VZEROUPPER, ssa.OpAMD64VZEROALL:
|
||||||
|
s.Prog(v.Op.Asm())
|
||||||
|
|
||||||
|
case ssa.OpAMD64Zero128, ssa.OpAMD64Zero256, ssa.OpAMD64Zero512: // no code emitted
|
||||||
|
|
||||||
|
case ssa.OpAMD64VMOVSSf2v, ssa.OpAMD64VMOVSDf2v:
|
||||||
|
// These are for initializing the least 32/64 bits of a SIMD register from a "float".
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
p.AddRestSourceReg(x86.REG_X15)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
|
||||||
|
case ssa.OpAMD64VMOVQload, ssa.OpAMD64VMOVDload,
|
||||||
|
ssa.OpAMD64VMOVSSload, ssa.OpAMD64VMOVSDload:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
|
||||||
|
case ssa.OpAMD64VMOVSSconst, ssa.OpAMD64VMOVSDconst:
|
||||||
|
// for loading constants directly into SIMD registers
|
||||||
|
x := simdReg(v)
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_FCONST
|
||||||
|
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = x
|
||||||
|
|
||||||
|
case ssa.OpAMD64VMOVD, ssa.OpAMD64VMOVQ:
|
||||||
|
// These are for initializing the least 32/64 bits of a SIMD register from an "int".
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
|
||||||
|
case ssa.OpAMD64VMOVDQUload128, ssa.OpAMD64VMOVDQUload256, ssa.OpAMD64VMOVDQUload512,
|
||||||
|
ssa.OpAMD64KMOVBload, ssa.OpAMD64KMOVWload, ssa.OpAMD64KMOVDload, ssa.OpAMD64KMOVQload:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdOrMaskReg(v)
|
||||||
|
case ssa.OpAMD64VMOVDQUstore128, ssa.OpAMD64VMOVDQUstore256, ssa.OpAMD64VMOVDQUstore512,
|
||||||
|
ssa.OpAMD64KMOVBstore, ssa.OpAMD64KMOVWstore, ssa.OpAMD64KMOVDstore, ssa.OpAMD64KMOVQstore:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdOrMaskReg(v.Args[1])
|
||||||
|
p.To.Type = obj.TYPE_MEM
|
||||||
|
p.To.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.To, v)
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMASK32load128, ssa.OpAMD64VPMASK64load128, ssa.OpAMD64VPMASK32load256, ssa.OpAMD64VPMASK64load256:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1])) // masking simd reg
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMASK32store128, ssa.OpAMD64VPMASK64store128, ssa.OpAMD64VPMASK32store256, ssa.OpAMD64VPMASK64store256:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.To.Type = obj.TYPE_MEM
|
||||||
|
p.To.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.To, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1])) // masking simd reg
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMASK64load512, ssa.OpAMD64VPMASK32load512, ssa.OpAMD64VPMASK16load512, ssa.OpAMD64VPMASK8load512:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
p.AddRestSourceReg(v.Args[1].Reg()) // simd mask reg
|
||||||
|
x86.ParseSuffix(p, "Z") // must be zero if not in mask
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMASK64store512, ssa.OpAMD64VPMASK32store512, ssa.OpAMD64VPMASK16store512, ssa.OpAMD64VPMASK8store512:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.To.Type = obj.TYPE_MEM
|
||||||
|
p.To.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.To, v)
|
||||||
|
p.AddRestSourceReg(v.Args[1].Reg()) // simd mask reg
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMOVMToVec8x16,
|
||||||
|
ssa.OpAMD64VPMOVMToVec8x32,
|
||||||
|
ssa.OpAMD64VPMOVMToVec8x64,
|
||||||
|
ssa.OpAMD64VPMOVMToVec16x8,
|
||||||
|
ssa.OpAMD64VPMOVMToVec16x16,
|
||||||
|
ssa.OpAMD64VPMOVMToVec16x32,
|
||||||
|
ssa.OpAMD64VPMOVMToVec32x4,
|
||||||
|
ssa.OpAMD64VPMOVMToVec32x8,
|
||||||
|
ssa.OpAMD64VPMOVMToVec32x16,
|
||||||
|
ssa.OpAMD64VPMOVMToVec64x2,
|
||||||
|
ssa.OpAMD64VPMOVMToVec64x4,
|
||||||
|
ssa.OpAMD64VPMOVMToVec64x8:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
|
||||||
|
case ssa.OpAMD64VPMOVVec8x16ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec8x32ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec8x64ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec16x8ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec16x16ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec16x32ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec32x4ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec32x8ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec32x16ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec64x2ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec64x4ToM,
|
||||||
|
ssa.OpAMD64VPMOVVec64x8ToM:
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[0])
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = v.Reg()
|
||||||
|
|
||||||
|
case ssa.OpAMD64KMOVQk, ssa.OpAMD64KMOVDk, ssa.OpAMD64KMOVWk, ssa.OpAMD64KMOVBk,
|
||||||
|
ssa.OpAMD64KMOVQi, ssa.OpAMD64KMOVDi, ssa.OpAMD64KMOVWi, ssa.OpAMD64KMOVBi:
|
||||||
|
// See also ssa.OpAMD64KMOVQload
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = v.Reg()
|
||||||
|
case ssa.OpAMD64VPTEST:
|
||||||
|
// Some instructions setting flags put their second operand into the destination reg.
|
||||||
|
// See also CMP[BWDQ].
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[0])
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v.Args[1])
|
||||||
|
|
||||||
default:
|
default:
|
||||||
|
if !ssaGenSIMDValue(s, v) {
|
||||||
v.Fatalf("genValue not implemented: %s", v.LongString())
|
v.Fatalf("genValue not implemented: %s", v.LongString())
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// zeroX15 zeroes the X15 register.
|
||||||
|
func zeroX15(s *ssagen.State) {
|
||||||
|
vxorps := func(s *ssagen.State) {
|
||||||
|
p := s.Prog(x86.AVXORPS)
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = x86.REG_X15
|
||||||
|
p.AddRestSourceReg(x86.REG_X15)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = x86.REG_X15
|
||||||
|
}
|
||||||
|
if buildcfg.GOAMD64 >= 3 {
|
||||||
|
vxorps(s)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
// AVX may not be available, check before zeroing the high bits.
|
||||||
|
p := s.Prog(x86.ACMPB)
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Name = obj.NAME_EXTERN
|
||||||
|
p.From.Sym = ir.Syms.X86HasAVX
|
||||||
|
p.To.Type = obj.TYPE_CONST
|
||||||
|
p.To.Offset = 1
|
||||||
|
jmp := s.Prog(x86.AJNE)
|
||||||
|
jmp.To.Type = obj.TYPE_BRANCH
|
||||||
|
vxorps(s)
|
||||||
|
sse := opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
||||||
|
jmp.To.SetTarget(sse)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VRSQRTPS X1, X1
|
||||||
|
func simdV11(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[0])
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPSUBD X1, X2, X3
|
||||||
|
func simdV21(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
// Vector registers operands follows a right-to-left order.
|
||||||
|
// e.g. VPSUBD X1, X2, X3 means X3 = X2 - X1.
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// This function is to accustomize the shifts.
|
||||||
|
// The 2nd arg is an XMM, and this function merely checks that.
|
||||||
|
// Example instruction: VPSLLQ Z1, X1, Z2
|
||||||
|
func simdVfpv(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
// Vector registers operands follows a right-to-left order.
|
||||||
|
// e.g. VPSUBD X1, X2, X3 means X3 = X2 - X1.
|
||||||
|
p.From.Reg = v.Args[1].Reg()
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPCMPEQW Z26, Z30, K4
|
||||||
|
func simdV2k(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPMINUQ X21, X3, K3, X31
|
||||||
|
func simdV2kv(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
// These "simd*" series of functions assumes:
|
||||||
|
// Any "K" register that serves as the write-mask
|
||||||
|
// or "predicate" for "predicated AVX512 instructions"
|
||||||
|
// sits right at the end of the operand list.
|
||||||
|
// TODO: verify this assumption.
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPABSB X1, X2, K3 (masking merging)
|
||||||
|
func simdV2kvResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
// These "simd*" series of functions assumes:
|
||||||
|
// Any "K" register that serves as the write-mask
|
||||||
|
// or "predicate" for "predicated AVX512 instructions"
|
||||||
|
// sits right at the end of the operand list.
|
||||||
|
// TODO: verify this assumption.
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// This function is to accustomize the shifts.
|
||||||
|
// The 2nd arg is an XMM, and this function merely checks that.
|
||||||
|
// Example instruction: VPSLLQ Z1, X1, K1, Z2
|
||||||
|
func simdVfpkv(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = v.Args[1].Reg()
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPCMPEQW Z26, Z30, K1, K4
|
||||||
|
func simdV2kk(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPOPCNTB X14, K4, X16
|
||||||
|
func simdVkv(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[0])
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VROUNDPD $7, X2, X2
|
||||||
|
func simdV11Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VREDUCEPD $126, X1, K3, X31
|
||||||
|
func simdVkvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VCMPPS $7, X2, X9, X2
|
||||||
|
func simdV21Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPINSRB $3, DX, X0, X0
|
||||||
|
func simdVgpvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(v.Args[1].Reg())
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPCMPD $1, Z1, Z2, K1
|
||||||
|
func simdV2kImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPCMPD $1, Z1, Z2, K2, K1
|
||||||
|
func simdV2kkImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdV2kvImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VFMADD213PD Z2, Z1, Z0
|
||||||
|
func simdV31ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdV31ResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[2]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
// p.AddRestSourceReg(x86.REG_K0)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// v31loadResultInArg0Imm8
|
||||||
|
// Example instruction:
|
||||||
|
// for (VPTERNLOGD128load {sym} [makeValAndOff(int32(int8(c)),off)] x y ptr mem)
|
||||||
|
func simdV31loadResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[2].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VFMADD213PD Z2, Z1, K1, Z0
|
||||||
|
func simdV3kvResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[3]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdVgpImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = v.Reg()
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Currently unused
|
||||||
|
func simdV31(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Currently unused
|
||||||
|
func simdV3kv(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[2])
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[3]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VRCP14PS (DI), K6, X22
|
||||||
|
func simdVkvload(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPSLLVD (DX), X7, X18
|
||||||
|
func simdV21load(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[1].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPDPWSSD (SI), X24, X18
|
||||||
|
func simdV31loadResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[2].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPDPWSSD (SI), X24, K1, X18
|
||||||
|
func simdV3kvloadResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[2].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[3]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPSLLVD (SI), X1, K1, X2
|
||||||
|
func simdV2kvload(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[1].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPCMPEQD (SI), X1, K1
|
||||||
|
func simdV2kload(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[1].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VCVTTPS2DQ (BX), X2
|
||||||
|
func simdV11load(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_MEM
|
||||||
|
p.From.Reg = v.Args[0].Reg()
|
||||||
|
ssagen.AddAux(&p.From, v)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPSHUFD $7, (BX), X11
|
||||||
|
func simdV11loadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[0].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPRORD $81, -15(R14), K7, Y1
|
||||||
|
func simdVkvloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[0].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VPSHLDD $82, 7(SI), Y21, Y3
|
||||||
|
func simdV21loadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VCMPPS $81, -7(DI), Y16, K3
|
||||||
|
func simdV2kloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VCMPPS $81, -7(DI), Y16, K1, K3
|
||||||
|
func simdV2kkloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = maskReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: VGF2P8AFFINEINVQB $64, -17(BP), X31, K3, X26
|
||||||
|
func simdV2kvloadImm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
sc := v.AuxValAndOff()
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.From.Offset = sc.Val64()
|
||||||
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[1].Reg()}
|
||||||
|
ssagen.AddAux2(&m, v, sc.Off64())
|
||||||
|
p.AddRestSource(m)
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[0]))
|
||||||
|
p.AddRestSourceReg(maskReg(v.Args[2]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: SHA1NEXTE X2, X2
|
||||||
|
func simdV21ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Type = obj.TYPE_REG
|
||||||
|
p.From.Reg = simdReg(v.Args[1])
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: SHA1RNDS4 $1, X2, X2
|
||||||
|
func simdV21ResultInArg0Imm8(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
p := s.Prog(v.Op.Asm())
|
||||||
|
p.From.Offset = int64(v.AuxUInt8())
|
||||||
|
p.From.Type = obj.TYPE_CONST
|
||||||
|
p.AddRestSourceReg(simdReg(v.Args[1]))
|
||||||
|
p.To.Type = obj.TYPE_REG
|
||||||
|
p.To.Reg = simdReg(v)
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example instruction: SHA256RNDS2 X0, X11, X2
|
||||||
|
func simdV31x0AtIn2ResultInArg0(s *ssagen.State, v *ssa.Value) *obj.Prog {
|
||||||
|
return simdV31ResultInArg0(s, v)
|
||||||
|
}
|
||||||
|
|
||||||
var blockJump = [...]struct {
|
var blockJump = [...]struct {
|
||||||
asm, invasm obj.As
|
asm, invasm obj.As
|
||||||
|
|
@ -1732,7 +2475,7 @@ func ssaGenBlock(s *ssagen.State, b, next *ssa.Block) {
|
||||||
}
|
}
|
||||||
|
|
||||||
func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
||||||
p := s.Prog(loadByType(t))
|
p := s.Prog(loadByRegWidth(reg, t.Size()))
|
||||||
p.From.Type = obj.TYPE_MEM
|
p.From.Type = obj.TYPE_MEM
|
||||||
p.From.Name = obj.NAME_AUTO
|
p.From.Name = obj.NAME_AUTO
|
||||||
p.From.Sym = n.Linksym()
|
p.From.Sym = n.Linksym()
|
||||||
|
|
@ -1743,7 +2486,7 @@ func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir
|
||||||
}
|
}
|
||||||
|
|
||||||
func spillArgReg(pp *objw.Progs, p *obj.Prog, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
func spillArgReg(pp *objw.Progs, p *obj.Prog, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
||||||
p = pp.Append(p, storeByType(t), obj.TYPE_REG, reg, 0, obj.TYPE_MEM, 0, n.FrameOffset()+off)
|
p = pp.Append(p, storeByRegWidth(reg, t.Size()), obj.TYPE_REG, reg, 0, obj.TYPE_MEM, 0, n.FrameOffset()+off)
|
||||||
p.To.Name = obj.NAME_PARAM
|
p.To.Name = obj.NAME_PARAM
|
||||||
p.To.Sym = n.Linksym()
|
p.To.Sym = n.Linksym()
|
||||||
p.Pos = p.Pos.WithNotStmt()
|
p.Pos = p.Pos.WithNotStmt()
|
||||||
|
|
@ -1778,3 +2521,58 @@ func move16(s *ssagen.State, src, dst, tmp int16, off int64) {
|
||||||
p.To.Reg = dst
|
p.To.Reg = dst
|
||||||
p.To.Offset = off
|
p.To.Offset = off
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// XXX maybe make this part of v.Reg?
|
||||||
|
// On the other hand, it is architecture-specific.
|
||||||
|
func simdReg(v *ssa.Value) int16 {
|
||||||
|
t := v.Type
|
||||||
|
if !t.IsSIMD() {
|
||||||
|
base.Fatalf("simdReg: not a simd type; v=%s, b=b%d, f=%s", v.LongString(), v.Block.ID, v.Block.Func.Name)
|
||||||
|
}
|
||||||
|
return simdRegBySize(v.Reg(), t.Size())
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdRegBySize(reg int16, size int64) int16 {
|
||||||
|
switch size {
|
||||||
|
case 16:
|
||||||
|
return reg
|
||||||
|
case 32:
|
||||||
|
return reg + (x86.REG_Y0 - x86.REG_X0)
|
||||||
|
case 64:
|
||||||
|
return reg + (x86.REG_Z0 - x86.REG_X0)
|
||||||
|
}
|
||||||
|
panic("simdRegBySize: bad size")
|
||||||
|
}
|
||||||
|
|
||||||
|
// XXX k mask
|
||||||
|
func maskReg(v *ssa.Value) int16 {
|
||||||
|
t := v.Type
|
||||||
|
if !t.IsSIMD() {
|
||||||
|
base.Fatalf("maskReg: not a simd type; v=%s, b=b%d, f=%s", v.LongString(), v.Block.ID, v.Block.Func.Name)
|
||||||
|
}
|
||||||
|
switch t.Size() {
|
||||||
|
case 8:
|
||||||
|
return v.Reg()
|
||||||
|
}
|
||||||
|
panic("unreachable")
|
||||||
|
}
|
||||||
|
|
||||||
|
// XXX k mask + vec
|
||||||
|
func simdOrMaskReg(v *ssa.Value) int16 {
|
||||||
|
t := v.Type
|
||||||
|
if t.Size() <= 8 {
|
||||||
|
return maskReg(v)
|
||||||
|
}
|
||||||
|
return simdReg(v)
|
||||||
|
}
|
||||||
|
|
||||||
|
// XXX this is used for shift operations only.
|
||||||
|
// regalloc will issue OpCopy with incorrect type, but the assigned
|
||||||
|
// register should be correct, and this function is merely checking
|
||||||
|
// the sanity of this part.
|
||||||
|
func simdCheckRegOnly(v *ssa.Value, regStart, regEnd int16) int16 {
|
||||||
|
if v.Reg() > regEnd || v.Reg() < regStart {
|
||||||
|
panic("simdCheckRegOnly: not the desired register")
|
||||||
|
}
|
||||||
|
return v.Reg()
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -29,7 +29,7 @@ var (
|
||||||
compilequeue []*ir.Func // functions waiting to be compiled
|
compilequeue []*ir.Func // functions waiting to be compiled
|
||||||
)
|
)
|
||||||
|
|
||||||
func enqueueFunc(fn *ir.Func) {
|
func enqueueFunc(fn *ir.Func, symABIs *ssagen.SymABIs) {
|
||||||
if ir.CurFunc != nil {
|
if ir.CurFunc != nil {
|
||||||
base.FatalfAt(fn.Pos(), "enqueueFunc %v inside %v", fn, ir.CurFunc)
|
base.FatalfAt(fn.Pos(), "enqueueFunc %v inside %v", fn, ir.CurFunc)
|
||||||
}
|
}
|
||||||
|
|
@ -49,6 +49,13 @@ func enqueueFunc(fn *ir.Func) {
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(fn.Body) == 0 {
|
if len(fn.Body) == 0 {
|
||||||
|
if ir.IsIntrinsicSym(fn.Sym()) && fn.Sym().Linkname == "" && !symABIs.HasDef(fn.Sym()) {
|
||||||
|
// Generate the function body for a bodyless intrinsic, in case it
|
||||||
|
// is used in a non-call context (e.g. as a function pointer).
|
||||||
|
// We skip functions defined in assembly, or has a linkname (which
|
||||||
|
// could be defined in another package).
|
||||||
|
ssagen.GenIntrinsicBody(fn)
|
||||||
|
} else {
|
||||||
// Initialize ABI wrappers if necessary.
|
// Initialize ABI wrappers if necessary.
|
||||||
ir.InitLSym(fn, false)
|
ir.InitLSym(fn, false)
|
||||||
types.CalcSize(fn.Type())
|
types.CalcSize(fn.Type())
|
||||||
|
|
@ -66,6 +73,7 @@ func enqueueFunc(fn *ir.Func) {
|
||||||
}
|
}
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
errorsBefore := base.Errors()
|
errorsBefore := base.Errors()
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -188,9 +188,9 @@ func Main(archInit func(*ssagen.ArchInfo)) {
|
||||||
|
|
||||||
ir.EscFmt = escape.Fmt
|
ir.EscFmt = escape.Fmt
|
||||||
ir.IsIntrinsicCall = ssagen.IsIntrinsicCall
|
ir.IsIntrinsicCall = ssagen.IsIntrinsicCall
|
||||||
|
ir.IsIntrinsicSym = ssagen.IsIntrinsicSym
|
||||||
inline.SSADumpInline = ssagen.DumpInline
|
inline.SSADumpInline = ssagen.DumpInline
|
||||||
ssagen.InitEnv()
|
ssagen.InitEnv()
|
||||||
ssagen.InitTables()
|
|
||||||
|
|
||||||
types.PtrSize = ssagen.Arch.LinkArch.PtrSize
|
types.PtrSize = ssagen.Arch.LinkArch.PtrSize
|
||||||
types.RegSize = ssagen.Arch.LinkArch.RegSize
|
types.RegSize = ssagen.Arch.LinkArch.RegSize
|
||||||
|
|
@ -204,6 +204,11 @@ func Main(archInit func(*ssagen.ArchInfo)) {
|
||||||
typecheck.InitRuntime()
|
typecheck.InitRuntime()
|
||||||
rttype.Init()
|
rttype.Init()
|
||||||
|
|
||||||
|
// Some intrinsics (notably, the simd intrinsics) mention
|
||||||
|
// types "eagerly", thus ssagen must be initialized AFTER
|
||||||
|
// the type system is ready.
|
||||||
|
ssagen.InitTables()
|
||||||
|
|
||||||
// Parse and typecheck input.
|
// Parse and typecheck input.
|
||||||
noder.LoadPackage(flag.Args())
|
noder.LoadPackage(flag.Args())
|
||||||
|
|
||||||
|
|
@ -309,7 +314,7 @@ func Main(archInit func(*ssagen.ArchInfo)) {
|
||||||
}
|
}
|
||||||
|
|
||||||
if nextFunc < len(typecheck.Target.Funcs) {
|
if nextFunc < len(typecheck.Target.Funcs) {
|
||||||
enqueueFunc(typecheck.Target.Funcs[nextFunc])
|
enqueueFunc(typecheck.Target.Funcs[nextFunc], symABIs)
|
||||||
nextFunc++
|
nextFunc++
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -179,6 +179,25 @@ func CanInlineFuncs(funcs []*ir.Func, profile *pgoir.Profile) {
|
||||||
})
|
})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func simdCreditMultiplier(fn *ir.Func) int32 {
|
||||||
|
for _, field := range fn.Type().RecvParamsResults() {
|
||||||
|
if field.Type.IsSIMD() {
|
||||||
|
return 3
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Sometimes code uses closures, that do not take simd
|
||||||
|
// parameters, to perform repetitive SIMD operations.
|
||||||
|
// fn. These really need to be inlined, or the anticipated
|
||||||
|
// awesome SIMD performance will be missed.
|
||||||
|
for _, v := range fn.ClosureVars {
|
||||||
|
if v.Type().IsSIMD() {
|
||||||
|
return 11 // 11 ought to be enough.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
// inlineBudget determines the max budget for function 'fn' prior to
|
// inlineBudget determines the max budget for function 'fn' prior to
|
||||||
// analyzing the hairiness of the body of 'fn'. We pass in the pgo
|
// analyzing the hairiness of the body of 'fn'. We pass in the pgo
|
||||||
// profile if available (which can change the budget), also a
|
// profile if available (which can change the budget), also a
|
||||||
|
|
@ -186,9 +205,14 @@ func CanInlineFuncs(funcs []*ir.Func, profile *pgoir.Profile) {
|
||||||
// possibility that a call to the function might have its score
|
// possibility that a call to the function might have its score
|
||||||
// adjusted downwards. If 'verbose' is set, then print a remark where
|
// adjusted downwards. If 'verbose' is set, then print a remark where
|
||||||
// we boost the budget due to PGO.
|
// we boost the budget due to PGO.
|
||||||
|
// Note that inlineCostOk has the final say on whether an inline will
|
||||||
|
// happen; changes here merely make inlines possible.
|
||||||
func inlineBudget(fn *ir.Func, profile *pgoir.Profile, relaxed bool, verbose bool) int32 {
|
func inlineBudget(fn *ir.Func, profile *pgoir.Profile, relaxed bool, verbose bool) int32 {
|
||||||
// Update the budget for profile-guided inlining.
|
// Update the budget for profile-guided inlining.
|
||||||
budget := int32(inlineMaxBudget)
|
budget := int32(inlineMaxBudget)
|
||||||
|
|
||||||
|
budget *= simdCreditMultiplier(fn)
|
||||||
|
|
||||||
if IsPgoHotFunc(fn, profile) {
|
if IsPgoHotFunc(fn, profile) {
|
||||||
budget = inlineHotMaxBudget
|
budget = inlineHotMaxBudget
|
||||||
if verbose {
|
if verbose {
|
||||||
|
|
@ -202,6 +226,7 @@ func inlineBudget(fn *ir.Func, profile *pgoir.Profile, relaxed bool, verbose boo
|
||||||
// be very liberal here, if the closure is only called once, the budget is large
|
// be very liberal here, if the closure is only called once, the budget is large
|
||||||
budget = max(budget, inlineClosureCalledOnceCost)
|
budget = max(budget, inlineClosureCalledOnceCost)
|
||||||
}
|
}
|
||||||
|
|
||||||
return budget
|
return budget
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -263,6 +288,7 @@ func CanInline(fn *ir.Func, profile *pgoir.Profile) {
|
||||||
|
|
||||||
visitor := hairyVisitor{
|
visitor := hairyVisitor{
|
||||||
curFunc: fn,
|
curFunc: fn,
|
||||||
|
debug: isDebugFn(fn),
|
||||||
isBigFunc: IsBigFunc(fn),
|
isBigFunc: IsBigFunc(fn),
|
||||||
budget: budget,
|
budget: budget,
|
||||||
maxBudget: budget,
|
maxBudget: budget,
|
||||||
|
|
@ -407,6 +433,7 @@ type hairyVisitor struct {
|
||||||
// This is needed to access the current caller in the doNode function.
|
// This is needed to access the current caller in the doNode function.
|
||||||
curFunc *ir.Func
|
curFunc *ir.Func
|
||||||
isBigFunc bool
|
isBigFunc bool
|
||||||
|
debug bool
|
||||||
budget int32
|
budget int32
|
||||||
maxBudget int32
|
maxBudget int32
|
||||||
reason string
|
reason string
|
||||||
|
|
@ -416,6 +443,16 @@ type hairyVisitor struct {
|
||||||
profile *pgoir.Profile
|
profile *pgoir.Profile
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func isDebugFn(fn *ir.Func) bool {
|
||||||
|
// if n := fn.Nname; n != nil {
|
||||||
|
// if n.Sym().Name == "Int32x8.Transpose8" && n.Sym().Pkg.Path == "simd" {
|
||||||
|
// fmt.Printf("isDebugFn '%s' DOT '%s'\n", n.Sym().Pkg.Path, n.Sym().Name)
|
||||||
|
// return true
|
||||||
|
// }
|
||||||
|
// }
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
func (v *hairyVisitor) tooHairy(fn *ir.Func) bool {
|
func (v *hairyVisitor) tooHairy(fn *ir.Func) bool {
|
||||||
v.do = v.doNode // cache closure
|
v.do = v.doNode // cache closure
|
||||||
if ir.DoChildren(fn, v.do) {
|
if ir.DoChildren(fn, v.do) {
|
||||||
|
|
@ -434,6 +471,9 @@ func (v *hairyVisitor) doNode(n ir.Node) bool {
|
||||||
if n == nil {
|
if n == nil {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
if v.debug {
|
||||||
|
fmt.Printf("%v: doNode %v budget is %d\n", ir.Line(n), n.Op(), v.budget)
|
||||||
|
}
|
||||||
opSwitch:
|
opSwitch:
|
||||||
switch n.Op() {
|
switch n.Op() {
|
||||||
// Call is okay if inlinable and we have the budget for the body.
|
// Call is okay if inlinable and we have the budget for the body.
|
||||||
|
|
@ -551,12 +591,19 @@ opSwitch:
|
||||||
}
|
}
|
||||||
|
|
||||||
if cheap {
|
if cheap {
|
||||||
|
if v.debug {
|
||||||
|
if ir.IsIntrinsicCall(n) {
|
||||||
|
fmt.Printf("%v: cheap call is also intrinsic, %v\n", ir.Line(n), n)
|
||||||
|
}
|
||||||
|
}
|
||||||
break // treat like any other node, that is, cost of 1
|
break // treat like any other node, that is, cost of 1
|
||||||
}
|
}
|
||||||
|
|
||||||
if ir.IsIntrinsicCall(n) {
|
if ir.IsIntrinsicCall(n) {
|
||||||
// Treat like any other node.
|
if v.debug {
|
||||||
break
|
fmt.Printf("%v: intrinsic call, %v\n", ir.Line(n), n)
|
||||||
|
}
|
||||||
|
break // Treat like any other node.
|
||||||
}
|
}
|
||||||
|
|
||||||
if callee := inlCallee(v.curFunc, n.Fun, v.profile, false); callee != nil && typecheck.HaveInlineBody(callee) {
|
if callee := inlCallee(v.curFunc, n.Fun, v.profile, false); callee != nil && typecheck.HaveInlineBody(callee) {
|
||||||
|
|
@ -583,6 +630,10 @@ opSwitch:
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if v.debug {
|
||||||
|
fmt.Printf("%v: costly OCALLFUNC %v\n", ir.Line(n), n)
|
||||||
|
}
|
||||||
|
|
||||||
// Call cost for non-leaf inlining.
|
// Call cost for non-leaf inlining.
|
||||||
v.budget -= extraCost
|
v.budget -= extraCost
|
||||||
|
|
||||||
|
|
@ -592,6 +643,9 @@ opSwitch:
|
||||||
// Things that are too hairy, irrespective of the budget
|
// Things that are too hairy, irrespective of the budget
|
||||||
case ir.OCALL, ir.OCALLINTER:
|
case ir.OCALL, ir.OCALLINTER:
|
||||||
// Call cost for non-leaf inlining.
|
// Call cost for non-leaf inlining.
|
||||||
|
if v.debug {
|
||||||
|
fmt.Printf("%v: costly OCALL %v\n", ir.Line(n), n)
|
||||||
|
}
|
||||||
v.budget -= v.extraCallCost
|
v.budget -= v.extraCallCost
|
||||||
|
|
||||||
case ir.OPANIC:
|
case ir.OPANIC:
|
||||||
|
|
@ -754,7 +808,7 @@ opSwitch:
|
||||||
v.budget--
|
v.budget--
|
||||||
|
|
||||||
// When debugging, don't stop early, to get full cost of inlining this function
|
// When debugging, don't stop early, to get full cost of inlining this function
|
||||||
if v.budget < 0 && base.Flag.LowerM < 2 && !logopt.Enabled() {
|
if v.budget < 0 && base.Flag.LowerM < 2 && !logopt.Enabled() && !v.debug {
|
||||||
v.reason = "too expensive"
|
v.reason = "too expensive"
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
@ -914,6 +968,8 @@ func inlineCostOK(n *ir.CallExpr, caller, callee *ir.Func, bigCaller, closureCal
|
||||||
maxCost = inlineBigFunctionMaxCost
|
maxCost = inlineBigFunctionMaxCost
|
||||||
}
|
}
|
||||||
|
|
||||||
|
simdMaxCost := simdCreditMultiplier(callee) * maxCost
|
||||||
|
|
||||||
if callee.ClosureParent != nil {
|
if callee.ClosureParent != nil {
|
||||||
maxCost *= 2 // favor inlining closures
|
maxCost *= 2 // favor inlining closures
|
||||||
if closureCalledOnce { // really favor inlining the one call to this closure
|
if closureCalledOnce { // really favor inlining the one call to this closure
|
||||||
|
|
@ -921,6 +977,8 @@ func inlineCostOK(n *ir.CallExpr, caller, callee *ir.Func, bigCaller, closureCal
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
maxCost = max(maxCost, simdMaxCost)
|
||||||
|
|
||||||
metric := callee.Inl.Cost
|
metric := callee.Inl.Cost
|
||||||
if inlheur.Enabled() {
|
if inlheur.Enabled() {
|
||||||
score, ok := inlheur.GetCallSiteScore(caller, n)
|
score, ok := inlheur.GetCallSiteScore(caller, n)
|
||||||
|
|
|
||||||
|
|
@ -1031,6 +1031,9 @@ func StaticCalleeName(n Node) *Name {
|
||||||
// IsIntrinsicCall reports whether the compiler back end will treat the call as an intrinsic operation.
|
// IsIntrinsicCall reports whether the compiler back end will treat the call as an intrinsic operation.
|
||||||
var IsIntrinsicCall = func(*CallExpr) bool { return false }
|
var IsIntrinsicCall = func(*CallExpr) bool { return false }
|
||||||
|
|
||||||
|
// IsIntrinsicSym reports whether the compiler back end will treat a call to this symbol as an intrinsic operation.
|
||||||
|
var IsIntrinsicSym = func(*types.Sym) bool { return false }
|
||||||
|
|
||||||
// SameSafeExpr checks whether it is safe to reuse one of l and r
|
// SameSafeExpr checks whether it is safe to reuse one of l and r
|
||||||
// instead of computing both. SameSafeExpr assumes that l and r are
|
// instead of computing both. SameSafeExpr assumes that l and r are
|
||||||
// used in the same statement or expression. In order for it to be
|
// used in the same statement or expression. In order for it to be
|
||||||
|
|
@ -1149,6 +1152,14 @@ func ParamNames(ft *types.Type) []Node {
|
||||||
return args
|
return args
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func RecvParamNames(ft *types.Type) []Node {
|
||||||
|
args := make([]Node, ft.NumRecvs()+ft.NumParams())
|
||||||
|
for i, f := range ft.RecvParams() {
|
||||||
|
args[i] = f.Nname.(*Name)
|
||||||
|
}
|
||||||
|
return args
|
||||||
|
}
|
||||||
|
|
||||||
// MethodSym returns the method symbol representing a method name
|
// MethodSym returns the method symbol representing a method name
|
||||||
// associated with a specific receiver type.
|
// associated with a specific receiver type.
|
||||||
//
|
//
|
||||||
|
|
|
||||||
|
|
@ -53,6 +53,7 @@ type symsStruct struct {
|
||||||
PanicdottypeI *obj.LSym
|
PanicdottypeI *obj.LSym
|
||||||
Panicnildottype *obj.LSym
|
Panicnildottype *obj.LSym
|
||||||
Panicoverflow *obj.LSym
|
Panicoverflow *obj.LSym
|
||||||
|
PanicSimdImm *obj.LSym
|
||||||
Racefuncenter *obj.LSym
|
Racefuncenter *obj.LSym
|
||||||
Racefuncexit *obj.LSym
|
Racefuncexit *obj.LSym
|
||||||
Raceread *obj.LSym
|
Raceread *obj.LSym
|
||||||
|
|
@ -76,6 +77,7 @@ type symsStruct struct {
|
||||||
Loong64HasLAM_BH *obj.LSym
|
Loong64HasLAM_BH *obj.LSym
|
||||||
Loong64HasLSX *obj.LSym
|
Loong64HasLSX *obj.LSym
|
||||||
RISCV64HasZbb *obj.LSym
|
RISCV64HasZbb *obj.LSym
|
||||||
|
X86HasAVX *obj.LSym
|
||||||
X86HasFMA *obj.LSym
|
X86HasFMA *obj.LSym
|
||||||
X86HasPOPCNT *obj.LSym
|
X86HasPOPCNT *obj.LSym
|
||||||
X86HasSSE41 *obj.LSym
|
X86HasSSE41 *obj.LSym
|
||||||
|
|
|
||||||
|
|
@ -1534,6 +1534,9 @@ func isfat(t *types.Type) bool {
|
||||||
}
|
}
|
||||||
return true
|
return true
|
||||||
case types.TSTRUCT:
|
case types.TSTRUCT:
|
||||||
|
if t.IsSIMD() {
|
||||||
|
return false
|
||||||
|
}
|
||||||
// Struct with 1 field, check if field is fat
|
// Struct with 1 field, check if field is fat
|
||||||
if t.NumFields() == 1 {
|
if t.NumFields() == 1 {
|
||||||
return isfat(t.Field(0).Type)
|
return isfat(t.Field(0).Type)
|
||||||
|
|
|
||||||
|
|
@ -1657,3 +1657,171 @@
|
||||||
|
|
||||||
// If we don't use the flags any more, just use the standard op.
|
// If we don't use the flags any more, just use the standard op.
|
||||||
(Select0 a:(ADD(Q|L)constflags [c] x)) && a.Uses == 1 => (ADD(Q|L)const [c] x)
|
(Select0 a:(ADD(Q|L)constflags [c] x)) && a.Uses == 1 => (ADD(Q|L)const [c] x)
|
||||||
|
|
||||||
|
// SIMD lowering rules
|
||||||
|
|
||||||
|
// Mask conversions
|
||||||
|
// integers to masks
|
||||||
|
(Cvt16toMask8x16 <t> x) => (VPMOVMToVec8x16 <types.TypeVec128> (KMOVWk <t> x))
|
||||||
|
(Cvt32toMask8x32 <t> x) => (VPMOVMToVec8x32 <types.TypeVec256> (KMOVDk <t> x))
|
||||||
|
(Cvt64toMask8x64 <t> x) => (VPMOVMToVec8x64 <types.TypeVec512> (KMOVQk <t> x))
|
||||||
|
|
||||||
|
(Cvt8toMask16x8 <t> x) => (VPMOVMToVec16x8 <types.TypeVec128> (KMOVBk <t> x))
|
||||||
|
(Cvt16toMask16x16 <t> x) => (VPMOVMToVec16x16 <types.TypeVec256> (KMOVWk <t> x))
|
||||||
|
(Cvt32toMask16x32 <t> x) => (VPMOVMToVec16x32 <types.TypeVec512> (KMOVDk <t> x))
|
||||||
|
|
||||||
|
(Cvt8toMask32x4 <t> x) => (VPMOVMToVec32x4 <types.TypeVec128> (KMOVBk <t> x))
|
||||||
|
(Cvt8toMask32x8 <t> x) => (VPMOVMToVec32x8 <types.TypeVec256> (KMOVBk <t> x))
|
||||||
|
(Cvt16toMask32x16 <t> x) => (VPMOVMToVec32x16 <types.TypeVec512> (KMOVWk <t> x))
|
||||||
|
|
||||||
|
(Cvt8toMask64x2 <t> x) => (VPMOVMToVec64x2 <types.TypeVec128> (KMOVBk <t> x))
|
||||||
|
(Cvt8toMask64x4 <t> x) => (VPMOVMToVec64x4 <types.TypeVec256> (KMOVBk <t> x))
|
||||||
|
(Cvt8toMask64x8 <t> x) => (VPMOVMToVec64x8 <types.TypeVec512> (KMOVBk <t> x))
|
||||||
|
|
||||||
|
// masks to integers
|
||||||
|
(CvtMask8x16to16 <t> x) => (KMOVWi <t> (VPMOVVec8x16ToM <types.TypeMask> x))
|
||||||
|
(CvtMask8x32to32 <t> x) => (KMOVDi <t> (VPMOVVec8x32ToM <types.TypeMask> x))
|
||||||
|
(CvtMask8x64to64 <t> x) => (KMOVQi <t> (VPMOVVec8x64ToM <types.TypeMask> x))
|
||||||
|
|
||||||
|
(CvtMask16x8to8 <t> x) => (KMOVBi <t> (VPMOVVec16x8ToM <types.TypeMask> x))
|
||||||
|
(CvtMask16x16to16 <t> x) => (KMOVWi <t> (VPMOVVec16x16ToM <types.TypeMask> x))
|
||||||
|
(CvtMask16x32to32 <t> x) => (KMOVDi <t> (VPMOVVec16x32ToM <types.TypeMask> x))
|
||||||
|
|
||||||
|
(CvtMask32x4to8 <t> x) => (KMOVBi <t> (VPMOVVec32x4ToM <types.TypeMask> x))
|
||||||
|
(CvtMask32x8to8 <t> x) => (KMOVBi <t> (VPMOVVec32x8ToM <types.TypeMask> x))
|
||||||
|
(CvtMask32x16to16 <t> x) => (KMOVWi <t> (VPMOVVec32x16ToM <types.TypeMask> x))
|
||||||
|
|
||||||
|
(CvtMask64x2to8 <t> x) => (KMOVBi <t> (VPMOVVec64x2ToM <types.TypeMask> x))
|
||||||
|
(CvtMask64x4to8 <t> x) => (KMOVBi <t> (VPMOVVec64x4ToM <types.TypeMask> x))
|
||||||
|
(CvtMask64x8to8 <t> x) => (KMOVBi <t> (VPMOVVec64x8ToM <types.TypeMask> x))
|
||||||
|
|
||||||
|
// optimizations
|
||||||
|
(MOVBstore [off] {sym} ptr (KMOVBi mask) mem) => (KMOVBstore [off] {sym} ptr mask mem)
|
||||||
|
(MOVWstore [off] {sym} ptr (KMOVWi mask) mem) => (KMOVWstore [off] {sym} ptr mask mem)
|
||||||
|
(MOVLstore [off] {sym} ptr (KMOVDi mask) mem) => (KMOVDstore [off] {sym} ptr mask mem)
|
||||||
|
(MOVQstore [off] {sym} ptr (KMOVQi mask) mem) => (KMOVQstore [off] {sym} ptr mask mem)
|
||||||
|
|
||||||
|
(KMOVBk l:(MOVBload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVBload [off] {sym} ptr mem)
|
||||||
|
(KMOVWk l:(MOVWload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVWload [off] {sym} ptr mem)
|
||||||
|
(KMOVDk l:(MOVLload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVDload [off] {sym} ptr mem)
|
||||||
|
(KMOVQk l:(MOVQload [off] {sym} ptr mem)) && canMergeLoad(v, l) && clobber(l) => (KMOVQload [off] {sym} ptr mem)
|
||||||
|
|
||||||
|
// SIMD vector loads and stores
|
||||||
|
(Load <t> ptr mem) && t.Size() == 16 => (VMOVDQUload128 ptr mem)
|
||||||
|
(Store {t} ptr val mem) && t.Size() == 16 => (VMOVDQUstore128 ptr val mem)
|
||||||
|
|
||||||
|
(Load <t> ptr mem) && t.Size() == 32 => (VMOVDQUload256 ptr mem)
|
||||||
|
(Store {t} ptr val mem) && t.Size() == 32 => (VMOVDQUstore256 ptr val mem)
|
||||||
|
|
||||||
|
(Load <t> ptr mem) && t.Size() == 64 => (VMOVDQUload512 ptr mem)
|
||||||
|
(Store {t} ptr val mem) && t.Size() == 64 => (VMOVDQUstore512 ptr val mem)
|
||||||
|
|
||||||
|
// SIMD vector integer-vector-masked loads and stores.
|
||||||
|
(LoadMasked32 <t> ptr mask mem) && t.Size() == 16 => (VPMASK32load128 ptr mask mem)
|
||||||
|
(LoadMasked32 <t> ptr mask mem) && t.Size() == 32 => (VPMASK32load256 ptr mask mem)
|
||||||
|
(LoadMasked64 <t> ptr mask mem) && t.Size() == 16 => (VPMASK64load128 ptr mask mem)
|
||||||
|
(LoadMasked64 <t> ptr mask mem) && t.Size() == 32 => (VPMASK64load256 ptr mask mem)
|
||||||
|
|
||||||
|
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 16 => (VPMASK32store128 ptr mask val mem)
|
||||||
|
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 32 => (VPMASK32store256 ptr mask val mem)
|
||||||
|
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 16 => (VPMASK64store128 ptr mask val mem)
|
||||||
|
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 32 => (VPMASK64store256 ptr mask val mem)
|
||||||
|
|
||||||
|
// Misc
|
||||||
|
(IsZeroVec x) => (SETEQ (VPTEST x x))
|
||||||
|
|
||||||
|
// SIMD vector K-masked loads and stores
|
||||||
|
|
||||||
|
(LoadMasked64 <t> ptr mask mem) && t.Size() == 64 => (VPMASK64load512 ptr (VPMOVVec64x8ToM <types.TypeMask> mask) mem)
|
||||||
|
(LoadMasked32 <t> ptr mask mem) && t.Size() == 64 => (VPMASK32load512 ptr (VPMOVVec32x16ToM <types.TypeMask> mask) mem)
|
||||||
|
(LoadMasked16 <t> ptr mask mem) && t.Size() == 64 => (VPMASK16load512 ptr (VPMOVVec16x32ToM <types.TypeMask> mask) mem)
|
||||||
|
(LoadMasked8 <t> ptr mask mem) && t.Size() == 64 => (VPMASK8load512 ptr (VPMOVVec8x64ToM <types.TypeMask> mask) mem)
|
||||||
|
|
||||||
|
(StoreMasked64 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK64store512 ptr (VPMOVVec64x8ToM <types.TypeMask> mask) val mem)
|
||||||
|
(StoreMasked32 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK32store512 ptr (VPMOVVec32x16ToM <types.TypeMask> mask) val mem)
|
||||||
|
(StoreMasked16 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK16store512 ptr (VPMOVVec16x32ToM <types.TypeMask> mask) val mem)
|
||||||
|
(StoreMasked8 {t} ptr mask val mem) && t.Size() == 64 => (VPMASK8store512 ptr (VPMOVVec8x64ToM <types.TypeMask> mask) val mem)
|
||||||
|
|
||||||
|
(ZeroSIMD <t>) && t.Size() == 16 => (Zero128 <t>)
|
||||||
|
(ZeroSIMD <t>) && t.Size() == 32 => (Zero256 <t>)
|
||||||
|
(ZeroSIMD <t>) && t.Size() == 64 => (Zero512 <t>)
|
||||||
|
|
||||||
|
(VPMOVVec8x16ToM (VPMOVMToVec8x16 x)) => x
|
||||||
|
(VPMOVVec8x32ToM (VPMOVMToVec8x32 x)) => x
|
||||||
|
(VPMOVVec8x64ToM (VPMOVMToVec8x64 x)) => x
|
||||||
|
|
||||||
|
(VPMOVVec16x8ToM (VPMOVMToVec16x8 x)) => x
|
||||||
|
(VPMOVVec16x16ToM (VPMOVMToVec16x16 x)) => x
|
||||||
|
(VPMOVVec16x32ToM (VPMOVMToVec16x32 x)) => x
|
||||||
|
|
||||||
|
(VPMOVVec32x4ToM (VPMOVMToVec32x4 x)) => x
|
||||||
|
(VPMOVVec32x8ToM (VPMOVMToVec32x8 x)) => x
|
||||||
|
(VPMOVVec32x16ToM (VPMOVMToVec32x16 x)) => x
|
||||||
|
|
||||||
|
(VPMOVVec64x2ToM (VPMOVMToVec64x2 x)) => x
|
||||||
|
(VPMOVVec64x4ToM (VPMOVMToVec64x4 x)) => x
|
||||||
|
(VPMOVVec64x8ToM (VPMOVMToVec64x8 x)) => x
|
||||||
|
|
||||||
|
(VPANDQ512 x (VPMOVMToVec64x8 k)) => (VMOVDQU64Masked512 x k)
|
||||||
|
(VPANDQ512 x (VPMOVMToVec32x16 k)) => (VMOVDQU32Masked512 x k)
|
||||||
|
(VPANDQ512 x (VPMOVMToVec16x32 k)) => (VMOVDQU16Masked512 x k)
|
||||||
|
(VPANDQ512 x (VPMOVMToVec8x64 k)) => (VMOVDQU8Masked512 x k)
|
||||||
|
(VPANDD512 x (VPMOVMToVec64x8 k)) => (VMOVDQU64Masked512 x k)
|
||||||
|
(VPANDD512 x (VPMOVMToVec32x16 k)) => (VMOVDQU32Masked512 x k)
|
||||||
|
(VPANDD512 x (VPMOVMToVec16x32 k)) => (VMOVDQU16Masked512 x k)
|
||||||
|
(VPANDD512 x (VPMOVMToVec8x64 k)) => (VMOVDQU8Masked512 x k)
|
||||||
|
|
||||||
|
(VPAND128 x (VPMOVMToVec8x16 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU8Masked128 x k)
|
||||||
|
(VPAND128 x (VPMOVMToVec16x8 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU16Masked128 x k)
|
||||||
|
(VPAND128 x (VPMOVMToVec32x4 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU32Masked128 x k)
|
||||||
|
(VPAND128 x (VPMOVMToVec64x2 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU64Masked128 x k)
|
||||||
|
|
||||||
|
(VPAND256 x (VPMOVMToVec8x32 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU8Masked256 x k)
|
||||||
|
(VPAND256 x (VPMOVMToVec16x16 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU16Masked256 x k)
|
||||||
|
(VPAND256 x (VPMOVMToVec32x8 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU32Masked256 x k)
|
||||||
|
(VPAND256 x (VPMOVMToVec64x4 k)) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (VMOVDQU64Masked256 x k)
|
||||||
|
|
||||||
|
// Insert to zero of 32/64 bit floats and ints to a zero is just MOVS[SD]
|
||||||
|
(VPINSRQ128 [0] (Zero128 <t>) y) && y.Type.IsFloat() => (VMOVSDf2v <types.TypeVec128> y)
|
||||||
|
(VPINSRD128 [0] (Zero128 <t>) y) && y.Type.IsFloat() => (VMOVSSf2v <types.TypeVec128> y)
|
||||||
|
(VPINSRQ128 [0] (Zero128 <t>) y) && !y.Type.IsFloat() => (VMOVQ <types.TypeVec128> y)
|
||||||
|
(VPINSRD128 [0] (Zero128 <t>) y) && !y.Type.IsFloat() => (VMOVD <types.TypeVec128> y)
|
||||||
|
|
||||||
|
// These rewrites can skip zero-extending the 8/16-bit inputs because they are
|
||||||
|
// only used as the input to a broadcast; the potentially "bad" bits are ignored
|
||||||
|
(VPBROADCASTB(128|256|512) x:(VPINSRB128 [0] (Zero128 <t>) y)) && x.Uses == 1 =>
|
||||||
|
(VPBROADCASTB(128|256|512) (VMOVQ <types.TypeVec128> y))
|
||||||
|
(VPBROADCASTW(128|256|512) x:(VPINSRW128 [0] (Zero128 <t>) y)) && x.Uses == 1 =>
|
||||||
|
(VPBROADCASTW(128|256|512) (VMOVQ <types.TypeVec128> y))
|
||||||
|
|
||||||
|
(VMOVQ x:(MOVQload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVQload <v.Type> [off] {sym} ptr mem)
|
||||||
|
(VMOVD x:(MOVLload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVDload <v.Type> [off] {sym} ptr mem)
|
||||||
|
|
||||||
|
(VMOVSDf2v x:(MOVSDload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVSDload <v.Type> [off] {sym} ptr mem)
|
||||||
|
(VMOVSSf2v x:(MOVSSload [off] {sym} ptr mem)) && x.Uses == 1 && clobber(x) => @x.Block (VMOVSSload <v.Type> [off] {sym} ptr mem)
|
||||||
|
|
||||||
|
(VMOVSDf2v x:(MOVSDconst [c] )) => (VMOVSDconst [c] )
|
||||||
|
(VMOVSSf2v x:(MOVSSconst [c] )) => (VMOVSSconst [c] )
|
||||||
|
|
||||||
|
(VMOVDQUload(128|256|512) [off1] {sym} x:(ADDQconst [off2] ptr) mem) && is32Bit(int64(off1)+int64(off2)) => (VMOVDQUload(128|256|512) [off1+off2] {sym} ptr mem)
|
||||||
|
(VMOVDQUstore(128|256|512) [off1] {sym} x:(ADDQconst [off2] ptr) val mem) && is32Bit(int64(off1)+int64(off2)) => (VMOVDQUstore(128|256|512) [off1+off2] {sym} ptr val mem)
|
||||||
|
(VMOVDQUload(128|256|512) [off1] {sym1} x:(LEAQ [off2] {sym2} base) mem) && is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) => (VMOVDQUload(128|256|512) [off1+off2] {mergeSym(sym1, sym2)} base mem)
|
||||||
|
(VMOVDQUstore(128|256|512) [off1] {sym1} x:(LEAQ [off2] {sym2} base) val mem) && is32Bit(int64(off1)+int64(off2)) && canMergeSym(sym1, sym2) => (VMOVDQUstore(128|256|512) [off1+off2] {mergeSym(sym1, sym2)} base val mem)
|
||||||
|
|
||||||
|
// 2-op VPTEST optimizations
|
||||||
|
(SETEQ (VPTEST x:(VPAND(128|256) j k) y)) && x == y && x.Uses == 2 => (SETEQ (VPTEST j k))
|
||||||
|
(SETEQ (VPTEST x:(VPAND(D|Q)512 j k) y)) && x == y && x.Uses == 2 => (SETEQ (VPTEST j k))
|
||||||
|
(SETEQ (VPTEST x:(VPANDN(128|256) j k) y)) && x == y && x.Uses == 2 => (SETB (VPTEST k j)) // AndNot has swapped its operand order
|
||||||
|
(SETEQ (VPTEST x:(VPANDN(D|Q)512 j k) y)) && x == y && x.Uses == 2 => (SETB (VPTEST k j)) // AndNot has swapped its operand order
|
||||||
|
(EQ (VPTEST x:(VPAND(128|256) j k) y) yes no) && x == y && x.Uses == 2 => (EQ (VPTEST j k) yes no)
|
||||||
|
(EQ (VPTEST x:(VPAND(D|Q)512 j k) y) yes no) && x == y && x.Uses == 2 => (EQ (VPTEST j k) yes no)
|
||||||
|
(EQ (VPTEST x:(VPANDN(128|256) j k) y) yes no) && x == y && x.Uses == 2 => (ULT (VPTEST k j) yes no) // AndNot has swapped its operand order
|
||||||
|
(EQ (VPTEST x:(VPANDN(D|Q)512 j k) y) yes no) && x == y && x.Uses == 2 => (ULT (VPTEST k j) yes no) // AndNot has swapped its operand order
|
||||||
|
|
||||||
|
// DotProductQuadruple optimizations
|
||||||
|
(VPADDD128 (VPDPBUSD128 (Zero128 <t>) x y) z) => (VPDPBUSD128 <t> z x y)
|
||||||
|
(VPADDD256 (VPDPBUSD256 (Zero256 <t>) x y) z) => (VPDPBUSD256 <t> z x y)
|
||||||
|
(VPADDD512 (VPDPBUSD512 (Zero512 <t>) x y) z) => (VPDPBUSD512 <t> z x y)
|
||||||
|
(VPADDD128 (VPDPBUSDS128 (Zero128 <t>) x y) z) => (VPDPBUSDS128 <t> z x y)
|
||||||
|
(VPADDD256 (VPDPBUSDS256 (Zero256 <t>) x y) z) => (VPDPBUSDS256 <t> z x y)
|
||||||
|
(VPADDD512 (VPDPBUSDS512 (Zero512 <t>) x y) z) => (VPDPBUSDS512 <t> z x y)
|
||||||
|
|
@ -62,7 +62,33 @@ var regNamesAMD64 = []string{
|
||||||
"X13",
|
"X13",
|
||||||
"X14",
|
"X14",
|
||||||
"X15", // constant 0 in ABIInternal
|
"X15", // constant 0 in ABIInternal
|
||||||
|
"X16",
|
||||||
|
"X17",
|
||||||
|
"X18",
|
||||||
|
"X19",
|
||||||
|
"X20",
|
||||||
|
"X21",
|
||||||
|
"X22",
|
||||||
|
"X23",
|
||||||
|
"X24",
|
||||||
|
"X25",
|
||||||
|
"X26",
|
||||||
|
"X27",
|
||||||
|
"X28",
|
||||||
|
"X29",
|
||||||
|
"X30",
|
||||||
|
"X31",
|
||||||
|
|
||||||
|
// TODO: update asyncPreempt for K registers.
|
||||||
|
// asyncPreempt also needs to store Z0-Z15 properly.
|
||||||
|
"K0",
|
||||||
|
"K1",
|
||||||
|
"K2",
|
||||||
|
"K3",
|
||||||
|
"K4",
|
||||||
|
"K5",
|
||||||
|
"K6",
|
||||||
|
"K7",
|
||||||
// If you add registers, update asyncPreempt in runtime
|
// If you add registers, update asyncPreempt in runtime
|
||||||
|
|
||||||
// pseudo-registers
|
// pseudo-registers
|
||||||
|
|
@ -98,16 +124,28 @@ func init() {
|
||||||
gp = buildReg("AX CX DX BX BP SI DI R8 R9 R10 R11 R12 R13 R15")
|
gp = buildReg("AX CX DX BX BP SI DI R8 R9 R10 R11 R12 R13 R15")
|
||||||
g = buildReg("g")
|
g = buildReg("g")
|
||||||
fp = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14")
|
fp = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14")
|
||||||
|
v = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14")
|
||||||
|
w = buildReg("X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31")
|
||||||
x15 = buildReg("X15")
|
x15 = buildReg("X15")
|
||||||
|
mask = buildReg("K1 K2 K3 K4 K5 K6 K7")
|
||||||
gpsp = gp | buildReg("SP")
|
gpsp = gp | buildReg("SP")
|
||||||
gpspsb = gpsp | buildReg("SB")
|
gpspsb = gpsp | buildReg("SB")
|
||||||
gpspsbg = gpspsb | g
|
gpspsbg = gpspsb | g
|
||||||
callerSave = gp | fp | g // runtime.setg (and anything calling it) may clobber g
|
callerSave = gp | fp | g // runtime.setg (and anything calling it) may clobber g
|
||||||
|
|
||||||
|
vz = v | x15
|
||||||
|
wz = w | x15
|
||||||
|
x0 = buildReg("X0")
|
||||||
)
|
)
|
||||||
// Common slices of register masks
|
// Common slices of register masks
|
||||||
var (
|
var (
|
||||||
gponly = []regMask{gp}
|
gponly = []regMask{gp}
|
||||||
fponly = []regMask{fp}
|
fponly = []regMask{fp}
|
||||||
|
vonly = []regMask{v}
|
||||||
|
wonly = []regMask{w}
|
||||||
|
maskonly = []regMask{mask}
|
||||||
|
vzonly = []regMask{vz}
|
||||||
|
wzonly = []regMask{wz}
|
||||||
)
|
)
|
||||||
|
|
||||||
// Common regInfo
|
// Common regInfo
|
||||||
|
|
@ -170,6 +208,67 @@ func init() {
|
||||||
fpstore = regInfo{inputs: []regMask{gpspsb, fp, 0}}
|
fpstore = regInfo{inputs: []regMask{gpspsb, fp, 0}}
|
||||||
fpstoreidx = regInfo{inputs: []regMask{gpspsb, gpsp, fp, 0}}
|
fpstoreidx = regInfo{inputs: []regMask{gpspsb, gpsp, fp, 0}}
|
||||||
|
|
||||||
|
// masked loads/stores, vector register or mask register
|
||||||
|
vloadv = regInfo{inputs: []regMask{gpspsb, v, 0}, outputs: vonly}
|
||||||
|
vstorev = regInfo{inputs: []regMask{gpspsb, v, v, 0}}
|
||||||
|
vloadk = regInfo{inputs: []regMask{gpspsb, mask, 0}, outputs: vonly}
|
||||||
|
vstorek = regInfo{inputs: []regMask{gpspsb, mask, v, 0}}
|
||||||
|
|
||||||
|
v11 = regInfo{inputs: vonly, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
v21 = regInfo{inputs: []regMask{v, vz}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
vk = regInfo{inputs: vzonly, outputs: maskonly}
|
||||||
|
kv = regInfo{inputs: maskonly, outputs: vonly}
|
||||||
|
v2k = regInfo{inputs: []regMask{vz, vz}, outputs: maskonly}
|
||||||
|
vkv = regInfo{inputs: []regMask{vz, mask}, outputs: vonly}
|
||||||
|
v2kv = regInfo{inputs: []regMask{vz, vz, mask}, outputs: vonly}
|
||||||
|
v2kk = regInfo{inputs: []regMask{vz, vz, mask}, outputs: maskonly}
|
||||||
|
v31 = regInfo{inputs: []regMask{v, vz, vz}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
v3kv = regInfo{inputs: []regMask{v, vz, vz, mask}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
vgpv = regInfo{inputs: []regMask{vz, gp}, outputs: vonly}
|
||||||
|
vgp = regInfo{inputs: vonly, outputs: gponly}
|
||||||
|
vfpv = regInfo{inputs: []regMask{vz, fp}, outputs: vonly}
|
||||||
|
vfpkv = regInfo{inputs: []regMask{vz, fp, mask}, outputs: vonly}
|
||||||
|
fpv = regInfo{inputs: []regMask{fp}, outputs: vonly}
|
||||||
|
gpv = regInfo{inputs: []regMask{gp}, outputs: vonly}
|
||||||
|
v2flags = regInfo{inputs: []regMask{vz, vz}}
|
||||||
|
|
||||||
|
w11 = regInfo{inputs: wonly, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
w21 = regInfo{inputs: []regMask{wz, wz}, outputs: wonly}
|
||||||
|
wk = regInfo{inputs: wzonly, outputs: maskonly}
|
||||||
|
kw = regInfo{inputs: maskonly, outputs: wonly}
|
||||||
|
w2k = regInfo{inputs: []regMask{wz, wz}, outputs: maskonly}
|
||||||
|
wkw = regInfo{inputs: []regMask{wz, mask}, outputs: wonly}
|
||||||
|
w2kw = regInfo{inputs: []regMask{w, wz, mask}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
w2kk = regInfo{inputs: []regMask{wz, wz, mask}, outputs: maskonly}
|
||||||
|
w31 = regInfo{inputs: []regMask{w, wz, wz}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
w3kw = regInfo{inputs: []regMask{w, wz, wz, mask}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
wgpw = regInfo{inputs: []regMask{wz, gp}, outputs: wonly}
|
||||||
|
wgp = regInfo{inputs: wzonly, outputs: gponly}
|
||||||
|
wfpw = regInfo{inputs: []regMask{wz, fp}, outputs: wonly}
|
||||||
|
wfpkw = regInfo{inputs: []regMask{wz, fp, mask}, outputs: wonly}
|
||||||
|
|
||||||
|
// These register masks are used by SIMD only, they follow the pattern:
|
||||||
|
// Mem last, k mask second to last (if any), address right before mem and k mask.
|
||||||
|
wkwload = regInfo{inputs: []regMask{gpspsb, mask, 0}, outputs: wonly}
|
||||||
|
v21load = regInfo{inputs: []regMask{v, gpspsb, 0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
v31load = regInfo{inputs: []regMask{v, vz, gpspsb, 0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
v11load = regInfo{inputs: []regMask{gpspsb, 0}, outputs: vonly}
|
||||||
|
w21load = regInfo{inputs: []regMask{wz, gpspsb, 0}, outputs: wonly}
|
||||||
|
w31load = regInfo{inputs: []regMask{w, wz, gpspsb, 0}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
w2kload = regInfo{inputs: []regMask{wz, gpspsb, 0}, outputs: maskonly}
|
||||||
|
w2kwload = regInfo{inputs: []regMask{wz, gpspsb, mask, 0}, outputs: wonly}
|
||||||
|
w11load = regInfo{inputs: []regMask{gpspsb, 0}, outputs: wonly}
|
||||||
|
w3kwload = regInfo{inputs: []regMask{w, wz, gpspsb, mask, 0}, outputs: wonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
w2kkload = regInfo{inputs: []regMask{wz, gpspsb, mask, 0}, outputs: maskonly}
|
||||||
|
v31x0AtIn2 = regInfo{inputs: []regMask{v, vz, x0}, outputs: vonly} // used in resultInArg0 ops, arg0 must not be x15
|
||||||
|
|
||||||
|
kload = regInfo{inputs: []regMask{gpspsb, 0}, outputs: maskonly}
|
||||||
|
kstore = regInfo{inputs: []regMask{gpspsb, mask, 0}}
|
||||||
|
gpk = regInfo{inputs: gponly, outputs: maskonly}
|
||||||
|
kgp = regInfo{inputs: maskonly, outputs: gponly}
|
||||||
|
|
||||||
|
x15only = regInfo{inputs: nil, outputs: []regMask{x15}}
|
||||||
|
|
||||||
prefreg = regInfo{inputs: []regMask{gpspsbg}}
|
prefreg = regInfo{inputs: []regMask{gpspsbg}}
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -1235,6 +1334,118 @@ func init() {
|
||||||
//
|
//
|
||||||
// output[i] = (input[i] >> 7) & 1
|
// output[i] = (input[i] >> 7) & 1
|
||||||
{name: "PMOVMSKB", argLength: 1, reg: fpgp, asm: "PMOVMSKB"},
|
{name: "PMOVMSKB", argLength: 1, reg: fpgp, asm: "PMOVMSKB"},
|
||||||
|
|
||||||
|
// SIMD ops
|
||||||
|
{name: "VMOVDQUload128", argLength: 2, reg: fpload, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
|
||||||
|
{name: "VMOVDQUstore128", argLength: 3, reg: fpstore, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
|
||||||
|
|
||||||
|
{name: "VMOVDQUload256", argLength: 2, reg: fpload, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
|
||||||
|
{name: "VMOVDQUstore256", argLength: 3, reg: fpstore, asm: "VMOVDQU", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
|
||||||
|
|
||||||
|
{name: "VMOVDQUload512", argLength: 2, reg: fpload, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1 = mem
|
||||||
|
{name: "VMOVDQUstore512", argLength: 3, reg: fpstore, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg1, arg2 = mem
|
||||||
|
|
||||||
|
// AVX2 32 and 64-bit element int-vector masked moves.
|
||||||
|
{name: "VPMASK32load128", argLength: 3, reg: vloadv, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
|
||||||
|
{name: "VPMASK32store128", argLength: 4, reg: vstorev, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
|
||||||
|
{name: "VPMASK64load128", argLength: 3, reg: vloadv, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
|
||||||
|
{name: "VPMASK64store128", argLength: 4, reg: vstorev, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
|
||||||
|
|
||||||
|
{name: "VPMASK32load256", argLength: 3, reg: vloadv, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
|
||||||
|
{name: "VPMASK32store256", argLength: 4, reg: vstorev, asm: "VPMASKMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
|
||||||
|
{name: "VPMASK64load256", argLength: 3, reg: vloadv, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=integer mask, arg2 = mem
|
||||||
|
{name: "VPMASK64store256", argLength: 4, reg: vstorev, asm: "VPMASKMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=integer mask, arg3 = mem
|
||||||
|
|
||||||
|
// AVX512 8-64-bit element mask-register masked moves
|
||||||
|
{name: "VPMASK8load512", argLength: 3, reg: vloadk, asm: "VMOVDQU8", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
|
||||||
|
{name: "VPMASK8store512", argLength: 4, reg: vstorek, asm: "VMOVDQU8", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
|
||||||
|
{name: "VPMASK16load512", argLength: 3, reg: vloadk, asm: "VMOVDQU16", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
|
||||||
|
{name: "VPMASK16store512", argLength: 4, reg: vstorek, asm: "VMOVDQU16", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
|
||||||
|
{name: "VPMASK32load512", argLength: 3, reg: vloadk, asm: "VMOVDQU32", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
|
||||||
|
{name: "VPMASK32store512", argLength: 4, reg: vstorek, asm: "VMOVDQU32", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
|
||||||
|
{name: "VPMASK64load512", argLength: 3, reg: vloadk, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"}, // load from arg0+auxint+aux, arg1=k mask, arg2 = mem
|
||||||
|
{name: "VPMASK64store512", argLength: 4, reg: vstorek, asm: "VMOVDQU64", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"}, // store, *(arg0+auxint+aux) = arg2, arg1=k mask, arg3 = mem
|
||||||
|
|
||||||
|
{name: "VPMOVMToVec8x16", argLength: 1, reg: kv, asm: "VPMOVM2B"},
|
||||||
|
{name: "VPMOVMToVec8x32", argLength: 1, reg: kv, asm: "VPMOVM2B"},
|
||||||
|
{name: "VPMOVMToVec8x64", argLength: 1, reg: kw, asm: "VPMOVM2B"},
|
||||||
|
|
||||||
|
{name: "VPMOVMToVec16x8", argLength: 1, reg: kv, asm: "VPMOVM2W"},
|
||||||
|
{name: "VPMOVMToVec16x16", argLength: 1, reg: kv, asm: "VPMOVM2W"},
|
||||||
|
{name: "VPMOVMToVec16x32", argLength: 1, reg: kw, asm: "VPMOVM2W"},
|
||||||
|
|
||||||
|
{name: "VPMOVMToVec32x4", argLength: 1, reg: kv, asm: "VPMOVM2D"},
|
||||||
|
{name: "VPMOVMToVec32x8", argLength: 1, reg: kv, asm: "VPMOVM2D"},
|
||||||
|
{name: "VPMOVMToVec32x16", argLength: 1, reg: kw, asm: "VPMOVM2D"},
|
||||||
|
|
||||||
|
{name: "VPMOVMToVec64x2", argLength: 1, reg: kv, asm: "VPMOVM2Q"},
|
||||||
|
{name: "VPMOVMToVec64x4", argLength: 1, reg: kv, asm: "VPMOVM2Q"},
|
||||||
|
{name: "VPMOVMToVec64x8", argLength: 1, reg: kw, asm: "VPMOVM2Q"},
|
||||||
|
|
||||||
|
{name: "VPMOVVec8x16ToM", argLength: 1, reg: vk, asm: "VPMOVB2M"},
|
||||||
|
{name: "VPMOVVec8x32ToM", argLength: 1, reg: vk, asm: "VPMOVB2M"},
|
||||||
|
{name: "VPMOVVec8x64ToM", argLength: 1, reg: wk, asm: "VPMOVB2M"},
|
||||||
|
|
||||||
|
{name: "VPMOVVec16x8ToM", argLength: 1, reg: vk, asm: "VPMOVW2M"},
|
||||||
|
{name: "VPMOVVec16x16ToM", argLength: 1, reg: vk, asm: "VPMOVW2M"},
|
||||||
|
{name: "VPMOVVec16x32ToM", argLength: 1, reg: wk, asm: "VPMOVW2M"},
|
||||||
|
|
||||||
|
{name: "VPMOVVec32x4ToM", argLength: 1, reg: vk, asm: "VPMOVD2M"},
|
||||||
|
{name: "VPMOVVec32x8ToM", argLength: 1, reg: vk, asm: "VPMOVD2M"},
|
||||||
|
{name: "VPMOVVec32x16ToM", argLength: 1, reg: wk, asm: "VPMOVD2M"},
|
||||||
|
|
||||||
|
{name: "VPMOVVec64x2ToM", argLength: 1, reg: vk, asm: "VPMOVQ2M"},
|
||||||
|
{name: "VPMOVVec64x4ToM", argLength: 1, reg: vk, asm: "VPMOVQ2M"},
|
||||||
|
{name: "VPMOVVec64x8ToM", argLength: 1, reg: wk, asm: "VPMOVQ2M"},
|
||||||
|
|
||||||
|
{name: "Zero128", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
|
||||||
|
{name: "Zero256", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
|
||||||
|
{name: "Zero512", argLength: 0, reg: x15only, zeroWidth: true, fixedReg: true},
|
||||||
|
|
||||||
|
{name: "VMOVSDf2v", argLength: 1, reg: fpv, asm: "VMOVSD"},
|
||||||
|
{name: "VMOVSSf2v", argLength: 1, reg: fpv, asm: "VMOVSS"},
|
||||||
|
{name: "VMOVQ", argLength: 1, reg: gpv, asm: "VMOVQ"},
|
||||||
|
{name: "VMOVD", argLength: 1, reg: gpv, asm: "VMOVD"},
|
||||||
|
|
||||||
|
{name: "VMOVQload", argLength: 2, reg: fpload, asm: "VMOVQ", aux: "SymOff", typ: "UInt64", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "VMOVDload", argLength: 2, reg: fpload, asm: "VMOVD", aux: "SymOff", typ: "UInt32", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "VMOVSSload", argLength: 2, reg: fpload, asm: "VMOVSS", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "VMOVSDload", argLength: 2, reg: fpload, asm: "VMOVSD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
|
||||||
|
{name: "VMOVSSconst", reg: fp01, asm: "VMOVSS", aux: "Float32", rematerializeable: true},
|
||||||
|
{name: "VMOVSDconst", reg: fp01, asm: "VMOVSD", aux: "Float64", rematerializeable: true},
|
||||||
|
|
||||||
|
{name: "VZEROUPPER", argLength: 1, reg: regInfo{clobbers: v}, asm: "VZEROUPPER"}, // arg=mem, returns mem
|
||||||
|
{name: "VZEROALL", argLength: 1, reg: regInfo{clobbers: v}, asm: "VZEROALL"}, // arg=mem, returns mem
|
||||||
|
|
||||||
|
// KMOVxload: loads masks
|
||||||
|
// Load (Q=8,D=4,W=2,B=1) bytes from (arg0+auxint+aux), arg1=mem.
|
||||||
|
// "+auxint+aux" == add auxint and the offset of the symbol in aux (if any) to the effective address
|
||||||
|
{name: "KMOVBload", argLength: 2, reg: kload, asm: "KMOVB", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "KMOVWload", argLength: 2, reg: kload, asm: "KMOVW", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "KMOVDload", argLength: 2, reg: kload, asm: "KMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
{name: "KMOVQload", argLength: 2, reg: kload, asm: "KMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Read"},
|
||||||
|
|
||||||
|
// KMOVxstore: stores masks
|
||||||
|
// Store (Q=8,D=4,W=2,B=1) low bytes of arg1.
|
||||||
|
// Does *(arg0+auxint+aux) = arg1, arg2=mem.
|
||||||
|
{name: "KMOVBstore", argLength: 3, reg: kstore, asm: "KMOVB", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
|
||||||
|
{name: "KMOVWstore", argLength: 3, reg: kstore, asm: "KMOVW", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
|
||||||
|
{name: "KMOVDstore", argLength: 3, reg: kstore, asm: "KMOVD", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
|
||||||
|
{name: "KMOVQstore", argLength: 3, reg: kstore, asm: "KMOVQ", aux: "SymOff", faultOnNilArg0: true, symEffect: "Write"},
|
||||||
|
|
||||||
|
// Move GP directly to mask register
|
||||||
|
{name: "KMOVQk", argLength: 1, reg: gpk, asm: "KMOVQ"},
|
||||||
|
{name: "KMOVDk", argLength: 1, reg: gpk, asm: "KMOVD"},
|
||||||
|
{name: "KMOVWk", argLength: 1, reg: gpk, asm: "KMOVW"},
|
||||||
|
{name: "KMOVBk", argLength: 1, reg: gpk, asm: "KMOVB"},
|
||||||
|
{name: "KMOVQi", argLength: 1, reg: kgp, asm: "KMOVQ"},
|
||||||
|
{name: "KMOVDi", argLength: 1, reg: kgp, asm: "KMOVD"},
|
||||||
|
{name: "KMOVWi", argLength: 1, reg: kgp, asm: "KMOVW"},
|
||||||
|
{name: "KMOVBi", argLength: 1, reg: kgp, asm: "KMOVB"},
|
||||||
|
|
||||||
|
// VPTEST
|
||||||
|
{name: "VPTEST", asm: "VPTEST", argLength: 2, reg: v2flags, clobberFlags: true, typ: "Flags"},
|
||||||
}
|
}
|
||||||
|
|
||||||
var AMD64blocks = []blockData{
|
var AMD64blocks = []blockData{
|
||||||
|
|
@ -1266,14 +1477,17 @@ func init() {
|
||||||
name: "AMD64",
|
name: "AMD64",
|
||||||
pkg: "cmd/internal/obj/x86",
|
pkg: "cmd/internal/obj/x86",
|
||||||
genfile: "../../amd64/ssa.go",
|
genfile: "../../amd64/ssa.go",
|
||||||
ops: AMD64ops,
|
genSIMDfile: "../../amd64/simdssa.go",
|
||||||
|
ops: append(AMD64ops, simdAMD64Ops(v11, v21, v2k, vkv, v2kv, v2kk, v31, v3kv, vgpv, vgp, vfpv, vfpkv,
|
||||||
|
w11, w21, w2k, wkw, w2kw, w2kk, w31, w3kw, wgpw, wgp, wfpw, wfpkw, wkwload, v21load, v31load, v11load,
|
||||||
|
w21load, w31load, w2kload, w2kwload, w11load, w3kwload, w2kkload, v31x0AtIn2)...), // AMD64ops,
|
||||||
blocks: AMD64blocks,
|
blocks: AMD64blocks,
|
||||||
regnames: regNamesAMD64,
|
regnames: regNamesAMD64,
|
||||||
ParamIntRegNames: "AX BX CX DI SI R8 R9 R10 R11",
|
ParamIntRegNames: "AX BX CX DI SI R8 R9 R10 R11",
|
||||||
ParamFloatRegNames: "X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14",
|
ParamFloatRegNames: "X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14",
|
||||||
gpregmask: gp,
|
gpregmask: gp,
|
||||||
fpregmask: fp,
|
fpregmask: fp,
|
||||||
specialregmask: x15,
|
specialregmask: mask,
|
||||||
framepointerreg: int8(num["BP"]),
|
framepointerreg: int8(num["BP"]),
|
||||||
linkreg: -1, // not used
|
linkreg: -1, // not used
|
||||||
})
|
})
|
||||||
|
|
|
||||||
|
|
@ -941,7 +941,7 @@
|
||||||
|
|
||||||
// struct operations
|
// struct operations
|
||||||
(StructSelect [i] x:(StructMake ___)) => x.Args[i]
|
(StructSelect [i] x:(StructMake ___)) => x.Args[i]
|
||||||
(Load <t> _ _) && t.IsStruct() && CanSSA(t) => rewriteStructLoad(v)
|
(Load <t> _ _) && t.IsStruct() && CanSSA(t) && !t.IsSIMD() => rewriteStructLoad(v)
|
||||||
(Store _ (StructMake ___) _) => rewriteStructStore(v)
|
(Store _ (StructMake ___) _) => rewriteStructStore(v)
|
||||||
|
|
||||||
(StructSelect [i] x:(Load <t> ptr mem)) && !CanSSA(t) =>
|
(StructSelect [i] x:(Load <t> ptr mem)) && !CanSSA(t) =>
|
||||||
|
|
|
||||||
|
|
@ -375,6 +375,18 @@ var genericOps = []opData{
|
||||||
{name: "Load", argLength: 2}, // Load from arg0. arg1=memory
|
{name: "Load", argLength: 2}, // Load from arg0. arg1=memory
|
||||||
{name: "Dereference", argLength: 2}, // Load from arg0. arg1=memory. Helper op for arg/result passing, result is an otherwise not-SSA-able "value".
|
{name: "Dereference", argLength: 2}, // Load from arg0. arg1=memory. Helper op for arg/result passing, result is an otherwise not-SSA-able "value".
|
||||||
{name: "Store", argLength: 3, typ: "Mem", aux: "Typ"}, // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
|
{name: "Store", argLength: 3, typ: "Mem", aux: "Typ"}, // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
|
||||||
|
|
||||||
|
// masked memory operations.
|
||||||
|
// TODO add 16 and 8
|
||||||
|
{name: "LoadMasked8", argLength: 3}, // Load from arg0, arg1 = mask of 8-bits, arg2 = memory
|
||||||
|
{name: "LoadMasked16", argLength: 3}, // Load from arg0, arg1 = mask of 16-bits, arg2 = memory
|
||||||
|
{name: "LoadMasked32", argLength: 3}, // Load from arg0, arg1 = mask of 32-bits, arg2 = memory
|
||||||
|
{name: "LoadMasked64", argLength: 3}, // Load from arg0, arg1 = mask of 64-bits, arg2 = memory
|
||||||
|
{name: "StoreMasked8", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 8-bits, arg3 = memory
|
||||||
|
{name: "StoreMasked16", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 16-bits, arg3 = memory
|
||||||
|
{name: "StoreMasked32", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 32-bits, arg3 = memory
|
||||||
|
{name: "StoreMasked64", argLength: 4, typ: "Mem", aux: "Typ"}, // Store arg2 to arg0, arg1=mask of 64-bits, arg3 = memory
|
||||||
|
|
||||||
// Normally we require that the source and destination of Move do not overlap.
|
// Normally we require that the source and destination of Move do not overlap.
|
||||||
// There is an exception when we know all the loads will happen before all
|
// There is an exception when we know all the loads will happen before all
|
||||||
// the stores. In that case, overlap is ok. See
|
// the stores. In that case, overlap is ok. See
|
||||||
|
|
@ -666,6 +678,40 @@ var genericOps = []opData{
|
||||||
// Prefetch instruction
|
// Prefetch instruction
|
||||||
{name: "PrefetchCache", argLength: 2, hasSideEffects: true}, // Do prefetch arg0 to cache. arg0=addr, arg1=memory.
|
{name: "PrefetchCache", argLength: 2, hasSideEffects: true}, // Do prefetch arg0 to cache. arg0=addr, arg1=memory.
|
||||||
{name: "PrefetchCacheStreamed", argLength: 2, hasSideEffects: true}, // Do non-temporal or streamed prefetch arg0 to cache. arg0=addr, arg1=memory.
|
{name: "PrefetchCacheStreamed", argLength: 2, hasSideEffects: true}, // Do non-temporal or streamed prefetch arg0 to cache. arg0=addr, arg1=memory.
|
||||||
|
|
||||||
|
// SIMD
|
||||||
|
{name: "ZeroSIMD", argLength: 0}, // zero value of a vector
|
||||||
|
|
||||||
|
// Convert integers to masks
|
||||||
|
{name: "Cvt16toMask8x16", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt32toMask8x32", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt64toMask8x64", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask16x8", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt16toMask16x16", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt32toMask16x32", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask32x4", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask32x8", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt16toMask32x16", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask64x2", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask64x4", argLength: 1}, // arg0 = integer mask value
|
||||||
|
{name: "Cvt8toMask64x8", argLength: 1}, // arg0 = integer mask value
|
||||||
|
|
||||||
|
// Convert masks to integers
|
||||||
|
{name: "CvtMask8x16to16", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask8x32to32", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask8x64to64", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask16x8to8", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask16x16to16", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask16x32to32", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask32x4to8", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask32x8to8", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask32x16to16", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask64x2to8", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask64x4to8", argLength: 1}, // arg0 = mask
|
||||||
|
{name: "CvtMask64x8to8", argLength: 1}, // arg0 = mask
|
||||||
|
|
||||||
|
// Returns true if arg0 is all zero.
|
||||||
|
{name: "IsZeroVec", argLength: 1},
|
||||||
}
|
}
|
||||||
|
|
||||||
// kind controls successors implicit exit
|
// kind controls successors implicit exit
|
||||||
|
|
@ -693,6 +739,7 @@ var genericBlocks = []blockData{
|
||||||
}
|
}
|
||||||
|
|
||||||
func init() {
|
func init() {
|
||||||
|
genericOps = append(genericOps, simdGenericOps()...)
|
||||||
archs = append(archs, arch{
|
archs = append(archs, arch{
|
||||||
name: "generic",
|
name: "generic",
|
||||||
ops: genericOps,
|
ops: genericOps,
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,7 @@ type arch struct {
|
||||||
name string
|
name string
|
||||||
pkg string // obj package to import for this arch.
|
pkg string // obj package to import for this arch.
|
||||||
genfile string // source file containing opcode code generation.
|
genfile string // source file containing opcode code generation.
|
||||||
|
genSIMDfile string // source file containing opcode code generation for SIMD.
|
||||||
ops []opData
|
ops []opData
|
||||||
blocks []blockData
|
blocks []blockData
|
||||||
regnames []string
|
regnames []string
|
||||||
|
|
@ -547,6 +548,15 @@ func genOp() {
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Fatalf("can't read %s: %v", a.genfile, err)
|
log.Fatalf("can't read %s: %v", a.genfile, err)
|
||||||
}
|
}
|
||||||
|
// Append the file of simd operations, too
|
||||||
|
if a.genSIMDfile != "" {
|
||||||
|
simdSrc, err := os.ReadFile(a.genSIMDfile)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("can't read %s: %v", a.genSIMDfile, err)
|
||||||
|
}
|
||||||
|
src = append(src, simdSrc...)
|
||||||
|
}
|
||||||
|
|
||||||
seen := make(map[string]bool, len(a.ops))
|
seen := make(map[string]bool, len(a.ops))
|
||||||
for _, m := range rxOp.FindAllSubmatch(src, -1) {
|
for _, m := range rxOp.FindAllSubmatch(src, -1) {
|
||||||
seen[string(m[1])] = true
|
seen[string(m[1])] = true
|
||||||
|
|
|
||||||
117
src/cmd/compile/internal/ssa/_gen/multiscanner.go
Normal file
117
src/cmd/compile/internal/ssa/_gen/multiscanner.go
Normal file
|
|
@ -0,0 +1,117 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bufio"
|
||||||
|
"io"
|
||||||
|
)
|
||||||
|
|
||||||
|
// NamedScanner is a simple struct to pair a name with a Scanner.
|
||||||
|
type NamedScanner struct {
|
||||||
|
Name string
|
||||||
|
Scanner *bufio.Scanner
|
||||||
|
}
|
||||||
|
|
||||||
|
// NamedReader is a simple struct to pair a name with a Reader,
|
||||||
|
// which will be converted to a Scanner using bufio.NewScanner.
|
||||||
|
type NamedReader struct {
|
||||||
|
Name string
|
||||||
|
Reader io.Reader
|
||||||
|
}
|
||||||
|
|
||||||
|
// MultiScanner scans over multiple bufio.Scanners as if they were a single stream.
|
||||||
|
// It also keeps track of the name of the current scanner and the line number.
|
||||||
|
type MultiScanner struct {
|
||||||
|
scanners []NamedScanner
|
||||||
|
scannerIdx int
|
||||||
|
line int
|
||||||
|
totalLine int
|
||||||
|
err error
|
||||||
|
}
|
||||||
|
|
||||||
|
// NewMultiScanner creates a new MultiScanner from slice of NamedScanners.
|
||||||
|
func NewMultiScanner(scanners []NamedScanner) *MultiScanner {
|
||||||
|
return &MultiScanner{
|
||||||
|
scanners: scanners,
|
||||||
|
scannerIdx: -1, // Start before the first scanner
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// MultiScannerFromReaders creates a new MultiScanner from a slice of NamedReaders.
|
||||||
|
func MultiScannerFromReaders(readers []NamedReader) *MultiScanner {
|
||||||
|
var scanners []NamedScanner
|
||||||
|
for _, r := range readers {
|
||||||
|
scanners = append(scanners, NamedScanner{
|
||||||
|
Name: r.Name,
|
||||||
|
Scanner: bufio.NewScanner(r.Reader),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
return NewMultiScanner(scanners)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Scan advances the scanner to the next token, which will then be
|
||||||
|
// available through the Text method. It returns false when the scan stops,
|
||||||
|
// either by reaching the end of the input or an error.
|
||||||
|
// After Scan returns false, the Err method will return any error that
|
||||||
|
// occurred during scanning, except that if it was io.EOF, Err
|
||||||
|
// will return nil.
|
||||||
|
func (ms *MultiScanner) Scan() bool {
|
||||||
|
if ms.scannerIdx == -1 {
|
||||||
|
ms.scannerIdx = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
for ms.scannerIdx < len(ms.scanners) {
|
||||||
|
current := ms.scanners[ms.scannerIdx]
|
||||||
|
if current.Scanner.Scan() {
|
||||||
|
ms.line++
|
||||||
|
ms.totalLine++
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
if err := current.Scanner.Err(); err != nil {
|
||||||
|
ms.err = err
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
// Move to the next scanner
|
||||||
|
ms.scannerIdx++
|
||||||
|
ms.line = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// Text returns the most recent token generated by a call to Scan.
|
||||||
|
func (ms *MultiScanner) Text() string {
|
||||||
|
if ms.scannerIdx < 0 || ms.scannerIdx >= len(ms.scanners) {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
return ms.scanners[ms.scannerIdx].Scanner.Text()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Err returns the first non-EOF error that was encountered by the MultiScanner.
|
||||||
|
func (ms *MultiScanner) Err() error {
|
||||||
|
return ms.err
|
||||||
|
}
|
||||||
|
|
||||||
|
// Name returns the name of the current scanner.
|
||||||
|
func (ms *MultiScanner) Name() string {
|
||||||
|
if ms.scannerIdx < 0 {
|
||||||
|
return "<before first>"
|
||||||
|
}
|
||||||
|
if ms.scannerIdx >= len(ms.scanners) {
|
||||||
|
return "<after last>"
|
||||||
|
}
|
||||||
|
return ms.scanners[ms.scannerIdx].Name
|
||||||
|
}
|
||||||
|
|
||||||
|
// Line returns the current line number within the current scanner.
|
||||||
|
func (ms *MultiScanner) Line() int {
|
||||||
|
return ms.line
|
||||||
|
}
|
||||||
|
|
||||||
|
// TotalLine returns the total number of lines scanned across all scanners.
|
||||||
|
func (ms *MultiScanner) TotalLine() int {
|
||||||
|
return ms.totalLine
|
||||||
|
}
|
||||||
|
|
@ -94,8 +94,11 @@ func genSplitLoadRules(arch arch) { genRulesSuffix(arch, "splitload") }
|
||||||
func genLateLowerRules(arch arch) { genRulesSuffix(arch, "latelower") }
|
func genLateLowerRules(arch arch) { genRulesSuffix(arch, "latelower") }
|
||||||
|
|
||||||
func genRulesSuffix(arch arch, suff string) {
|
func genRulesSuffix(arch arch, suff string) {
|
||||||
|
var readers []NamedReader
|
||||||
// Open input file.
|
// Open input file.
|
||||||
text, err := os.Open(arch.name + suff + ".rules")
|
var text io.Reader
|
||||||
|
name := arch.name + suff + ".rules"
|
||||||
|
text, err := os.Open(name)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
if suff == "" {
|
if suff == "" {
|
||||||
// All architectures must have a plain rules file.
|
// All architectures must have a plain rules file.
|
||||||
|
|
@ -104,18 +107,28 @@ func genRulesSuffix(arch arch, suff string) {
|
||||||
// Some architectures have bonus rules files that others don't share. That's fine.
|
// Some architectures have bonus rules files that others don't share. That's fine.
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
readers = append(readers, NamedReader{name, text})
|
||||||
|
|
||||||
|
// Check for file of SIMD rules to add
|
||||||
|
if suff == "" {
|
||||||
|
simdname := "simd" + arch.name + ".rules"
|
||||||
|
simdtext, err := os.Open(simdname)
|
||||||
|
if err == nil {
|
||||||
|
readers = append(readers, NamedReader{simdname, simdtext})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// oprules contains a list of rules for each block and opcode
|
// oprules contains a list of rules for each block and opcode
|
||||||
blockrules := map[string][]Rule{}
|
blockrules := map[string][]Rule{}
|
||||||
oprules := map[string][]Rule{}
|
oprules := map[string][]Rule{}
|
||||||
|
|
||||||
// read rule file
|
// read rule file
|
||||||
scanner := bufio.NewScanner(text)
|
scanner := MultiScannerFromReaders(readers)
|
||||||
rule := ""
|
rule := ""
|
||||||
var lineno int
|
var lineno int
|
||||||
var ruleLineno int // line number of "=>"
|
var ruleLineno int // line number of "=>"
|
||||||
for scanner.Scan() {
|
for scanner.Scan() {
|
||||||
lineno++
|
lineno = scanner.Line()
|
||||||
line := scanner.Text()
|
line := scanner.Text()
|
||||||
if i := strings.Index(line, "//"); i >= 0 {
|
if i := strings.Index(line, "//"); i >= 0 {
|
||||||
// Remove comments. Note that this isn't string safe, so
|
// Remove comments. Note that this isn't string safe, so
|
||||||
|
|
@ -142,7 +155,7 @@ func genRulesSuffix(arch arch, suff string) {
|
||||||
break // continuing the line can't help, and it will only make errors worse
|
break // continuing the line can't help, and it will only make errors worse
|
||||||
}
|
}
|
||||||
|
|
||||||
loc := fmt.Sprintf("%s%s.rules:%d", arch.name, suff, ruleLineno)
|
loc := fmt.Sprintf("%s:%d", scanner.Name(), ruleLineno)
|
||||||
for _, rule2 := range expandOr(rule) {
|
for _, rule2 := range expandOr(rule) {
|
||||||
r := Rule{Rule: rule2, Loc: loc}
|
r := Rule{Rule: rule2, Loc: loc}
|
||||||
if rawop := strings.Split(rule2, " ")[0][1:]; isBlock(rawop, arch) {
|
if rawop := strings.Split(rule2, " ")[0][1:]; isBlock(rawop, arch) {
|
||||||
|
|
@ -162,7 +175,7 @@ func genRulesSuffix(arch arch, suff string) {
|
||||||
log.Fatalf("scanner failed: %v\n", err)
|
log.Fatalf("scanner failed: %v\n", err)
|
||||||
}
|
}
|
||||||
if balance(rule) != 0 {
|
if balance(rule) != 0 {
|
||||||
log.Fatalf("%s.rules:%d: unbalanced rule: %v\n", arch.name, lineno, rule)
|
log.Fatalf("%s:%d: unbalanced rule: %v\n", scanner.Name(), lineno, rule)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Order all the ops.
|
// Order all the ops.
|
||||||
|
|
@ -862,7 +875,7 @@ func declReserved(name, value string) *Declare {
|
||||||
if !reservedNames[name] {
|
if !reservedNames[name] {
|
||||||
panic(fmt.Sprintf("declReserved call does not use a reserved name: %q", name))
|
panic(fmt.Sprintf("declReserved call does not use a reserved name: %q", name))
|
||||||
}
|
}
|
||||||
return &Declare{name, exprf(value)}
|
return &Declare{name, exprf("%s", value)}
|
||||||
}
|
}
|
||||||
|
|
||||||
// breakf constructs a simple "if cond { break }" statement, using exprf for its
|
// breakf constructs a simple "if cond { break }" statement, using exprf for its
|
||||||
|
|
@ -889,7 +902,7 @@ func genBlockRewrite(rule Rule, arch arch, data blockData) *RuleRewrite {
|
||||||
if vname == "" {
|
if vname == "" {
|
||||||
vname = fmt.Sprintf("v_%v", i)
|
vname = fmt.Sprintf("v_%v", i)
|
||||||
}
|
}
|
||||||
rr.add(declf(rr.Loc, vname, cname))
|
rr.add(declf(rr.Loc, vname, "%s", cname))
|
||||||
p, op := genMatch0(rr, arch, expr, vname, nil, false) // TODO: pass non-nil cnt?
|
p, op := genMatch0(rr, arch, expr, vname, nil, false) // TODO: pass non-nil cnt?
|
||||||
if op != "" {
|
if op != "" {
|
||||||
check := fmt.Sprintf("%s.Op == %s", cname, op)
|
check := fmt.Sprintf("%s.Op == %s", cname, op)
|
||||||
|
|
@ -904,7 +917,7 @@ func genBlockRewrite(rule Rule, arch arch, data blockData) *RuleRewrite {
|
||||||
}
|
}
|
||||||
pos[i] = p
|
pos[i] = p
|
||||||
} else {
|
} else {
|
||||||
rr.add(declf(rr.Loc, arg, cname))
|
rr.add(declf(rr.Loc, arg, "%s", cname))
|
||||||
pos[i] = arg + ".Pos"
|
pos[i] = arg + ".Pos"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
2889
src/cmd/compile/internal/ssa/_gen/simdAMD64.rules
Normal file
2889
src/cmd/compile/internal/ssa/_gen/simdAMD64.rules
Normal file
File diff suppressed because it is too large
Load diff
2423
src/cmd/compile/internal/ssa/_gen/simdAMD64ops.go
Normal file
2423
src/cmd/compile/internal/ssa/_gen/simdAMD64ops.go
Normal file
File diff suppressed because it is too large
Load diff
1310
src/cmd/compile/internal/ssa/_gen/simdgenericOps.go
Normal file
1310
src/cmd/compile/internal/ssa/_gen/simdgenericOps.go
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -18,6 +18,9 @@ type Block struct {
|
||||||
// Source position for block's control operation
|
// Source position for block's control operation
|
||||||
Pos src.XPos
|
Pos src.XPos
|
||||||
|
|
||||||
|
// What cpu features (AVXnnn, SVEyyy) are implied to reach/execute this block?
|
||||||
|
CPUfeatures CPUfeatures
|
||||||
|
|
||||||
// The kind of block this is.
|
// The kind of block this is.
|
||||||
Kind BlockKind
|
Kind BlockKind
|
||||||
|
|
||||||
|
|
@ -449,3 +452,57 @@ const (
|
||||||
HotPgoInitial = HotPgo | HotInitial // special case; single block loop, initial block is header block has a flow-in entry, but PGO says it is hot
|
HotPgoInitial = HotPgo | HotInitial // special case; single block loop, initial block is header block has a flow-in entry, but PGO says it is hot
|
||||||
HotPgoInitialNotFLowIn = HotPgo | HotInitial | HotNotFlowIn // PGO says it is hot, and the loop is rotated so flow enters loop with a branch
|
HotPgoInitialNotFLowIn = HotPgo | HotInitial | HotNotFlowIn // PGO says it is hot, and the loop is rotated so flow enters loop with a branch
|
||||||
)
|
)
|
||||||
|
|
||||||
|
type CPUfeatures uint32
|
||||||
|
|
||||||
|
const (
|
||||||
|
CPUNone CPUfeatures = 0
|
||||||
|
CPUAll CPUfeatures = ^CPUfeatures(0)
|
||||||
|
CPUavx CPUfeatures = 1 << iota
|
||||||
|
CPUavx2
|
||||||
|
CPUavxvnni
|
||||||
|
CPUavx512
|
||||||
|
CPUbitalg
|
||||||
|
CPUgfni
|
||||||
|
CPUvbmi
|
||||||
|
CPUvbmi2
|
||||||
|
CPUvpopcntdq
|
||||||
|
CPUavx512vnni
|
||||||
|
|
||||||
|
CPUneon
|
||||||
|
CPUsve2
|
||||||
|
)
|
||||||
|
|
||||||
|
func (f CPUfeatures) hasFeature(x CPUfeatures) bool {
|
||||||
|
return f&x == x
|
||||||
|
}
|
||||||
|
|
||||||
|
func (f CPUfeatures) String() string {
|
||||||
|
if f == CPUNone {
|
||||||
|
return "none"
|
||||||
|
}
|
||||||
|
if f == CPUAll {
|
||||||
|
return "all"
|
||||||
|
}
|
||||||
|
s := ""
|
||||||
|
foo := func(what string, feat CPUfeatures) {
|
||||||
|
if feat&f != 0 {
|
||||||
|
if s != "" {
|
||||||
|
s += "+"
|
||||||
|
}
|
||||||
|
s += what
|
||||||
|
}
|
||||||
|
}
|
||||||
|
foo("avx", CPUavx)
|
||||||
|
foo("avx2", CPUavx2)
|
||||||
|
foo("avx512", CPUavx512)
|
||||||
|
foo("avxvnni", CPUavxvnni)
|
||||||
|
foo("bitalg", CPUbitalg)
|
||||||
|
foo("gfni", CPUgfni)
|
||||||
|
foo("vbmi", CPUvbmi)
|
||||||
|
foo("vbmi2", CPUvbmi2)
|
||||||
|
foo("popcntdq", CPUvpopcntdq)
|
||||||
|
foo("avx512vnni", CPUavx512vnni)
|
||||||
|
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -150,8 +150,9 @@ func checkFunc(f *Func) {
|
||||||
case auxInt128:
|
case auxInt128:
|
||||||
// AuxInt must be zero, so leave canHaveAuxInt set to false.
|
// AuxInt must be zero, so leave canHaveAuxInt set to false.
|
||||||
case auxUInt8:
|
case auxUInt8:
|
||||||
if v.AuxInt != int64(uint8(v.AuxInt)) {
|
// Cast to int8 due to requirement of AuxInt, check its comment for details.
|
||||||
f.Fatalf("bad uint8 AuxInt value for %v", v)
|
if v.AuxInt != int64(int8(v.AuxInt)) {
|
||||||
|
f.Fatalf("bad uint8 AuxInt value for %v, saw %d but need %d", v, v.AuxInt, int64(int8(v.AuxInt)))
|
||||||
}
|
}
|
||||||
canHaveAuxInt = true
|
canHaveAuxInt = true
|
||||||
case auxFloat32:
|
case auxFloat32:
|
||||||
|
|
|
||||||
|
|
@ -488,6 +488,8 @@ var passes = [...]pass{
|
||||||
{name: "writebarrier", fn: writebarrier, required: true}, // expand write barrier ops
|
{name: "writebarrier", fn: writebarrier, required: true}, // expand write barrier ops
|
||||||
{name: "insert resched checks", fn: insertLoopReschedChecks,
|
{name: "insert resched checks", fn: insertLoopReschedChecks,
|
||||||
disabled: !buildcfg.Experiment.PreemptibleLoops}, // insert resched checks in loops.
|
disabled: !buildcfg.Experiment.PreemptibleLoops}, // insert resched checks in loops.
|
||||||
|
{name: "cpufeatures", fn: cpufeatures, required: buildcfg.Experiment.SIMD, disabled: !buildcfg.Experiment.SIMD},
|
||||||
|
{name: "rewrite tern", fn: rewriteTern, required: false, disabled: !buildcfg.Experiment.SIMD},
|
||||||
{name: "lower", fn: lower, required: true},
|
{name: "lower", fn: lower, required: true},
|
||||||
{name: "addressing modes", fn: addressingModes, required: false},
|
{name: "addressing modes", fn: addressingModes, required: false},
|
||||||
{name: "late lower", fn: lateLower, required: true},
|
{name: "late lower", fn: lateLower, required: true},
|
||||||
|
|
@ -596,6 +598,8 @@ var passOrder = [...]constraint{
|
||||||
{"branchelim", "late opt"},
|
{"branchelim", "late opt"},
|
||||||
// branchelim is an arch-independent pass.
|
// branchelim is an arch-independent pass.
|
||||||
{"branchelim", "lower"},
|
{"branchelim", "lower"},
|
||||||
|
// lower needs cpu feature information (for SIMD)
|
||||||
|
{"cpufeatures", "lower"},
|
||||||
}
|
}
|
||||||
|
|
||||||
func init() {
|
func init() {
|
||||||
|
|
|
||||||
|
|
@ -88,6 +88,10 @@ type Types struct {
|
||||||
Float32Ptr *types.Type
|
Float32Ptr *types.Type
|
||||||
Float64Ptr *types.Type
|
Float64Ptr *types.Type
|
||||||
BytePtrPtr *types.Type
|
BytePtrPtr *types.Type
|
||||||
|
Vec128 *types.Type
|
||||||
|
Vec256 *types.Type
|
||||||
|
Vec512 *types.Type
|
||||||
|
Mask *types.Type
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewTypes creates and populates a Types.
|
// NewTypes creates and populates a Types.
|
||||||
|
|
@ -122,6 +126,10 @@ func (t *Types) SetTypPtrs() {
|
||||||
t.Float32Ptr = types.NewPtr(types.Types[types.TFLOAT32])
|
t.Float32Ptr = types.NewPtr(types.Types[types.TFLOAT32])
|
||||||
t.Float64Ptr = types.NewPtr(types.Types[types.TFLOAT64])
|
t.Float64Ptr = types.NewPtr(types.Types[types.TFLOAT64])
|
||||||
t.BytePtrPtr = types.NewPtr(types.NewPtr(types.Types[types.TUINT8]))
|
t.BytePtrPtr = types.NewPtr(types.NewPtr(types.Types[types.TUINT8]))
|
||||||
|
t.Vec128 = types.TypeVec128
|
||||||
|
t.Vec256 = types.TypeVec256
|
||||||
|
t.Vec512 = types.TypeVec512
|
||||||
|
t.Mask = types.TypeMask
|
||||||
}
|
}
|
||||||
|
|
||||||
type Logger interface {
|
type Logger interface {
|
||||||
|
|
|
||||||
262
src/cmd/compile/internal/ssa/cpufeatures.go
Normal file
262
src/cmd/compile/internal/ssa/cpufeatures.go
Normal file
|
|
@ -0,0 +1,262 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package ssa
|
||||||
|
|
||||||
|
import (
|
||||||
|
"cmd/compile/internal/types"
|
||||||
|
"cmd/internal/obj"
|
||||||
|
"fmt"
|
||||||
|
"internal/goarch"
|
||||||
|
)
|
||||||
|
|
||||||
|
type localEffect struct {
|
||||||
|
start CPUfeatures // features present at beginning of block
|
||||||
|
internal CPUfeatures // features implied by execution of block
|
||||||
|
end [2]CPUfeatures // for BlockIf, features present on outgoing edges
|
||||||
|
visited bool // On the first iteration this will be false for backedges.
|
||||||
|
}
|
||||||
|
|
||||||
|
func (e localEffect) String() string {
|
||||||
|
return fmt.Sprintf("visited=%v, start=%v, internal=%v, end[0]=%v, end[1]=%v", e.visited, e.start, e.internal, e.end[0], e.end[1])
|
||||||
|
}
|
||||||
|
|
||||||
|
// ifEffect pattern matches for a BlockIf conditional on a load
|
||||||
|
// of a field from internal/cpu.X86 and returns the corresponding
|
||||||
|
// effect.
|
||||||
|
func ifEffect(b *Block) (features CPUfeatures, taken int) {
|
||||||
|
// TODO generalize for other architectures.
|
||||||
|
if b.Kind != BlockIf {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
c := b.Controls[0]
|
||||||
|
|
||||||
|
if c.Op == OpNot {
|
||||||
|
taken = 1
|
||||||
|
c = c.Args[0]
|
||||||
|
}
|
||||||
|
if c.Op != OpLoad {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
offPtr := c.Args[0]
|
||||||
|
if offPtr.Op != OpOffPtr {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
addr := offPtr.Args[0]
|
||||||
|
if addr.Op != OpAddr || addr.Args[0].Op != OpSB {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
sym := addr.Aux.(*obj.LSym)
|
||||||
|
if sym.Name != "internal/cpu.X86" {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
o := offPtr.AuxInt
|
||||||
|
t := addr.Type
|
||||||
|
if !t.IsPtr() {
|
||||||
|
b.Func.Fatalf("The symbol %s is not a pointer, found %v instead", sym.Name, t)
|
||||||
|
}
|
||||||
|
t = t.Elem()
|
||||||
|
if !t.IsStruct() {
|
||||||
|
b.Func.Fatalf("The referent of symbol %s is not a struct, found %v instead", sym.Name, t)
|
||||||
|
}
|
||||||
|
match := ""
|
||||||
|
for _, f := range t.Fields() {
|
||||||
|
if o == f.Offset && f.Sym != nil {
|
||||||
|
match = f.Sym.Name
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
switch match {
|
||||||
|
|
||||||
|
case "HasAVX":
|
||||||
|
features = CPUavx
|
||||||
|
case "HasAVXVNNI":
|
||||||
|
features = CPUavx | CPUavxvnni
|
||||||
|
case "HasAVX2":
|
||||||
|
features = CPUavx2 | CPUavx
|
||||||
|
|
||||||
|
// Compiler currently treats these all alike.
|
||||||
|
case "HasAVX512", "HasAVX512F", "HasAVX512CD", "HasAVX512BW",
|
||||||
|
"HasAVX512DQ", "HasAVX512VL", "HasAVX512VPCLMULQDQ":
|
||||||
|
features = CPUavx512 | CPUavx2 | CPUavx
|
||||||
|
|
||||||
|
case "HasAVX512GFNI":
|
||||||
|
features = CPUavx512 | CPUgfni | CPUavx2 | CPUavx
|
||||||
|
case "HasAVX512VNNI":
|
||||||
|
features = CPUavx512 | CPUavx512vnni | CPUavx2 | CPUavx
|
||||||
|
case "HasAVX512VBMI":
|
||||||
|
features = CPUavx512 | CPUvbmi | CPUavx2 | CPUavx
|
||||||
|
case "HasAVX512VBMI2":
|
||||||
|
features = CPUavx512 | CPUvbmi2 | CPUavx2 | CPUavx
|
||||||
|
case "HasAVX512BITALG":
|
||||||
|
features = CPUavx512 | CPUbitalg | CPUavx2 | CPUavx
|
||||||
|
case "HasAVX512VPOPCNTDQ":
|
||||||
|
features = CPUavx512 | CPUvpopcntdq | CPUavx2 | CPUavx
|
||||||
|
|
||||||
|
case "HasBMI1":
|
||||||
|
features = CPUvbmi
|
||||||
|
case "HasBMI2":
|
||||||
|
features = CPUvbmi2
|
||||||
|
|
||||||
|
// Features that are not currently interesting to the compiler.
|
||||||
|
case "HasAES", "HasADX", "HasERMS", "HasFSRM", "HasFMA", "HasGFNI", "HasOSXSAVE",
|
||||||
|
"HasPCLMULQDQ", "HasPOPCNT", "HasRDTSCP", "HasSHA",
|
||||||
|
"HasSSE3", "HasSSSE3", "HasSSE41", "HasSSE42":
|
||||||
|
|
||||||
|
}
|
||||||
|
if b.Func.pass.debug > 2 {
|
||||||
|
b.Func.Warnl(b.Pos, "%s, block b%v has features offset %d, match is %s, features is %v", b.Func.Name, b.ID, o, match, features)
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func cpufeatures(f *Func) {
|
||||||
|
arch := f.Config.Ctxt().Arch.Family
|
||||||
|
// TODO there are other SIMD architectures
|
||||||
|
if arch != goarch.AMD64 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
po := f.Postorder()
|
||||||
|
|
||||||
|
effects := make([]localEffect, 1+f.NumBlocks(), 1+f.NumBlocks())
|
||||||
|
|
||||||
|
features := func(t *types.Type) CPUfeatures {
|
||||||
|
if t.IsSIMD() {
|
||||||
|
switch t.Size() {
|
||||||
|
case 16, 32:
|
||||||
|
return CPUavx
|
||||||
|
case 64:
|
||||||
|
return CPUavx512 | CPUavx2 | CPUavx
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return CPUNone
|
||||||
|
}
|
||||||
|
|
||||||
|
// visit blocks in reverse post order
|
||||||
|
// when b is visited, all of its predecessors (except for loop back edges)
|
||||||
|
// will have been visited
|
||||||
|
for i := len(po) - 1; i >= 0; i-- {
|
||||||
|
b := po[i]
|
||||||
|
|
||||||
|
var feat CPUfeatures
|
||||||
|
|
||||||
|
if b == f.Entry {
|
||||||
|
// Check the types of inputs and outputs, as well as annotations.
|
||||||
|
// Start with none and union all that is implied by all the types seen.
|
||||||
|
if f.Type != nil { // a problem for SSA tests
|
||||||
|
for _, field := range f.Type.RecvParamsResults() {
|
||||||
|
feat |= features(field.Type)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
} else {
|
||||||
|
// Start with all and intersect over predecessors
|
||||||
|
feat = CPUAll
|
||||||
|
for _, p := range b.Preds {
|
||||||
|
pb := p.Block()
|
||||||
|
if !effects[pb.ID].visited {
|
||||||
|
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
pi := p.Index()
|
||||||
|
if pb.Kind != BlockIf {
|
||||||
|
pi = 0
|
||||||
|
}
|
||||||
|
|
||||||
|
feat &= effects[pb.ID].end[pi]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
e := localEffect{start: feat, visited: true}
|
||||||
|
|
||||||
|
// Separately capture the internal effects of this block
|
||||||
|
var internal CPUfeatures
|
||||||
|
for _, v := range b.Values {
|
||||||
|
// the rule applied here is, if the block contains any
|
||||||
|
// instruction that would fault if the feature (avx, avx512)
|
||||||
|
// were not present, then assume that the feature is present
|
||||||
|
// for all the instructions in the block, a fault is a fault.
|
||||||
|
t := v.Type
|
||||||
|
if t.IsResults() {
|
||||||
|
for i := 0; i < t.NumFields(); i++ {
|
||||||
|
feat |= features(t.FieldType(i))
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
internal |= features(v.Type)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
e.internal = internal
|
||||||
|
feat |= internal
|
||||||
|
|
||||||
|
branchEffect, taken := ifEffect(b)
|
||||||
|
e.end = [2]CPUfeatures{feat, feat}
|
||||||
|
e.end[taken] |= branchEffect
|
||||||
|
|
||||||
|
effects[b.ID] = e
|
||||||
|
if f.pass.debug > 1 && feat != CPUNone {
|
||||||
|
f.Warnl(b.Pos, "%s, block b%v has features %v", b.Func.Name, b.ID, feat)
|
||||||
|
}
|
||||||
|
|
||||||
|
b.CPUfeatures = feat
|
||||||
|
f.maxCPUFeatures |= feat // not necessary to refine this estimate below
|
||||||
|
}
|
||||||
|
|
||||||
|
// If the flow graph is irreducible, things can still change on backedges.
|
||||||
|
change := true
|
||||||
|
for change {
|
||||||
|
change = false
|
||||||
|
for i := len(po) - 1; i >= 0; i-- {
|
||||||
|
b := po[i]
|
||||||
|
|
||||||
|
if b == f.Entry {
|
||||||
|
continue // cannot change
|
||||||
|
}
|
||||||
|
feat := CPUAll
|
||||||
|
for _, p := range b.Preds {
|
||||||
|
pb := p.Block()
|
||||||
|
pi := p.Index()
|
||||||
|
if pb.Kind != BlockIf {
|
||||||
|
pi = 0
|
||||||
|
}
|
||||||
|
feat &= effects[pb.ID].end[pi]
|
||||||
|
}
|
||||||
|
e := effects[b.ID]
|
||||||
|
if feat == e.start {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
e.start = feat
|
||||||
|
effects[b.ID] = e
|
||||||
|
// uh-oh, something changed
|
||||||
|
if f.pass.debug > 1 {
|
||||||
|
f.Warnl(b.Pos, "%s, block b%v saw predecessor feature change", b.Func.Name, b.ID)
|
||||||
|
}
|
||||||
|
|
||||||
|
feat |= e.internal
|
||||||
|
if feat == e.end[0]&e.end[1] {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
branchEffect, taken := ifEffect(b)
|
||||||
|
e.end = [2]CPUfeatures{feat, feat}
|
||||||
|
e.end[taken] |= branchEffect
|
||||||
|
|
||||||
|
effects[b.ID] = e
|
||||||
|
b.CPUfeatures = feat
|
||||||
|
if f.pass.debug > 1 {
|
||||||
|
f.Warnl(b.Pos, "%s, block b%v has new features %v", b.Func.Name, b.ID, feat)
|
||||||
|
}
|
||||||
|
change = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if f.pass.debug > 0 {
|
||||||
|
for _, b := range f.Blocks {
|
||||||
|
if b.CPUfeatures != CPUNone {
|
||||||
|
f.Warnl(b.Pos, "%s, block b%v has features %v", b.Func.Name, b.ID, b.CPUfeatures)
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -100,7 +100,7 @@ func decomposeBuiltin(f *Func) {
|
||||||
}
|
}
|
||||||
case t.IsFloat():
|
case t.IsFloat():
|
||||||
// floats are never decomposed, even ones bigger than RegSize
|
// floats are never decomposed, even ones bigger than RegSize
|
||||||
case t.Size() > f.Config.RegSize:
|
case t.Size() > f.Config.RegSize && !t.IsSIMD():
|
||||||
f.Fatalf("undecomposed named type %s %v", name, t)
|
f.Fatalf("undecomposed named type %s %v", name, t)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -135,7 +135,7 @@ func decomposeBuiltinPhi(v *Value) {
|
||||||
decomposeInterfacePhi(v)
|
decomposeInterfacePhi(v)
|
||||||
case v.Type.IsFloat():
|
case v.Type.IsFloat():
|
||||||
// floats are never decomposed, even ones bigger than RegSize
|
// floats are never decomposed, even ones bigger than RegSize
|
||||||
case v.Type.Size() > v.Block.Func.Config.RegSize:
|
case v.Type.Size() > v.Block.Func.Config.RegSize && !v.Type.IsSIMD():
|
||||||
v.Fatalf("%v undecomposed type %v", v, v.Type)
|
v.Fatalf("%v undecomposed type %v", v, v.Type)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -248,7 +248,7 @@ func decomposeUser(f *Func) {
|
||||||
for _, name := range f.Names {
|
for _, name := range f.Names {
|
||||||
t := name.Type
|
t := name.Type
|
||||||
switch {
|
switch {
|
||||||
case t.IsStruct():
|
case isStructNotSIMD(t):
|
||||||
newNames = decomposeUserStructInto(f, name, newNames)
|
newNames = decomposeUserStructInto(f, name, newNames)
|
||||||
case t.IsArray():
|
case t.IsArray():
|
||||||
newNames = decomposeUserArrayInto(f, name, newNames)
|
newNames = decomposeUserArrayInto(f, name, newNames)
|
||||||
|
|
@ -293,7 +293,7 @@ func decomposeUserArrayInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Loc
|
||||||
|
|
||||||
if t.Elem().IsArray() {
|
if t.Elem().IsArray() {
|
||||||
return decomposeUserArrayInto(f, elemName, slots)
|
return decomposeUserArrayInto(f, elemName, slots)
|
||||||
} else if t.Elem().IsStruct() {
|
} else if isStructNotSIMD(t.Elem()) {
|
||||||
return decomposeUserStructInto(f, elemName, slots)
|
return decomposeUserStructInto(f, elemName, slots)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -313,7 +313,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
|
||||||
fnames = append(fnames, fs)
|
fnames = append(fnames, fs)
|
||||||
// arrays and structs will be decomposed further, so
|
// arrays and structs will be decomposed further, so
|
||||||
// there's no need to record a name
|
// there's no need to record a name
|
||||||
if !fs.Type.IsArray() && !fs.Type.IsStruct() {
|
if !fs.Type.IsArray() && !isStructNotSIMD(fs.Type) {
|
||||||
slots = maybeAppend(f, slots, fs)
|
slots = maybeAppend(f, slots, fs)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -339,7 +339,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
|
||||||
// now that this f.NamedValues contains values for the struct
|
// now that this f.NamedValues contains values for the struct
|
||||||
// fields, recurse into nested structs
|
// fields, recurse into nested structs
|
||||||
for i := 0; i < n; i++ {
|
for i := 0; i < n; i++ {
|
||||||
if name.Type.FieldType(i).IsStruct() {
|
if isStructNotSIMD(name.Type.FieldType(i)) {
|
||||||
slots = decomposeUserStructInto(f, fnames[i], slots)
|
slots = decomposeUserStructInto(f, fnames[i], slots)
|
||||||
delete(f.NamedValues, *fnames[i])
|
delete(f.NamedValues, *fnames[i])
|
||||||
} else if name.Type.FieldType(i).IsArray() {
|
} else if name.Type.FieldType(i).IsArray() {
|
||||||
|
|
@ -351,7 +351,7 @@ func decomposeUserStructInto(f *Func, name *LocalSlot, slots []*LocalSlot) []*Lo
|
||||||
}
|
}
|
||||||
func decomposeUserPhi(v *Value) {
|
func decomposeUserPhi(v *Value) {
|
||||||
switch {
|
switch {
|
||||||
case v.Type.IsStruct():
|
case isStructNotSIMD(v.Type):
|
||||||
decomposeStructPhi(v)
|
decomposeStructPhi(v)
|
||||||
case v.Type.IsArray():
|
case v.Type.IsArray():
|
||||||
decomposeArrayPhi(v)
|
decomposeArrayPhi(v)
|
||||||
|
|
@ -458,3 +458,7 @@ func deleteNamedVals(f *Func, toDelete []namedVal) {
|
||||||
}
|
}
|
||||||
f.Names = f.Names[:end]
|
f.Names = f.Names[:end]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func isStructNotSIMD(t *types.Type) bool {
|
||||||
|
return t.IsStruct() && !t.IsSIMD()
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -396,6 +396,9 @@ func (x *expandState) decomposeAsNecessary(pos src.XPos, b *Block, a, m0 *Value,
|
||||||
return mem
|
return mem
|
||||||
|
|
||||||
case types.TSTRUCT:
|
case types.TSTRUCT:
|
||||||
|
if at.IsSIMD() {
|
||||||
|
break // XXX
|
||||||
|
}
|
||||||
for i := 0; i < at.NumFields(); i++ {
|
for i := 0; i < at.NumFields(); i++ {
|
||||||
et := at.Field(i).Type // might need to read offsets from the fields
|
et := at.Field(i).Type // might need to read offsets from the fields
|
||||||
e := b.NewValue1I(pos, OpStructSelect, et, int64(i), a)
|
e := b.NewValue1I(pos, OpStructSelect, et, int64(i), a)
|
||||||
|
|
@ -551,6 +554,9 @@ func (x *expandState) rewriteSelectOrArg(pos src.XPos, b *Block, container, a, m
|
||||||
|
|
||||||
case types.TSTRUCT:
|
case types.TSTRUCT:
|
||||||
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
|
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
|
||||||
|
if at.IsSIMD() {
|
||||||
|
break // XXX
|
||||||
|
}
|
||||||
for i := 0; i < at.NumFields(); i++ {
|
for i := 0; i < at.NumFields(); i++ {
|
||||||
et := at.Field(i).Type
|
et := at.Field(i).Type
|
||||||
e := x.rewriteSelectOrArg(pos, b, container, nil, m0, et, rc.next(et))
|
e := x.rewriteSelectOrArg(pos, b, container, nil, m0, et, rc.next(et))
|
||||||
|
|
@ -717,6 +723,9 @@ func (x *expandState) rewriteWideSelectToStores(pos src.XPos, b *Block, containe
|
||||||
|
|
||||||
case types.TSTRUCT:
|
case types.TSTRUCT:
|
||||||
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
|
// Assume ssagen/ssa.go (in buildssa) spills large aggregates so they won't appear here.
|
||||||
|
if at.IsSIMD() {
|
||||||
|
break // XXX
|
||||||
|
}
|
||||||
for i := 0; i < at.NumFields(); i++ {
|
for i := 0; i < at.NumFields(); i++ {
|
||||||
et := at.Field(i).Type
|
et := at.Field(i).Type
|
||||||
m0 = x.rewriteWideSelectToStores(pos, b, container, m0, et, rc.next(et))
|
m0 = x.rewriteWideSelectToStores(pos, b, container, m0, et, rc.next(et))
|
||||||
|
|
|
||||||
|
|
@ -41,6 +41,8 @@ type Func struct {
|
||||||
ABISelf *abi.ABIConfig // ABI for function being compiled
|
ABISelf *abi.ABIConfig // ABI for function being compiled
|
||||||
ABIDefault *abi.ABIConfig // ABI for rtcall and other no-parsed-signature/pragma functions.
|
ABIDefault *abi.ABIConfig // ABI for rtcall and other no-parsed-signature/pragma functions.
|
||||||
|
|
||||||
|
maxCPUFeatures CPUfeatures // union of all the CPU features in all the blocks.
|
||||||
|
|
||||||
scheduled bool // Values in Blocks are in final order
|
scheduled bool // Values in Blocks are in final order
|
||||||
laidout bool // Blocks are ordered
|
laidout bool // Blocks are ordered
|
||||||
NoSplit bool // true if function is marked as nosplit. Used by schedule check pass.
|
NoSplit bool // true if function is marked as nosplit. Used by schedule check pass.
|
||||||
|
|
@ -632,6 +634,19 @@ func (b *Block) NewValue4(pos src.XPos, op Op, t *types.Type, arg0, arg1, arg2,
|
||||||
return v
|
return v
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// NewValue4A returns a new value in the block with four arguments and zero aux values.
|
||||||
|
func (b *Block) NewValue4A(pos src.XPos, op Op, t *types.Type, aux Aux, arg0, arg1, arg2, arg3 *Value) *Value {
|
||||||
|
v := b.Func.newValue(op, t, b, pos)
|
||||||
|
v.AuxInt = 0
|
||||||
|
v.Aux = aux
|
||||||
|
v.Args = []*Value{arg0, arg1, arg2, arg3}
|
||||||
|
arg0.Uses++
|
||||||
|
arg1.Uses++
|
||||||
|
arg2.Uses++
|
||||||
|
arg3.Uses++
|
||||||
|
return v
|
||||||
|
}
|
||||||
|
|
||||||
// NewValue4I returns a new value in the block with four arguments and auxint value.
|
// NewValue4I returns a new value in the block with four arguments and auxint value.
|
||||||
func (b *Block) NewValue4I(pos src.XPos, op Op, t *types.Type, auxint int64, arg0, arg1, arg2, arg3 *Value) *Value {
|
func (b *Block) NewValue4I(pos src.XPos, op Op, t *types.Type, auxint int64, arg0, arg1, arg2, arg3 *Value) *Value {
|
||||||
v := b.Func.newValue(op, t, b, pos)
|
v := b.Func.newValue(op, t, b, pos)
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -931,6 +931,14 @@ func (s *regAllocState) compatRegs(t *types.Type) regMask {
|
||||||
if t.IsTuple() || t.IsFlags() {
|
if t.IsTuple() || t.IsFlags() {
|
||||||
return 0
|
return 0
|
||||||
}
|
}
|
||||||
|
if t.IsSIMD() {
|
||||||
|
if t.Size() > 8 {
|
||||||
|
return s.f.Config.fpRegMask & s.allocatable
|
||||||
|
} else {
|
||||||
|
// K mask
|
||||||
|
return s.f.Config.gpRegMask & s.allocatable
|
||||||
|
}
|
||||||
|
}
|
||||||
if t.IsFloat() || t == types.TypeInt128 {
|
if t.IsFloat() || t == types.TypeInt128 {
|
||||||
if t.Kind() == types.TFLOAT32 && s.f.Config.fp32RegMask != 0 {
|
if t.Kind() == types.TFLOAT32 && s.f.Config.fp32RegMask != 0 {
|
||||||
m = s.f.Config.fp32RegMask
|
m = s.f.Config.fp32RegMask
|
||||||
|
|
@ -1439,6 +1447,13 @@ func (s *regAllocState) regalloc(f *Func) {
|
||||||
s.sb = v.ID
|
s.sb = v.ID
|
||||||
case OpARM64ZERO, OpLOONG64ZERO, OpMIPS64ZERO:
|
case OpARM64ZERO, OpLOONG64ZERO, OpMIPS64ZERO:
|
||||||
s.assignReg(s.ZeroIntReg, v, v)
|
s.assignReg(s.ZeroIntReg, v, v)
|
||||||
|
case OpAMD64Zero128, OpAMD64Zero256, OpAMD64Zero512:
|
||||||
|
regspec := s.regspec(v)
|
||||||
|
m := regspec.outputs[0].regs
|
||||||
|
if countRegs(m) != 1 {
|
||||||
|
f.Fatalf("bad fixed-register op %s", v)
|
||||||
|
}
|
||||||
|
s.assignReg(pickReg(m), v, v)
|
||||||
default:
|
default:
|
||||||
f.Fatalf("unknown fixed-register op %s", v)
|
f.Fatalf("unknown fixed-register op %s", v)
|
||||||
}
|
}
|
||||||
|
|
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -12416,11 +12416,11 @@ func rewriteValuegeneric_OpLoad(v *Value) bool {
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
// match: (Load <t> _ _)
|
// match: (Load <t> _ _)
|
||||||
// cond: t.IsStruct() && CanSSA(t)
|
// cond: t.IsStruct() && CanSSA(t) && !t.IsSIMD()
|
||||||
// result: rewriteStructLoad(v)
|
// result: rewriteStructLoad(v)
|
||||||
for {
|
for {
|
||||||
t := v.Type
|
t := v.Type
|
||||||
if !(t.IsStruct() && CanSSA(t)) {
|
if !(t.IsStruct() && CanSSA(t) && !t.IsSIMD()) {
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
v.copyOf(rewriteStructLoad(v))
|
v.copyOf(rewriteStructLoad(v))
|
||||||
|
|
|
||||||
292
src/cmd/compile/internal/ssa/rewritetern.go
Normal file
292
src/cmd/compile/internal/ssa/rewritetern.go
Normal file
|
|
@ -0,0 +1,292 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package ssa
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"internal/goarch"
|
||||||
|
"slices"
|
||||||
|
)
|
||||||
|
|
||||||
|
var truthTableValues [3]uint8 = [3]uint8{0b1111_0000, 0b1100_1100, 0b1010_1010}
|
||||||
|
|
||||||
|
func (slop SIMDLogicalOP) String() string {
|
||||||
|
if slop == sloInterior {
|
||||||
|
return "leaf"
|
||||||
|
}
|
||||||
|
interior := ""
|
||||||
|
if slop&sloInterior != 0 {
|
||||||
|
interior = "+interior"
|
||||||
|
}
|
||||||
|
switch slop &^ sloInterior {
|
||||||
|
case sloAnd:
|
||||||
|
return "and" + interior
|
||||||
|
case sloXor:
|
||||||
|
return "xor" + interior
|
||||||
|
case sloOr:
|
||||||
|
return "or" + interior
|
||||||
|
case sloAndNot:
|
||||||
|
return "andNot" + interior
|
||||||
|
case sloNot:
|
||||||
|
return "not" + interior
|
||||||
|
}
|
||||||
|
return "wrong"
|
||||||
|
}
|
||||||
|
|
||||||
|
func rewriteTern(f *Func) {
|
||||||
|
if f.maxCPUFeatures == CPUNone {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
arch := f.Config.Ctxt().Arch.Family
|
||||||
|
// TODO there are other SIMD architectures
|
||||||
|
if arch != goarch.AMD64 {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
boolExprTrees := make(map[*Value]SIMDLogicalOP)
|
||||||
|
|
||||||
|
// Find logical-expr expression trees, including leaves.
|
||||||
|
// interior nodes will be marked sloInterior,
|
||||||
|
// root nodes will not be marked sloInterior,
|
||||||
|
// leaf nodes are only marked sloInterior.
|
||||||
|
for _, b := range f.Blocks {
|
||||||
|
for _, v := range b.Values {
|
||||||
|
slo := classifyBooleanSIMD(v)
|
||||||
|
switch slo {
|
||||||
|
case sloOr,
|
||||||
|
sloAndNot,
|
||||||
|
sloXor,
|
||||||
|
sloAnd:
|
||||||
|
boolExprTrees[v.Args[1]] |= sloInterior
|
||||||
|
fallthrough
|
||||||
|
case sloNot:
|
||||||
|
boolExprTrees[v.Args[0]] |= sloInterior
|
||||||
|
boolExprTrees[v] |= slo
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// get a canonical sorted set of roots
|
||||||
|
var roots []*Value
|
||||||
|
for v, slo := range boolExprTrees {
|
||||||
|
if f.pass.debug > 1 {
|
||||||
|
f.Warnl(v.Pos, "%s has SLO %v", v.LongString(), slo)
|
||||||
|
}
|
||||||
|
|
||||||
|
if slo&sloInterior == 0 && v.Block.CPUfeatures.hasFeature(CPUavx512) {
|
||||||
|
roots = append(roots, v)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
slices.SortFunc(roots, func(u, v *Value) int { return int(u.ID - v.ID) }) // IDs are small enough to not care about overflow.
|
||||||
|
|
||||||
|
// This rewrite works by iterating over the root set.
|
||||||
|
// For each boolean expression, it walks the expression
|
||||||
|
// bottom up accumulating sets of variables mentioned in
|
||||||
|
// subexpressions, lazy-greedily finding the largest subexpressions
|
||||||
|
// of 3 inputs that can be rewritten to use ternary-truth-table instructions.
|
||||||
|
|
||||||
|
// rewrite recursively attempts to replace v and v's subexpressions with
|
||||||
|
// ternary-logic truth-table operations, returning a set of not more than 3
|
||||||
|
// subexpressions within v that may be combined into a parent's replacement.
|
||||||
|
// V need not have the CPU features that allow a ternary-logic operation;
|
||||||
|
// in that case, v will not be rewritten. Replacements also require
|
||||||
|
// exactly 3 different variable inputs to a boolean expression.
|
||||||
|
//
|
||||||
|
// Given the CPU feature and 3 inputs, v is replaced in the following
|
||||||
|
// cases:
|
||||||
|
//
|
||||||
|
// 1) v is a root
|
||||||
|
// 2) u = NOT(v) and u lacks the CPU feature
|
||||||
|
// 3) u = OP(v, w) and u lacks the CPU feature
|
||||||
|
// 4) u = OP(v, w) and u has more than 3 variable inputs. var rewrite func(v *Value) [3]*Value
|
||||||
|
var rewrite func(v *Value) [3]*Value
|
||||||
|
|
||||||
|
// computeTT returns the truth table for a boolean expression
|
||||||
|
// over the variables in vars, where vars[0] varies slowest in
|
||||||
|
// the truth table and vars[2] varies fastest.
|
||||||
|
// e.g. computeTT( "and(x, or(y, not(z)))", {x,y,z} ) returns
|
||||||
|
// (bit 0 first) 0 0 0 0 1 0 1 1 = (reversed) 1101_0000 = 0xD0
|
||||||
|
// x: 0 0 0 0 1 1 1 1
|
||||||
|
// y: 0 0 1 1 0 0 1 1
|
||||||
|
// z: 0 1 0 1 0 1 0 1
|
||||||
|
var computeTT func(v *Value, vars [3]*Value) uint8
|
||||||
|
|
||||||
|
// combine two sets of variables into one, returning ok/not
|
||||||
|
// if the two sets contained 3 or fewer elements. Combine
|
||||||
|
// ensures that the sets of Values never contain duplicates.
|
||||||
|
// (Duplicates would create less-efficient code, not incorrect code.)
|
||||||
|
combine := func(a, b [3]*Value) ([3]*Value, bool) {
|
||||||
|
var c [3]*Value
|
||||||
|
i := 0
|
||||||
|
for _, v := range a {
|
||||||
|
if v == nil {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
c[i] = v
|
||||||
|
i++
|
||||||
|
}
|
||||||
|
bloop:
|
||||||
|
for _, v := range b {
|
||||||
|
if v == nil {
|
||||||
|
break
|
||||||
|
}
|
||||||
|
for _, u := range a {
|
||||||
|
if v == u {
|
||||||
|
continue bloop
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if i == 3 {
|
||||||
|
return [3]*Value{}, false
|
||||||
|
}
|
||||||
|
c[i] = v
|
||||||
|
i++
|
||||||
|
}
|
||||||
|
return c, true
|
||||||
|
}
|
||||||
|
|
||||||
|
computeTT = func(v *Value, vars [3]*Value) uint8 {
|
||||||
|
i := 0
|
||||||
|
for ; i < len(vars); i++ {
|
||||||
|
if vars[i] == v {
|
||||||
|
return truthTableValues[i]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
slo := boolExprTrees[v] &^ sloInterior
|
||||||
|
a := computeTT(v.Args[0], vars)
|
||||||
|
switch slo {
|
||||||
|
case sloNot:
|
||||||
|
return ^a
|
||||||
|
case sloAnd:
|
||||||
|
return a & computeTT(v.Args[1], vars)
|
||||||
|
case sloXor:
|
||||||
|
return a ^ computeTT(v.Args[1], vars)
|
||||||
|
case sloOr:
|
||||||
|
return a | computeTT(v.Args[1], vars)
|
||||||
|
case sloAndNot:
|
||||||
|
return a & ^computeTT(v.Args[1], vars)
|
||||||
|
}
|
||||||
|
panic("switch should have covered all cases, or unknown var in logical expression")
|
||||||
|
}
|
||||||
|
|
||||||
|
replace := func(a0 *Value, vars0 [3]*Value) {
|
||||||
|
imm := computeTT(a0, vars0)
|
||||||
|
op := ternOpForLogical(a0.Op)
|
||||||
|
if op == a0.Op {
|
||||||
|
panic(fmt.Errorf("should have mapped away from input op, a0 is %s", a0.LongString()))
|
||||||
|
}
|
||||||
|
if f.pass.debug > 0 {
|
||||||
|
f.Warnl(a0.Pos, "Rewriting %s into %v of 0b%b %v %v %v", a0.LongString(), op, imm,
|
||||||
|
vars0[0], vars0[1], vars0[2])
|
||||||
|
}
|
||||||
|
a0.reset(op)
|
||||||
|
a0.SetArgs3(vars0[0], vars0[1], vars0[2])
|
||||||
|
a0.AuxInt = int64(int8(imm))
|
||||||
|
}
|
||||||
|
|
||||||
|
// addOne ensures the no-duplicates addition of a single value
|
||||||
|
// to a set that is not full. It seems possible that a shared
|
||||||
|
// subexpression in tricky combination with blocks lacking the
|
||||||
|
// AVX512 feature might permit this.
|
||||||
|
addOne := func(vars [3]*Value, v *Value) [3]*Value {
|
||||||
|
if vars[2] != nil {
|
||||||
|
panic("rewriteTern.addOne, vars[2] should be nil")
|
||||||
|
}
|
||||||
|
if v == vars[0] || v == vars[1] {
|
||||||
|
return vars
|
||||||
|
}
|
||||||
|
if vars[1] == nil {
|
||||||
|
vars[1] = v
|
||||||
|
} else {
|
||||||
|
vars[2] = v
|
||||||
|
}
|
||||||
|
return vars
|
||||||
|
}
|
||||||
|
|
||||||
|
rewrite = func(v *Value) [3]*Value {
|
||||||
|
slo := boolExprTrees[v]
|
||||||
|
if slo == sloInterior { // leaf node, i.e., a "variable"
|
||||||
|
return [3]*Value{v, nil, nil}
|
||||||
|
}
|
||||||
|
var vars [3]*Value
|
||||||
|
hasFeature := v.Block.CPUfeatures.hasFeature(CPUavx512)
|
||||||
|
if slo&sloNot == sloNot {
|
||||||
|
vars = rewrite(v.Args[0])
|
||||||
|
if !hasFeature {
|
||||||
|
if vars[2] != nil {
|
||||||
|
replace(v.Args[0], vars)
|
||||||
|
return [3]*Value{v, nil, nil}
|
||||||
|
}
|
||||||
|
return vars
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
var ok bool
|
||||||
|
a0, a1 := v.Args[0], v.Args[1]
|
||||||
|
vars0 := rewrite(a0)
|
||||||
|
vars1 := rewrite(a1)
|
||||||
|
vars, ok = combine(vars0, vars1)
|
||||||
|
|
||||||
|
if f.pass.debug > 1 {
|
||||||
|
f.Warnl(a0.Pos, "combine(%v, %v) -> %v, %v", vars0, vars1, vars, ok)
|
||||||
|
}
|
||||||
|
|
||||||
|
if !(ok && v.Block.CPUfeatures.hasFeature(CPUavx512)) {
|
||||||
|
// too many variables, or cannot rewrite current values.
|
||||||
|
// rewrite one or both subtrees if possible
|
||||||
|
if vars0[2] != nil && a0.Block.CPUfeatures.hasFeature(CPUavx512) {
|
||||||
|
replace(a0, vars0)
|
||||||
|
}
|
||||||
|
if vars1[2] != nil && a1.Block.CPUfeatures.hasFeature(CPUavx512) {
|
||||||
|
replace(a1, vars1)
|
||||||
|
}
|
||||||
|
|
||||||
|
// 3-element var arrays are either rewritten, or unable to be rewritten
|
||||||
|
// because of the features in effect in their block. Either way, they
|
||||||
|
// are treated as a "new var" if 3 elements are present.
|
||||||
|
|
||||||
|
if vars0[2] == nil {
|
||||||
|
if vars1[2] == nil {
|
||||||
|
// both subtrees are 2-element and were not rewritten.
|
||||||
|
//
|
||||||
|
// TODO a clever person would look at subtrees of inputs,
|
||||||
|
// e.g. rewrite
|
||||||
|
// ((a AND b) XOR b) XOR (d XOR (c AND d))
|
||||||
|
// to (((a AND b) XOR b) XOR d) XOR (c AND d)
|
||||||
|
// to v = TERNLOG(truthtable, a, b, d) XOR (c AND d)
|
||||||
|
// and return the variable set {v, c, d}
|
||||||
|
//
|
||||||
|
// But for now, just restart with a0 and a1.
|
||||||
|
return [3]*Value{a0, a1, nil}
|
||||||
|
} else {
|
||||||
|
// a1 (maybe) rewrote, a0 has room for another var
|
||||||
|
vars = addOne(vars0, a1)
|
||||||
|
}
|
||||||
|
} else if vars1[2] == nil {
|
||||||
|
// a0 (maybe) rewrote, a1 has room for another var
|
||||||
|
vars = addOne(vars1, a0)
|
||||||
|
} else if !ok {
|
||||||
|
// both (maybe) rewrote
|
||||||
|
// a0 and a1 are different because otherwise their variable
|
||||||
|
// sets would have combined "ok".
|
||||||
|
return [3]*Value{a0, a1, nil}
|
||||||
|
}
|
||||||
|
// continue with either the vars from "ok" or the updated set of vars.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// if root and 3 vars and hasFeature, rewrite.
|
||||||
|
if slo&sloInterior == 0 && vars[2] != nil && hasFeature {
|
||||||
|
replace(v, vars)
|
||||||
|
return [3]*Value{v, nil, nil}
|
||||||
|
}
|
||||||
|
return vars
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, v := range roots {
|
||||||
|
if f.pass.debug > 1 {
|
||||||
|
f.Warnl(v.Pos, "SLO root %s", v.LongString())
|
||||||
|
}
|
||||||
|
rewrite(v)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -21,7 +21,7 @@ func TestSizeof(t *testing.T) {
|
||||||
_64bit uintptr // size on 64bit platforms
|
_64bit uintptr // size on 64bit platforms
|
||||||
}{
|
}{
|
||||||
{Value{}, 72, 112},
|
{Value{}, 72, 112},
|
||||||
{Block{}, 164, 304},
|
{Block{}, 168, 312},
|
||||||
{LocalSlot{}, 28, 40},
|
{LocalSlot{}, 28, 40},
|
||||||
{valState{}, 28, 40},
|
{valState{}, 28, 40},
|
||||||
}
|
}
|
||||||
|
|
|
||||||
160
src/cmd/compile/internal/ssa/tern_helpers.go
Normal file
160
src/cmd/compile/internal/ssa/tern_helpers.go
Normal file
|
|
@ -0,0 +1,160 @@
|
||||||
|
// Code generated by 'go run genfiles.go'; DO NOT EDIT.
|
||||||
|
|
||||||
|
package ssa
|
||||||
|
|
||||||
|
type SIMDLogicalOP uint8
|
||||||
|
|
||||||
|
const (
|
||||||
|
// boolean simd operations, for reducing expression to VPTERNLOG* instructions
|
||||||
|
// sloInterior is set for non-root nodes in logical-op expression trees.
|
||||||
|
// the operations are even-numbered.
|
||||||
|
sloInterior SIMDLogicalOP = 1
|
||||||
|
sloNone SIMDLogicalOP = 2 * iota
|
||||||
|
sloAnd
|
||||||
|
sloOr
|
||||||
|
sloAndNot
|
||||||
|
sloXor
|
||||||
|
sloNot
|
||||||
|
)
|
||||||
|
|
||||||
|
func classifyBooleanSIMD(v *Value) SIMDLogicalOP {
|
||||||
|
switch v.Op {
|
||||||
|
case OpAndInt8x16, OpAndInt16x8, OpAndInt32x4, OpAndInt64x2, OpAndInt8x32, OpAndInt16x16, OpAndInt32x8, OpAndInt64x4, OpAndInt8x64, OpAndInt16x32, OpAndInt32x16, OpAndInt64x8:
|
||||||
|
return sloAnd
|
||||||
|
|
||||||
|
case OpOrInt8x16, OpOrInt16x8, OpOrInt32x4, OpOrInt64x2, OpOrInt8x32, OpOrInt16x16, OpOrInt32x8, OpOrInt64x4, OpOrInt8x64, OpOrInt16x32, OpOrInt32x16, OpOrInt64x8:
|
||||||
|
return sloOr
|
||||||
|
|
||||||
|
case OpAndNotInt8x16, OpAndNotInt16x8, OpAndNotInt32x4, OpAndNotInt64x2, OpAndNotInt8x32, OpAndNotInt16x16, OpAndNotInt32x8, OpAndNotInt64x4, OpAndNotInt8x64, OpAndNotInt16x32, OpAndNotInt32x16, OpAndNotInt64x8:
|
||||||
|
return sloAndNot
|
||||||
|
case OpXorInt8x16:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt8x16 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt16x8:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt16x8 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt32x4:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt32x4 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt64x2:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt64x2 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt8x32:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt8x32 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt16x16:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt16x16 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt32x8:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt32x8 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt64x4:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt64x4 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt8x64:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt8x64 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt16x32:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt16x32 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt32x16:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt32x16 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
case OpXorInt64x8:
|
||||||
|
if y := v.Args[1]; y.Op == OpEqualInt64x8 &&
|
||||||
|
y.Args[0] == y.Args[1] {
|
||||||
|
return sloNot
|
||||||
|
}
|
||||||
|
return sloXor
|
||||||
|
|
||||||
|
}
|
||||||
|
return sloNone
|
||||||
|
}
|
||||||
|
|
||||||
|
func ternOpForLogical(op Op) Op {
|
||||||
|
switch op {
|
||||||
|
case OpAndInt8x16, OpOrInt8x16, OpXorInt8x16, OpAndNotInt8x16:
|
||||||
|
return OpternInt32x4
|
||||||
|
case OpAndUint8x16, OpOrUint8x16, OpXorUint8x16, OpAndNotUint8x16:
|
||||||
|
return OpternUint32x4
|
||||||
|
case OpAndInt16x8, OpOrInt16x8, OpXorInt16x8, OpAndNotInt16x8:
|
||||||
|
return OpternInt32x4
|
||||||
|
case OpAndUint16x8, OpOrUint16x8, OpXorUint16x8, OpAndNotUint16x8:
|
||||||
|
return OpternUint32x4
|
||||||
|
case OpAndInt32x4, OpOrInt32x4, OpXorInt32x4, OpAndNotInt32x4:
|
||||||
|
return OpternInt32x4
|
||||||
|
case OpAndUint32x4, OpOrUint32x4, OpXorUint32x4, OpAndNotUint32x4:
|
||||||
|
return OpternUint32x4
|
||||||
|
case OpAndInt64x2, OpOrInt64x2, OpXorInt64x2, OpAndNotInt64x2:
|
||||||
|
return OpternInt64x2
|
||||||
|
case OpAndUint64x2, OpOrUint64x2, OpXorUint64x2, OpAndNotUint64x2:
|
||||||
|
return OpternUint64x2
|
||||||
|
case OpAndInt8x32, OpOrInt8x32, OpXorInt8x32, OpAndNotInt8x32:
|
||||||
|
return OpternInt32x8
|
||||||
|
case OpAndUint8x32, OpOrUint8x32, OpXorUint8x32, OpAndNotUint8x32:
|
||||||
|
return OpternUint32x8
|
||||||
|
case OpAndInt16x16, OpOrInt16x16, OpXorInt16x16, OpAndNotInt16x16:
|
||||||
|
return OpternInt32x8
|
||||||
|
case OpAndUint16x16, OpOrUint16x16, OpXorUint16x16, OpAndNotUint16x16:
|
||||||
|
return OpternUint32x8
|
||||||
|
case OpAndInt32x8, OpOrInt32x8, OpXorInt32x8, OpAndNotInt32x8:
|
||||||
|
return OpternInt32x8
|
||||||
|
case OpAndUint32x8, OpOrUint32x8, OpXorUint32x8, OpAndNotUint32x8:
|
||||||
|
return OpternUint32x8
|
||||||
|
case OpAndInt64x4, OpOrInt64x4, OpXorInt64x4, OpAndNotInt64x4:
|
||||||
|
return OpternInt64x4
|
||||||
|
case OpAndUint64x4, OpOrUint64x4, OpXorUint64x4, OpAndNotUint64x4:
|
||||||
|
return OpternUint64x4
|
||||||
|
case OpAndInt8x64, OpOrInt8x64, OpXorInt8x64, OpAndNotInt8x64:
|
||||||
|
return OpternInt32x16
|
||||||
|
case OpAndUint8x64, OpOrUint8x64, OpXorUint8x64, OpAndNotUint8x64:
|
||||||
|
return OpternUint32x16
|
||||||
|
case OpAndInt16x32, OpOrInt16x32, OpXorInt16x32, OpAndNotInt16x32:
|
||||||
|
return OpternInt32x16
|
||||||
|
case OpAndUint16x32, OpOrUint16x32, OpXorUint16x32, OpAndNotUint16x32:
|
||||||
|
return OpternUint32x16
|
||||||
|
case OpAndInt32x16, OpOrInt32x16, OpXorInt32x16, OpAndNotInt32x16:
|
||||||
|
return OpternInt32x16
|
||||||
|
case OpAndUint32x16, OpOrUint32x16, OpXorUint32x16, OpAndNotUint32x16:
|
||||||
|
return OpternUint32x16
|
||||||
|
case OpAndInt64x8, OpOrInt64x8, OpXorInt64x8, OpAndNotInt64x8:
|
||||||
|
return OpternInt64x8
|
||||||
|
case OpAndUint64x8, OpOrUint64x8, OpXorUint64x8, OpAndNotUint64x8:
|
||||||
|
return OpternUint64x8
|
||||||
|
|
||||||
|
}
|
||||||
|
return op
|
||||||
|
}
|
||||||
|
|
@ -9,6 +9,7 @@ import (
|
||||||
"cmd/compile/internal/types"
|
"cmd/compile/internal/types"
|
||||||
"cmd/internal/src"
|
"cmd/internal/src"
|
||||||
"fmt"
|
"fmt"
|
||||||
|
"internal/buildcfg"
|
||||||
"math"
|
"math"
|
||||||
"sort"
|
"sort"
|
||||||
"strings"
|
"strings"
|
||||||
|
|
@ -612,12 +613,18 @@ func AutoVar(v *Value) (*ir.Name, int64) {
|
||||||
// CanSSA reports whether values of type t can be represented as a Value.
|
// CanSSA reports whether values of type t can be represented as a Value.
|
||||||
func CanSSA(t *types.Type) bool {
|
func CanSSA(t *types.Type) bool {
|
||||||
types.CalcSize(t)
|
types.CalcSize(t)
|
||||||
if t.Size() > int64(4*types.PtrSize) {
|
if t.IsSIMD() {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
sizeLimit := int64(MaxStruct * types.PtrSize)
|
||||||
|
if t.Size() > sizeLimit {
|
||||||
// 4*Widthptr is an arbitrary constant. We want it
|
// 4*Widthptr is an arbitrary constant. We want it
|
||||||
// to be at least 3*Widthptr so slices can be registerized.
|
// to be at least 3*Widthptr so slices can be registerized.
|
||||||
// Too big and we'll introduce too much register pressure.
|
// Too big and we'll introduce too much register pressure.
|
||||||
|
if !buildcfg.Experiment.SIMD {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
}
|
||||||
switch t.Kind() {
|
switch t.Kind() {
|
||||||
case types.TARRAY:
|
case types.TARRAY:
|
||||||
// We can't do larger arrays because dynamic indexing is
|
// We can't do larger arrays because dynamic indexing is
|
||||||
|
|
@ -636,7 +643,17 @@ func CanSSA(t *types.Type) bool {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
// Special check for SIMD. If the composite type
|
||||||
|
// contains SIMD vectors we can return true
|
||||||
|
// if it pass the checks below.
|
||||||
|
if !buildcfg.Experiment.SIMD {
|
||||||
return true
|
return true
|
||||||
|
}
|
||||||
|
if t.Size() <= sizeLimit {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
i, f := t.Registers()
|
||||||
|
return i+f <= MaxStruct
|
||||||
default:
|
default:
|
||||||
return true
|
return true
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -99,6 +99,18 @@ func (s *SymABIs) ReadSymABIs(file string) {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// HasDef returns whether the given symbol has an assembly definition.
|
||||||
|
func (s *SymABIs) HasDef(sym *types.Sym) bool {
|
||||||
|
symName := sym.Linkname
|
||||||
|
if symName == "" {
|
||||||
|
symName = sym.Pkg.Prefix + "." + sym.Name
|
||||||
|
}
|
||||||
|
symName = s.canonicalize(symName)
|
||||||
|
|
||||||
|
_, hasDefABI := s.defs[symName]
|
||||||
|
return hasDefABI
|
||||||
|
}
|
||||||
|
|
||||||
// GenABIWrappers applies ABI information to Funcs and generates ABI
|
// GenABIWrappers applies ABI information to Funcs and generates ABI
|
||||||
// wrapper functions where necessary.
|
// wrapper functions where necessary.
|
||||||
func (s *SymABIs) GenABIWrappers() {
|
func (s *SymABIs) GenABIWrappers() {
|
||||||
|
|
|
||||||
|
|
@ -12,6 +12,7 @@ import (
|
||||||
"cmd/compile/internal/base"
|
"cmd/compile/internal/base"
|
||||||
"cmd/compile/internal/ir"
|
"cmd/compile/internal/ir"
|
||||||
"cmd/compile/internal/ssa"
|
"cmd/compile/internal/ssa"
|
||||||
|
"cmd/compile/internal/typecheck"
|
||||||
"cmd/compile/internal/types"
|
"cmd/compile/internal/types"
|
||||||
"cmd/internal/sys"
|
"cmd/internal/sys"
|
||||||
)
|
)
|
||||||
|
|
@ -1632,6 +1633,495 @@ func initIntrinsics(cfg *intrinsicBuildConfig) {
|
||||||
return s.newValue1(ssa.OpCvtBoolToUint8, types.Types[types.TUINT8], args[0])
|
return s.newValue1(ssa.OpCvtBoolToUint8, types.Types[types.TUINT8], args[0])
|
||||||
},
|
},
|
||||||
all...)
|
all...)
|
||||||
|
|
||||||
|
if buildcfg.Experiment.SIMD {
|
||||||
|
// Only enable intrinsics, if SIMD experiment.
|
||||||
|
simdIntrinsics(addF)
|
||||||
|
|
||||||
|
addF("simd", "ClearAVXUpperBits",
|
||||||
|
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
s.vars[memVar] = s.newValue1(ssa.OpAMD64VZEROUPPER, types.TypeMem, s.mem())
|
||||||
|
return nil
|
||||||
|
},
|
||||||
|
sys.AMD64)
|
||||||
|
|
||||||
|
addF(simdPackage, "Int8x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int16x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int32x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int64x2.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint8x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint16x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint32x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint64x2.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int8x32.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int16x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int32x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Int64x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint8x32.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint16x16.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint32x8.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
addF(simdPackage, "Uint64x4.IsZero", opLen1(ssa.OpIsZeroVec, types.Types[types.TBOOL]), sys.AMD64)
|
||||||
|
|
||||||
|
sfp4 := func(method string, hwop ssa.Op, vectype *types.Type) {
|
||||||
|
addF("simd", method,
|
||||||
|
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
x, a, b, c, d, y := args[0], args[1], args[2], args[3], args[4], args[5]
|
||||||
|
if a.Op == ssa.OpConst8 && b.Op == ssa.OpConst8 && c.Op == ssa.OpConst8 && d.Op == ssa.OpConst8 {
|
||||||
|
return select4FromPair(x, a, b, c, d, y, s, hwop, vectype)
|
||||||
|
} else {
|
||||||
|
return s.callResult(n, callNormal)
|
||||||
|
}
|
||||||
|
},
|
||||||
|
sys.AMD64)
|
||||||
|
}
|
||||||
|
|
||||||
|
sfp4("Int32x4.SelectFromPair", ssa.OpconcatSelectedConstantInt32x4, types.TypeVec128)
|
||||||
|
sfp4("Uint32x4.SelectFromPair", ssa.OpconcatSelectedConstantUint32x4, types.TypeVec128)
|
||||||
|
sfp4("Float32x4.SelectFromPair", ssa.OpconcatSelectedConstantFloat32x4, types.TypeVec128)
|
||||||
|
|
||||||
|
sfp4("Int32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt32x8, types.TypeVec256)
|
||||||
|
sfp4("Uint32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint32x8, types.TypeVec256)
|
||||||
|
sfp4("Float32x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat32x8, types.TypeVec256)
|
||||||
|
|
||||||
|
sfp4("Int32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt32x16, types.TypeVec512)
|
||||||
|
sfp4("Uint32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint32x16, types.TypeVec512)
|
||||||
|
sfp4("Float32x16.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat32x16, types.TypeVec512)
|
||||||
|
|
||||||
|
sfp2 := func(method string, hwop ssa.Op, vectype *types.Type, cscimm func(i, j uint8) int64) {
|
||||||
|
addF("simd", method,
|
||||||
|
func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
x, a, b, y := args[0], args[1], args[2], args[3]
|
||||||
|
if a.Op == ssa.OpConst8 && b.Op == ssa.OpConst8 {
|
||||||
|
return select2FromPair(x, a, b, y, s, hwop, vectype, cscimm)
|
||||||
|
} else {
|
||||||
|
return s.callResult(n, callNormal)
|
||||||
|
}
|
||||||
|
},
|
||||||
|
sys.AMD64)
|
||||||
|
}
|
||||||
|
|
||||||
|
sfp2("Uint64x2.SelectFromPair", ssa.OpconcatSelectedConstantUint64x2, types.TypeVec128, cscimm2)
|
||||||
|
sfp2("Int64x2.SelectFromPair", ssa.OpconcatSelectedConstantInt64x2, types.TypeVec128, cscimm2)
|
||||||
|
sfp2("Float64x2.SelectFromPair", ssa.OpconcatSelectedConstantFloat64x2, types.TypeVec128, cscimm2)
|
||||||
|
|
||||||
|
sfp2("Uint64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint64x4, types.TypeVec256, cscimm2g2)
|
||||||
|
sfp2("Int64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt64x4, types.TypeVec256, cscimm2g2)
|
||||||
|
sfp2("Float64x4.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat64x4, types.TypeVec256, cscimm2g2)
|
||||||
|
|
||||||
|
sfp2("Uint64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedUint64x8, types.TypeVec512, cscimm2g4)
|
||||||
|
sfp2("Int64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedInt64x8, types.TypeVec512, cscimm2g4)
|
||||||
|
sfp2("Float64x8.SelectFromPairGrouped", ssa.OpconcatSelectedConstantGroupedFloat64x8, types.TypeVec512, cscimm2g4)
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func cscimm4(a, b, c, d uint8) int64 {
|
||||||
|
return se(a + b<<2 + c<<4 + d<<6)
|
||||||
|
}
|
||||||
|
|
||||||
|
func cscimm2(a, b uint8) int64 {
|
||||||
|
return se(a + b<<1)
|
||||||
|
}
|
||||||
|
|
||||||
|
func cscimm2g2(a, b uint8) int64 {
|
||||||
|
g := cscimm2(a, b)
|
||||||
|
return int64(int8(g + g<<2))
|
||||||
|
}
|
||||||
|
|
||||||
|
func cscimm2g4(a, b uint8) int64 {
|
||||||
|
g := cscimm2g2(a, b)
|
||||||
|
return int64(int8(g + g<<4))
|
||||||
|
}
|
||||||
|
|
||||||
|
const (
|
||||||
|
_LLLL = iota
|
||||||
|
_HLLL
|
||||||
|
_LHLL
|
||||||
|
_HHLL
|
||||||
|
_LLHL
|
||||||
|
_HLHL
|
||||||
|
_LHHL
|
||||||
|
_HHHL
|
||||||
|
_LLLH
|
||||||
|
_HLLH
|
||||||
|
_LHLH
|
||||||
|
_HHLH
|
||||||
|
_LLHH
|
||||||
|
_HLHH
|
||||||
|
_LHHH
|
||||||
|
_HHHH
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
_LL = iota
|
||||||
|
_HL
|
||||||
|
_LH
|
||||||
|
_HH
|
||||||
|
)
|
||||||
|
|
||||||
|
func select2FromPair(x, _a, _b, y *ssa.Value, s *state, op ssa.Op, t *types.Type, csc func(a, b uint8) int64) *ssa.Value {
|
||||||
|
a, b := uint8(_a.AuxInt8()), uint8(_b.AuxInt8())
|
||||||
|
pattern := (a&2)>>1 + (b & 2)
|
||||||
|
a, b = a&1, b&1
|
||||||
|
|
||||||
|
switch pattern {
|
||||||
|
case _LL:
|
||||||
|
return s.newValue2I(op, t, csc(a, b), x, x)
|
||||||
|
case _HH:
|
||||||
|
return s.newValue2I(op, t, csc(a, b), y, y)
|
||||||
|
case _LH:
|
||||||
|
return s.newValue2I(op, t, csc(a, b), x, y)
|
||||||
|
case _HL:
|
||||||
|
return s.newValue2I(op, t, csc(a, b), y, x)
|
||||||
|
}
|
||||||
|
panic("The preceding switch should have been exhaustive")
|
||||||
|
}
|
||||||
|
|
||||||
|
func select4FromPair(x, _a, _b, _c, _d, y *ssa.Value, s *state, op ssa.Op, t *types.Type) *ssa.Value {
|
||||||
|
a, b, c, d := uint8(_a.AuxInt8()), uint8(_b.AuxInt8()), uint8(_c.AuxInt8()), uint8(_d.AuxInt8())
|
||||||
|
pattern := a>>2 + (b&4)>>1 + (c & 4) + (d&4)<<1
|
||||||
|
|
||||||
|
a, b, c, d = a&3, b&3, c&3, d&3
|
||||||
|
|
||||||
|
switch pattern {
|
||||||
|
case _LLLL:
|
||||||
|
// TODO DETECT 0,1,2,3, 0,0,0,0
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, c, d), x, x)
|
||||||
|
case _HHHH:
|
||||||
|
// TODO DETECT 0,1,2,3, 0,0,0,0
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, c, d), y, y)
|
||||||
|
case _LLHH:
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, c, d), x, y)
|
||||||
|
case _HHLL:
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, c, d), y, x)
|
||||||
|
|
||||||
|
case _HLLL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, a, b, b), y, x)
|
||||||
|
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, x)
|
||||||
|
case _LHLL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, a, b, b), x, y)
|
||||||
|
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, x)
|
||||||
|
case _HLHH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, a, b, b), y, x)
|
||||||
|
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, y)
|
||||||
|
case _LHHH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, a, b, b), x, y)
|
||||||
|
return s.newValue2I(op, t, cscimm4(0, 2, c, d), z, y)
|
||||||
|
|
||||||
|
case _LLLH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(c, c, d, d), x, y)
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), x, z)
|
||||||
|
case _LLHL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(c, c, d, d), y, x)
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), x, z)
|
||||||
|
|
||||||
|
case _HHLH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(c, c, d, d), x, y)
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), y, z)
|
||||||
|
|
||||||
|
case _HHHL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(c, c, d, d), y, x)
|
||||||
|
return s.newValue2I(op, t, cscimm4(a, b, 0, 2), y, z)
|
||||||
|
|
||||||
|
case _LHLH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, c, b, d), x, y)
|
||||||
|
return s.newValue2I(op, t, se(0b11_01_10_00), z, z)
|
||||||
|
case _HLHL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(b, d, a, c), x, y)
|
||||||
|
return s.newValue2I(op, t, se(0b01_11_00_10), z, z)
|
||||||
|
case _HLLH:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(b, c, a, d), x, y)
|
||||||
|
return s.newValue2I(op, t, se(0b11_01_00_10), z, z)
|
||||||
|
case _LHHL:
|
||||||
|
z := s.newValue2I(op, t, cscimm4(a, d, b, c), x, y)
|
||||||
|
return s.newValue2I(op, t, se(0b01_11_10_00), z, z)
|
||||||
|
}
|
||||||
|
panic("The preceding switch should have been exhaustive")
|
||||||
|
}
|
||||||
|
|
||||||
|
// se smears the not-really-a-sign bit of a uint8 to conform to the conventions
|
||||||
|
// for representing AuxInt in ssa.
|
||||||
|
func se(x uint8) int64 {
|
||||||
|
return int64(int8(x))
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen1(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue1(op, t, args[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen2(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue2(op, t, args[0], args[1])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen2_21(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue2(op, t, args[1], args[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue3(op, t, args[0], args[1], args[2])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
var ssaVecBySize = map[int64]*types.Type{
|
||||||
|
16: types.TypeVec128,
|
||||||
|
32: types.TypeVec256,
|
||||||
|
64: types.TypeVec512,
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3_31Zero3(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if t, ok := ssaVecBySize[args[1].Type.Size()]; !ok {
|
||||||
|
panic("unknown simd vector size")
|
||||||
|
} else {
|
||||||
|
return s.newValue3(op, t, s.newValue0(ssa.OpZeroSIMD, t), args[1], args[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3_21(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue3(op, t, args[1], args[0], args[2])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3_231(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue3(op, t, args[2], args[0], args[1])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen4(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue4(op, t, args[0], args[1], args[2], args[3])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen4_231(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue4(op, t, args[2], args[0], args[1], args[3])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen4_31(op ssa.Op, t *types.Type) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue4(op, t, args[2], args[1], args[0], args[3])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func immJumpTable(s *state, idx *ssa.Value, intrinsicCall *ir.CallExpr, genOp func(*state, int)) *ssa.Value {
|
||||||
|
// Make blocks we'll need.
|
||||||
|
bEnd := s.f.NewBlock(ssa.BlockPlain)
|
||||||
|
|
||||||
|
if !idx.Type.IsKind(types.TUINT8) {
|
||||||
|
panic("immJumpTable expects uint8 value")
|
||||||
|
}
|
||||||
|
|
||||||
|
// We will exhaust 0-255, so no need to check the bounds.
|
||||||
|
t := types.Types[types.TUINTPTR]
|
||||||
|
idx = s.conv(nil, idx, idx.Type, t)
|
||||||
|
|
||||||
|
b := s.curBlock
|
||||||
|
b.Kind = ssa.BlockJumpTable
|
||||||
|
b.Pos = intrinsicCall.Pos()
|
||||||
|
if base.Flag.Cfg.SpectreIndex {
|
||||||
|
// Potential Spectre vulnerability hardening?
|
||||||
|
idx = s.newValue2(ssa.OpSpectreSliceIndex, t, idx, s.uintptrConstant(255))
|
||||||
|
}
|
||||||
|
b.SetControl(idx)
|
||||||
|
targets := [256]*ssa.Block{}
|
||||||
|
for i := range 256 {
|
||||||
|
t := s.f.NewBlock(ssa.BlockPlain)
|
||||||
|
targets[i] = t
|
||||||
|
b.AddEdgeTo(t)
|
||||||
|
}
|
||||||
|
s.endBlock()
|
||||||
|
|
||||||
|
for i, t := range targets {
|
||||||
|
s.startBlock(t)
|
||||||
|
genOp(s, i)
|
||||||
|
if t.Kind != ssa.BlockExit {
|
||||||
|
t.AddEdgeTo(bEnd)
|
||||||
|
}
|
||||||
|
s.endBlock()
|
||||||
|
}
|
||||||
|
|
||||||
|
s.startBlock(bEnd)
|
||||||
|
ret := s.variable(intrinsicCall, intrinsicCall.Type())
|
||||||
|
return ret
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen1Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue1I(op, t, args[1].AuxInt<<int64(offset), args[0])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue1I(op, t, int64(int8(idx<<offset)), args[0])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen2Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue2I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset)), args[0], args[2])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue3I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2], args[3])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue3I(op, t, int64(int8(idx<<offset)), args[0], args[2], args[3])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen2Imm8_2I(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[2].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue2I(op, t, args[2].AuxInt<<int64(offset), args[0], args[1])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[2], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset)), args[0], args[1])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Two immediates instead of just 1. Offset is ignored, so it is a _ parameter instead.
|
||||||
|
func opLen2Imm8_II(op ssa.Op, t *types.Type, _ int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 && args[2].Op == ssa.OpConst8 && args[1].AuxInt & ^3 == 0 && args[2].AuxInt & ^3 == 0 {
|
||||||
|
i1, i2 := args[1].AuxInt, args[2].AuxInt
|
||||||
|
return s.newValue2I(op, t, int64(int8(i1+i2<<4)), args[0], args[3])
|
||||||
|
}
|
||||||
|
four := s.constInt64(types.Types[types.TUINT8], 4)
|
||||||
|
shifted := s.newValue2(ssa.OpLsh8x8, types.Types[types.TUINT8], args[2], four)
|
||||||
|
combined := s.newValue2(ssa.OpAdd8, types.Types[types.TUINT8], args[1], shifted)
|
||||||
|
return immJumpTable(s, combined, n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
// TODO for "zeroing" values, panic instead.
|
||||||
|
if idx & ^(3+3<<4) == 0 {
|
||||||
|
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx)), args[0], args[3])
|
||||||
|
} else {
|
||||||
|
sNew.rtcall(ir.Syms.PanicSimdImm, false, nil)
|
||||||
|
}
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// The assembler requires the imm value of a SHA1RNDS4 instruction to be one of 0,1,2,3...
|
||||||
|
func opLen2Imm8_SHA1RNDS4(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue2I(op, t, (args[1].AuxInt<<int64(offset))&0b11, args[0], args[2])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue2I(op, t, int64(int8(idx<<offset))&0b11, args[0], args[2])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen3Imm8_2I(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[2].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue3I(op, t, args[2].AuxInt<<int64(offset), args[0], args[1], args[3])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[2], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue3I(op, t, int64(int8(idx<<offset)), args[0], args[1], args[3])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func opLen4Imm8(op ssa.Op, t *types.Type, offset int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
if args[1].Op == ssa.OpConst8 {
|
||||||
|
return s.newValue4I(op, t, args[1].AuxInt<<int64(offset), args[0], args[2], args[3], args[4])
|
||||||
|
}
|
||||||
|
return immJumpTable(s, args[1], n, func(sNew *state, idx int) {
|
||||||
|
// Encode as int8 due to requirement of AuxInt, check its comment for details.
|
||||||
|
s.vars[n] = sNew.newValue4I(op, t, int64(int8(idx<<offset)), args[0], args[2], args[3], args[4])
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdLoad() func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue2(ssa.OpLoad, n.Type(), args[0], s.mem())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdStore() func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
s.store(args[0].Type, args[1], args[0])
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
var cvtVToMaskOpcodes = map[int]map[int]ssa.Op{
|
||||||
|
8: {16: ssa.OpCvt16toMask8x16, 32: ssa.OpCvt32toMask8x32, 64: ssa.OpCvt64toMask8x64},
|
||||||
|
16: {8: ssa.OpCvt8toMask16x8, 16: ssa.OpCvt16toMask16x16, 32: ssa.OpCvt32toMask16x32},
|
||||||
|
32: {4: ssa.OpCvt8toMask32x4, 8: ssa.OpCvt8toMask32x8, 16: ssa.OpCvt16toMask32x16},
|
||||||
|
64: {2: ssa.OpCvt8toMask64x2, 4: ssa.OpCvt8toMask64x4, 8: ssa.OpCvt8toMask64x8},
|
||||||
|
}
|
||||||
|
|
||||||
|
var cvtMaskToVOpcodes = map[int]map[int]ssa.Op{
|
||||||
|
8: {16: ssa.OpCvtMask8x16to16, 32: ssa.OpCvtMask8x32to32, 64: ssa.OpCvtMask8x64to64},
|
||||||
|
16: {8: ssa.OpCvtMask16x8to8, 16: ssa.OpCvtMask16x16to16, 32: ssa.OpCvtMask16x32to32},
|
||||||
|
32: {4: ssa.OpCvtMask32x4to8, 8: ssa.OpCvtMask32x8to8, 16: ssa.OpCvtMask32x16to16},
|
||||||
|
64: {2: ssa.OpCvtMask64x2to8, 4: ssa.OpCvtMask64x4to8, 8: ssa.OpCvtMask64x8to8},
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdCvtVToMask(elemBits, lanes int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
op := cvtVToMaskOpcodes[elemBits][lanes]
|
||||||
|
if op == 0 {
|
||||||
|
panic(fmt.Sprintf("Unknown mask shape: Mask%dx%d", elemBits, lanes))
|
||||||
|
}
|
||||||
|
return s.newValue1(op, types.TypeMask, args[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdCvtMaskToV(elemBits, lanes int) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
op := cvtMaskToVOpcodes[elemBits][lanes]
|
||||||
|
if op == 0 {
|
||||||
|
panic(fmt.Sprintf("Unknown mask shape: Mask%dx%d", elemBits, lanes))
|
||||||
|
}
|
||||||
|
return s.newValue1(op, n.Type(), args[0])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdMaskedLoad(op ssa.Op) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return s.newValue3(op, n.Type(), args[0], args[1], s.mem())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func simdMaskedStore(op ssa.Op) func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
return func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value {
|
||||||
|
s.vars[memVar] = s.newValue4A(op, types.TypeMem, args[0].Type, args[1], args[2], args[0], s.mem())
|
||||||
|
return nil
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// findIntrinsic returns a function which builds the SSA equivalent of the
|
// findIntrinsic returns a function which builds the SSA equivalent of the
|
||||||
|
|
@ -1657,7 +2147,8 @@ func findIntrinsic(sym *types.Sym) intrinsicBuilder {
|
||||||
|
|
||||||
fn := sym.Name
|
fn := sym.Name
|
||||||
if ssa.IntrinsicsDisable {
|
if ssa.IntrinsicsDisable {
|
||||||
if pkg == "internal/runtime/sys" && (fn == "GetCallerPC" || fn == "GrtCallerSP" || fn == "GetClosurePtr") {
|
if pkg == "internal/runtime/sys" && (fn == "GetCallerPC" || fn == "GrtCallerSP" || fn == "GetClosurePtr") ||
|
||||||
|
pkg == "internal/simd" || pkg == "simd" { // TODO after simd has been moved to package simd, remove internal/simd
|
||||||
// These runtime functions don't have definitions, must be intrinsics.
|
// These runtime functions don't have definitions, must be intrinsics.
|
||||||
} else {
|
} else {
|
||||||
return nil
|
return nil
|
||||||
|
|
@ -1672,7 +2163,74 @@ func IsIntrinsicCall(n *ir.CallExpr) bool {
|
||||||
}
|
}
|
||||||
name, ok := n.Fun.(*ir.Name)
|
name, ok := n.Fun.(*ir.Name)
|
||||||
if !ok {
|
if !ok {
|
||||||
|
if n.Fun.Op() == ir.OMETHEXPR {
|
||||||
|
if meth := ir.MethodExprName(n.Fun); meth != nil {
|
||||||
|
if fn := meth.Func; fn != nil {
|
||||||
|
return IsIntrinsicSym(fn.Sym())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
return findIntrinsic(name.Sym()) != nil
|
return IsIntrinsicSym(name.Sym())
|
||||||
|
}
|
||||||
|
|
||||||
|
func IsIntrinsicSym(sym *types.Sym) bool {
|
||||||
|
return findIntrinsic(sym) != nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// GenIntrinsicBody generates the function body for a bodyless intrinsic.
|
||||||
|
// This is used when the intrinsic is used in a non-call context, e.g.
|
||||||
|
// as a function pointer, or (for a method) being referenced from the type
|
||||||
|
// descriptor.
|
||||||
|
//
|
||||||
|
// The compiler already recognizes a call to fn as an intrinsic and can
|
||||||
|
// directly generate code for it. So we just fill in the body with a call
|
||||||
|
// to fn.
|
||||||
|
func GenIntrinsicBody(fn *ir.Func) {
|
||||||
|
if ir.CurFunc != nil {
|
||||||
|
base.FatalfAt(fn.Pos(), "enqueueFunc %v inside %v", fn, ir.CurFunc)
|
||||||
|
}
|
||||||
|
|
||||||
|
if base.Flag.LowerR != 0 {
|
||||||
|
fmt.Println("generate intrinsic for", ir.FuncName(fn))
|
||||||
|
}
|
||||||
|
|
||||||
|
pos := fn.Pos()
|
||||||
|
ft := fn.Type()
|
||||||
|
var ret ir.Node
|
||||||
|
|
||||||
|
// For a method, it usually starts with an ODOTMETH (pre-typecheck) or
|
||||||
|
// OMETHEXPR (post-typecheck) referencing the method symbol without the
|
||||||
|
// receiver type, and Walk rewrites it to a call directly to the
|
||||||
|
// type-qualified method symbol, moving the receiver to an argument.
|
||||||
|
// Here fn has already the type-qualified method symbol, and it is hard
|
||||||
|
// to get the unqualified symbol. So we just generate the post-Walk form
|
||||||
|
// and mark it typechecked and Walked.
|
||||||
|
call := ir.NewCallExpr(pos, ir.OCALLFUNC, fn.Nname, nil)
|
||||||
|
call.Args = ir.RecvParamNames(ft)
|
||||||
|
call.IsDDD = ft.IsVariadic()
|
||||||
|
typecheck.Exprs(call.Args)
|
||||||
|
call.SetTypecheck(1)
|
||||||
|
call.SetWalked(true)
|
||||||
|
ret = call
|
||||||
|
if ft.NumResults() > 0 {
|
||||||
|
if ft.NumResults() == 1 {
|
||||||
|
call.SetType(ft.Result(0).Type)
|
||||||
|
} else {
|
||||||
|
call.SetType(ft.ResultsTuple())
|
||||||
|
}
|
||||||
|
n := ir.NewReturnStmt(base.Pos, nil)
|
||||||
|
n.Results = []ir.Node{call}
|
||||||
|
ret = n
|
||||||
|
}
|
||||||
|
fn.Body.Append(ret)
|
||||||
|
|
||||||
|
if base.Flag.LowerR != 0 {
|
||||||
|
ir.DumpList("generate intrinsic body", fn.Body)
|
||||||
|
}
|
||||||
|
|
||||||
|
ir.CurFunc = fn
|
||||||
|
typecheck.Stmts(fn.Body)
|
||||||
|
ir.CurFunc = nil // we know CurFunc is nil at entry
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -16,6 +16,9 @@ import (
|
||||||
|
|
||||||
var updateIntrinsics = flag.Bool("update", false, "Print an updated intrinsics table")
|
var updateIntrinsics = flag.Bool("update", false, "Print an updated intrinsics table")
|
||||||
|
|
||||||
|
// TODO turn on after SIMD is stable. The time burned keeping this test happy during SIMD development has already well exceeded any plausible benefit.
|
||||||
|
var simd = flag.Bool("simd", false, "Also check SIMD intrinsics; for now, it is noisy and not helpful")
|
||||||
|
|
||||||
type testIntrinsicKey struct {
|
type testIntrinsicKey struct {
|
||||||
archName string
|
archName string
|
||||||
pkg string
|
pkg string
|
||||||
|
|
@ -1403,13 +1406,13 @@ func TestIntrinsics(t *testing.T) {
|
||||||
gotIntrinsics[testIntrinsicKey{ik.arch.Name, ik.pkg, ik.fn}] = struct{}{}
|
gotIntrinsics[testIntrinsicKey{ik.arch.Name, ik.pkg, ik.fn}] = struct{}{}
|
||||||
}
|
}
|
||||||
for ik, _ := range gotIntrinsics {
|
for ik, _ := range gotIntrinsics {
|
||||||
if _, found := wantIntrinsics[ik]; !found {
|
if _, found := wantIntrinsics[ik]; !found && (ik.pkg != "simd" || *simd) {
|
||||||
t.Errorf("Got unwanted intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
|
t.Errorf("Got unwanted intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
for ik, _ := range wantIntrinsics {
|
for ik, _ := range wantIntrinsics {
|
||||||
if _, found := gotIntrinsics[ik]; !found {
|
if _, found := gotIntrinsics[ik]; !found && (ik.pkg != "simd" || *simd) {
|
||||||
t.Errorf("Want missing intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
|
t.Errorf("Want missing intrinsic %v %v.%v", ik.archName, ik.pkg, ik.fn)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
|
||||||
1771
src/cmd/compile/internal/ssagen/simdintrinsics.go
Normal file
1771
src/cmd/compile/internal/ssagen/simdintrinsics.go
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -156,6 +156,7 @@ func InitConfig() {
|
||||||
ir.Syms.Panicnildottype = typecheck.LookupRuntimeFunc("panicnildottype")
|
ir.Syms.Panicnildottype = typecheck.LookupRuntimeFunc("panicnildottype")
|
||||||
ir.Syms.Panicoverflow = typecheck.LookupRuntimeFunc("panicoverflow")
|
ir.Syms.Panicoverflow = typecheck.LookupRuntimeFunc("panicoverflow")
|
||||||
ir.Syms.Panicshift = typecheck.LookupRuntimeFunc("panicshift")
|
ir.Syms.Panicshift = typecheck.LookupRuntimeFunc("panicshift")
|
||||||
|
ir.Syms.PanicSimdImm = typecheck.LookupRuntimeFunc("panicSimdImm")
|
||||||
ir.Syms.Racefuncenter = typecheck.LookupRuntimeFunc("racefuncenter")
|
ir.Syms.Racefuncenter = typecheck.LookupRuntimeFunc("racefuncenter")
|
||||||
ir.Syms.Racefuncexit = typecheck.LookupRuntimeFunc("racefuncexit")
|
ir.Syms.Racefuncexit = typecheck.LookupRuntimeFunc("racefuncexit")
|
||||||
ir.Syms.Raceread = typecheck.LookupRuntimeFunc("raceread")
|
ir.Syms.Raceread = typecheck.LookupRuntimeFunc("raceread")
|
||||||
|
|
@ -165,9 +166,10 @@ func InitConfig() {
|
||||||
ir.Syms.TypeAssert = typecheck.LookupRuntimeFunc("typeAssert")
|
ir.Syms.TypeAssert = typecheck.LookupRuntimeFunc("typeAssert")
|
||||||
ir.Syms.WBZero = typecheck.LookupRuntimeFunc("wbZero")
|
ir.Syms.WBZero = typecheck.LookupRuntimeFunc("wbZero")
|
||||||
ir.Syms.WBMove = typecheck.LookupRuntimeFunc("wbMove")
|
ir.Syms.WBMove = typecheck.LookupRuntimeFunc("wbMove")
|
||||||
|
ir.Syms.X86HasAVX = typecheck.LookupRuntimeVar("x86HasAVX") // bool
|
||||||
|
ir.Syms.X86HasFMA = typecheck.LookupRuntimeVar("x86HasFMA") // bool
|
||||||
ir.Syms.X86HasPOPCNT = typecheck.LookupRuntimeVar("x86HasPOPCNT") // bool
|
ir.Syms.X86HasPOPCNT = typecheck.LookupRuntimeVar("x86HasPOPCNT") // bool
|
||||||
ir.Syms.X86HasSSE41 = typecheck.LookupRuntimeVar("x86HasSSE41") // bool
|
ir.Syms.X86HasSSE41 = typecheck.LookupRuntimeVar("x86HasSSE41") // bool
|
||||||
ir.Syms.X86HasFMA = typecheck.LookupRuntimeVar("x86HasFMA") // bool
|
|
||||||
ir.Syms.ARMHasVFPv4 = typecheck.LookupRuntimeVar("armHasVFPv4") // bool
|
ir.Syms.ARMHasVFPv4 = typecheck.LookupRuntimeVar("armHasVFPv4") // bool
|
||||||
ir.Syms.ARM64HasATOMICS = typecheck.LookupRuntimeVar("arm64HasATOMICS") // bool
|
ir.Syms.ARM64HasATOMICS = typecheck.LookupRuntimeVar("arm64HasATOMICS") // bool
|
||||||
ir.Syms.Loong64HasLAMCAS = typecheck.LookupRuntimeVar("loong64HasLAMCAS") // bool
|
ir.Syms.Loong64HasLAMCAS = typecheck.LookupRuntimeVar("loong64HasLAMCAS") // bool
|
||||||
|
|
@ -600,6 +602,9 @@ func buildssa(fn *ir.Func, worker int, isPgoHot bool) *ssa.Func {
|
||||||
// TODO figure out exactly what's unused, don't spill it. Make liveness fine-grained, also.
|
// TODO figure out exactly what's unused, don't spill it. Make liveness fine-grained, also.
|
||||||
for _, p := range params.InParams() {
|
for _, p := range params.InParams() {
|
||||||
typs, offs := p.RegisterTypesAndOffsets()
|
typs, offs := p.RegisterTypesAndOffsets()
|
||||||
|
if len(offs) < len(typs) {
|
||||||
|
s.Fatalf("len(offs)=%d < len(typs)=%d, params=\n%s", len(offs), len(typs), params)
|
||||||
|
}
|
||||||
for i, t := range typs {
|
for i, t := range typs {
|
||||||
o := offs[i] // offset within parameter
|
o := offs[i] // offset within parameter
|
||||||
fo := p.FrameOffset(params) // offset of parameter in frame
|
fo := p.FrameOffset(params) // offset of parameter in frame
|
||||||
|
|
@ -1333,6 +1338,11 @@ func (s *state) newValue4(op ssa.Op, t *types.Type, arg0, arg1, arg2, arg3 *ssa.
|
||||||
return s.curBlock.NewValue4(s.peekPos(), op, t, arg0, arg1, arg2, arg3)
|
return s.curBlock.NewValue4(s.peekPos(), op, t, arg0, arg1, arg2, arg3)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// newValue4A adds a new value with four arguments and an aux value to the current block.
|
||||||
|
func (s *state) newValue4A(op ssa.Op, t *types.Type, aux ssa.Aux, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
|
||||||
|
return s.curBlock.NewValue4A(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
|
||||||
|
}
|
||||||
|
|
||||||
// newValue4I adds a new value with four arguments and an auxint value to the current block.
|
// newValue4I adds a new value with four arguments and an auxint value to the current block.
|
||||||
func (s *state) newValue4I(op ssa.Op, t *types.Type, aux int64, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
|
func (s *state) newValue4I(op ssa.Op, t *types.Type, aux int64, arg0, arg1, arg2, arg3 *ssa.Value) *ssa.Value {
|
||||||
return s.curBlock.NewValue4I(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
|
return s.curBlock.NewValue4I(s.peekPos(), op, t, aux, arg0, arg1, arg2, arg3)
|
||||||
|
|
@ -1462,7 +1472,7 @@ func (s *state) instrument(t *types.Type, addr *ssa.Value, kind instrumentKind)
|
||||||
// If it is instrumenting for MSAN or ASAN and t is a struct type, it instruments
|
// If it is instrumenting for MSAN or ASAN and t is a struct type, it instruments
|
||||||
// operation for each field, instead of for the whole struct.
|
// operation for each field, instead of for the whole struct.
|
||||||
func (s *state) instrumentFields(t *types.Type, addr *ssa.Value, kind instrumentKind) {
|
func (s *state) instrumentFields(t *types.Type, addr *ssa.Value, kind instrumentKind) {
|
||||||
if !(base.Flag.MSan || base.Flag.ASan) || !t.IsStruct() {
|
if !(base.Flag.MSan || base.Flag.ASan) || !isStructNotSIMD(t) {
|
||||||
s.instrument(t, addr, kind)
|
s.instrument(t, addr, kind)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
@ -4585,7 +4595,7 @@ func (s *state) zeroVal(t *types.Type) *ssa.Value {
|
||||||
return s.constInterface(t)
|
return s.constInterface(t)
|
||||||
case t.IsSlice():
|
case t.IsSlice():
|
||||||
return s.constSlice(t)
|
return s.constSlice(t)
|
||||||
case t.IsStruct():
|
case isStructNotSIMD(t):
|
||||||
n := t.NumFields()
|
n := t.NumFields()
|
||||||
v := s.entryNewValue0(ssa.OpStructMake, t)
|
v := s.entryNewValue0(ssa.OpStructMake, t)
|
||||||
for i := 0; i < n; i++ {
|
for i := 0; i < n; i++ {
|
||||||
|
|
@ -4599,6 +4609,8 @@ func (s *state) zeroVal(t *types.Type) *ssa.Value {
|
||||||
case 1:
|
case 1:
|
||||||
return s.entryNewValue1(ssa.OpArrayMake1, t, s.zeroVal(t.Elem()))
|
return s.entryNewValue1(ssa.OpArrayMake1, t, s.zeroVal(t.Elem()))
|
||||||
}
|
}
|
||||||
|
case t.IsSIMD():
|
||||||
|
return s.newValue0(ssa.OpZeroSIMD, t)
|
||||||
}
|
}
|
||||||
s.Fatalf("zero for type %v not implemented", t)
|
s.Fatalf("zero for type %v not implemented", t)
|
||||||
return nil
|
return nil
|
||||||
|
|
@ -5578,7 +5590,7 @@ func (s *state) storeType(t *types.Type, left, right *ssa.Value, skip skipMask,
|
||||||
// do *left = right for all scalar (non-pointer) parts of t.
|
// do *left = right for all scalar (non-pointer) parts of t.
|
||||||
func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip skipMask) {
|
func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip skipMask) {
|
||||||
switch {
|
switch {
|
||||||
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex():
|
case t.IsBoolean() || t.IsInteger() || t.IsFloat() || t.IsComplex() || t.IsSIMD():
|
||||||
s.store(t, left, right)
|
s.store(t, left, right)
|
||||||
case t.IsPtrShaped():
|
case t.IsPtrShaped():
|
||||||
if t.IsPtr() && t.Elem().NotInHeap() {
|
if t.IsPtr() && t.Elem().NotInHeap() {
|
||||||
|
|
@ -5607,7 +5619,7 @@ func (s *state) storeTypeScalars(t *types.Type, left, right *ssa.Value, skip ski
|
||||||
// itab field doesn't need a write barrier (even though it is a pointer).
|
// itab field doesn't need a write barrier (even though it is a pointer).
|
||||||
itab := s.newValue1(ssa.OpITab, s.f.Config.Types.BytePtr, right)
|
itab := s.newValue1(ssa.OpITab, s.f.Config.Types.BytePtr, right)
|
||||||
s.store(types.Types[types.TUINTPTR], left, itab)
|
s.store(types.Types[types.TUINTPTR], left, itab)
|
||||||
case t.IsStruct():
|
case isStructNotSIMD(t):
|
||||||
n := t.NumFields()
|
n := t.NumFields()
|
||||||
for i := 0; i < n; i++ {
|
for i := 0; i < n; i++ {
|
||||||
ft := t.FieldType(i)
|
ft := t.FieldType(i)
|
||||||
|
|
@ -5644,7 +5656,7 @@ func (s *state) storeTypePtrs(t *types.Type, left, right *ssa.Value) {
|
||||||
idata := s.newValue1(ssa.OpIData, s.f.Config.Types.BytePtr, right)
|
idata := s.newValue1(ssa.OpIData, s.f.Config.Types.BytePtr, right)
|
||||||
idataAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.BytePtrPtr, s.config.PtrSize, left)
|
idataAddr := s.newValue1I(ssa.OpOffPtr, s.f.Config.Types.BytePtrPtr, s.config.PtrSize, left)
|
||||||
s.store(s.f.Config.Types.BytePtr, idataAddr, idata)
|
s.store(s.f.Config.Types.BytePtr, idataAddr, idata)
|
||||||
case t.IsStruct():
|
case isStructNotSIMD(t):
|
||||||
n := t.NumFields()
|
n := t.NumFields()
|
||||||
for i := 0; i < n; i++ {
|
for i := 0; i < n; i++ {
|
||||||
ft := t.FieldType(i)
|
ft := t.FieldType(i)
|
||||||
|
|
@ -6757,7 +6769,7 @@ func EmitArgInfo(f *ir.Func, abiInfo *abi.ABIParamResultInfo) *obj.LSym {
|
||||||
uintptrTyp := types.Types[types.TUINTPTR]
|
uintptrTyp := types.Types[types.TUINTPTR]
|
||||||
|
|
||||||
isAggregate := func(t *types.Type) bool {
|
isAggregate := func(t *types.Type) bool {
|
||||||
return t.IsStruct() || t.IsArray() || t.IsComplex() || t.IsInterface() || t.IsString() || t.IsSlice()
|
return isStructNotSIMD(t) || t.IsArray() || t.IsComplex() || t.IsInterface() || t.IsString() || t.IsSlice()
|
||||||
}
|
}
|
||||||
|
|
||||||
wOff := 0
|
wOff := 0
|
||||||
|
|
@ -6817,7 +6829,7 @@ func EmitArgInfo(f *ir.Func, abiInfo *abi.ABIParamResultInfo) *obj.LSym {
|
||||||
}
|
}
|
||||||
baseOffset += t.Elem().Size()
|
baseOffset += t.Elem().Size()
|
||||||
}
|
}
|
||||||
case t.IsStruct():
|
case isStructNotSIMD(t):
|
||||||
if t.NumFields() == 0 {
|
if t.NumFields() == 0 {
|
||||||
n++ // {} counts as a component
|
n++ // {} counts as a component
|
||||||
break
|
break
|
||||||
|
|
@ -7837,7 +7849,7 @@ func (s *State) UseArgs(n int64) {
|
||||||
// fieldIdx finds the index of the field referred to by the ODOT node n.
|
// fieldIdx finds the index of the field referred to by the ODOT node n.
|
||||||
func fieldIdx(n *ir.SelectorExpr) int {
|
func fieldIdx(n *ir.SelectorExpr) int {
|
||||||
t := n.X.Type()
|
t := n.X.Type()
|
||||||
if !t.IsStruct() {
|
if !isStructNotSIMD(t) {
|
||||||
panic("ODOT's LHS is not a struct")
|
panic("ODOT's LHS is not a struct")
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -8045,4 +8057,8 @@ func SpillSlotAddr(spill ssa.Spill, baseReg int16, extraOffset int64) obj.Addr {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func isStructNotSIMD(t *types.Type) bool {
|
||||||
|
return t.IsStruct() && !t.IsSIMD()
|
||||||
|
}
|
||||||
|
|
||||||
var BoundsCheckFunc [ssa.BoundsKindCount]*obj.LSym
|
var BoundsCheckFunc [ssa.BoundsKindCount]*obj.LSym
|
||||||
|
|
|
||||||
41
src/cmd/compile/internal/test/value_test.go
Normal file
41
src/cmd/compile/internal/test/value_test.go
Normal file
|
|
@ -0,0 +1,41 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"cmd/compile/internal/ssa"
|
||||||
|
"cmd/compile/internal/types"
|
||||||
|
"internal/buildcfg"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
// This file contains tests for ssa values, types and their utility functions.
|
||||||
|
|
||||||
|
func TestCanSSA(t *testing.T) {
|
||||||
|
i64 := types.Types[types.TINT64]
|
||||||
|
v128 := types.TypeVec128
|
||||||
|
s1 := mkstruct(i64, mkstruct(i64, i64, i64, i64))
|
||||||
|
if ssa.CanSSA(s1) {
|
||||||
|
// Test size check for struct.
|
||||||
|
t.Errorf("CanSSA(%v) returned true, expected false", s1)
|
||||||
|
}
|
||||||
|
a1 := types.NewArray(s1, 1)
|
||||||
|
if ssa.CanSSA(a1) {
|
||||||
|
// Test size check for array.
|
||||||
|
t.Errorf("CanSSA(%v) returned true, expected false", a1)
|
||||||
|
}
|
||||||
|
if buildcfg.Experiment.SIMD {
|
||||||
|
s2 := mkstruct(v128, v128, v128, v128)
|
||||||
|
if !ssa.CanSSA(s2) {
|
||||||
|
// Test size check for SIMD struct special case.
|
||||||
|
t.Errorf("CanSSA(%v) returned false, expected true", s2)
|
||||||
|
}
|
||||||
|
a2 := types.NewArray(s2, 1)
|
||||||
|
if !ssa.CanSSA(a2) {
|
||||||
|
// Test size check for SIMD array special case.
|
||||||
|
t.Errorf("CanSSA(%v) returned false, expected true", a2)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -292,9 +292,10 @@ func libfuzzerHookEqualFold(string, string, uint)
|
||||||
func addCovMeta(p unsafe.Pointer, len uint32, hash [16]byte, pkpath string, pkgId int, cmode uint8, cgran uint8) uint32
|
func addCovMeta(p unsafe.Pointer, len uint32, hash [16]byte, pkpath string, pkgId int, cmode uint8, cgran uint8) uint32
|
||||||
|
|
||||||
// architecture variants
|
// architecture variants
|
||||||
|
var x86HasAVX bool
|
||||||
|
var x86HasFMA bool
|
||||||
var x86HasPOPCNT bool
|
var x86HasPOPCNT bool
|
||||||
var x86HasSSE41 bool
|
var x86HasSSE41 bool
|
||||||
var x86HasFMA bool
|
|
||||||
var armHasVFPv4 bool
|
var armHasVFPv4 bool
|
||||||
var arm64HasATOMICS bool
|
var arm64HasATOMICS bool
|
||||||
var loong64HasLAMCAS bool
|
var loong64HasLAMCAS bool
|
||||||
|
|
|
||||||
|
|
@ -239,9 +239,10 @@ var runtimeDecls = [...]struct {
|
||||||
{"libfuzzerHookStrCmp", funcTag, 163},
|
{"libfuzzerHookStrCmp", funcTag, 163},
|
||||||
{"libfuzzerHookEqualFold", funcTag, 163},
|
{"libfuzzerHookEqualFold", funcTag, 163},
|
||||||
{"addCovMeta", funcTag, 165},
|
{"addCovMeta", funcTag, 165},
|
||||||
|
{"x86HasAVX", varTag, 6},
|
||||||
|
{"x86HasFMA", varTag, 6},
|
||||||
{"x86HasPOPCNT", varTag, 6},
|
{"x86HasPOPCNT", varTag, 6},
|
||||||
{"x86HasSSE41", varTag, 6},
|
{"x86HasSSE41", varTag, 6},
|
||||||
{"x86HasFMA", varTag, 6},
|
|
||||||
{"armHasVFPv4", varTag, 6},
|
{"armHasVFPv4", varTag, 6},
|
||||||
{"arm64HasATOMICS", varTag, 6},
|
{"arm64HasATOMICS", varTag, 6},
|
||||||
{"loong64HasLAMCAS", varTag, 6},
|
{"loong64HasLAMCAS", varTag, 6},
|
||||||
|
|
|
||||||
|
|
@ -10,6 +10,7 @@ import (
|
||||||
|
|
||||||
"cmd/compile/internal/base"
|
"cmd/compile/internal/base"
|
||||||
"cmd/internal/src"
|
"cmd/internal/src"
|
||||||
|
"internal/buildcfg"
|
||||||
"internal/types/errors"
|
"internal/types/errors"
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
@ -452,6 +453,31 @@ func CalcSize(t *Type) {
|
||||||
ResumeCheckSize()
|
ResumeCheckSize()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// simdify marks as type as "SIMD", either as a tag field,
|
||||||
|
// or having the SIMD attribute. The tag field is a marker
|
||||||
|
// type used to identify a struct that is not really a struct.
|
||||||
|
// A SIMD type is allocated to a vector register (on amd64,
|
||||||
|
// xmm, ymm, or zmm). The fields of a SIMD type are ignored
|
||||||
|
// by the compiler except for the space that they reserve.
|
||||||
|
func simdify(st *Type, isTag bool) {
|
||||||
|
st.align = 8
|
||||||
|
st.alg = ANOALG // not comparable with ==
|
||||||
|
st.intRegs = 0
|
||||||
|
st.isSIMD = true
|
||||||
|
if isTag {
|
||||||
|
st.width = 0
|
||||||
|
st.isSIMDTag = true
|
||||||
|
st.floatRegs = 0
|
||||||
|
} else {
|
||||||
|
st.floatRegs = 1
|
||||||
|
}
|
||||||
|
// if st.Sym() != nil {
|
||||||
|
// base.Warn("Simdify %s, %v, %d", st.Sym().Name, isTag, st.width)
|
||||||
|
// } else {
|
||||||
|
// base.Warn("Simdify %v, %v, %d", st, isTag, st.width)
|
||||||
|
// }
|
||||||
|
}
|
||||||
|
|
||||||
// CalcStructSize calculates the size of t,
|
// CalcStructSize calculates the size of t,
|
||||||
// filling in t.width, t.align, t.intRegs, and t.floatRegs,
|
// filling in t.width, t.align, t.intRegs, and t.floatRegs,
|
||||||
// even if size calculation is otherwise disabled.
|
// even if size calculation is otherwise disabled.
|
||||||
|
|
@ -464,10 +490,27 @@ func CalcStructSize(t *Type) {
|
||||||
switch {
|
switch {
|
||||||
case sym.Name == "align64" && isAtomicStdPkg(sym.Pkg):
|
case sym.Name == "align64" && isAtomicStdPkg(sym.Pkg):
|
||||||
maxAlign = 8
|
maxAlign = 8
|
||||||
|
|
||||||
|
case buildcfg.Experiment.SIMD && (sym.Pkg.Path == "internal/simd" || sym.Pkg.Path == "simd") && len(t.Fields()) >= 1:
|
||||||
|
// This gates the experiment -- without it, no user-visible types can be "simd".
|
||||||
|
// The SSA-visible SIMD types remain.
|
||||||
|
// TODO after simd has been moved to package simd, remove internal/simd.
|
||||||
|
switch sym.Name {
|
||||||
|
case "v128":
|
||||||
|
simdify(t, true)
|
||||||
|
return
|
||||||
|
case "v256":
|
||||||
|
simdify(t, true)
|
||||||
|
return
|
||||||
|
case "v512":
|
||||||
|
simdify(t, true)
|
||||||
|
return
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
fields := t.Fields()
|
fields := t.Fields()
|
||||||
|
|
||||||
size := calcStructOffset(t, fields, 0)
|
size := calcStructOffset(t, fields, 0)
|
||||||
|
|
||||||
// For non-zero-sized structs which end in a zero-sized field, we
|
// For non-zero-sized structs which end in a zero-sized field, we
|
||||||
|
|
@ -540,6 +583,11 @@ func CalcStructSize(t *Type) {
|
||||||
break
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if len(t.Fields()) >= 1 && t.Fields()[0].Type.isSIMDTag {
|
||||||
|
// this catches `type Foo simd.Whatever` -- Foo is also SIMD.
|
||||||
|
simdify(t, false)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// CalcArraySize calculates the size of t,
|
// CalcArraySize calculates the size of t,
|
||||||
|
|
|
||||||
|
|
@ -202,6 +202,7 @@ type Type struct {
|
||||||
|
|
||||||
flags bitset8
|
flags bitset8
|
||||||
alg AlgKind // valid if Align > 0
|
alg AlgKind // valid if Align > 0
|
||||||
|
isSIMDTag, isSIMD bool // tag is the marker type, isSIMD means has marker type
|
||||||
|
|
||||||
// size of prefix of object that contains all pointers. valid if Align > 0.
|
// size of prefix of object that contains all pointers. valid if Align > 0.
|
||||||
// Note that for pointers, this is always PtrSize even if the element type
|
// Note that for pointers, this is always PtrSize even if the element type
|
||||||
|
|
@ -594,6 +595,12 @@ func newSSA(name string) *Type {
|
||||||
return t
|
return t
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func newSIMD(name string) *Type {
|
||||||
|
t := newSSA(name)
|
||||||
|
t.isSIMD = true
|
||||||
|
return t
|
||||||
|
}
|
||||||
|
|
||||||
// NewMap returns a new map Type with key type k and element (aka value) type v.
|
// NewMap returns a new map Type with key type k and element (aka value) type v.
|
||||||
func NewMap(k, v *Type) *Type {
|
func NewMap(k, v *Type) *Type {
|
||||||
t := newType(TMAP)
|
t := newType(TMAP)
|
||||||
|
|
@ -982,17 +989,16 @@ func (t *Type) ArgWidth() int64 {
|
||||||
return t.extra.(*Func).Argwid
|
return t.extra.(*Func).Argwid
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Size returns the width of t in bytes.
|
||||||
func (t *Type) Size() int64 {
|
func (t *Type) Size() int64 {
|
||||||
if t.kind == TSSA {
|
if t.kind == TSSA {
|
||||||
if t == TypeInt128 {
|
return t.width
|
||||||
return 16
|
|
||||||
}
|
|
||||||
return 0
|
|
||||||
}
|
}
|
||||||
CalcSize(t)
|
CalcSize(t)
|
||||||
return t.width
|
return t.width
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Alignment returns the alignment of t in bytes.
|
||||||
func (t *Type) Alignment() int64 {
|
func (t *Type) Alignment() int64 {
|
||||||
CalcSize(t)
|
CalcSize(t)
|
||||||
return int64(t.align)
|
return int64(t.align)
|
||||||
|
|
@ -1598,12 +1604,26 @@ var (
|
||||||
TypeFlags = newSSA("flags")
|
TypeFlags = newSSA("flags")
|
||||||
TypeVoid = newSSA("void")
|
TypeVoid = newSSA("void")
|
||||||
TypeInt128 = newSSA("int128")
|
TypeInt128 = newSSA("int128")
|
||||||
|
TypeVec128 = newSIMD("vec128")
|
||||||
|
TypeVec256 = newSIMD("vec256")
|
||||||
|
TypeVec512 = newSIMD("vec512")
|
||||||
|
TypeMask = newSIMD("mask") // not a vector, not 100% sure what this should be.
|
||||||
TypeResultMem = newResults([]*Type{TypeMem})
|
TypeResultMem = newResults([]*Type{TypeMem})
|
||||||
)
|
)
|
||||||
|
|
||||||
func init() {
|
func init() {
|
||||||
TypeInt128.width = 16
|
TypeInt128.width = 16
|
||||||
TypeInt128.align = 8
|
TypeInt128.align = 8
|
||||||
|
|
||||||
|
TypeVec128.width = 16
|
||||||
|
TypeVec128.align = 8
|
||||||
|
TypeVec256.width = 32
|
||||||
|
TypeVec256.align = 8
|
||||||
|
TypeVec512.width = 64
|
||||||
|
TypeVec512.align = 8
|
||||||
|
|
||||||
|
TypeMask.width = 8 // This will depend on the architecture; spilling will be "interesting".
|
||||||
|
TypeMask.align = 8
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewNamed returns a new named type for the given type name. obj should be an
|
// NewNamed returns a new named type for the given type name. obj should be an
|
||||||
|
|
@ -1963,3 +1983,7 @@ var SimType [NTYPE]Kind
|
||||||
|
|
||||||
// Fake package for shape types (see typecheck.Shapify()).
|
// Fake package for shape types (see typecheck.Shapify()).
|
||||||
var ShapePkg = NewPkg("go.shape", "go.shape")
|
var ShapePkg = NewPkg("go.shape", "go.shape")
|
||||||
|
|
||||||
|
func (t *Type) IsSIMD() bool {
|
||||||
|
return t.isSIMD
|
||||||
|
}
|
||||||
|
|
|
||||||
|
|
@ -361,6 +361,8 @@ var excluded = map[string]bool{
|
||||||
"builtin": true,
|
"builtin": true,
|
||||||
"cmd/compile/internal/ssa/_gen": true,
|
"cmd/compile/internal/ssa/_gen": true,
|
||||||
"runtime/_mkmalloc": true,
|
"runtime/_mkmalloc": true,
|
||||||
|
"simd/_gen/simdgen": true,
|
||||||
|
"simd/_gen/unify": true,
|
||||||
}
|
}
|
||||||
|
|
||||||
// printPackageMu synchronizes the printing of type-checked package files in
|
// printPackageMu synchronizes the printing of type-checked package files in
|
||||||
|
|
|
||||||
4
src/cmd/dist/test.go
vendored
4
src/cmd/dist/test.go
vendored
|
|
@ -956,7 +956,9 @@ func (t *tester) registerTests() {
|
||||||
// which is darwin,linux,windows/amd64 and darwin/arm64.
|
// which is darwin,linux,windows/amd64 and darwin/arm64.
|
||||||
//
|
//
|
||||||
// The same logic applies to the release notes that correspond to each api/next file.
|
// The same logic applies to the release notes that correspond to each api/next file.
|
||||||
if goos == "darwin" || ((goos == "linux" || goos == "windows") && goarch == "amd64") {
|
//
|
||||||
|
// TODO: remove the exclusion of goexperiment simd right before dev.simd branch is merged to master.
|
||||||
|
if goos == "darwin" || ((goos == "linux" || goos == "windows") && (goarch == "amd64" && !strings.Contains(goexperiment, "simd"))) {
|
||||||
t.registerTest("API release note check", &goTest{variant: "check", pkg: "cmd/relnote", testFlags: []string{"-check"}})
|
t.registerTest("API release note check", &goTest{variant: "check", pkg: "cmd/relnote", testFlags: []string{"-check"}})
|
||||||
t.registerTest("API check", &goTest{variant: "check", pkg: "cmd/api", timeout: 5 * time.Minute, testFlags: []string{"-check"}})
|
t.registerTest("API check", &goTest{variant: "check", pkg: "cmd/api", timeout: 5 * time.Minute, testFlags: []string{"-check"}})
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -236,7 +236,7 @@ func progedit(ctxt *obj.Link, p *obj.Prog, newprog obj.ProgAlloc) {
|
||||||
// Rewrite float constants to values stored in memory.
|
// Rewrite float constants to values stored in memory.
|
||||||
switch p.As {
|
switch p.As {
|
||||||
// Convert AMOVSS $(0), Xx to AXORPS Xx, Xx
|
// Convert AMOVSS $(0), Xx to AXORPS Xx, Xx
|
||||||
case AMOVSS:
|
case AMOVSS, AVMOVSS:
|
||||||
if p.From.Type == obj.TYPE_FCONST {
|
if p.From.Type == obj.TYPE_FCONST {
|
||||||
// f == 0 can't be used here due to -0, so use Float64bits
|
// f == 0 can't be used here due to -0, so use Float64bits
|
||||||
if f := p.From.Val.(float64); math.Float64bits(f) == 0 {
|
if f := p.From.Val.(float64); math.Float64bits(f) == 0 {
|
||||||
|
|
@ -272,7 +272,7 @@ func progedit(ctxt *obj.Link, p *obj.Prog, newprog obj.ProgAlloc) {
|
||||||
p.From.Offset = 0
|
p.From.Offset = 0
|
||||||
}
|
}
|
||||||
|
|
||||||
case AMOVSD:
|
case AMOVSD, AVMOVSD:
|
||||||
// Convert AMOVSD $(0), Xx to AXORPS Xx, Xx
|
// Convert AMOVSD $(0), Xx to AXORPS Xx, Xx
|
||||||
if p.From.Type == obj.TYPE_FCONST {
|
if p.From.Type == obj.TYPE_FCONST {
|
||||||
// f == 0 can't be used here due to -0, so use Float64bits
|
// f == 0 can't be used here due to -0, so use Float64bits
|
||||||
|
|
|
||||||
|
|
@ -67,7 +67,7 @@ var (
|
||||||
|
|
||||||
// dirs are the directories to look for *.go files in.
|
// dirs are the directories to look for *.go files in.
|
||||||
// TODO(bradfitz): just use all directories?
|
// TODO(bradfitz): just use all directories?
|
||||||
dirs = []string{".", "ken", "chan", "interface", "internal/runtime/sys", "syntax", "dwarf", "fixedbugs", "codegen", "abi", "typeparam", "typeparam/mdempsky", "arenas"}
|
dirs = []string{".", "ken", "chan", "interface", "internal/runtime/sys", "syntax", "dwarf", "fixedbugs", "codegen", "abi", "typeparam", "typeparam/mdempsky", "arenas", "simd"}
|
||||||
)
|
)
|
||||||
|
|
||||||
// Test is the main entrypoint that runs tests in the GOROOT/test directory.
|
// Test is the main entrypoint that runs tests in the GOROOT/test directory.
|
||||||
|
|
|
||||||
|
|
@ -54,6 +54,7 @@ var depsRules = `
|
||||||
internal/goexperiment,
|
internal/goexperiment,
|
||||||
internal/goos,
|
internal/goos,
|
||||||
internal/goversion,
|
internal/goversion,
|
||||||
|
internal/itoa,
|
||||||
internal/nettrace,
|
internal/nettrace,
|
||||||
internal/platform,
|
internal/platform,
|
||||||
internal/profilerecord,
|
internal/profilerecord,
|
||||||
|
|
@ -71,6 +72,8 @@ var depsRules = `
|
||||||
internal/byteorder, internal/cpu, internal/goarch < internal/chacha8rand;
|
internal/byteorder, internal/cpu, internal/goarch < internal/chacha8rand;
|
||||||
internal/goarch, math/bits < internal/strconv;
|
internal/goarch, math/bits < internal/strconv;
|
||||||
|
|
||||||
|
internal/cpu, internal/strconv < simd;
|
||||||
|
|
||||||
# RUNTIME is the core runtime group of packages, all of them very light-weight.
|
# RUNTIME is the core runtime group of packages, all of them very light-weight.
|
||||||
internal/abi,
|
internal/abi,
|
||||||
internal/chacha8rand,
|
internal/chacha8rand,
|
||||||
|
|
@ -80,6 +83,7 @@ var depsRules = `
|
||||||
internal/godebugs,
|
internal/godebugs,
|
||||||
internal/goexperiment,
|
internal/goexperiment,
|
||||||
internal/goos,
|
internal/goos,
|
||||||
|
internal/itoa,
|
||||||
internal/profilerecord,
|
internal/profilerecord,
|
||||||
internal/strconv,
|
internal/strconv,
|
||||||
internal/trace/tracev2,
|
internal/trace/tracev2,
|
||||||
|
|
@ -697,6 +701,9 @@ var depsRules = `
|
||||||
FMT, DEBUG, flag, runtime/trace, internal/sysinfo, math/rand
|
FMT, DEBUG, flag, runtime/trace, internal/sysinfo, math/rand
|
||||||
< testing;
|
< testing;
|
||||||
|
|
||||||
|
testing, math
|
||||||
|
< simd/internal/test_helpers;
|
||||||
|
|
||||||
log/slog, testing
|
log/slog, testing
|
||||||
< testing/slogtest;
|
< testing/slogtest;
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,6 @@ echo "// Copyright 2022 The Go Authors. All rights reserved.
|
||||||
package comment
|
package comment
|
||||||
|
|
||||||
var stdPkgs = []string{"
|
var stdPkgs = []string{"
|
||||||
go list std | grep -v / | sort | sed 's/.*/"&",/'
|
GOEXPERIMENT=none go list std | grep -v / | sort | sed 's/.*/"&",/'
|
||||||
echo "}"
|
echo "}"
|
||||||
) | gofmt >std.go.tmp && mv std.go.tmp std.go
|
) | gofmt >std.go.tmp && mv std.go.tmp std.go
|
||||||
|
|
|
||||||
|
|
@ -13,7 +13,9 @@ import (
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestStd(t *testing.T) {
|
func TestStd(t *testing.T) {
|
||||||
out, err := testenv.Command(t, testenv.GoToolPath(t), "list", "std").CombinedOutput()
|
cmd := testenv.Command(t, testenv.GoToolPath(t), "list", "std")
|
||||||
|
cmd.Env = append(cmd.Environ(), "GOEXPERIMENT=none")
|
||||||
|
out, err := cmd.CombinedOutput()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
t.Fatalf("%v\n%s", err, out)
|
t.Fatalf("%v\n%s", err, out)
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -361,6 +361,8 @@ var excluded = map[string]bool{
|
||||||
"builtin": true,
|
"builtin": true,
|
||||||
"cmd/compile/internal/ssa/_gen": true,
|
"cmd/compile/internal/ssa/_gen": true,
|
||||||
"runtime/_mkmalloc": true,
|
"runtime/_mkmalloc": true,
|
||||||
|
"simd/_gen/simdgen": true,
|
||||||
|
"simd/_gen/unify": true,
|
||||||
}
|
}
|
||||||
|
|
||||||
// printPackageMu synchronizes the printing of type-checked package files in
|
// printPackageMu synchronizes the printing of type-checked package files in
|
||||||
|
|
|
||||||
|
|
@ -88,8 +88,6 @@ func ParseGOEXPERIMENT(goos, goarch, goexp string) (*ExperimentFlags, error) {
|
||||||
SizeSpecializedMalloc: true,
|
SizeSpecializedMalloc: true,
|
||||||
GreenTeaGC: true,
|
GreenTeaGC: true,
|
||||||
}
|
}
|
||||||
|
|
||||||
// Start with the statically enabled set of experiments.
|
|
||||||
flags := &ExperimentFlags{
|
flags := &ExperimentFlags{
|
||||||
Flags: baseline,
|
Flags: baseline,
|
||||||
baseline: baseline,
|
baseline: baseline,
|
||||||
|
|
|
||||||
|
|
@ -25,17 +25,22 @@ var X86 struct {
|
||||||
HasAES bool
|
HasAES bool
|
||||||
HasADX bool
|
HasADX bool
|
||||||
HasAVX bool
|
HasAVX bool
|
||||||
|
HasAVXVNNI bool
|
||||||
HasAVX2 bool
|
HasAVX2 bool
|
||||||
HasAVX512 bool // Virtual feature: F+CD+BW+DQ+VL
|
HasAVX512 bool // Virtual feature: F+CD+BW+DQ+VL
|
||||||
HasAVX512F bool
|
HasAVX512F bool
|
||||||
HasAVX512CD bool
|
HasAVX512CD bool
|
||||||
HasAVX512BITALG bool
|
|
||||||
HasAVX512BW bool
|
HasAVX512BW bool
|
||||||
HasAVX512DQ bool
|
HasAVX512DQ bool
|
||||||
HasAVX512VL bool
|
HasAVX512VL bool
|
||||||
HasAVX512VPCLMULQDQ bool
|
HasAVX512GFNI bool
|
||||||
|
HasAVX512VAES bool
|
||||||
|
HasAVX512VNNI bool
|
||||||
HasAVX512VBMI bool
|
HasAVX512VBMI bool
|
||||||
HasAVX512VBMI2 bool
|
HasAVX512VBMI2 bool
|
||||||
|
HasAVX512BITALG bool
|
||||||
|
HasAVX512VPOPCNTDQ bool
|
||||||
|
HasAVX512VPCLMULQDQ bool
|
||||||
HasBMI1 bool
|
HasBMI1 bool
|
||||||
HasBMI2 bool
|
HasBMI2 bool
|
||||||
HasERMS bool
|
HasERMS bool
|
||||||
|
|
|
||||||
|
|
@ -6,8 +6,6 @@
|
||||||
|
|
||||||
package cpu
|
package cpu
|
||||||
|
|
||||||
import _ "unsafe" // for linkname
|
|
||||||
|
|
||||||
func osInit() {
|
func osInit() {
|
||||||
// macOS 12 moved these to the hw.optional.arm tree, but as of Go 1.24 we
|
// macOS 12 moved these to the hw.optional.arm tree, but as of Go 1.24 we
|
||||||
// still support macOS 11. See [Determine Encryption Capabilities].
|
// still support macOS 11. See [Determine Encryption Capabilities].
|
||||||
|
|
@ -29,24 +27,3 @@ func osInit() {
|
||||||
ARM64.HasSHA1 = true
|
ARM64.HasSHA1 = true
|
||||||
ARM64.HasSHA2 = true
|
ARM64.HasSHA2 = true
|
||||||
}
|
}
|
||||||
|
|
||||||
//go:noescape
|
|
||||||
func getsysctlbyname(name []byte) (int32, int32)
|
|
||||||
|
|
||||||
// sysctlEnabled should be an internal detail,
|
|
||||||
// but widely used packages access it using linkname.
|
|
||||||
// Notable members of the hall of shame include:
|
|
||||||
// - github.com/bytedance/gopkg
|
|
||||||
// - github.com/songzhibin97/gkit
|
|
||||||
//
|
|
||||||
// Do not remove or change the type signature.
|
|
||||||
// See go.dev/issue/67401.
|
|
||||||
//
|
|
||||||
//go:linkname sysctlEnabled
|
|
||||||
func sysctlEnabled(name []byte) bool {
|
|
||||||
ret, value := getsysctlbyname(name)
|
|
||||||
if ret < 0 {
|
|
||||||
return false
|
|
||||||
}
|
|
||||||
return value > 0
|
|
||||||
}
|
|
||||||
|
|
|
||||||
72
src/internal/cpu/cpu_darwin.go
Normal file
72
src/internal/cpu/cpu_darwin.go
Normal file
|
|
@ -0,0 +1,72 @@
|
||||||
|
// Copyright 2020 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
//go:build darwin && !ios
|
||||||
|
|
||||||
|
package cpu
|
||||||
|
|
||||||
|
import _ "unsafe" // for linkname
|
||||||
|
|
||||||
|
// Pushed from runtime.
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func sysctlbynameInt32(name []byte) (int32, int32)
|
||||||
|
|
||||||
|
// Pushed from runtime.
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func sysctlbynameBytes(name, out []byte) int32
|
||||||
|
|
||||||
|
// sysctlEnabled should be an internal detail,
|
||||||
|
// but widely used packages access it using linkname.
|
||||||
|
// Notable members of the hall of shame include:
|
||||||
|
// - github.com/bytedance/gopkg
|
||||||
|
// - github.com/songzhibin97/gkit
|
||||||
|
//
|
||||||
|
// Do not remove or change the type signature.
|
||||||
|
// See go.dev/issue/67401.
|
||||||
|
//
|
||||||
|
//go:linkname sysctlEnabled
|
||||||
|
func sysctlEnabled(name []byte) bool {
|
||||||
|
ret, value := sysctlbynameInt32(name)
|
||||||
|
if ret < 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return value > 0
|
||||||
|
}
|
||||||
|
|
||||||
|
// darwinKernelVersionCheck reports if Darwin kernel version is at
|
||||||
|
// least major.minor.patch.
|
||||||
|
//
|
||||||
|
// Code borrowed from x/sys/cpu.
|
||||||
|
func darwinKernelVersionCheck(major, minor, patch int) bool {
|
||||||
|
var release [256]byte
|
||||||
|
ret := sysctlbynameBytes([]byte("kern.osrelease\x00"), release[:])
|
||||||
|
if ret < 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
var mmp [3]int
|
||||||
|
c := 0
|
||||||
|
Loop:
|
||||||
|
for _, b := range release[:] {
|
||||||
|
switch {
|
||||||
|
case b >= '0' && b <= '9':
|
||||||
|
mmp[c] = 10*mmp[c] + int(b-'0')
|
||||||
|
case b == '.':
|
||||||
|
c++
|
||||||
|
if c > 2 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
case b == 0:
|
||||||
|
break Loop
|
||||||
|
default:
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if c != 2 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
return mmp[0] > major || mmp[0] == major && (mmp[1] > minor || mmp[1] == minor && mmp[2] >= patch)
|
||||||
|
}
|
||||||
|
|
@ -18,11 +18,21 @@ func xgetbv() (eax, edx uint32)
|
||||||
func getGOAMD64level() int32
|
func getGOAMD64level() int32
|
||||||
|
|
||||||
const (
|
const (
|
||||||
// Bits returned in ECX for CPUID EAX=0x1 ECX=0x0
|
// eax bits
|
||||||
|
cpuid_AVXVNNI = 1 << 4
|
||||||
|
|
||||||
|
// ecx bits
|
||||||
cpuid_SSE3 = 1 << 0
|
cpuid_SSE3 = 1 << 0
|
||||||
cpuid_PCLMULQDQ = 1 << 1
|
cpuid_PCLMULQDQ = 1 << 1
|
||||||
|
cpuid_AVX512VBMI = 1 << 1
|
||||||
|
cpuid_AVX512VBMI2 = 1 << 6
|
||||||
cpuid_SSSE3 = 1 << 9
|
cpuid_SSSE3 = 1 << 9
|
||||||
|
cpuid_AVX512GFNI = 1 << 8
|
||||||
|
cpuid_AVX512VAES = 1 << 9
|
||||||
|
cpuid_AVX512VNNI = 1 << 11
|
||||||
|
cpuid_AVX512BITALG = 1 << 12
|
||||||
cpuid_FMA = 1 << 12
|
cpuid_FMA = 1 << 12
|
||||||
|
cpuid_AVX512VPOPCNTDQ = 1 << 14
|
||||||
cpuid_SSE41 = 1 << 19
|
cpuid_SSE41 = 1 << 19
|
||||||
cpuid_SSE42 = 1 << 20
|
cpuid_SSE42 = 1 << 20
|
||||||
cpuid_POPCNT = 1 << 23
|
cpuid_POPCNT = 1 << 23
|
||||||
|
|
@ -105,6 +115,7 @@ func doinit() {
|
||||||
maxID, _, _, _ := cpuid(0, 0)
|
maxID, _, _, _ := cpuid(0, 0)
|
||||||
|
|
||||||
if maxID < 1 {
|
if maxID < 1 {
|
||||||
|
osInit()
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -149,10 +160,11 @@ func doinit() {
|
||||||
X86.HasAVX = isSet(ecx1, cpuid_AVX) && osSupportsAVX
|
X86.HasAVX = isSet(ecx1, cpuid_AVX) && osSupportsAVX
|
||||||
|
|
||||||
if maxID < 7 {
|
if maxID < 7 {
|
||||||
|
osInit()
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
_, ebx7, ecx7, edx7 := cpuid(7, 0)
|
eax7, ebx7, ecx7, edx7 := cpuid(7, 0)
|
||||||
X86.HasBMI1 = isSet(ebx7, cpuid_BMI1)
|
X86.HasBMI1 = isSet(ebx7, cpuid_BMI1)
|
||||||
X86.HasAVX2 = isSet(ebx7, cpuid_AVX2) && osSupportsAVX
|
X86.HasAVX2 = isSet(ebx7, cpuid_AVX2) && osSupportsAVX
|
||||||
X86.HasBMI2 = isSet(ebx7, cpuid_BMI2)
|
X86.HasBMI2 = isSet(ebx7, cpuid_BMI2)
|
||||||
|
|
@ -166,6 +178,13 @@ func doinit() {
|
||||||
X86.HasAVX512BW = isSet(ebx7, cpuid_AVX512BW)
|
X86.HasAVX512BW = isSet(ebx7, cpuid_AVX512BW)
|
||||||
X86.HasAVX512DQ = isSet(ebx7, cpuid_AVX512DQ)
|
X86.HasAVX512DQ = isSet(ebx7, cpuid_AVX512DQ)
|
||||||
X86.HasAVX512VL = isSet(ebx7, cpuid_AVX512VL)
|
X86.HasAVX512VL = isSet(ebx7, cpuid_AVX512VL)
|
||||||
|
X86.HasAVX512GFNI = isSet(ecx7, cpuid_AVX512GFNI)
|
||||||
|
X86.HasAVX512BITALG = isSet(ecx7, cpuid_AVX512BITALG)
|
||||||
|
X86.HasAVX512VPOPCNTDQ = isSet(ecx7, cpuid_AVX512VPOPCNTDQ)
|
||||||
|
X86.HasAVX512VBMI = isSet(ecx7, cpuid_AVX512VBMI)
|
||||||
|
X86.HasAVX512VBMI2 = isSet(ecx7, cpuid_AVX512VBMI2)
|
||||||
|
X86.HasAVX512VAES = isSet(ecx7, cpuid_AVX512VAES)
|
||||||
|
X86.HasAVX512VNNI = isSet(ecx7, cpuid_AVX512VNNI)
|
||||||
X86.HasAVX512VPCLMULQDQ = isSet(ecx7, cpuid_AVX512VPCLMULQDQ)
|
X86.HasAVX512VPCLMULQDQ = isSet(ecx7, cpuid_AVX512VPCLMULQDQ)
|
||||||
X86.HasAVX512VBMI = isSet(ecx7, cpuid_AVX512_VBMI)
|
X86.HasAVX512VBMI = isSet(ecx7, cpuid_AVX512_VBMI)
|
||||||
X86.HasAVX512VBMI2 = isSet(ecx7, cpuid_AVX512_VBMI2)
|
X86.HasAVX512VBMI2 = isSet(ecx7, cpuid_AVX512_VBMI2)
|
||||||
|
|
@ -179,6 +198,7 @@ func doinit() {
|
||||||
maxExtendedInformation, _, _, _ = cpuid(0x80000000, 0)
|
maxExtendedInformation, _, _, _ = cpuid(0x80000000, 0)
|
||||||
|
|
||||||
if maxExtendedInformation < 0x80000001 {
|
if maxExtendedInformation < 0x80000001 {
|
||||||
|
osInit()
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -195,6 +215,15 @@ func doinit() {
|
||||||
// included in AVX10.1.
|
// included in AVX10.1.
|
||||||
X86.HasAVX512 = X86.HasAVX512F && X86.HasAVX512CD && X86.HasAVX512BW && X86.HasAVX512DQ && X86.HasAVX512VL
|
X86.HasAVX512 = X86.HasAVX512F && X86.HasAVX512CD && X86.HasAVX512BW && X86.HasAVX512DQ && X86.HasAVX512VL
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if eax7 >= 1 {
|
||||||
|
eax71, _, _, _ := cpuid(7, 1)
|
||||||
|
if X86.HasAVX {
|
||||||
|
X86.HasAVXVNNI = isSet(4, eax71)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
osInit()
|
||||||
}
|
}
|
||||||
|
|
||||||
func isSet(hwc uint32, value uint32) bool {
|
func isSet(hwc uint32, value uint32) bool {
|
||||||
|
|
|
||||||
23
src/internal/cpu/cpu_x86_darwin.go
Normal file
23
src/internal/cpu/cpu_x86_darwin.go
Normal file
|
|
@ -0,0 +1,23 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
//go:build (386 || amd64) && darwin && !ios
|
||||||
|
|
||||||
|
package cpu
|
||||||
|
|
||||||
|
func osInit() {
|
||||||
|
if isRosetta() && darwinKernelVersionCheck(24, 0, 0) {
|
||||||
|
// Apparently, on macOS 15 (Darwin kernel version 24) or newer,
|
||||||
|
// Rosetta 2 supports AVX1 and 2. However, neither CPUID nor
|
||||||
|
// sysctl says it has AVX. Detect this situation here and report
|
||||||
|
// AVX1 and 2 as supported.
|
||||||
|
// TODO: check if any other feature is actually supported.
|
||||||
|
X86.HasAVX = true
|
||||||
|
X86.HasAVX2 = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func isRosetta() bool {
|
||||||
|
return sysctlEnabled([]byte("sysctl.proc_translated\x00"))
|
||||||
|
}
|
||||||
9
src/internal/cpu/cpu_x86_other.go
Normal file
9
src/internal/cpu/cpu_x86_other.go
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
//go:build (386 || amd64) && (!darwin || ios)
|
||||||
|
|
||||||
|
package cpu
|
||||||
|
|
||||||
|
func osInit() {}
|
||||||
8
src/internal/goexperiment/exp_simd_off.go
Normal file
8
src/internal/goexperiment/exp_simd_off.go
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
// Code generated by mkconsts.go. DO NOT EDIT.
|
||||||
|
|
||||||
|
//go:build !goexperiment.simd
|
||||||
|
|
||||||
|
package goexperiment
|
||||||
|
|
||||||
|
const SIMD = false
|
||||||
|
const SIMDInt = 0
|
||||||
8
src/internal/goexperiment/exp_simd_on.go
Normal file
8
src/internal/goexperiment/exp_simd_on.go
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
// Code generated by mkconsts.go. DO NOT EDIT.
|
||||||
|
|
||||||
|
//go:build goexperiment.simd
|
||||||
|
|
||||||
|
package goexperiment
|
||||||
|
|
||||||
|
const SIMD = true
|
||||||
|
const SIMDInt = 1
|
||||||
|
|
@ -121,4 +121,8 @@ type Flags struct {
|
||||||
|
|
||||||
// GoroutineLeakProfile enables the collection of goroutine leak profiles.
|
// GoroutineLeakProfile enables the collection of goroutine leak profiles.
|
||||||
GoroutineLeakProfile bool
|
GoroutineLeakProfile bool
|
||||||
|
|
||||||
|
// SIMD enables the simd package and the compiler's handling
|
||||||
|
// of SIMD intrinsics.
|
||||||
|
SIMD bool
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1049,6 +1049,9 @@ needm:
|
||||||
// there's no need to handle that. Clear R14 so that there's
|
// there's no need to handle that. Clear R14 so that there's
|
||||||
// a bad value in there, in case needm tries to use it.
|
// a bad value in there, in case needm tries to use it.
|
||||||
XORPS X15, X15
|
XORPS X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
XORQ R14, R14
|
XORQ R14, R14
|
||||||
MOVQ $runtime·needAndBindM<ABIInternal>(SB), AX
|
MOVQ $runtime·needAndBindM<ABIInternal>(SB), AX
|
||||||
CALL AX
|
CALL AX
|
||||||
|
|
@ -1746,6 +1749,9 @@ TEXT ·sigpanic0(SB),NOSPLIT,$0-0
|
||||||
get_tls(R14)
|
get_tls(R14)
|
||||||
MOVQ g(R14), R14
|
MOVQ g(R14), R14
|
||||||
XORPS X15, X15
|
XORPS X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
JMP ·sigpanic<ABIInternal>(SB)
|
JMP ·sigpanic<ABIInternal>(SB)
|
||||||
|
|
||||||
// gcWriteBarrier informs the GC about heap pointer writes.
|
// gcWriteBarrier informs the GC about heap pointer writes.
|
||||||
|
|
|
||||||
|
|
@ -28,9 +28,10 @@ const (
|
||||||
var (
|
var (
|
||||||
// Set in runtime.cpuinit.
|
// Set in runtime.cpuinit.
|
||||||
// TODO: deprecate these; use internal/cpu directly.
|
// TODO: deprecate these; use internal/cpu directly.
|
||||||
|
x86HasAVX bool
|
||||||
|
x86HasFMA bool
|
||||||
x86HasPOPCNT bool
|
x86HasPOPCNT bool
|
||||||
x86HasSSE41 bool
|
x86HasSSE41 bool
|
||||||
x86HasFMA bool
|
|
||||||
|
|
||||||
armHasVFPv4 bool
|
armHasVFPv4 bool
|
||||||
|
|
||||||
|
|
|
||||||
19
src/runtime/cpuflags_amd64_test.go
Normal file
19
src/runtime/cpuflags_amd64_test.go
Normal file
|
|
@ -0,0 +1,19 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package runtime_test
|
||||||
|
|
||||||
|
import (
|
||||||
|
"runtime"
|
||||||
|
"testing"
|
||||||
|
)
|
||||||
|
|
||||||
|
func TestHasAVX(t *testing.T) {
|
||||||
|
t.Parallel()
|
||||||
|
output := runTestProg(t, "testprog", "CheckAVX")
|
||||||
|
ok := output == "OK\n"
|
||||||
|
if *runtime.X86HasAVX != ok {
|
||||||
|
t.Fatalf("x86HasAVX: %v, CheckAVX got:\n%s", *runtime.X86HasAVX, output)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
@ -1978,6 +1978,8 @@ func TraceStack(gp *G, tab *TraceStackTable) {
|
||||||
traceStack(0, gp, (*traceStackTable)(tab))
|
traceStack(0, gp, (*traceStackTable)(tab))
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var X86HasAVX = &x86HasAVX
|
||||||
|
|
||||||
var DebugDecorateMappings = &debug.decoratemappings
|
var DebugDecorateMappings = &debug.decoratemappings
|
||||||
|
|
||||||
func SetVMANameSupported() bool { return setVMANameSupported() }
|
func SetVMANameSupported() bool { return setVMANameSupported() }
|
||||||
|
|
|
||||||
|
|
@ -402,7 +402,7 @@ func genAMD64(g *gen) {
|
||||||
// Create layouts for X, Y, and Z registers.
|
// Create layouts for X, Y, and Z registers.
|
||||||
const (
|
const (
|
||||||
numXRegs = 16
|
numXRegs = 16
|
||||||
numZRegs = 16 // TODO: If we start using upper registers, change to 32
|
numZRegs = 32
|
||||||
numKRegs = 8
|
numKRegs = 8
|
||||||
)
|
)
|
||||||
lZRegs := layout{sp: xReg} // Non-GP registers
|
lZRegs := layout{sp: xReg} // Non-GP registers
|
||||||
|
|
|
||||||
|
|
@ -162,11 +162,22 @@ func sysctlbynameInt32(name []byte) (int32, int32) {
|
||||||
return ret, out
|
return ret, out
|
||||||
}
|
}
|
||||||
|
|
||||||
//go:linkname internal_cpu_getsysctlbyname internal/cpu.getsysctlbyname
|
func sysctlbynameBytes(name, out []byte) int32 {
|
||||||
func internal_cpu_getsysctlbyname(name []byte) (int32, int32) {
|
nout := uintptr(len(out))
|
||||||
|
ret := sysctlbyname(&name[0], &out[0], &nout, nil, 0)
|
||||||
|
return ret
|
||||||
|
}
|
||||||
|
|
||||||
|
//go:linkname internal_cpu_sysctlbynameInt32 internal/cpu.sysctlbynameInt32
|
||||||
|
func internal_cpu_sysctlbynameInt32(name []byte) (int32, int32) {
|
||||||
return sysctlbynameInt32(name)
|
return sysctlbynameInt32(name)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
//go:linkname internal_cpu_sysctlbynameBytes internal/cpu.sysctlbynameBytes
|
||||||
|
func internal_cpu_sysctlbynameBytes(name, out []byte) int32 {
|
||||||
|
return sysctlbynameBytes(name, out)
|
||||||
|
}
|
||||||
|
|
||||||
const (
|
const (
|
||||||
_CTL_HW = 6
|
_CTL_HW = 6
|
||||||
_HW_NCPU = 3
|
_HW_NCPU = 3
|
||||||
|
|
|
||||||
|
|
@ -341,6 +341,13 @@ func panicmemAddr(addr uintptr) {
|
||||||
panic(errorAddressString{msg: "invalid memory address or nil pointer dereference", addr: addr})
|
panic(errorAddressString{msg: "invalid memory address or nil pointer dereference", addr: addr})
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var simdImmError = error(errorString("out-of-range immediate for simd intrinsic"))
|
||||||
|
|
||||||
|
func panicSimdImm() {
|
||||||
|
panicCheck2("simd immediate error")
|
||||||
|
panic(simdImmError)
|
||||||
|
}
|
||||||
|
|
||||||
// Create a new deferred function fn, which has no arguments and results.
|
// Create a new deferred function fn, which has no arguments and results.
|
||||||
// The compiler turns a defer statement into a call to this.
|
// The compiler turns a defer statement into a call to this.
|
||||||
func deferproc(fn func()) {
|
func deferproc(fn func()) {
|
||||||
|
|
|
||||||
|
|
@ -19,6 +19,22 @@ type xRegs struct {
|
||||||
Z13 [64]byte
|
Z13 [64]byte
|
||||||
Z14 [64]byte
|
Z14 [64]byte
|
||||||
Z15 [64]byte
|
Z15 [64]byte
|
||||||
|
Z16 [64]byte
|
||||||
|
Z17 [64]byte
|
||||||
|
Z18 [64]byte
|
||||||
|
Z19 [64]byte
|
||||||
|
Z20 [64]byte
|
||||||
|
Z21 [64]byte
|
||||||
|
Z22 [64]byte
|
||||||
|
Z23 [64]byte
|
||||||
|
Z24 [64]byte
|
||||||
|
Z25 [64]byte
|
||||||
|
Z26 [64]byte
|
||||||
|
Z27 [64]byte
|
||||||
|
Z28 [64]byte
|
||||||
|
Z29 [64]byte
|
||||||
|
Z30 [64]byte
|
||||||
|
Z31 [64]byte
|
||||||
K0 uint64
|
K0 uint64
|
||||||
K1 uint64
|
K1 uint64
|
||||||
K2 uint64
|
K2 uint64
|
||||||
|
|
|
||||||
|
|
@ -95,14 +95,30 @@ saveAVX512:
|
||||||
VMOVDQU64 Z13, 832(AX)
|
VMOVDQU64 Z13, 832(AX)
|
||||||
VMOVDQU64 Z14, 896(AX)
|
VMOVDQU64 Z14, 896(AX)
|
||||||
VMOVDQU64 Z15, 960(AX)
|
VMOVDQU64 Z15, 960(AX)
|
||||||
KMOVQ K0, 1024(AX)
|
VMOVDQU64 Z16, 1024(AX)
|
||||||
KMOVQ K1, 1032(AX)
|
VMOVDQU64 Z17, 1088(AX)
|
||||||
KMOVQ K2, 1040(AX)
|
VMOVDQU64 Z18, 1152(AX)
|
||||||
KMOVQ K3, 1048(AX)
|
VMOVDQU64 Z19, 1216(AX)
|
||||||
KMOVQ K4, 1056(AX)
|
VMOVDQU64 Z20, 1280(AX)
|
||||||
KMOVQ K5, 1064(AX)
|
VMOVDQU64 Z21, 1344(AX)
|
||||||
KMOVQ K6, 1072(AX)
|
VMOVDQU64 Z22, 1408(AX)
|
||||||
KMOVQ K7, 1080(AX)
|
VMOVDQU64 Z23, 1472(AX)
|
||||||
|
VMOVDQU64 Z24, 1536(AX)
|
||||||
|
VMOVDQU64 Z25, 1600(AX)
|
||||||
|
VMOVDQU64 Z26, 1664(AX)
|
||||||
|
VMOVDQU64 Z27, 1728(AX)
|
||||||
|
VMOVDQU64 Z28, 1792(AX)
|
||||||
|
VMOVDQU64 Z29, 1856(AX)
|
||||||
|
VMOVDQU64 Z30, 1920(AX)
|
||||||
|
VMOVDQU64 Z31, 1984(AX)
|
||||||
|
KMOVQ K0, 2048(AX)
|
||||||
|
KMOVQ K1, 2056(AX)
|
||||||
|
KMOVQ K2, 2064(AX)
|
||||||
|
KMOVQ K3, 2072(AX)
|
||||||
|
KMOVQ K4, 2080(AX)
|
||||||
|
KMOVQ K5, 2088(AX)
|
||||||
|
KMOVQ K6, 2096(AX)
|
||||||
|
KMOVQ K7, 2104(AX)
|
||||||
JMP preempt
|
JMP preempt
|
||||||
preempt:
|
preempt:
|
||||||
CALL ·asyncPreempt2(SB)
|
CALL ·asyncPreempt2(SB)
|
||||||
|
|
@ -153,14 +169,30 @@ restoreAVX2:
|
||||||
VMOVDQU 0(AX), Y0
|
VMOVDQU 0(AX), Y0
|
||||||
JMP restoreGPs
|
JMP restoreGPs
|
||||||
restoreAVX512:
|
restoreAVX512:
|
||||||
KMOVQ 1080(AX), K7
|
KMOVQ 2104(AX), K7
|
||||||
KMOVQ 1072(AX), K6
|
KMOVQ 2096(AX), K6
|
||||||
KMOVQ 1064(AX), K5
|
KMOVQ 2088(AX), K5
|
||||||
KMOVQ 1056(AX), K4
|
KMOVQ 2080(AX), K4
|
||||||
KMOVQ 1048(AX), K3
|
KMOVQ 2072(AX), K3
|
||||||
KMOVQ 1040(AX), K2
|
KMOVQ 2064(AX), K2
|
||||||
KMOVQ 1032(AX), K1
|
KMOVQ 2056(AX), K1
|
||||||
KMOVQ 1024(AX), K0
|
KMOVQ 2048(AX), K0
|
||||||
|
VMOVDQU64 1984(AX), Z31
|
||||||
|
VMOVDQU64 1920(AX), Z30
|
||||||
|
VMOVDQU64 1856(AX), Z29
|
||||||
|
VMOVDQU64 1792(AX), Z28
|
||||||
|
VMOVDQU64 1728(AX), Z27
|
||||||
|
VMOVDQU64 1664(AX), Z26
|
||||||
|
VMOVDQU64 1600(AX), Z25
|
||||||
|
VMOVDQU64 1536(AX), Z24
|
||||||
|
VMOVDQU64 1472(AX), Z23
|
||||||
|
VMOVDQU64 1408(AX), Z22
|
||||||
|
VMOVDQU64 1344(AX), Z21
|
||||||
|
VMOVDQU64 1280(AX), Z20
|
||||||
|
VMOVDQU64 1216(AX), Z19
|
||||||
|
VMOVDQU64 1152(AX), Z18
|
||||||
|
VMOVDQU64 1088(AX), Z17
|
||||||
|
VMOVDQU64 1024(AX), Z16
|
||||||
VMOVDQU64 960(AX), Z15
|
VMOVDQU64 960(AX), Z15
|
||||||
VMOVDQU64 896(AX), Z14
|
VMOVDQU64 896(AX), Z14
|
||||||
VMOVDQU64 832(AX), Z13
|
VMOVDQU64 832(AX), Z13
|
||||||
|
|
|
||||||
|
|
@ -763,9 +763,10 @@ func cpuinit(env string) {
|
||||||
// to guard execution of instructions that can not be assumed to be always supported.
|
// to guard execution of instructions that can not be assumed to be always supported.
|
||||||
switch GOARCH {
|
switch GOARCH {
|
||||||
case "386", "amd64":
|
case "386", "amd64":
|
||||||
|
x86HasAVX = cpu.X86.HasAVX
|
||||||
|
x86HasFMA = cpu.X86.HasFMA
|
||||||
x86HasPOPCNT = cpu.X86.HasPOPCNT
|
x86HasPOPCNT = cpu.X86.HasPOPCNT
|
||||||
x86HasSSE41 = cpu.X86.HasSSE41
|
x86HasSSE41 = cpu.X86.HasSSE41
|
||||||
x86HasFMA = cpu.X86.HasFMA
|
|
||||||
|
|
||||||
case "arm":
|
case "arm":
|
||||||
armHasVFPv4 = cpu.ARM.HasVFPv4
|
armHasVFPv4 = cpu.ARM.HasVFPv4
|
||||||
|
|
|
||||||
|
|
@ -456,6 +456,9 @@ call:
|
||||||
// Back to Go world, set special registers.
|
// Back to Go world, set special registers.
|
||||||
// The g register (R14) is preserved in C.
|
// The g register (R14) is preserved in C.
|
||||||
XORPS X15, X15
|
XORPS X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
RET
|
RET
|
||||||
|
|
||||||
// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
|
// C->Go callback thunk that allows to call runtime·racesymbolize from C code.
|
||||||
|
|
|
||||||
|
|
@ -177,6 +177,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -228,6 +228,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -265,6 +265,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
@ -290,6 +293,9 @@ TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -340,6 +340,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
@ -365,6 +368,9 @@ TEXT runtime·sigprofNonGoWrapper<>(SB),NOSPLIT|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -310,6 +310,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -64,6 +64,9 @@ TEXT runtime·sigtramp(SB),NOSPLIT|TOPFRAME|NOFRAME,$0
|
||||||
get_tls(R12)
|
get_tls(R12)
|
||||||
MOVQ g(R12), R14
|
MOVQ g(R12), R14
|
||||||
PXOR X15, X15
|
PXOR X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
|
|
||||||
// Reserve space for spill slots.
|
// Reserve space for spill slots.
|
||||||
NOP SP // disable vet stack checking
|
NOP SP // disable vet stack checking
|
||||||
|
|
|
||||||
|
|
@ -32,6 +32,9 @@ TEXT sigtramp<>(SB),NOSPLIT,$0-0
|
||||||
// R14 is cleared in case there's a non-zero value in there
|
// R14 is cleared in case there's a non-zero value in there
|
||||||
// if called from a non-go thread.
|
// if called from a non-go thread.
|
||||||
XORPS X15, X15
|
XORPS X15, X15
|
||||||
|
CMPB internal∕cpu·X86+const_offsetX86HasAVX(SB), $1
|
||||||
|
JNE 2(PC)
|
||||||
|
VXORPS X15, X15, X15
|
||||||
XORQ R14, R14
|
XORQ R14, R14
|
||||||
|
|
||||||
get_tls(AX)
|
get_tls(AX)
|
||||||
|
|
|
||||||
18
src/runtime/testdata/testprog/cpuflags_amd64.go
vendored
Normal file
18
src/runtime/testdata/testprog/cpuflags_amd64.go
vendored
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import "fmt"
|
||||||
|
|
||||||
|
func init() {
|
||||||
|
register("CheckAVX", CheckAVX)
|
||||||
|
}
|
||||||
|
|
||||||
|
func CheckAVX() {
|
||||||
|
checkAVX()
|
||||||
|
fmt.Println("OK")
|
||||||
|
}
|
||||||
|
|
||||||
|
func checkAVX()
|
||||||
9
src/runtime/testdata/testprog/cpuflags_amd64.s
vendored
Normal file
9
src/runtime/testdata/testprog/cpuflags_amd64.s
vendored
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
#include "textflag.h"
|
||||||
|
|
||||||
|
TEXT ·checkAVX(SB), NOSPLIT|NOFRAME, $0-0
|
||||||
|
VXORPS X1, X2, X3
|
||||||
|
RET
|
||||||
8
src/simd/_gen/go.mod
Normal file
8
src/simd/_gen/go.mod
Normal file
|
|
@ -0,0 +1,8 @@
|
||||||
|
module simd/_gen
|
||||||
|
|
||||||
|
go 1.24
|
||||||
|
|
||||||
|
require (
|
||||||
|
golang.org/x/arch v0.20.0
|
||||||
|
gopkg.in/yaml.v3 v3.0.1
|
||||||
|
)
|
||||||
6
src/simd/_gen/go.sum
Normal file
6
src/simd/_gen/go.sum
Normal file
|
|
@ -0,0 +1,6 @@
|
||||||
|
golang.org/x/arch v0.20.0 h1:dx1zTU0MAE98U+TQ8BLl7XsJbgze2WnNKF/8tGp/Q6c=
|
||||||
|
golang.org/x/arch v0.20.0/go.mod h1:bdwinDaKcfZUGpH09BB7ZmOfhalA8lQdzl62l8gGWsk=
|
||||||
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
|
||||||
|
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
|
||||||
|
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
|
||||||
|
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
|
||||||
149
src/simd/_gen/main.go
Normal file
149
src/simd/_gen/main.go
Normal file
|
|
@ -0,0 +1,149 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
// Run all SIMD-related code generators.
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"flag"
|
||||||
|
"fmt"
|
||||||
|
"os"
|
||||||
|
"os/exec"
|
||||||
|
"path/filepath"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
const defaultXedPath = "$XEDPATH" + string(filepath.ListSeparator) + "./simdgen/xeddata" + string(filepath.ListSeparator) + "$HOME/xed/obj/dgen"
|
||||||
|
|
||||||
|
var (
|
||||||
|
flagTmplgen = flag.Bool("tmplgen", true, "run tmplgen generator")
|
||||||
|
flagSimdgen = flag.Bool("simdgen", true, "run simdgen generator")
|
||||||
|
|
||||||
|
flagN = flag.Bool("n", false, "dry run")
|
||||||
|
flagXedPath = flag.String("xedPath", defaultXedPath, "load XED datafile from `path`, which must be the XED obj/dgen directory")
|
||||||
|
)
|
||||||
|
|
||||||
|
var goRoot string
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
flag.Parse()
|
||||||
|
if flag.NArg() > 0 {
|
||||||
|
flag.Usage()
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
if *flagXedPath == defaultXedPath {
|
||||||
|
// In general we want the shell to do variable expansion, but for the
|
||||||
|
// default value we don't get that, so do it ourselves.
|
||||||
|
*flagXedPath = os.ExpandEnv(defaultXedPath)
|
||||||
|
}
|
||||||
|
|
||||||
|
var err error
|
||||||
|
goRoot, err = resolveGOROOT()
|
||||||
|
if err != nil {
|
||||||
|
fmt.Fprintln(os.Stderr, err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
if *flagTmplgen {
|
||||||
|
doTmplgen()
|
||||||
|
}
|
||||||
|
if *flagSimdgen {
|
||||||
|
doSimdgen()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func doTmplgen() {
|
||||||
|
goRun("-C", "tmplgen", ".")
|
||||||
|
}
|
||||||
|
|
||||||
|
func doSimdgen() {
|
||||||
|
xedPath, err := resolveXEDPath(*flagXedPath)
|
||||||
|
if err != nil {
|
||||||
|
fmt.Fprintln(os.Stderr, err)
|
||||||
|
os.Exit(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Regenerate the XED-derived SIMD files
|
||||||
|
goRun("-C", "simdgen", ".", "-o", "godefs", "-goroot", goRoot, "-xedPath", prettyPath("./simdgen", xedPath), "go.yaml", "types.yaml", "categories.yaml")
|
||||||
|
|
||||||
|
// simdgen produces SSA rule files, so update the SSA files
|
||||||
|
goRun("-C", prettyPath(".", filepath.Join(goRoot, "src", "cmd", "compile", "internal", "ssa", "_gen")), ".")
|
||||||
|
}
|
||||||
|
|
||||||
|
func resolveXEDPath(pathList string) (xedPath string, err error) {
|
||||||
|
for _, path := range filepath.SplitList(pathList) {
|
||||||
|
if path == "" {
|
||||||
|
// Probably an unknown shell variable. Ignore.
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if _, err := os.Stat(filepath.Join(path, "all-dec-instructions.txt")); err == nil {
|
||||||
|
return filepath.Abs(path)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return "", fmt.Errorf("set $XEDPATH or -xedPath to the XED obj/dgen directory")
|
||||||
|
}
|
||||||
|
|
||||||
|
func resolveGOROOT() (goRoot string, err error) {
|
||||||
|
cmd := exec.Command("go", "env", "GOROOT")
|
||||||
|
cmd.Stderr = os.Stderr
|
||||||
|
out, err := cmd.Output()
|
||||||
|
if err != nil {
|
||||||
|
return "", fmt.Errorf("%s: %s", cmd, err)
|
||||||
|
}
|
||||||
|
goRoot = strings.TrimSuffix(string(out), "\n")
|
||||||
|
return goRoot, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func goRun(args ...string) {
|
||||||
|
exe := filepath.Join(goRoot, "bin", "go")
|
||||||
|
cmd := exec.Command(exe, append([]string{"run"}, args...)...)
|
||||||
|
run(cmd)
|
||||||
|
}
|
||||||
|
|
||||||
|
func run(cmd *exec.Cmd) {
|
||||||
|
cmd.Stdout = os.Stdout
|
||||||
|
cmd.Stderr = os.Stderr
|
||||||
|
fmt.Fprintf(os.Stderr, "%s\n", cmdString(cmd))
|
||||||
|
if *flagN {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if err := cmd.Run(); err != nil {
|
||||||
|
fmt.Fprintf(os.Stderr, "%s failed: %s\n", cmd, err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func prettyPath(base, path string) string {
|
||||||
|
base, err := filepath.Abs(base)
|
||||||
|
if err != nil {
|
||||||
|
return path
|
||||||
|
}
|
||||||
|
p, err := filepath.Rel(base, path)
|
||||||
|
if err != nil {
|
||||||
|
return path
|
||||||
|
}
|
||||||
|
return p
|
||||||
|
}
|
||||||
|
|
||||||
|
func cmdString(cmd *exec.Cmd) string {
|
||||||
|
// TODO: Shell quoting?
|
||||||
|
// TODO: Environment.
|
||||||
|
|
||||||
|
var buf strings.Builder
|
||||||
|
|
||||||
|
cmdPath, err := exec.LookPath(filepath.Base(cmd.Path))
|
||||||
|
if err == nil && cmdPath == cmd.Path {
|
||||||
|
cmdPath = filepath.Base(cmdPath)
|
||||||
|
} else {
|
||||||
|
cmdPath = prettyPath(".", cmd.Path)
|
||||||
|
}
|
||||||
|
buf.WriteString(cmdPath)
|
||||||
|
|
||||||
|
for _, arg := range cmd.Args[1:] {
|
||||||
|
buf.WriteByte(' ')
|
||||||
|
buf.WriteString(arg)
|
||||||
|
}
|
||||||
|
|
||||||
|
return buf.String()
|
||||||
|
}
|
||||||
3
src/simd/_gen/simdgen/.gitignore
vendored
Normal file
3
src/simd/_gen/simdgen/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
testdata/*
|
||||||
|
.gemini/*
|
||||||
|
.gemini*
|
||||||
1
src/simd/_gen/simdgen/categories.yaml
Normal file
1
src/simd/_gen/simdgen/categories.yaml
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
!import ops/*/categories.yaml
|
||||||
48
src/simd/_gen/simdgen/etetest.sh
Executable file
48
src/simd/_gen/simdgen/etetest.sh
Executable file
|
|
@ -0,0 +1,48 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# This is an end-to-end test of Go SIMD. It updates all generated
|
||||||
|
# files in this repo and then runs several tests.
|
||||||
|
|
||||||
|
XEDDATA="${XEDDATA:-xeddata}"
|
||||||
|
if [[ ! -d "$XEDDATA" ]]; then
|
||||||
|
echo >&2 "Must either set \$XEDDATA or symlink xeddata/ to the XED obj/dgen directory."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
which go >/dev/null || exit 1
|
||||||
|
goroot="$(go env GOROOT)"
|
||||||
|
if [[ ! ../../../.. -ef "$goroot" ]]; then
|
||||||
|
# We might be able to make this work but it's SO CONFUSING.
|
||||||
|
echo >&2 "go command in path has GOROOT $goroot"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ $(go env GOEXPERIMENT) != simd ]]; then
|
||||||
|
echo >&2 "GOEXPERIMENT=$(go env GOEXPERIMENT), expected simd"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
set -ex
|
||||||
|
|
||||||
|
# Regenerate SIMD files
|
||||||
|
go run . -o godefs -goroot "$goroot" -xedPath "$XEDDATA" go.yaml types.yaml categories.yaml
|
||||||
|
# Regenerate SSA files from SIMD rules
|
||||||
|
go run -C "$goroot"/src/cmd/compile/internal/ssa/_gen .
|
||||||
|
|
||||||
|
# Rebuild compiler
|
||||||
|
cd "$goroot"/src
|
||||||
|
go install cmd/compile
|
||||||
|
|
||||||
|
# Tests
|
||||||
|
GOARCH=amd64 go run -C simd/testdata .
|
||||||
|
GOARCH=amd64 go test -v simd
|
||||||
|
go test go/doc go/build
|
||||||
|
go test cmd/api -v -check -run ^TestCheck$
|
||||||
|
go test cmd/compile/internal/ssagen -simd=0
|
||||||
|
|
||||||
|
# Check tests without the GOEXPERIMENT
|
||||||
|
GOEXPERIMENT= go test go/doc go/build
|
||||||
|
GOEXPERIMENT= go test cmd/api -v -check -run ^TestCheck$
|
||||||
|
GOEXPERIMENT= go test cmd/compile/internal/ssagen -simd=0
|
||||||
|
|
||||||
|
# TODO: Add some tests of SIMD itself
|
||||||
73
src/simd/_gen/simdgen/gen_simdGenericOps.go
Normal file
73
src/simd/_gen/simdgen/gen_simdGenericOps.go
Normal file
|
|
@ -0,0 +1,73 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"sort"
|
||||||
|
)
|
||||||
|
|
||||||
|
const simdGenericOpsTmpl = `
|
||||||
|
package main
|
||||||
|
|
||||||
|
func simdGenericOps() []opData {
|
||||||
|
return []opData{
|
||||||
|
{{- range .Ops }}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, commutative: {{.Comm}}},
|
||||||
|
{{- end }}
|
||||||
|
{{- range .OpsImm }}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, commutative: {{.Comm}}, aux: "UInt8"},
|
||||||
|
{{- end }}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
`
|
||||||
|
|
||||||
|
// writeSIMDGenericOps generates the generic ops and writes it to simdAMD64ops.go
|
||||||
|
// within the specified directory.
|
||||||
|
func writeSIMDGenericOps(ops []Operation) *bytes.Buffer {
|
||||||
|
t := templateOf(simdGenericOpsTmpl, "simdgenericOps")
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(generatedHeader)
|
||||||
|
|
||||||
|
type genericOpsData struct {
|
||||||
|
OpName string
|
||||||
|
OpInLen int
|
||||||
|
Comm bool
|
||||||
|
}
|
||||||
|
type opData struct {
|
||||||
|
Ops []genericOpsData
|
||||||
|
OpsImm []genericOpsData
|
||||||
|
}
|
||||||
|
var opsData opData
|
||||||
|
for _, op := range ops {
|
||||||
|
if op.NoGenericOps != nil && *op.NoGenericOps == "true" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if op.SkipMaskedMethod() {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
_, _, _, immType, gOp := op.shape()
|
||||||
|
gOpData := genericOpsData{gOp.GenericName(), len(gOp.In), op.Commutative}
|
||||||
|
if immType == VarImm || immType == ConstVarImm {
|
||||||
|
opsData.OpsImm = append(opsData.OpsImm, gOpData)
|
||||||
|
} else {
|
||||||
|
opsData.Ops = append(opsData.Ops, gOpData)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
sort.Slice(opsData.Ops, func(i, j int) bool {
|
||||||
|
return compareNatural(opsData.Ops[i].OpName, opsData.Ops[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsData.OpsImm, func(i, j int) bool {
|
||||||
|
return compareNatural(opsData.OpsImm[i].OpName, opsData.OpsImm[j].OpName) < 0
|
||||||
|
})
|
||||||
|
|
||||||
|
err := t.Execute(buffer, opsData)
|
||||||
|
if err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
156
src/simd/_gen/simdgen/gen_simdIntrinsics.go
Normal file
156
src/simd/_gen/simdgen/gen_simdIntrinsics.go
Normal file
|
|
@ -0,0 +1,156 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"slices"
|
||||||
|
)
|
||||||
|
|
||||||
|
const simdIntrinsicsTmpl = `
|
||||||
|
{{define "header"}}
|
||||||
|
package ssagen
|
||||||
|
|
||||||
|
import (
|
||||||
|
"cmd/compile/internal/ir"
|
||||||
|
"cmd/compile/internal/ssa"
|
||||||
|
"cmd/compile/internal/types"
|
||||||
|
"cmd/internal/sys"
|
||||||
|
)
|
||||||
|
|
||||||
|
const simdPackage = "` + simdPackage + `"
|
||||||
|
|
||||||
|
func simdIntrinsics(addF func(pkg, fn string, b intrinsicBuilder, archFamilies ...sys.ArchFamily)) {
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op1"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen1(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen2(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2_21"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen2_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2_21Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen3(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3_21"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen3_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3_21Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3_21(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3_231Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3_231(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3_31Zero3"}} addF(simdPackage, "{{(index .In 2).Go}}.{{.Go}}", opLen3_31Zero3(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op4"}} addF(simdPackage, "{{(index .In 0).Go}}.{{.Go}}", opLen4(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op4_231Type1"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen4_231(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op4_31"}} addF(simdPackage, "{{(index .In 2).Go}}.{{.Go}}", opLen4_31(ssa.Op{{.GenericName}}, {{.SSAType}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op1Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen1Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2Imm8_2I"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_2I(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2Imm8_II"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_II(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op2Imm8_SHA1RNDS4"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen2Imm8_SHA1RNDS4(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op3Imm8_2I"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen3Imm8_2I(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
{{define "op4Imm8"}} addF(simdPackage, "{{(index .In 1).Go}}.{{.Go}}", opLen4Imm8(ssa.Op{{.GenericName}}, {{.SSAType}}, {{(index .In 0).ImmOffset}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "vectorConversion"}} addF(simdPackage, "{{.Tsrc.Name}}.As{{.Tdst.Name}}", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "loadStore"}} addF(simdPackage, "Load{{.Name}}", simdLoad(), sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}.Store", simdStore(), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "maskedLoadStore"}} addF(simdPackage, "LoadMasked{{.Name}}", simdMaskedLoad(ssa.OpLoadMasked{{.ElemBits}}), sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}.StoreMasked", simdMaskedStore(ssa.OpStoreMasked{{.ElemBits}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "mask"}} addF(simdPackage, "{{.Name}}.As{{.VectorCounterpart}}", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.VectorCounterpart}}.asMask", func(s *state, n *ir.CallExpr, args []*ssa.Value) *ssa.Value { return args[0] }, sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}.And", opLen2(ssa.OpAnd{{.ReshapedVectorWithAndOr}}, types.TypeVec{{.Size}}), sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}.Or", opLen2(ssa.OpOr{{.ReshapedVectorWithAndOr}}, types.TypeVec{{.Size}}), sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}FromBits", simdCvtVToMask({{.ElemBits}}, {{.Lanes}}), sys.AMD64)
|
||||||
|
addF(simdPackage, "{{.Name}}.ToBits", simdCvtMaskToV({{.ElemBits}}, {{.Lanes}}), sys.AMD64)
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "footer"}}}
|
||||||
|
{{end}}
|
||||||
|
`
|
||||||
|
|
||||||
|
// writeSIMDIntrinsics generates the intrinsic mappings and writes it to simdintrinsics.go
|
||||||
|
// within the specified directory.
|
||||||
|
func writeSIMDIntrinsics(ops []Operation, typeMap simdTypeMap) *bytes.Buffer {
|
||||||
|
t := templateOf(simdIntrinsicsTmpl, "simdintrinsics")
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(generatedHeader)
|
||||||
|
|
||||||
|
if err := t.ExecuteTemplate(buffer, "header", nil); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute header template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
slices.SortFunc(ops, compareOperations)
|
||||||
|
|
||||||
|
for _, op := range ops {
|
||||||
|
if op.NoTypes != nil && *op.NoTypes == "true" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if op.SkipMaskedMethod() {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if s, op, err := classifyOp(op); err == nil {
|
||||||
|
if err := t.ExecuteTemplate(buffer, s, op); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template %s for op %s: %w", s, op.Go, err))
|
||||||
|
}
|
||||||
|
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("failed to classify op %v: %w", op.Go, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, conv := range vConvertFromTypeMap(typeMap) {
|
||||||
|
if err := t.ExecuteTemplate(buffer, "vectorConversion", conv); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute vectorConversion template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, typ := range typesFromTypeMap(typeMap) {
|
||||||
|
if typ.Type != "mask" {
|
||||||
|
if err := t.ExecuteTemplate(buffer, "loadStore", typ); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute loadStore template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, typ := range typesFromTypeMap(typeMap) {
|
||||||
|
if typ.MaskedLoadStoreFilter() {
|
||||||
|
if err := t.ExecuteTemplate(buffer, "maskedLoadStore", typ); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute maskedLoadStore template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, mask := range masksFromTypeMap(typeMap) {
|
||||||
|
if err := t.ExecuteTemplate(buffer, "mask", mask); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute mask template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := t.ExecuteTemplate(buffer, "footer", nil); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute footer template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
256
src/simd/_gen/simdgen/gen_simdMachineOps.go
Normal file
256
src/simd/_gen/simdgen/gen_simdMachineOps.go
Normal file
|
|
@ -0,0 +1,256 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
)
|
||||||
|
|
||||||
|
const simdMachineOpsTmpl = `
|
||||||
|
package main
|
||||||
|
|
||||||
|
func simdAMD64Ops(v11, v21, v2k, vkv, v2kv, v2kk, v31, v3kv, vgpv, vgp, vfpv, vfpkv, w11, w21, w2k, wkw, w2kw, w2kk, w31, w3kw, wgpw, wgp, wfpw, wfpkw,
|
||||||
|
wkwload, v21load, v31load, v11load, w21load, w31load, w2kload, w2kwload, w11load, w3kwload, w2kkload, v31x0AtIn2 regInfo) []opData {
|
||||||
|
return []opData{
|
||||||
|
{{- range .OpsData }}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", resultInArg0: {{.ResultInArg0}}},
|
||||||
|
{{- end }}
|
||||||
|
{{- range .OpsDataImm }}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", aux: "UInt8", commutative: {{.Comm}}, typ: "{{.Type}}", resultInArg0: {{.ResultInArg0}}},
|
||||||
|
{{- end }}
|
||||||
|
{{- range .OpsDataLoad}}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", aux: "SymOff", symEffect: "Read", resultInArg0: {{.ResultInArg0}}},
|
||||||
|
{{- end}}
|
||||||
|
{{- range .OpsDataImmLoad}}
|
||||||
|
{name: "{{.OpName}}", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: {{.Comm}}, typ: "{{.Type}}", aux: "SymValAndOff", symEffect: "Read", resultInArg0: {{.ResultInArg0}}},
|
||||||
|
{{- end}}
|
||||||
|
{{- range .OpsDataMerging }}
|
||||||
|
{name: "{{.OpName}}Merging", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", commutative: false, typ: "{{.Type}}", resultInArg0: true},
|
||||||
|
{{- end }}
|
||||||
|
{{- range .OpsDataImmMerging }}
|
||||||
|
{name: "{{.OpName}}Merging", argLength: {{.OpInLen}}, reg: {{.RegInfo}}, asm: "{{.Asm}}", aux: "UInt8", commutative: false, typ: "{{.Type}}", resultInArg0: true},
|
||||||
|
{{- end }}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
`
|
||||||
|
|
||||||
|
// writeSIMDMachineOps generates the machine ops and writes it to simdAMD64ops.go
|
||||||
|
// within the specified directory.
|
||||||
|
func writeSIMDMachineOps(ops []Operation) *bytes.Buffer {
|
||||||
|
t := templateOf(simdMachineOpsTmpl, "simdAMD64Ops")
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(generatedHeader)
|
||||||
|
|
||||||
|
type opData struct {
|
||||||
|
OpName string
|
||||||
|
Asm string
|
||||||
|
OpInLen int
|
||||||
|
RegInfo string
|
||||||
|
Comm bool
|
||||||
|
Type string
|
||||||
|
ResultInArg0 bool
|
||||||
|
}
|
||||||
|
type machineOpsData struct {
|
||||||
|
OpsData []opData
|
||||||
|
OpsDataImm []opData
|
||||||
|
OpsDataLoad []opData
|
||||||
|
OpsDataImmLoad []opData
|
||||||
|
OpsDataMerging []opData
|
||||||
|
OpsDataImmMerging []opData
|
||||||
|
}
|
||||||
|
|
||||||
|
regInfoSet := map[string]bool{
|
||||||
|
"v11": true, "v21": true, "v2k": true, "v2kv": true, "v2kk": true, "vkv": true, "v31": true, "v3kv": true, "vgpv": true, "vgp": true, "vfpv": true, "vfpkv": true,
|
||||||
|
"w11": true, "w21": true, "w2k": true, "w2kw": true, "w2kk": true, "wkw": true, "w31": true, "w3kw": true, "wgpw": true, "wgp": true, "wfpw": true, "wfpkw": true,
|
||||||
|
"wkwload": true, "v21load": true, "v31load": true, "v11load": true, "w21load": true, "w31load": true, "w2kload": true, "w2kwload": true, "w11load": true,
|
||||||
|
"w3kwload": true, "w2kkload": true, "v31x0AtIn2": true}
|
||||||
|
opsData := make([]opData, 0)
|
||||||
|
opsDataImm := make([]opData, 0)
|
||||||
|
opsDataLoad := make([]opData, 0)
|
||||||
|
opsDataImmLoad := make([]opData, 0)
|
||||||
|
opsDataMerging := make([]opData, 0)
|
||||||
|
opsDataImmMerging := make([]opData, 0)
|
||||||
|
|
||||||
|
// Determine the "best" version of an instruction to use
|
||||||
|
best := make(map[string]Operation)
|
||||||
|
var mOpOrder []string
|
||||||
|
countOverrides := func(s []Operand) int {
|
||||||
|
a := 0
|
||||||
|
for _, o := range s {
|
||||||
|
if o.OverwriteBase != nil {
|
||||||
|
a++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return a
|
||||||
|
}
|
||||||
|
for _, op := range ops {
|
||||||
|
_, _, maskType, _, gOp := op.shape()
|
||||||
|
asm := machineOpName(maskType, gOp)
|
||||||
|
other, ok := best[asm]
|
||||||
|
if !ok {
|
||||||
|
best[asm] = op
|
||||||
|
mOpOrder = append(mOpOrder, asm)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
// see if "op" is better than "other"
|
||||||
|
if countOverrides(op.In)+countOverrides(op.Out) < countOverrides(other.In)+countOverrides(other.Out) {
|
||||||
|
best[asm] = op
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
regInfoErrs := make([]error, 0)
|
||||||
|
regInfoMissing := make(map[string]bool, 0)
|
||||||
|
for _, asm := range mOpOrder {
|
||||||
|
op := best[asm]
|
||||||
|
shapeIn, shapeOut, maskType, _, gOp := op.shape()
|
||||||
|
|
||||||
|
// TODO: all our masked operations are now zeroing, we need to generate machine ops with merging masks, maybe copy
|
||||||
|
// one here with a name suffix "Merging". The rewrite rules will need them.
|
||||||
|
makeRegInfo := func(op Operation, mem memShape) (string, error) {
|
||||||
|
regInfo, err := op.regShape(mem)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
regInfo, err = rewriteVecAsScalarRegInfo(op, regInfo)
|
||||||
|
if err != nil {
|
||||||
|
if mem == NoMem || mem == InvalidMem {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
if regInfo == "v01load" {
|
||||||
|
regInfo = "vload"
|
||||||
|
}
|
||||||
|
// Makes AVX512 operations use upper registers
|
||||||
|
if strings.Contains(op.CPUFeature, "AVX512") {
|
||||||
|
regInfo = strings.ReplaceAll(regInfo, "v", "w")
|
||||||
|
}
|
||||||
|
if _, ok := regInfoSet[regInfo]; !ok {
|
||||||
|
regInfoErrs = append(regInfoErrs, fmt.Errorf("unsupported register constraint, please update the template and AMD64Ops.go: %s. Op is %s", regInfo, op))
|
||||||
|
regInfoMissing[regInfo] = true
|
||||||
|
}
|
||||||
|
return regInfo, nil
|
||||||
|
}
|
||||||
|
regInfo, err := makeRegInfo(op, NoMem)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
var outType string
|
||||||
|
if shapeOut == OneVregOut || shapeOut == OneVregOutAtIn || gOp.Out[0].OverwriteClass != nil {
|
||||||
|
// If class overwrite is happening, that's not really a mask but a vreg.
|
||||||
|
outType = fmt.Sprintf("Vec%d", *gOp.Out[0].Bits)
|
||||||
|
} else if shapeOut == OneGregOut {
|
||||||
|
outType = gOp.GoType() // this is a straight Go type, not a VecNNN type
|
||||||
|
} else if shapeOut == OneKmaskOut {
|
||||||
|
outType = "Mask"
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen does not recognize this output shape: %d", shapeOut))
|
||||||
|
}
|
||||||
|
resultInArg0 := false
|
||||||
|
if shapeOut == OneVregOutAtIn {
|
||||||
|
resultInArg0 = true
|
||||||
|
}
|
||||||
|
var memOpData *opData
|
||||||
|
regInfoMerging := regInfo
|
||||||
|
hasMerging := false
|
||||||
|
if op.MemFeatures != nil && *op.MemFeatures == "vbcst" {
|
||||||
|
// Right now we only have vbcst case
|
||||||
|
// Make a full vec memory variant.
|
||||||
|
opMem := rewriteLastVregToMem(op)
|
||||||
|
regInfo, err := makeRegInfo(opMem, VregMemIn)
|
||||||
|
if err != nil {
|
||||||
|
// Just skip it if it's non nill.
|
||||||
|
// an error could be triggered by [checkVecAsScalar].
|
||||||
|
// TODO: make [checkVecAsScalar] aware of mem ops.
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("Seen error: %e", err)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
memOpData = &opData{asm + "load", gOp.Asm, len(gOp.In) + 1, regInfo, false, outType, resultInArg0}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
hasMerging = gOp.hasMaskedMerging(maskType, shapeOut)
|
||||||
|
if hasMerging && !resultInArg0 {
|
||||||
|
// We have to copy the slice here becasue the sort will be visible from other
|
||||||
|
// aliases when no reslicing is happening.
|
||||||
|
newIn := make([]Operand, len(op.In), len(op.In)+1)
|
||||||
|
copy(newIn, op.In)
|
||||||
|
op.In = newIn
|
||||||
|
op.In = append(op.In, op.Out[0])
|
||||||
|
op.sortOperand()
|
||||||
|
regInfoMerging, err = makeRegInfo(op, NoMem)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if shapeIn == OneImmIn || shapeIn == OneKmaskImmIn {
|
||||||
|
opsDataImm = append(opsDataImm, opData{asm, gOp.Asm, len(gOp.In), regInfo, gOp.Commutative, outType, resultInArg0})
|
||||||
|
if memOpData != nil {
|
||||||
|
if *op.MemFeatures != "vbcst" {
|
||||||
|
panic("simdgen only knows vbcst for mem ops for now")
|
||||||
|
}
|
||||||
|
opsDataImmLoad = append(opsDataImmLoad, *memOpData)
|
||||||
|
}
|
||||||
|
if hasMerging {
|
||||||
|
mergingLen := len(gOp.In)
|
||||||
|
if !resultInArg0 {
|
||||||
|
mergingLen++
|
||||||
|
}
|
||||||
|
opsDataImmMerging = append(opsDataImmMerging, opData{asm, gOp.Asm, mergingLen, regInfoMerging, gOp.Commutative, outType, resultInArg0})
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
opsData = append(opsData, opData{asm, gOp.Asm, len(gOp.In), regInfo, gOp.Commutative, outType, resultInArg0})
|
||||||
|
if memOpData != nil {
|
||||||
|
if *op.MemFeatures != "vbcst" {
|
||||||
|
panic("simdgen only knows vbcst for mem ops for now")
|
||||||
|
}
|
||||||
|
opsDataLoad = append(opsDataLoad, *memOpData)
|
||||||
|
}
|
||||||
|
if hasMerging {
|
||||||
|
mergingLen := len(gOp.In)
|
||||||
|
if !resultInArg0 {
|
||||||
|
mergingLen++
|
||||||
|
}
|
||||||
|
opsDataMerging = append(opsDataMerging, opData{asm, gOp.Asm, mergingLen, regInfoMerging, gOp.Commutative, outType, resultInArg0})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if len(regInfoErrs) != 0 {
|
||||||
|
for _, e := range regInfoErrs {
|
||||||
|
log.Printf("Errors: %e\n", e)
|
||||||
|
}
|
||||||
|
panic(fmt.Errorf("these regInfo unseen: %v", regInfoMissing))
|
||||||
|
}
|
||||||
|
sort.Slice(opsData, func(i, j int) bool {
|
||||||
|
return compareNatural(opsData[i].OpName, opsData[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsDataImm, func(i, j int) bool {
|
||||||
|
return compareNatural(opsDataImm[i].OpName, opsDataImm[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsDataLoad, func(i, j int) bool {
|
||||||
|
return compareNatural(opsDataLoad[i].OpName, opsDataLoad[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsDataImmLoad, func(i, j int) bool {
|
||||||
|
return compareNatural(opsDataImmLoad[i].OpName, opsDataImmLoad[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsDataMerging, func(i, j int) bool {
|
||||||
|
return compareNatural(opsDataMerging[i].OpName, opsDataMerging[j].OpName) < 0
|
||||||
|
})
|
||||||
|
sort.Slice(opsDataImmMerging, func(i, j int) bool {
|
||||||
|
return compareNatural(opsDataImmMerging[i].OpName, opsDataImmMerging[j].OpName) < 0
|
||||||
|
})
|
||||||
|
err := t.Execute(buffer, machineOpsData{opsData, opsDataImm, opsDataLoad, opsDataImmLoad,
|
||||||
|
opsDataMerging, opsDataImmMerging})
|
||||||
|
if err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
658
src/simd/_gen/simdgen/gen_simdTypes.go
Normal file
658
src/simd/_gen/simdgen/gen_simdTypes.go
Normal file
|
|
@ -0,0 +1,658 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"cmp"
|
||||||
|
"fmt"
|
||||||
|
"maps"
|
||||||
|
"slices"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
"unicode"
|
||||||
|
)
|
||||||
|
|
||||||
|
type simdType struct {
|
||||||
|
Name string // The go type name of this simd type, for example Int32x4.
|
||||||
|
Lanes int // The number of elements in this vector/mask.
|
||||||
|
Base string // The element's type, like for Int32x4 it will be int32.
|
||||||
|
Fields string // The struct fields, it should be right formatted.
|
||||||
|
Type string // Either "mask" or "vreg"
|
||||||
|
VectorCounterpart string // For mask use only: just replacing the "Mask" in [simdType.Name] with "Int"
|
||||||
|
ReshapedVectorWithAndOr string // For mask use only: vector AND and OR are only available in some shape with element width 32.
|
||||||
|
Size int // The size of the vector type
|
||||||
|
}
|
||||||
|
|
||||||
|
func (x simdType) ElemBits() int {
|
||||||
|
return x.Size / x.Lanes
|
||||||
|
}
|
||||||
|
|
||||||
|
// LanesContainer returns the smallest int/uint bit size that is
|
||||||
|
// large enough to hold one bit for each lane. E.g., Mask32x4
|
||||||
|
// is 4 lanes, and a uint8 is the smallest uint that has 4 bits.
|
||||||
|
func (x simdType) LanesContainer() int {
|
||||||
|
if x.Lanes > 64 {
|
||||||
|
panic("too many lanes")
|
||||||
|
}
|
||||||
|
if x.Lanes > 32 {
|
||||||
|
return 64
|
||||||
|
}
|
||||||
|
if x.Lanes > 16 {
|
||||||
|
return 32
|
||||||
|
}
|
||||||
|
if x.Lanes > 8 {
|
||||||
|
return 16
|
||||||
|
}
|
||||||
|
return 8
|
||||||
|
}
|
||||||
|
|
||||||
|
// MaskedLoadStoreFilter encodes which simd type type currently
|
||||||
|
// get masked loads/stores generated, it is used in two places,
|
||||||
|
// this forces coordination.
|
||||||
|
func (x simdType) MaskedLoadStoreFilter() bool {
|
||||||
|
return x.Size == 512 || x.ElemBits() >= 32 && x.Type != "mask"
|
||||||
|
}
|
||||||
|
|
||||||
|
func (x simdType) IntelSizeSuffix() string {
|
||||||
|
switch x.ElemBits() {
|
||||||
|
case 8:
|
||||||
|
return "B"
|
||||||
|
case 16:
|
||||||
|
return "W"
|
||||||
|
case 32:
|
||||||
|
return "D"
|
||||||
|
case 64:
|
||||||
|
return "Q"
|
||||||
|
}
|
||||||
|
panic("oops")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (x simdType) MaskedLoadDoc() string {
|
||||||
|
if x.Size == 512 || x.ElemBits() < 32 {
|
||||||
|
return fmt.Sprintf("// Asm: VMOVDQU%d.Z, CPU Feature: AVX512", x.ElemBits())
|
||||||
|
} else {
|
||||||
|
return fmt.Sprintf("// Asm: VMASKMOV%s, CPU Feature: AVX2", x.IntelSizeSuffix())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func (x simdType) MaskedStoreDoc() string {
|
||||||
|
if x.Size == 512 || x.ElemBits() < 32 {
|
||||||
|
return fmt.Sprintf("// Asm: VMOVDQU%d, CPU Feature: AVX512", x.ElemBits())
|
||||||
|
} else {
|
||||||
|
return fmt.Sprintf("// Asm: VMASKMOV%s, CPU Feature: AVX2", x.IntelSizeSuffix())
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareSimdTypes(x, y simdType) int {
|
||||||
|
// "vreg" then "mask"
|
||||||
|
if c := -compareNatural(x.Type, y.Type); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
// want "flo" < "int" < "uin" (and then 8 < 16 < 32 < 64),
|
||||||
|
// not "int16" < "int32" < "int64" < "int8")
|
||||||
|
// so limit comparison to first 3 bytes in string.
|
||||||
|
if c := compareNatural(x.Base[:3], y.Base[:3]); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
// base type size, 8 < 16 < 32 < 64
|
||||||
|
if c := x.ElemBits() - y.ElemBits(); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
// vector size last
|
||||||
|
return x.Size - y.Size
|
||||||
|
}
|
||||||
|
|
||||||
|
type simdTypeMap map[int][]simdType
|
||||||
|
|
||||||
|
type simdTypePair struct {
|
||||||
|
Tsrc simdType
|
||||||
|
Tdst simdType
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareSimdTypePairs(x, y simdTypePair) int {
|
||||||
|
c := compareSimdTypes(x.Tsrc, y.Tsrc)
|
||||||
|
if c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
return compareSimdTypes(x.Tdst, y.Tdst)
|
||||||
|
}
|
||||||
|
|
||||||
|
const simdPackageHeader = generatedHeader + `
|
||||||
|
//go:build goexperiment.simd
|
||||||
|
|
||||||
|
package simd
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdTypesTemplates = `
|
||||||
|
{{define "sizeTmpl"}}
|
||||||
|
// v{{.}} is a tag type that tells the compiler that this is really {{.}}-bit SIMD
|
||||||
|
type v{{.}} struct {
|
||||||
|
_{{.}} [0]func() // uncomparable
|
||||||
|
}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "typeTmpl"}}
|
||||||
|
// {{.Name}} is a {{.Size}}-bit SIMD vector of {{.Lanes}} {{.Base}}
|
||||||
|
type {{.Name}} struct {
|
||||||
|
{{.Fields}}
|
||||||
|
}
|
||||||
|
|
||||||
|
{{end}}
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdFeaturesTemplate = `
|
||||||
|
import "internal/cpu"
|
||||||
|
|
||||||
|
type X86Features struct {}
|
||||||
|
|
||||||
|
var X86 X86Features
|
||||||
|
|
||||||
|
{{range .}}
|
||||||
|
{{- if eq .Feature "AVX512"}}
|
||||||
|
// {{.Feature}} returns whether the CPU supports the AVX512F+CD+BW+DQ+VL features.
|
||||||
|
//
|
||||||
|
// These five CPU features are bundled together, and no use of AVX-512
|
||||||
|
// is allowed unless all of these features are supported together.
|
||||||
|
// Nearly every CPU that has shipped with any support for AVX-512 has
|
||||||
|
// supported all five of these features.
|
||||||
|
{{- else -}}
|
||||||
|
// {{.Feature}} returns whether the CPU supports the {{.Feature}} feature.
|
||||||
|
{{- end}}
|
||||||
|
//
|
||||||
|
// {{.Feature}} is defined on all GOARCHes, but will only return true on
|
||||||
|
// GOARCH {{.GoArch}}.
|
||||||
|
func (X86Features) {{.Feature}}() bool {
|
||||||
|
return cpu.X86.Has{{.Feature}}
|
||||||
|
}
|
||||||
|
{{end}}
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdLoadStoreTemplate = `
|
||||||
|
// Len returns the number of elements in a {{.Name}}
|
||||||
|
func (x {{.Name}}) Len() int { return {{.Lanes}} }
|
||||||
|
|
||||||
|
// Load{{.Name}} loads a {{.Name}} from an array
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func Load{{.Name}}(y *[{{.Lanes}}]{{.Base}}) {{.Name}}
|
||||||
|
|
||||||
|
// Store stores a {{.Name}} to an array
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func (x {{.Name}}) Store(y *[{{.Lanes}}]{{.Base}})
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdMaskFromValTemplate = `
|
||||||
|
// {{.Name}}FromBits constructs a {{.Name}} from a bitmap value, where 1 means set for the indexed element, 0 means unset.
|
||||||
|
{{- if ne .Lanes .LanesContainer}}
|
||||||
|
// Only the lower {{.Lanes}} bits of y are used.
|
||||||
|
{{- end}}
|
||||||
|
//
|
||||||
|
// Asm: KMOV{{.IntelSizeSuffix}}, CPU Feature: AVX512
|
||||||
|
func {{.Name}}FromBits(y uint{{.LanesContainer}}) {{.Name}}
|
||||||
|
|
||||||
|
// ToBits constructs a bitmap from a {{.Name}}, where 1 means set for the indexed element, 0 means unset.
|
||||||
|
{{- if ne .Lanes .LanesContainer}}
|
||||||
|
// Only the lower {{.Lanes}} bits of y are used.
|
||||||
|
{{- end}}
|
||||||
|
//
|
||||||
|
// Asm: KMOV{{.IntelSizeSuffix}}, CPU Features: AVX512
|
||||||
|
func (x {{.Name}}) ToBits() uint{{.LanesContainer}}
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdMaskedLoadStoreTemplate = `
|
||||||
|
// LoadMasked{{.Name}} loads a {{.Name}} from an array,
|
||||||
|
// at those elements enabled by mask
|
||||||
|
//
|
||||||
|
{{.MaskedLoadDoc}}
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func LoadMasked{{.Name}}(y *[{{.Lanes}}]{{.Base}}, mask Mask{{.ElemBits}}x{{.Lanes}}) {{.Name}}
|
||||||
|
|
||||||
|
// StoreMasked stores a {{.Name}} to an array,
|
||||||
|
// at those elements enabled by mask
|
||||||
|
//
|
||||||
|
{{.MaskedStoreDoc}}
|
||||||
|
//
|
||||||
|
//go:noescape
|
||||||
|
func (x {{.Name}}) StoreMasked(y *[{{.Lanes}}]{{.Base}}, mask Mask{{.ElemBits}}x{{.Lanes}})
|
||||||
|
`
|
||||||
|
|
||||||
|
const simdStubsTmpl = `
|
||||||
|
{{define "op1"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}() {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2_21"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2_21Type1"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3_31Zero3"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op2NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3_21"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3_21Type1"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op0NameAndType "y"}}, {{.Op2NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3_231Type1"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.Op0NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2VecAsScalar"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}(y uint{{(index .In 1).TreatLikeAScalarOfSize}}) {{(index .Out 0).Go}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3VecAsScalar"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}(y uint{{(index .In 1).TreatLikeAScalarOfSize}}, {{.Op2NameAndType "z"}}) {{(index .Out 0).Go}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op4"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op0NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op2NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op4_231Type1"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.Op0NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op4_31"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op2NameAndType "x"}}) {{.Go}}({{.Op1NameAndType "y"}}, {{.Op0NameAndType "z"}}, {{.Op3NameAndType "u"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op1Imm8"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2Imm8"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2Imm8_2I"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.ImmName}} uint8) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2Imm8_II"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} result in better performance when they are constants, non-constant values will be translated into a jump table.
|
||||||
|
// {{.ImmName}} should be between 0 and 3, inclusive; other values may result in a runtime panic.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op2Imm8_SHA1RNDS4"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3Imm8"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}, {{.Op3NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "op3Imm8_2I"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.Op2NameAndType "y"}}, {{.ImmName}} uint8, {{.Op3NameAndType "z"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
|
||||||
|
{{define "op4Imm8"}}
|
||||||
|
{{if .Documentation}}{{.Documentation}}
|
||||||
|
//{{end}}
|
||||||
|
// {{.ImmName}} results in better performance when it's a constant, a non-constant value will be translated into a jump table.
|
||||||
|
//
|
||||||
|
// Asm: {{.Asm}}, CPU Feature: {{.CPUFeature}}
|
||||||
|
func ({{.Op1NameAndType "x"}}) {{.Go}}({{.ImmName}} uint8, {{.Op2NameAndType "y"}}, {{.Op3NameAndType "z"}}, {{.Op4NameAndType "u"}}) {{.GoType}}
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "vectorConversion"}}
|
||||||
|
// {{.Tdst.Name}} converts from {{.Tsrc.Name}} to {{.Tdst.Name}}
|
||||||
|
func (from {{.Tsrc.Name}}) As{{.Tdst.Name}}() (to {{.Tdst.Name}})
|
||||||
|
{{end}}
|
||||||
|
|
||||||
|
{{define "mask"}}
|
||||||
|
// As{{.VectorCounterpart}} converts from {{.Name}} to {{.VectorCounterpart}}
|
||||||
|
func (from {{.Name}}) As{{.VectorCounterpart}}() (to {{.VectorCounterpart}})
|
||||||
|
|
||||||
|
// asMask converts from {{.VectorCounterpart}} to {{.Name}}
|
||||||
|
func (from {{.VectorCounterpart}}) asMask() (to {{.Name}})
|
||||||
|
|
||||||
|
func (x {{.Name}}) And(y {{.Name}}) {{.Name}}
|
||||||
|
|
||||||
|
func (x {{.Name}}) Or(y {{.Name}}) {{.Name}}
|
||||||
|
{{end}}
|
||||||
|
`
|
||||||
|
|
||||||
|
// parseSIMDTypes groups go simd types by their vector sizes, and
|
||||||
|
// returns a map whose key is the vector size, value is the simd type.
|
||||||
|
func parseSIMDTypes(ops []Operation) simdTypeMap {
|
||||||
|
// TODO: maybe instead of going over ops, let's try go over types.yaml.
|
||||||
|
ret := map[int][]simdType{}
|
||||||
|
seen := map[string]struct{}{}
|
||||||
|
processArg := func(arg Operand) {
|
||||||
|
if arg.Class == "immediate" || arg.Class == "greg" {
|
||||||
|
// Immediates are not encoded as vector types.
|
||||||
|
return
|
||||||
|
}
|
||||||
|
if _, ok := seen[*arg.Go]; ok {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
seen[*arg.Go] = struct{}{}
|
||||||
|
|
||||||
|
lanes := *arg.Lanes
|
||||||
|
base := fmt.Sprintf("%s%d", *arg.Base, *arg.ElemBits)
|
||||||
|
tagFieldNameS := fmt.Sprintf("%sx%d", base, lanes)
|
||||||
|
tagFieldS := fmt.Sprintf("%s v%d", tagFieldNameS, *arg.Bits)
|
||||||
|
valFieldS := fmt.Sprintf("vals%s[%d]%s", strings.Repeat(" ", len(tagFieldNameS)-3), lanes, base)
|
||||||
|
fields := fmt.Sprintf("\t%s\n\t%s", tagFieldS, valFieldS)
|
||||||
|
if arg.Class == "mask" {
|
||||||
|
vectorCounterpart := strings.ReplaceAll(*arg.Go, "Mask", "Int")
|
||||||
|
reshapedVectorWithAndOr := fmt.Sprintf("Int32x%d", *arg.Bits/32)
|
||||||
|
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{*arg.Go, lanes, base, fields, arg.Class, vectorCounterpart, reshapedVectorWithAndOr, *arg.Bits})
|
||||||
|
// In case the vector counterpart of a mask is not present, put its vector counterpart typedef into the map as well.
|
||||||
|
if _, ok := seen[vectorCounterpart]; !ok {
|
||||||
|
seen[vectorCounterpart] = struct{}{}
|
||||||
|
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{vectorCounterpart, lanes, base, fields, "vreg", "", "", *arg.Bits})
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
ret[*arg.Bits] = append(ret[*arg.Bits], simdType{*arg.Go, lanes, base, fields, arg.Class, "", "", *arg.Bits})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for _, op := range ops {
|
||||||
|
for _, arg := range op.In {
|
||||||
|
processArg(arg)
|
||||||
|
}
|
||||||
|
for _, arg := range op.Out {
|
||||||
|
processArg(arg)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return ret
|
||||||
|
}
|
||||||
|
|
||||||
|
func vConvertFromTypeMap(typeMap simdTypeMap) []simdTypePair {
|
||||||
|
v := []simdTypePair{}
|
||||||
|
for _, ts := range typeMap {
|
||||||
|
for i, tsrc := range ts {
|
||||||
|
for j, tdst := range ts {
|
||||||
|
if i != j && tsrc.Type == tdst.Type && tsrc.Type == "vreg" &&
|
||||||
|
tsrc.Lanes > 1 && tdst.Lanes > 1 {
|
||||||
|
v = append(v, simdTypePair{tsrc, tdst})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
slices.SortFunc(v, compareSimdTypePairs)
|
||||||
|
return v
|
||||||
|
}
|
||||||
|
|
||||||
|
func masksFromTypeMap(typeMap simdTypeMap) []simdType {
|
||||||
|
m := []simdType{}
|
||||||
|
for _, ts := range typeMap {
|
||||||
|
for _, tsrc := range ts {
|
||||||
|
if tsrc.Type == "mask" {
|
||||||
|
m = append(m, tsrc)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
slices.SortFunc(m, compareSimdTypes)
|
||||||
|
return m
|
||||||
|
}
|
||||||
|
|
||||||
|
func typesFromTypeMap(typeMap simdTypeMap) []simdType {
|
||||||
|
m := []simdType{}
|
||||||
|
for _, ts := range typeMap {
|
||||||
|
for _, tsrc := range ts {
|
||||||
|
if tsrc.Lanes > 1 {
|
||||||
|
m = append(m, tsrc)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
slices.SortFunc(m, compareSimdTypes)
|
||||||
|
return m
|
||||||
|
}
|
||||||
|
|
||||||
|
// writeSIMDTypes generates the simd vector types into a bytes.Buffer
|
||||||
|
func writeSIMDTypes(typeMap simdTypeMap) *bytes.Buffer {
|
||||||
|
t := templateOf(simdTypesTemplates, "types_amd64")
|
||||||
|
loadStore := templateOf(simdLoadStoreTemplate, "loadstore_amd64")
|
||||||
|
maskedLoadStore := templateOf(simdMaskedLoadStoreTemplate, "maskedloadstore_amd64")
|
||||||
|
maskFromVal := templateOf(simdMaskFromValTemplate, "maskFromVal_amd64")
|
||||||
|
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(simdPackageHeader)
|
||||||
|
|
||||||
|
sizes := make([]int, 0, len(typeMap))
|
||||||
|
for size, types := range typeMap {
|
||||||
|
slices.SortFunc(types, compareSimdTypes)
|
||||||
|
sizes = append(sizes, size)
|
||||||
|
}
|
||||||
|
sort.Ints(sizes)
|
||||||
|
|
||||||
|
for _, size := range sizes {
|
||||||
|
if size <= 64 {
|
||||||
|
// these are scalar
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if err := t.ExecuteTemplate(buffer, "sizeTmpl", size); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute size template for size %d: %w", size, err))
|
||||||
|
}
|
||||||
|
for _, typeDef := range typeMap[size] {
|
||||||
|
if typeDef.Lanes == 1 {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if err := t.ExecuteTemplate(buffer, "typeTmpl", typeDef); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute type template for type %s: %w", typeDef.Name, err))
|
||||||
|
}
|
||||||
|
if typeDef.Type != "mask" {
|
||||||
|
if err := loadStore.ExecuteTemplate(buffer, "loadstore_amd64", typeDef); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute loadstore template for type %s: %w", typeDef.Name, err))
|
||||||
|
}
|
||||||
|
// restrict to AVX2 masked loads/stores first.
|
||||||
|
if typeDef.MaskedLoadStoreFilter() {
|
||||||
|
if err := maskedLoadStore.ExecuteTemplate(buffer, "maskedloadstore_amd64", typeDef); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute maskedloadstore template for type %s: %w", typeDef.Name, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if err := maskFromVal.ExecuteTemplate(buffer, "maskFromVal_amd64", typeDef); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute maskFromVal template for type %s: %w", typeDef.Name, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
|
|
||||||
|
func writeSIMDFeatures(ops []Operation) *bytes.Buffer {
|
||||||
|
// Gather all features
|
||||||
|
type featureKey struct {
|
||||||
|
GoArch string
|
||||||
|
Feature string
|
||||||
|
}
|
||||||
|
featureSet := make(map[featureKey]struct{})
|
||||||
|
for _, op := range ops {
|
||||||
|
// Generate a feature check for each independant feature in a
|
||||||
|
// composite feature.
|
||||||
|
for feature := range strings.SplitSeq(op.CPUFeature, ",") {
|
||||||
|
feature = strings.TrimSpace(feature)
|
||||||
|
featureSet[featureKey{op.GoArch, feature}] = struct{}{}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
features := slices.SortedFunc(maps.Keys(featureSet), func(a, b featureKey) int {
|
||||||
|
if c := cmp.Compare(a.GoArch, b.GoArch); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
return compareNatural(a.Feature, b.Feature)
|
||||||
|
})
|
||||||
|
|
||||||
|
// If we ever have the same feature name on more than one GOARCH, we'll have
|
||||||
|
// to be more careful about this.
|
||||||
|
t := templateOf(simdFeaturesTemplate, "features")
|
||||||
|
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(simdPackageHeader)
|
||||||
|
|
||||||
|
if err := t.Execute(buffer, features); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute features template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
|
|
||||||
|
// writeSIMDStubs returns two bytes.Buffers containing the declarations for the public
|
||||||
|
// and internal-use vector intrinsics.
|
||||||
|
func writeSIMDStubs(ops []Operation, typeMap simdTypeMap) (f, fI *bytes.Buffer) {
|
||||||
|
t := templateOf(simdStubsTmpl, "simdStubs")
|
||||||
|
f = new(bytes.Buffer)
|
||||||
|
fI = new(bytes.Buffer)
|
||||||
|
f.WriteString(simdPackageHeader)
|
||||||
|
fI.WriteString(simdPackageHeader)
|
||||||
|
|
||||||
|
slices.SortFunc(ops, compareOperations)
|
||||||
|
|
||||||
|
for i, op := range ops {
|
||||||
|
if op.NoTypes != nil && *op.NoTypes == "true" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if op.SkipMaskedMethod() {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
idxVecAsScalar, err := checkVecAsScalar(op)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
if s, op, err := classifyOp(op); err == nil {
|
||||||
|
if idxVecAsScalar != -1 {
|
||||||
|
if s == "op2" || s == "op3" {
|
||||||
|
s += "VecAsScalar"
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen only supports op2 or op3 with TreatLikeAScalarOfSize"))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if i == 0 || op.Go != ops[i-1].Go {
|
||||||
|
if unicode.IsUpper([]rune(op.Go)[0]) {
|
||||||
|
fmt.Fprintf(f, "\n/* %s */\n", op.Go)
|
||||||
|
} else {
|
||||||
|
fmt.Fprintf(fI, "\n/* %s */\n", op.Go)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if unicode.IsUpper([]rune(op.Go)[0]) {
|
||||||
|
if err := t.ExecuteTemplate(f, s, op); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template %s for op %v: %w", s, op, err))
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
if err := t.ExecuteTemplate(fI, s, op); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template %s for op %v: %w", s, op, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("failed to classify op %v: %w", op.Go, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
vectorConversions := vConvertFromTypeMap(typeMap)
|
||||||
|
for _, conv := range vectorConversions {
|
||||||
|
if err := t.ExecuteTemplate(f, "vectorConversion", conv); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute vectorConversion template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
masks := masksFromTypeMap(typeMap)
|
||||||
|
for _, mask := range masks {
|
||||||
|
if err := t.ExecuteTemplate(f, "mask", mask); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute mask template for mask %s: %w", mask.Name, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return
|
||||||
|
}
|
||||||
397
src/simd/_gen/simdgen/gen_simdrules.go
Normal file
397
src/simd/_gen/simdgen/gen_simdrules.go
Normal file
|
|
@ -0,0 +1,397 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"slices"
|
||||||
|
"strings"
|
||||||
|
"text/template"
|
||||||
|
)
|
||||||
|
|
||||||
|
type tplRuleData struct {
|
||||||
|
tplName string // e.g. "sftimm"
|
||||||
|
GoOp string // e.g. "ShiftAllLeft"
|
||||||
|
GoType string // e.g. "Uint32x8"
|
||||||
|
Args string // e.g. "x y"
|
||||||
|
Asm string // e.g. "VPSLLD256"
|
||||||
|
ArgsOut string // e.g. "x y"
|
||||||
|
MaskInConvert string // e.g. "VPMOVVec32x8ToM"
|
||||||
|
MaskOutConvert string // e.g. "VPMOVMToVec32x8"
|
||||||
|
ElementSize int // e.g. 32
|
||||||
|
Size int // e.g. 128
|
||||||
|
ArgsLoadAddr string // [Args] with its last vreg arg being a concrete "(VMOVDQUload* ptr mem)", and might contain mask.
|
||||||
|
ArgsAddr string // [Args] with its last vreg arg being replaced by "ptr", and might contain mask, and with a "mem" at the end.
|
||||||
|
FeatCheck string // e.g. "v.Block.CPUfeatures.hasFeature(CPUavx512)" -- for a ssa/_gen rules file.
|
||||||
|
}
|
||||||
|
|
||||||
|
var (
|
||||||
|
ruleTemplates = template.Must(template.New("simdRules").Parse(`
|
||||||
|
{{define "pureVreg"}}({{.GoOp}}{{.GoType}} {{.Args}}) => ({{.Asm}} {{.ArgsOut}})
|
||||||
|
{{end}}
|
||||||
|
{{define "maskIn"}}({{.GoOp}}{{.GoType}} {{.Args}} mask) => ({{.Asm}} {{.ArgsOut}} ({{.MaskInConvert}} <types.TypeMask> mask))
|
||||||
|
{{end}}
|
||||||
|
{{define "maskOut"}}({{.GoOp}}{{.GoType}} {{.Args}}) => ({{.MaskOutConvert}} ({{.Asm}} {{.ArgsOut}}))
|
||||||
|
{{end}}
|
||||||
|
{{define "maskInMaskOut"}}({{.GoOp}}{{.GoType}} {{.Args}} mask) => ({{.MaskOutConvert}} ({{.Asm}} {{.ArgsOut}} ({{.MaskInConvert}} <types.TypeMask> mask)))
|
||||||
|
{{end}}
|
||||||
|
{{define "sftimm"}}({{.Asm}} x (MOVQconst [c])) => ({{.Asm}}const [uint8(c)] x)
|
||||||
|
{{end}}
|
||||||
|
{{define "masksftimm"}}({{.Asm}} x (MOVQconst [c]) mask) => ({{.Asm}}const [uint8(c)] x mask)
|
||||||
|
{{end}}
|
||||||
|
{{define "vregMem"}}({{.Asm}} {{.ArgsLoadAddr}}) && canMergeLoad(v, l) && clobber(l) => ({{.Asm}}load {{.ArgsAddr}})
|
||||||
|
{{end}}
|
||||||
|
{{define "vregMemFeatCheck"}}({{.Asm}} {{.ArgsLoadAddr}}) && {{.FeatCheck}} && canMergeLoad(v, l) && clobber(l)=> ({{.Asm}}load {{.ArgsAddr}})
|
||||||
|
{{end}}
|
||||||
|
`))
|
||||||
|
)
|
||||||
|
|
||||||
|
func (d tplRuleData) MaskOptimization(asmCheck map[string]bool) string {
|
||||||
|
asmNoMask := d.Asm
|
||||||
|
if i := strings.Index(asmNoMask, "Masked"); i == -1 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
asmNoMask = strings.ReplaceAll(asmNoMask, "Masked", "")
|
||||||
|
if asmCheck[asmNoMask] == false {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, nope := range []string{"VMOVDQU", "VPCOMPRESS", "VCOMPRESS", "VPEXPAND", "VEXPAND", "VPBLENDM", "VMOVUP"} {
|
||||||
|
if strings.HasPrefix(asmNoMask, nope) {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
size := asmNoMask[len(asmNoMask)-3:]
|
||||||
|
if strings.HasSuffix(asmNoMask, "const") {
|
||||||
|
sufLen := len("128const")
|
||||||
|
size = asmNoMask[len(asmNoMask)-sufLen:][:3]
|
||||||
|
}
|
||||||
|
switch size {
|
||||||
|
case "128", "256", "512":
|
||||||
|
default:
|
||||||
|
panic("Unexpected operation size on " + d.Asm)
|
||||||
|
}
|
||||||
|
|
||||||
|
switch d.ElementSize {
|
||||||
|
case 8, 16, 32, 64:
|
||||||
|
default:
|
||||||
|
panic(fmt.Errorf("Unexpected operation width %d on %v", d.ElementSize, d.Asm))
|
||||||
|
}
|
||||||
|
|
||||||
|
return fmt.Sprintf("(VMOVDQU%dMasked%s (%s %s) mask) => (%s %s mask)\n", d.ElementSize, size, asmNoMask, d.Args, d.Asm, d.Args)
|
||||||
|
}
|
||||||
|
|
||||||
|
// SSA rewrite rules need to appear in a most-to-least-specific order. This works for that.
|
||||||
|
var tmplOrder = map[string]int{
|
||||||
|
"masksftimm": 0,
|
||||||
|
"sftimm": 1,
|
||||||
|
"maskInMaskOut": 2,
|
||||||
|
"maskOut": 3,
|
||||||
|
"maskIn": 4,
|
||||||
|
"pureVreg": 5,
|
||||||
|
"vregMem": 6,
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareTplRuleData(x, y tplRuleData) int {
|
||||||
|
if c := compareNatural(x.GoOp, y.GoOp); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if c := compareNatural(x.GoType, y.GoType); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if c := compareNatural(x.Args, y.Args); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if x.tplName == y.tplName {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
xo, xok := tmplOrder[x.tplName]
|
||||||
|
yo, yok := tmplOrder[y.tplName]
|
||||||
|
if !xok {
|
||||||
|
panic(fmt.Errorf("Unexpected template name %s, please add to tmplOrder", x.tplName))
|
||||||
|
}
|
||||||
|
if !yok {
|
||||||
|
panic(fmt.Errorf("Unexpected template name %s, please add to tmplOrder", y.tplName))
|
||||||
|
}
|
||||||
|
return xo - yo
|
||||||
|
}
|
||||||
|
|
||||||
|
// writeSIMDRules generates the lowering and rewrite rules for ssa and writes it to simdAMD64.rules
|
||||||
|
// within the specified directory.
|
||||||
|
func writeSIMDRules(ops []Operation) *bytes.Buffer {
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
buffer.WriteString(generatedHeader + "\n")
|
||||||
|
|
||||||
|
// asm -> masked merging rules
|
||||||
|
maskedMergeOpts := make(map[string]string)
|
||||||
|
s2n := map[int]string{8: "B", 16: "W", 32: "D", 64: "Q"}
|
||||||
|
asmCheck := map[string]bool{}
|
||||||
|
var allData []tplRuleData
|
||||||
|
var optData []tplRuleData // for mask peephole optimizations, and other misc
|
||||||
|
var memOptData []tplRuleData // for memory peephole optimizations
|
||||||
|
memOpSeen := make(map[string]bool)
|
||||||
|
|
||||||
|
for _, opr := range ops {
|
||||||
|
opInShape, opOutShape, maskType, immType, gOp := opr.shape()
|
||||||
|
asm := machineOpName(maskType, gOp)
|
||||||
|
vregInCnt := len(gOp.In)
|
||||||
|
if maskType == OneMask {
|
||||||
|
vregInCnt--
|
||||||
|
}
|
||||||
|
|
||||||
|
data := tplRuleData{
|
||||||
|
GoOp: gOp.Go,
|
||||||
|
Asm: asm,
|
||||||
|
}
|
||||||
|
|
||||||
|
if vregInCnt == 1 {
|
||||||
|
data.Args = "x"
|
||||||
|
data.ArgsOut = data.Args
|
||||||
|
} else if vregInCnt == 2 {
|
||||||
|
data.Args = "x y"
|
||||||
|
data.ArgsOut = data.Args
|
||||||
|
} else if vregInCnt == 3 {
|
||||||
|
data.Args = "x y z"
|
||||||
|
data.ArgsOut = data.Args
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen does not support more than 3 vreg in inputs"))
|
||||||
|
}
|
||||||
|
if immType == ConstImm {
|
||||||
|
data.ArgsOut = fmt.Sprintf("[%s] %s", *opr.In[0].Const, data.ArgsOut)
|
||||||
|
} else if immType == VarImm {
|
||||||
|
data.Args = fmt.Sprintf("[a] %s", data.Args)
|
||||||
|
data.ArgsOut = fmt.Sprintf("[a] %s", data.ArgsOut)
|
||||||
|
} else if immType == ConstVarImm {
|
||||||
|
data.Args = fmt.Sprintf("[a] %s", data.Args)
|
||||||
|
data.ArgsOut = fmt.Sprintf("[a+%s] %s", *opr.In[0].Const, data.ArgsOut)
|
||||||
|
}
|
||||||
|
|
||||||
|
goType := func(op Operation) string {
|
||||||
|
if op.OperandOrder != nil {
|
||||||
|
switch *op.OperandOrder {
|
||||||
|
case "21Type1", "231Type1":
|
||||||
|
// Permute uses operand[1] for method receiver.
|
||||||
|
return *op.In[1].Go
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return *op.In[0].Go
|
||||||
|
}
|
||||||
|
var tplName string
|
||||||
|
// If class overwrite is happening, that's not really a mask but a vreg.
|
||||||
|
if opOutShape == OneVregOut || opOutShape == OneVregOutAtIn || gOp.Out[0].OverwriteClass != nil {
|
||||||
|
switch opInShape {
|
||||||
|
case OneImmIn:
|
||||||
|
tplName = "pureVreg"
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
case PureVregIn:
|
||||||
|
tplName = "pureVreg"
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
case OneKmaskImmIn:
|
||||||
|
fallthrough
|
||||||
|
case OneKmaskIn:
|
||||||
|
tplName = "maskIn"
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
rearIdx := len(gOp.In) - 1
|
||||||
|
// Mask is at the end.
|
||||||
|
width := *gOp.In[rearIdx].ElemBits
|
||||||
|
data.MaskInConvert = fmt.Sprintf("VPMOVVec%dx%dToM", width, *gOp.In[rearIdx].Lanes)
|
||||||
|
data.ElementSize = width
|
||||||
|
case PureKmaskIn:
|
||||||
|
panic(fmt.Errorf("simdgen does not support pure k mask instructions, they should be generated by compiler optimizations"))
|
||||||
|
}
|
||||||
|
} else if opOutShape == OneGregOut {
|
||||||
|
tplName = "pureVreg" // TODO this will be wrong
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
} else {
|
||||||
|
// OneKmaskOut case
|
||||||
|
data.MaskOutConvert = fmt.Sprintf("VPMOVMToVec%dx%d", *gOp.Out[0].ElemBits, *gOp.In[0].Lanes)
|
||||||
|
switch opInShape {
|
||||||
|
case OneImmIn:
|
||||||
|
fallthrough
|
||||||
|
case PureVregIn:
|
||||||
|
tplName = "maskOut"
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
case OneKmaskImmIn:
|
||||||
|
fallthrough
|
||||||
|
case OneKmaskIn:
|
||||||
|
tplName = "maskInMaskOut"
|
||||||
|
data.GoType = goType(gOp)
|
||||||
|
rearIdx := len(gOp.In) - 1
|
||||||
|
data.MaskInConvert = fmt.Sprintf("VPMOVVec%dx%dToM", *gOp.In[rearIdx].ElemBits, *gOp.In[rearIdx].Lanes)
|
||||||
|
case PureKmaskIn:
|
||||||
|
panic(fmt.Errorf("simdgen does not support pure k mask instructions, they should be generated by compiler optimizations"))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if gOp.SpecialLower != nil {
|
||||||
|
if *gOp.SpecialLower == "sftimm" {
|
||||||
|
if data.GoType[0] == 'I' {
|
||||||
|
// only do these for signed types, it is a duplicate rewrite for unsigned
|
||||||
|
sftImmData := data
|
||||||
|
if tplName == "maskIn" {
|
||||||
|
sftImmData.tplName = "masksftimm"
|
||||||
|
} else {
|
||||||
|
sftImmData.tplName = "sftimm"
|
||||||
|
}
|
||||||
|
allData = append(allData, sftImmData)
|
||||||
|
asmCheck[sftImmData.Asm+"const"] = true
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
panic("simdgen sees unknwon special lower " + *gOp.SpecialLower + ", maybe implement it?")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if gOp.MemFeatures != nil && *gOp.MemFeatures == "vbcst" {
|
||||||
|
// sanity check
|
||||||
|
selected := true
|
||||||
|
for _, a := range gOp.In {
|
||||||
|
if a.TreatLikeAScalarOfSize != nil {
|
||||||
|
selected = false
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if _, ok := memOpSeen[data.Asm]; ok {
|
||||||
|
selected = false
|
||||||
|
}
|
||||||
|
if selected {
|
||||||
|
memOpSeen[data.Asm] = true
|
||||||
|
lastVreg := gOp.In[vregInCnt-1]
|
||||||
|
// sanity check
|
||||||
|
if lastVreg.Class != "vreg" {
|
||||||
|
panic(fmt.Errorf("simdgen expects vbcst replaced operand to be a vreg, but %v found", lastVreg))
|
||||||
|
}
|
||||||
|
memOpData := data
|
||||||
|
// Remove the last vreg from the arg and change it to a load.
|
||||||
|
origArgs := data.Args[:len(data.Args)-1]
|
||||||
|
// Prepare imm args.
|
||||||
|
immArg := ""
|
||||||
|
immArgCombineOff := " [off] "
|
||||||
|
if immType != NoImm && immType != InvalidImm {
|
||||||
|
_, after, found := strings.Cut(origArgs, "]")
|
||||||
|
if found {
|
||||||
|
origArgs = after
|
||||||
|
}
|
||||||
|
immArg = "[c] "
|
||||||
|
immArgCombineOff = " [makeValAndOff(int32(int8(c)),off)] "
|
||||||
|
}
|
||||||
|
memOpData.ArgsLoadAddr = immArg + origArgs + fmt.Sprintf("l:(VMOVDQUload%d {sym} [off] ptr mem)", *lastVreg.Bits)
|
||||||
|
// Remove the last vreg from the arg and change it to "ptr".
|
||||||
|
memOpData.ArgsAddr = "{sym}" + immArgCombineOff + origArgs + "ptr"
|
||||||
|
if maskType == OneMask {
|
||||||
|
memOpData.ArgsAddr += " mask"
|
||||||
|
memOpData.ArgsLoadAddr += " mask"
|
||||||
|
}
|
||||||
|
memOpData.ArgsAddr += " mem"
|
||||||
|
if gOp.MemFeaturesData != nil {
|
||||||
|
_, feat2 := getVbcstData(*gOp.MemFeaturesData)
|
||||||
|
knownFeatChecks := map[string]string{
|
||||||
|
"AVX": "v.Block.CPUfeatures.hasFeature(CPUavx)",
|
||||||
|
"AVX2": "v.Block.CPUfeatures.hasFeature(CPUavx2)",
|
||||||
|
"AVX512": "v.Block.CPUfeatures.hasFeature(CPUavx512)",
|
||||||
|
}
|
||||||
|
memOpData.FeatCheck = knownFeatChecks[feat2]
|
||||||
|
memOpData.tplName = "vregMemFeatCheck"
|
||||||
|
} else {
|
||||||
|
memOpData.tplName = "vregMem"
|
||||||
|
}
|
||||||
|
memOptData = append(memOptData, memOpData)
|
||||||
|
asmCheck[memOpData.Asm+"load"] = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Generate the masked merging optimization rules
|
||||||
|
if gOp.hasMaskedMerging(maskType, opOutShape) {
|
||||||
|
// TODO: handle customized operand order and special lower.
|
||||||
|
maskElem := gOp.In[len(gOp.In)-1]
|
||||||
|
if maskElem.Bits == nil {
|
||||||
|
panic("mask has no bits")
|
||||||
|
}
|
||||||
|
if maskElem.ElemBits == nil {
|
||||||
|
panic("mask has no elemBits")
|
||||||
|
}
|
||||||
|
if maskElem.Lanes == nil {
|
||||||
|
panic("mask has no lanes")
|
||||||
|
}
|
||||||
|
switch *maskElem.Bits {
|
||||||
|
case 128, 256:
|
||||||
|
// VPBLENDVB cases.
|
||||||
|
noMaskName := machineOpName(NoMask, gOp)
|
||||||
|
ruleExisting, ok := maskedMergeOpts[noMaskName]
|
||||||
|
rule := fmt.Sprintf("(VPBLENDVB%d dst (%s %s) mask) && v.Block.CPUfeatures.hasFeature(CPUavx512) => (%sMerging dst %s (VPMOVVec%dx%dToM <types.TypeMask> mask))\n",
|
||||||
|
*maskElem.Bits, noMaskName, data.Args, data.Asm, data.Args, *maskElem.ElemBits, *maskElem.Lanes)
|
||||||
|
if ok && ruleExisting != rule {
|
||||||
|
panic("multiple masked merge rules for one op")
|
||||||
|
} else {
|
||||||
|
maskedMergeOpts[noMaskName] = rule
|
||||||
|
}
|
||||||
|
case 512:
|
||||||
|
// VPBLENDM[BWDQ] cases.
|
||||||
|
noMaskName := machineOpName(NoMask, gOp)
|
||||||
|
ruleExisting, ok := maskedMergeOpts[noMaskName]
|
||||||
|
rule := fmt.Sprintf("(VPBLENDM%sMasked%d dst (%s %s) mask) => (%sMerging dst %s mask)\n",
|
||||||
|
s2n[*maskElem.ElemBits], *maskElem.Bits, noMaskName, data.Args, data.Asm, data.Args)
|
||||||
|
if ok && ruleExisting != rule {
|
||||||
|
panic("multiple masked merge rules for one op")
|
||||||
|
} else {
|
||||||
|
maskedMergeOpts[noMaskName] = rule
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if tplName == "pureVreg" && data.Args == data.ArgsOut {
|
||||||
|
data.Args = "..."
|
||||||
|
data.ArgsOut = "..."
|
||||||
|
}
|
||||||
|
data.tplName = tplName
|
||||||
|
if opr.NoGenericOps != nil && *opr.NoGenericOps == "true" ||
|
||||||
|
opr.SkipMaskedMethod() {
|
||||||
|
optData = append(optData, data)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
allData = append(allData, data)
|
||||||
|
asmCheck[data.Asm] = true
|
||||||
|
}
|
||||||
|
|
||||||
|
slices.SortFunc(allData, compareTplRuleData)
|
||||||
|
|
||||||
|
for _, data := range allData {
|
||||||
|
if err := ruleTemplates.ExecuteTemplate(buffer, data.tplName, data); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template %s for %s: %w", data.tplName, data.GoOp+data.GoType, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
seen := make(map[string]bool)
|
||||||
|
|
||||||
|
for _, data := range optData {
|
||||||
|
if data.tplName == "maskIn" {
|
||||||
|
rule := data.MaskOptimization(asmCheck)
|
||||||
|
if seen[rule] {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
seen[rule] = true
|
||||||
|
buffer.WriteString(rule)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
maskedMergeOptsRules := []string{}
|
||||||
|
for asm, rule := range maskedMergeOpts {
|
||||||
|
if !asmCheck[asm] {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
maskedMergeOptsRules = append(maskedMergeOptsRules, rule)
|
||||||
|
}
|
||||||
|
slices.Sort(maskedMergeOptsRules)
|
||||||
|
for _, rule := range maskedMergeOptsRules {
|
||||||
|
buffer.WriteString(rule)
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, data := range memOptData {
|
||||||
|
if err := ruleTemplates.ExecuteTemplate(buffer, data.tplName, data); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute template %s for %s: %w", data.tplName, data.Asm, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
236
src/simd/_gen/simdgen/gen_simdssa.go
Normal file
236
src/simd/_gen/simdgen/gen_simdssa.go
Normal file
|
|
@ -0,0 +1,236 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"strings"
|
||||||
|
"text/template"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
ssaTemplates = template.Must(template.New("simdSSA").Parse(`
|
||||||
|
{{define "header"}}// Code generated by x/arch/internal/simdgen using 'go run . -xedPath $XED_PATH -o godefs -goroot $GOROOT go.yaml types.yaml categories.yaml'; DO NOT EDIT.
|
||||||
|
|
||||||
|
package amd64
|
||||||
|
|
||||||
|
import (
|
||||||
|
"cmd/compile/internal/ssa"
|
||||||
|
"cmd/compile/internal/ssagen"
|
||||||
|
"cmd/internal/obj"
|
||||||
|
"cmd/internal/obj/x86"
|
||||||
|
)
|
||||||
|
|
||||||
|
func ssaGenSIMDValue(s *ssagen.State, v *ssa.Value) bool {
|
||||||
|
var p *obj.Prog
|
||||||
|
switch v.Op {{"{"}}{{end}}
|
||||||
|
{{define "case"}}
|
||||||
|
case {{.Cases}}:
|
||||||
|
p = {{.Helper}}(s, v)
|
||||||
|
{{end}}
|
||||||
|
{{define "footer"}}
|
||||||
|
default:
|
||||||
|
// Unknown reg shape
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
{{end}}
|
||||||
|
{{define "zeroing"}}
|
||||||
|
// Masked operation are always compiled with zeroing.
|
||||||
|
switch v.Op {
|
||||||
|
case {{.}}:
|
||||||
|
x86.ParseSuffix(p, "Z")
|
||||||
|
}
|
||||||
|
{{end}}
|
||||||
|
{{define "ending"}}
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
{{end}}`))
|
||||||
|
)
|
||||||
|
|
||||||
|
type tplSSAData struct {
|
||||||
|
Cases string
|
||||||
|
Helper string
|
||||||
|
}
|
||||||
|
|
||||||
|
// writeSIMDSSA generates the ssa to prog lowering codes and writes it to simdssa.go
|
||||||
|
// within the specified directory.
|
||||||
|
func writeSIMDSSA(ops []Operation) *bytes.Buffer {
|
||||||
|
var ZeroingMask []string
|
||||||
|
regInfoKeys := []string{
|
||||||
|
"v11",
|
||||||
|
"v21",
|
||||||
|
"v2k",
|
||||||
|
"v2kv",
|
||||||
|
"v2kk",
|
||||||
|
"vkv",
|
||||||
|
"v31",
|
||||||
|
"v3kv",
|
||||||
|
"v11Imm8",
|
||||||
|
"vkvImm8",
|
||||||
|
"v21Imm8",
|
||||||
|
"v2kImm8",
|
||||||
|
"v2kkImm8",
|
||||||
|
"v31ResultInArg0",
|
||||||
|
"v3kvResultInArg0",
|
||||||
|
"vfpv",
|
||||||
|
"vfpkv",
|
||||||
|
"vgpvImm8",
|
||||||
|
"vgpImm8",
|
||||||
|
"v2kvImm8",
|
||||||
|
"vkvload",
|
||||||
|
"v21load",
|
||||||
|
"v31loadResultInArg0",
|
||||||
|
"v3kvloadResultInArg0",
|
||||||
|
"v2kvload",
|
||||||
|
"v2kload",
|
||||||
|
"v11load",
|
||||||
|
"v11loadImm8",
|
||||||
|
"vkvloadImm8",
|
||||||
|
"v21loadImm8",
|
||||||
|
"v2kloadImm8",
|
||||||
|
"v2kkloadImm8",
|
||||||
|
"v2kvloadImm8",
|
||||||
|
"v31ResultInArg0Imm8",
|
||||||
|
"v31loadResultInArg0Imm8",
|
||||||
|
"v21ResultInArg0",
|
||||||
|
"v21ResultInArg0Imm8",
|
||||||
|
"v31x0AtIn2ResultInArg0",
|
||||||
|
"v2kvResultInArg0",
|
||||||
|
}
|
||||||
|
regInfoSet := map[string][]string{}
|
||||||
|
for _, key := range regInfoKeys {
|
||||||
|
regInfoSet[key] = []string{}
|
||||||
|
}
|
||||||
|
|
||||||
|
seen := map[string]struct{}{}
|
||||||
|
allUnseen := make(map[string][]Operation)
|
||||||
|
allUnseenCaseStr := make(map[string][]string)
|
||||||
|
classifyOp := func(op Operation, maskType maskShape, shapeIn inShape, shapeOut outShape, caseStr string, mem memShape) error {
|
||||||
|
regShape, err := op.regShape(mem)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if regShape == "v01load" {
|
||||||
|
regShape = "vload"
|
||||||
|
}
|
||||||
|
if shapeOut == OneVregOutAtIn {
|
||||||
|
regShape += "ResultInArg0"
|
||||||
|
}
|
||||||
|
if shapeIn == OneImmIn || shapeIn == OneKmaskImmIn {
|
||||||
|
regShape += "Imm8"
|
||||||
|
}
|
||||||
|
regShape, err = rewriteVecAsScalarRegInfo(op, regShape)
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if _, ok := regInfoSet[regShape]; !ok {
|
||||||
|
allUnseen[regShape] = append(allUnseen[regShape], op)
|
||||||
|
allUnseenCaseStr[regShape] = append(allUnseenCaseStr[regShape], caseStr)
|
||||||
|
}
|
||||||
|
regInfoSet[regShape] = append(regInfoSet[regShape], caseStr)
|
||||||
|
if mem == NoMem && op.hasMaskedMerging(maskType, shapeOut) {
|
||||||
|
regShapeMerging := regShape
|
||||||
|
if shapeOut != OneVregOutAtIn {
|
||||||
|
// We have to copy the slice here becasue the sort will be visible from other
|
||||||
|
// aliases when no reslicing is happening.
|
||||||
|
newIn := make([]Operand, len(op.In), len(op.In)+1)
|
||||||
|
copy(newIn, op.In)
|
||||||
|
op.In = newIn
|
||||||
|
op.In = append(op.In, op.Out[0])
|
||||||
|
op.sortOperand()
|
||||||
|
regShapeMerging, err = op.regShape(mem)
|
||||||
|
regShapeMerging += "ResultInArg0"
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if _, ok := regInfoSet[regShapeMerging]; !ok {
|
||||||
|
allUnseen[regShapeMerging] = append(allUnseen[regShapeMerging], op)
|
||||||
|
allUnseenCaseStr[regShapeMerging] = append(allUnseenCaseStr[regShapeMerging], caseStr+"Merging")
|
||||||
|
}
|
||||||
|
regInfoSet[regShapeMerging] = append(regInfoSet[regShapeMerging], caseStr+"Merging")
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
for _, op := range ops {
|
||||||
|
shapeIn, shapeOut, maskType, _, gOp := op.shape()
|
||||||
|
asm := machineOpName(maskType, gOp)
|
||||||
|
if _, ok := seen[asm]; ok {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
seen[asm] = struct{}{}
|
||||||
|
caseStr := fmt.Sprintf("ssa.OpAMD64%s", asm)
|
||||||
|
isZeroMasking := false
|
||||||
|
if shapeIn == OneKmaskIn || shapeIn == OneKmaskImmIn {
|
||||||
|
if gOp.Zeroing == nil || *gOp.Zeroing {
|
||||||
|
ZeroingMask = append(ZeroingMask, caseStr)
|
||||||
|
isZeroMasking = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err := classifyOp(op, maskType, shapeIn, shapeOut, caseStr, NoMem); err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
if op.MemFeatures != nil && *op.MemFeatures == "vbcst" {
|
||||||
|
// Make a full vec memory variant
|
||||||
|
op = rewriteLastVregToMem(op)
|
||||||
|
// Ignore the error
|
||||||
|
// an error could be triggered by [checkVecAsScalar].
|
||||||
|
// TODO: make [checkVecAsScalar] aware of mem ops.
|
||||||
|
if err := classifyOp(op, maskType, shapeIn, shapeOut, caseStr+"load", VregMemIn); err != nil {
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("Seen error: %e", err)
|
||||||
|
}
|
||||||
|
} else if isZeroMasking {
|
||||||
|
ZeroingMask = append(ZeroingMask, caseStr+"load")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if len(allUnseen) != 0 {
|
||||||
|
allKeys := make([]string, 0)
|
||||||
|
for k := range allUnseen {
|
||||||
|
allKeys = append(allKeys, k)
|
||||||
|
}
|
||||||
|
panic(fmt.Errorf("unsupported register constraint for prog, please update gen_simdssa.go and amd64/ssa.go: %+v\nAll keys: %v\n, cases: %v\n", allUnseen, allKeys, allUnseenCaseStr))
|
||||||
|
}
|
||||||
|
|
||||||
|
buffer := new(bytes.Buffer)
|
||||||
|
|
||||||
|
if err := ssaTemplates.ExecuteTemplate(buffer, "header", nil); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute header template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, regShape := range regInfoKeys {
|
||||||
|
// Stable traversal of regInfoSet
|
||||||
|
cases := regInfoSet[regShape]
|
||||||
|
if len(cases) == 0 {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
data := tplSSAData{
|
||||||
|
Cases: strings.Join(cases, ",\n\t\t"),
|
||||||
|
Helper: "simd" + capitalizeFirst(regShape),
|
||||||
|
}
|
||||||
|
if err := ssaTemplates.ExecuteTemplate(buffer, "case", data); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute case template for %s: %w", regShape, err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := ssaTemplates.ExecuteTemplate(buffer, "footer", nil); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute footer template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(ZeroingMask) != 0 {
|
||||||
|
if err := ssaTemplates.ExecuteTemplate(buffer, "zeroing", strings.Join(ZeroingMask, ",\n\t\t")); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute footer template: %w", err))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := ssaTemplates.ExecuteTemplate(buffer, "ending", nil); err != nil {
|
||||||
|
panic(fmt.Errorf("failed to execute footer template: %w", err))
|
||||||
|
}
|
||||||
|
|
||||||
|
return buffer
|
||||||
|
}
|
||||||
830
src/simd/_gen/simdgen/gen_utility.go
Normal file
830
src/simd/_gen/simdgen/gen_utility.go
Normal file
|
|
@ -0,0 +1,830 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"bufio"
|
||||||
|
"bytes"
|
||||||
|
"fmt"
|
||||||
|
"go/format"
|
||||||
|
"log"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"reflect"
|
||||||
|
"slices"
|
||||||
|
"sort"
|
||||||
|
"strings"
|
||||||
|
"text/template"
|
||||||
|
"unicode"
|
||||||
|
)
|
||||||
|
|
||||||
|
func templateOf(temp, name string) *template.Template {
|
||||||
|
t, err := template.New(name).Parse(temp)
|
||||||
|
if err != nil {
|
||||||
|
panic(fmt.Errorf("failed to parse template %s: %w", name, err))
|
||||||
|
}
|
||||||
|
return t
|
||||||
|
}
|
||||||
|
|
||||||
|
func createPath(goroot string, file string) (*os.File, error) {
|
||||||
|
fp := filepath.Join(goroot, file)
|
||||||
|
dir := filepath.Dir(fp)
|
||||||
|
err := os.MkdirAll(dir, 0755)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create directory %s: %w", dir, err)
|
||||||
|
}
|
||||||
|
f, err := os.Create(fp)
|
||||||
|
if err != nil {
|
||||||
|
return nil, fmt.Errorf("failed to create file %s: %w", fp, err)
|
||||||
|
}
|
||||||
|
return f, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func formatWriteAndClose(out *bytes.Buffer, goroot string, file string) {
|
||||||
|
b, err := format.Source(out.Bytes())
|
||||||
|
if err != nil {
|
||||||
|
fmt.Fprintf(os.Stderr, "%v\n", err)
|
||||||
|
fmt.Fprintf(os.Stderr, "%s\n", numberLines(out.Bytes()))
|
||||||
|
fmt.Fprintf(os.Stderr, "%v\n", err)
|
||||||
|
panic(err)
|
||||||
|
} else {
|
||||||
|
writeAndClose(b, goroot, file)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func writeAndClose(b []byte, goroot string, file string) {
|
||||||
|
ofile, err := createPath(goroot, file)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
ofile.Write(b)
|
||||||
|
ofile.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// numberLines takes a slice of bytes, and returns a string where each line
|
||||||
|
// is numbered, starting from 1.
|
||||||
|
func numberLines(data []byte) string {
|
||||||
|
var buf bytes.Buffer
|
||||||
|
r := bytes.NewReader(data)
|
||||||
|
s := bufio.NewScanner(r)
|
||||||
|
for i := 1; s.Scan(); i++ {
|
||||||
|
fmt.Fprintf(&buf, "%d: %s\n", i, s.Text())
|
||||||
|
}
|
||||||
|
return buf.String()
|
||||||
|
}
|
||||||
|
|
||||||
|
type inShape uint8
|
||||||
|
type outShape uint8
|
||||||
|
type maskShape uint8
|
||||||
|
type immShape uint8
|
||||||
|
type memShape uint8
|
||||||
|
|
||||||
|
const (
|
||||||
|
InvalidIn inShape = iota
|
||||||
|
PureVregIn // vector register input only
|
||||||
|
OneKmaskIn // vector and kmask input
|
||||||
|
OneImmIn // vector and immediate input
|
||||||
|
OneKmaskImmIn // vector, kmask, and immediate inputs
|
||||||
|
PureKmaskIn // only mask inputs.
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
InvalidOut outShape = iota
|
||||||
|
NoOut // no output
|
||||||
|
OneVregOut // (one) vector register output
|
||||||
|
OneGregOut // (one) general register output
|
||||||
|
OneKmaskOut // mask output
|
||||||
|
OneVregOutAtIn // the first input is also the output
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
InvalidMask maskShape = iota
|
||||||
|
NoMask // no mask
|
||||||
|
OneMask // with mask (K1 to K7)
|
||||||
|
AllMasks // a K mask instruction (K0-K7)
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
InvalidImm immShape = iota
|
||||||
|
NoImm // no immediate
|
||||||
|
ConstImm // const only immediate
|
||||||
|
VarImm // pure imm argument provided by the users
|
||||||
|
ConstVarImm // a combination of user arg and const
|
||||||
|
)
|
||||||
|
|
||||||
|
const (
|
||||||
|
InvalidMem memShape = iota
|
||||||
|
NoMem
|
||||||
|
VregMemIn // The instruction contains a mem input which is loading a vreg.
|
||||||
|
)
|
||||||
|
|
||||||
|
// opShape returns the several integers describing the shape of the operation,
|
||||||
|
// and modified versions of the op:
|
||||||
|
//
|
||||||
|
// opNoImm is op with its inputs excluding the const imm.
|
||||||
|
//
|
||||||
|
// This function does not modify op.
|
||||||
|
func (op *Operation) shape() (shapeIn inShape, shapeOut outShape, maskType maskShape, immType immShape,
|
||||||
|
opNoImm Operation) {
|
||||||
|
if len(op.Out) > 1 {
|
||||||
|
panic(fmt.Errorf("simdgen only supports 1 output: %s", op))
|
||||||
|
}
|
||||||
|
var outputReg int
|
||||||
|
if len(op.Out) == 1 {
|
||||||
|
outputReg = op.Out[0].AsmPos
|
||||||
|
if op.Out[0].Class == "vreg" {
|
||||||
|
shapeOut = OneVregOut
|
||||||
|
} else if op.Out[0].Class == "greg" {
|
||||||
|
shapeOut = OneGregOut
|
||||||
|
} else if op.Out[0].Class == "mask" {
|
||||||
|
shapeOut = OneKmaskOut
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen only supports output of class vreg or mask: %s", op))
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
shapeOut = NoOut
|
||||||
|
// TODO: are these only Load/Stores?
|
||||||
|
// We manually supported two Load and Store, are those enough?
|
||||||
|
panic(fmt.Errorf("simdgen only supports 1 output: %s", op))
|
||||||
|
}
|
||||||
|
hasImm := false
|
||||||
|
maskCount := 0
|
||||||
|
hasVreg := false
|
||||||
|
for _, in := range op.In {
|
||||||
|
if in.AsmPos == outputReg {
|
||||||
|
if shapeOut != OneVregOutAtIn && in.AsmPos == 0 && in.Class == "vreg" {
|
||||||
|
shapeOut = OneVregOutAtIn
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen only support output and input sharing the same position case of \"the first input is vreg and the only output\": %s", op))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if in.Class == "immediate" {
|
||||||
|
// A manual check on XED data found that AMD64 SIMD instructions at most
|
||||||
|
// have 1 immediates. So we don't need to check this here.
|
||||||
|
if *in.Bits != 8 {
|
||||||
|
panic(fmt.Errorf("simdgen only supports immediates of 8 bits: %s", op))
|
||||||
|
}
|
||||||
|
hasImm = true
|
||||||
|
} else if in.Class == "mask" {
|
||||||
|
maskCount++
|
||||||
|
} else {
|
||||||
|
hasVreg = true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
opNoImm = *op
|
||||||
|
|
||||||
|
removeImm := func(o *Operation) {
|
||||||
|
o.In = o.In[1:]
|
||||||
|
}
|
||||||
|
if hasImm {
|
||||||
|
removeImm(&opNoImm)
|
||||||
|
if op.In[0].Const != nil {
|
||||||
|
if op.In[0].ImmOffset != nil {
|
||||||
|
immType = ConstVarImm
|
||||||
|
} else {
|
||||||
|
immType = ConstImm
|
||||||
|
}
|
||||||
|
} else if op.In[0].ImmOffset != nil {
|
||||||
|
immType = VarImm
|
||||||
|
} else {
|
||||||
|
panic(fmt.Errorf("simdgen requires imm to have at least one of ImmOffset or Const set: %s", op))
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
immType = NoImm
|
||||||
|
}
|
||||||
|
if maskCount == 0 {
|
||||||
|
maskType = NoMask
|
||||||
|
} else {
|
||||||
|
maskType = OneMask
|
||||||
|
}
|
||||||
|
checkPureMask := func() bool {
|
||||||
|
if hasImm {
|
||||||
|
panic(fmt.Errorf("simdgen does not support immediates in pure mask operations: %s", op))
|
||||||
|
}
|
||||||
|
if hasVreg {
|
||||||
|
panic(fmt.Errorf("simdgen does not support more than 1 masks in non-pure mask operations: %s", op))
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if !hasImm && maskCount == 0 {
|
||||||
|
shapeIn = PureVregIn
|
||||||
|
} else if !hasImm && maskCount > 0 {
|
||||||
|
if maskCount == 1 {
|
||||||
|
shapeIn = OneKmaskIn
|
||||||
|
} else {
|
||||||
|
if checkPureMask() {
|
||||||
|
return
|
||||||
|
}
|
||||||
|
shapeIn = PureKmaskIn
|
||||||
|
maskType = AllMasks
|
||||||
|
}
|
||||||
|
} else if hasImm && maskCount == 0 {
|
||||||
|
shapeIn = OneImmIn
|
||||||
|
} else {
|
||||||
|
if maskCount == 1 {
|
||||||
|
shapeIn = OneKmaskImmIn
|
||||||
|
} else {
|
||||||
|
checkPureMask()
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
// regShape returns a string representation of the register shape.
|
||||||
|
func (op *Operation) regShape(mem memShape) (string, error) {
|
||||||
|
_, _, _, _, gOp := op.shape()
|
||||||
|
var regInfo, fixedName string
|
||||||
|
var vRegInCnt, gRegInCnt, kMaskInCnt, vRegOutCnt, gRegOutCnt, kMaskOutCnt, memInCnt, memOutCnt int
|
||||||
|
for i, in := range gOp.In {
|
||||||
|
switch in.Class {
|
||||||
|
case "vreg":
|
||||||
|
vRegInCnt++
|
||||||
|
case "greg":
|
||||||
|
gRegInCnt++
|
||||||
|
case "mask":
|
||||||
|
kMaskInCnt++
|
||||||
|
case "memory":
|
||||||
|
if mem != VregMemIn {
|
||||||
|
panic("simdgen only knows VregMemIn in regShape")
|
||||||
|
}
|
||||||
|
memInCnt++
|
||||||
|
vRegInCnt++
|
||||||
|
}
|
||||||
|
if in.FixedReg != nil {
|
||||||
|
fixedName = fmt.Sprintf("%sAtIn%d", *in.FixedReg, i)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for i, out := range gOp.Out {
|
||||||
|
// If class overwrite is happening, that's not really a mask but a vreg.
|
||||||
|
if out.Class == "vreg" || out.OverwriteClass != nil {
|
||||||
|
vRegOutCnt++
|
||||||
|
} else if out.Class == "greg" {
|
||||||
|
gRegOutCnt++
|
||||||
|
} else if out.Class == "mask" {
|
||||||
|
kMaskOutCnt++
|
||||||
|
} else if out.Class == "memory" {
|
||||||
|
if mem != VregMemIn {
|
||||||
|
panic("simdgen only knows VregMemIn in regShape")
|
||||||
|
}
|
||||||
|
vRegOutCnt++
|
||||||
|
memOutCnt++
|
||||||
|
}
|
||||||
|
if out.FixedReg != nil {
|
||||||
|
fixedName = fmt.Sprintf("%sAtIn%d", *out.FixedReg, i)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
var inRegs, inMasks, outRegs, outMasks string
|
||||||
|
|
||||||
|
rmAbbrev := func(s string, i int) string {
|
||||||
|
if i == 0 {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
if i == 1 {
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%s%d", s, i)
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
inRegs = rmAbbrev("v", vRegInCnt)
|
||||||
|
inRegs += rmAbbrev("gp", gRegInCnt)
|
||||||
|
inMasks = rmAbbrev("k", kMaskInCnt)
|
||||||
|
|
||||||
|
outRegs = rmAbbrev("v", vRegOutCnt)
|
||||||
|
outRegs += rmAbbrev("gp", gRegOutCnt)
|
||||||
|
outMasks = rmAbbrev("k", kMaskOutCnt)
|
||||||
|
|
||||||
|
if kMaskInCnt == 0 && kMaskOutCnt == 0 && gRegInCnt == 0 && gRegOutCnt == 0 {
|
||||||
|
// For pure v we can abbreviate it as v%d%d.
|
||||||
|
regInfo = fmt.Sprintf("v%d%d", vRegInCnt, vRegOutCnt)
|
||||||
|
} else if kMaskInCnt == 0 && kMaskOutCnt == 0 {
|
||||||
|
regInfo = fmt.Sprintf("%s%s", inRegs, outRegs)
|
||||||
|
} else {
|
||||||
|
regInfo = fmt.Sprintf("%s%s%s%s", inRegs, inMasks, outRegs, outMasks)
|
||||||
|
}
|
||||||
|
if memInCnt > 0 {
|
||||||
|
if memInCnt == 1 {
|
||||||
|
regInfo += "load"
|
||||||
|
} else {
|
||||||
|
panic("simdgen does not understand more than 1 mem op as of now")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if memOutCnt > 0 {
|
||||||
|
panic("simdgen does not understand memory as output as of now")
|
||||||
|
}
|
||||||
|
regInfo += fixedName
|
||||||
|
return regInfo, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// sortOperand sorts op.In by putting immediates first, then vreg, and mask the last.
|
||||||
|
// TODO: verify that this is a safe assumption of the prog structure.
|
||||||
|
// from my observation looks like in asm, imms are always the first,
|
||||||
|
// masks are always the last, with vreg in between.
|
||||||
|
func (op *Operation) sortOperand() {
|
||||||
|
priority := map[string]int{"immediate": 0, "vreg": 1, "greg": 1, "mask": 2}
|
||||||
|
sort.SliceStable(op.In, func(i, j int) bool {
|
||||||
|
pi := priority[op.In[i].Class]
|
||||||
|
pj := priority[op.In[j].Class]
|
||||||
|
if pi != pj {
|
||||||
|
return pi < pj
|
||||||
|
}
|
||||||
|
return op.In[i].AsmPos < op.In[j].AsmPos
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
// goNormalType returns the Go type name for the result of an Op that
|
||||||
|
// does not return a vector, i.e., that returns a result in a general
|
||||||
|
// register. Currently there's only one family of Ops in Go's simd library
|
||||||
|
// that does this (GetElem), and so this is specialized to work for that,
|
||||||
|
// but the problem (mismatch betwen hardware register width and Go type
|
||||||
|
// width) seems likely to recur if there are any other cases.
|
||||||
|
func (op Operation) goNormalType() string {
|
||||||
|
if op.Go == "GetElem" {
|
||||||
|
// GetElem returns an element of the vector into a general register
|
||||||
|
// but as far as the hardware is concerned, that result is either 32
|
||||||
|
// or 64 bits wide, no matter what the vector element width is.
|
||||||
|
// This is not "wrong" but it is not the right answer for Go source code.
|
||||||
|
// To get the Go type right, combine the base type ("int", "uint", "float"),
|
||||||
|
// with the input vector element width in bits (8,16,32,64).
|
||||||
|
|
||||||
|
at := 0 // proper value of at depends on whether immediate was stripped or not
|
||||||
|
if op.In[at].Class == "immediate" {
|
||||||
|
at++
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("%s%d", *op.Out[0].Base, *op.In[at].ElemBits)
|
||||||
|
}
|
||||||
|
panic(fmt.Errorf("Implement goNormalType for %v", op))
|
||||||
|
}
|
||||||
|
|
||||||
|
// SSAType returns the string for the type reference in SSA generation,
|
||||||
|
// for example in the intrinsics generating template.
|
||||||
|
func (op Operation) SSAType() string {
|
||||||
|
if op.Out[0].Class == "greg" {
|
||||||
|
return fmt.Sprintf("types.Types[types.T%s]", strings.ToUpper(op.goNormalType()))
|
||||||
|
}
|
||||||
|
return fmt.Sprintf("types.TypeVec%d", *op.Out[0].Bits)
|
||||||
|
}
|
||||||
|
|
||||||
|
// GoType returns the Go type returned by this operation (relative to the simd package),
|
||||||
|
// for example "int32" or "Int8x16". This is used in a template.
|
||||||
|
func (op Operation) GoType() string {
|
||||||
|
if op.Out[0].Class == "greg" {
|
||||||
|
return op.goNormalType()
|
||||||
|
}
|
||||||
|
return *op.Out[0].Go
|
||||||
|
}
|
||||||
|
|
||||||
|
// ImmName returns the name to use for an operation's immediate operand.
|
||||||
|
// This can be overriden in the yaml with "name" on an operand,
|
||||||
|
// otherwise, for now, "constant"
|
||||||
|
func (op Operation) ImmName() string {
|
||||||
|
return op.Op0Name("constant")
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o Operand) OpName(s string) string {
|
||||||
|
if n := o.Name; n != nil {
|
||||||
|
return *n
|
||||||
|
}
|
||||||
|
if o.Class == "mask" {
|
||||||
|
return "mask"
|
||||||
|
}
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o Operand) OpNameAndType(s string) string {
|
||||||
|
return o.OpName(s) + " " + *o.Go
|
||||||
|
}
|
||||||
|
|
||||||
|
// GoExported returns [Go] with first character capitalized.
|
||||||
|
func (op Operation) GoExported() string {
|
||||||
|
return capitalizeFirst(op.Go)
|
||||||
|
}
|
||||||
|
|
||||||
|
// DocumentationExported returns [Documentation] with method name capitalized.
|
||||||
|
func (op Operation) DocumentationExported() string {
|
||||||
|
return strings.ReplaceAll(op.Documentation, op.Go, op.GoExported())
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op0Name returns the name to use for the 0 operand,
|
||||||
|
// if any is present, otherwise the parameter is used.
|
||||||
|
func (op Operation) Op0Name(s string) string {
|
||||||
|
return op.In[0].OpName(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op1Name returns the name to use for the 1 operand,
|
||||||
|
// if any is present, otherwise the parameter is used.
|
||||||
|
func (op Operation) Op1Name(s string) string {
|
||||||
|
return op.In[1].OpName(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op2Name returns the name to use for the 2 operand,
|
||||||
|
// if any is present, otherwise the parameter is used.
|
||||||
|
func (op Operation) Op2Name(s string) string {
|
||||||
|
return op.In[2].OpName(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op3Name returns the name to use for the 3 operand,
|
||||||
|
// if any is present, otherwise the parameter is used.
|
||||||
|
func (op Operation) Op3Name(s string) string {
|
||||||
|
return op.In[3].OpName(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op0NameAndType returns the name and type to use for
|
||||||
|
// the 0 operand, if a name is provided, otherwise
|
||||||
|
// the parameter value is used as the default.
|
||||||
|
func (op Operation) Op0NameAndType(s string) string {
|
||||||
|
return op.In[0].OpNameAndType(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op1NameAndType returns the name and type to use for
|
||||||
|
// the 1 operand, if a name is provided, otherwise
|
||||||
|
// the parameter value is used as the default.
|
||||||
|
func (op Operation) Op1NameAndType(s string) string {
|
||||||
|
return op.In[1].OpNameAndType(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op2NameAndType returns the name and type to use for
|
||||||
|
// the 2 operand, if a name is provided, otherwise
|
||||||
|
// the parameter value is used as the default.
|
||||||
|
func (op Operation) Op2NameAndType(s string) string {
|
||||||
|
return op.In[2].OpNameAndType(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op3NameAndType returns the name and type to use for
|
||||||
|
// the 3 operand, if a name is provided, otherwise
|
||||||
|
// the parameter value is used as the default.
|
||||||
|
func (op Operation) Op3NameAndType(s string) string {
|
||||||
|
return op.In[3].OpNameAndType(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Op4NameAndType returns the name and type to use for
|
||||||
|
// the 4 operand, if a name is provided, otherwise
|
||||||
|
// the parameter value is used as the default.
|
||||||
|
func (op Operation) Op4NameAndType(s string) string {
|
||||||
|
return op.In[4].OpNameAndType(s)
|
||||||
|
}
|
||||||
|
|
||||||
|
var immClasses []string = []string{"BAD0Imm", "BAD1Imm", "op1Imm8", "op2Imm8", "op3Imm8", "op4Imm8"}
|
||||||
|
var classes []string = []string{"BAD0", "op1", "op2", "op3", "op4"}
|
||||||
|
|
||||||
|
// classifyOp returns a classification string, modified operation, and perhaps error based
|
||||||
|
// on the stub and intrinsic shape for the operation.
|
||||||
|
// The classification string is in the regular expression set "op[1234](Imm8)?(_<order>)?"
|
||||||
|
// where the "<order>" suffix is optionally attached to the Operation in its input yaml.
|
||||||
|
// The classification string is used to select a template or a clause of a template
|
||||||
|
// for intrinsics declaration and the ssagen intrinisics glue code in the compiler.
|
||||||
|
func classifyOp(op Operation) (string, Operation, error) {
|
||||||
|
_, _, _, immType, gOp := op.shape()
|
||||||
|
|
||||||
|
var class string
|
||||||
|
|
||||||
|
if immType == VarImm || immType == ConstVarImm {
|
||||||
|
switch l := len(op.In); l {
|
||||||
|
case 1:
|
||||||
|
return "", op, fmt.Errorf("simdgen does not recognize this operation of only immediate input: %s", op)
|
||||||
|
case 2, 3, 4, 5:
|
||||||
|
class = immClasses[l]
|
||||||
|
default:
|
||||||
|
return "", op, fmt.Errorf("simdgen does not recognize this operation of input length %d: %s", len(op.In), op)
|
||||||
|
}
|
||||||
|
if order := op.OperandOrder; order != nil {
|
||||||
|
class += "_" + *order
|
||||||
|
}
|
||||||
|
return class, op, nil
|
||||||
|
} else {
|
||||||
|
switch l := len(gOp.In); l {
|
||||||
|
case 1, 2, 3, 4:
|
||||||
|
class = classes[l]
|
||||||
|
default:
|
||||||
|
return "", op, fmt.Errorf("simdgen does not recognize this operation of input length %d: %s", len(op.In), op)
|
||||||
|
}
|
||||||
|
if order := op.OperandOrder; order != nil {
|
||||||
|
class += "_" + *order
|
||||||
|
}
|
||||||
|
return class, gOp, nil
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func checkVecAsScalar(op Operation) (idx int, err error) {
|
||||||
|
idx = -1
|
||||||
|
sSize := 0
|
||||||
|
for i, o := range op.In {
|
||||||
|
if o.TreatLikeAScalarOfSize != nil {
|
||||||
|
if idx == -1 {
|
||||||
|
idx = i
|
||||||
|
sSize = *o.TreatLikeAScalarOfSize
|
||||||
|
} else {
|
||||||
|
err = fmt.Errorf("simdgen only supports one TreatLikeAScalarOfSize in the arg list: %s", op)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if idx >= 0 {
|
||||||
|
if sSize != 8 && sSize != 16 && sSize != 32 && sSize != 64 {
|
||||||
|
err = fmt.Errorf("simdgen does not recognize this uint size: %d, %s", sSize, op)
|
||||||
|
return
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func rewriteVecAsScalarRegInfo(op Operation, regInfo string) (string, error) {
|
||||||
|
idx, err := checkVecAsScalar(op)
|
||||||
|
if err != nil {
|
||||||
|
return "", err
|
||||||
|
}
|
||||||
|
if idx != -1 {
|
||||||
|
if regInfo == "v21" {
|
||||||
|
regInfo = "vfpv"
|
||||||
|
} else if regInfo == "v2kv" {
|
||||||
|
regInfo = "vfpkv"
|
||||||
|
} else if regInfo == "v31" {
|
||||||
|
regInfo = "v2fpv"
|
||||||
|
} else if regInfo == "v3kv" {
|
||||||
|
regInfo = "v2fpkv"
|
||||||
|
} else {
|
||||||
|
return "", fmt.Errorf("simdgen does not recognize uses of treatLikeAScalarOfSize with op regShape %s in op: %s", regInfo, op)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return regInfo, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func rewriteLastVregToMem(op Operation) Operation {
|
||||||
|
newIn := make([]Operand, len(op.In))
|
||||||
|
lastVregIdx := -1
|
||||||
|
for i := range len(op.In) {
|
||||||
|
newIn[i] = op.In[i]
|
||||||
|
if op.In[i].Class == "vreg" {
|
||||||
|
lastVregIdx = i
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// vbcst operations put their mem op always as the last vreg.
|
||||||
|
if lastVregIdx == -1 {
|
||||||
|
panic("simdgen cannot find one vreg in the mem op vreg original")
|
||||||
|
}
|
||||||
|
newIn[lastVregIdx].Class = "memory"
|
||||||
|
op.In = newIn
|
||||||
|
|
||||||
|
return op
|
||||||
|
}
|
||||||
|
|
||||||
|
// dedup is deduping operations in the full structure level.
|
||||||
|
func dedup(ops []Operation) (deduped []Operation) {
|
||||||
|
for _, op := range ops {
|
||||||
|
seen := false
|
||||||
|
for _, dop := range deduped {
|
||||||
|
if reflect.DeepEqual(op, dop) {
|
||||||
|
seen = true
|
||||||
|
break
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if !seen {
|
||||||
|
deduped = append(deduped, op)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func (op Operation) GenericName() string {
|
||||||
|
if op.OperandOrder != nil {
|
||||||
|
switch *op.OperandOrder {
|
||||||
|
case "21Type1", "231Type1":
|
||||||
|
// Permute uses operand[1] for method receiver.
|
||||||
|
return op.Go + *op.In[1].Go
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if op.In[0].Class == "immediate" {
|
||||||
|
return op.Go + *op.In[1].Go
|
||||||
|
}
|
||||||
|
return op.Go + *op.In[0].Go
|
||||||
|
}
|
||||||
|
|
||||||
|
// dedupGodef is deduping operations in [Op.Go]+[*Op.In[0].Go] level.
|
||||||
|
// By deduping, it means picking the least advanced architecture that satisfy the requirement:
|
||||||
|
// AVX512 will be least preferred.
|
||||||
|
// If FlagNoDedup is set, it will report the duplicates to the console.
|
||||||
|
func dedupGodef(ops []Operation) ([]Operation, error) {
|
||||||
|
seen := map[string][]Operation{}
|
||||||
|
for _, op := range ops {
|
||||||
|
_, _, _, _, gOp := op.shape()
|
||||||
|
|
||||||
|
gN := gOp.GenericName()
|
||||||
|
seen[gN] = append(seen[gN], op)
|
||||||
|
}
|
||||||
|
if *FlagReportDup {
|
||||||
|
for gName, dup := range seen {
|
||||||
|
if len(dup) > 1 {
|
||||||
|
log.Printf("Duplicate for %s:\n", gName)
|
||||||
|
for _, op := range dup {
|
||||||
|
log.Printf("%s\n", op)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return ops, nil
|
||||||
|
}
|
||||||
|
isAVX512 := func(op Operation) bool {
|
||||||
|
return strings.Contains(op.CPUFeature, "AVX512")
|
||||||
|
}
|
||||||
|
deduped := []Operation{}
|
||||||
|
for _, dup := range seen {
|
||||||
|
if len(dup) > 1 {
|
||||||
|
slices.SortFunc(dup, func(i, j Operation) int {
|
||||||
|
// Put non-AVX512 candidates at the beginning
|
||||||
|
if !isAVX512(i) && isAVX512(j) {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if isAVX512(i) && !isAVX512(j) {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
if i.CPUFeature != j.CPUFeature {
|
||||||
|
return strings.Compare(i.CPUFeature, j.CPUFeature)
|
||||||
|
}
|
||||||
|
// Weirdly Intel sometimes has duplicated definitions for the same instruction,
|
||||||
|
// this confuses the XED mem-op merge logic: [MemFeature] will only be attached to an instruction
|
||||||
|
// for only once, which means that for essentially duplicated instructions only one will have the
|
||||||
|
// proper [MemFeature] set. We have to make this sort deterministic for [MemFeature].
|
||||||
|
if i.MemFeatures != nil && j.MemFeatures == nil {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if i.MemFeatures == nil && j.MemFeatures != nil {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
// Their order does not matter anymore, at least for now.
|
||||||
|
return 0
|
||||||
|
})
|
||||||
|
}
|
||||||
|
deduped = append(deduped, dup[0])
|
||||||
|
}
|
||||||
|
slices.SortFunc(deduped, compareOperations)
|
||||||
|
return deduped, nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy op.ConstImm to op.In[0].Const
|
||||||
|
// This is a hack to reduce the size of defs we need for const imm operations.
|
||||||
|
func copyConstImm(ops []Operation) error {
|
||||||
|
for _, op := range ops {
|
||||||
|
if op.ConstImm == nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
_, _, _, immType, _ := op.shape()
|
||||||
|
|
||||||
|
if immType == ConstImm || immType == ConstVarImm {
|
||||||
|
op.In[0].Const = op.ConstImm
|
||||||
|
}
|
||||||
|
// Otherwise, just not port it - e.g. {VPCMP[BWDQ] imm=0} and {VPCMPEQ[BWDQ]} are
|
||||||
|
// the same operations "Equal", [dedupgodef] should be able to distinguish them.
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func capitalizeFirst(s string) string {
|
||||||
|
if s == "" {
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
// Convert the string to a slice of runes to handle multi-byte characters correctly.
|
||||||
|
r := []rune(s)
|
||||||
|
r[0] = unicode.ToUpper(r[0])
|
||||||
|
return string(r)
|
||||||
|
}
|
||||||
|
|
||||||
|
// overwrite corrects some errors due to:
|
||||||
|
// - The XED data is wrong
|
||||||
|
// - Go's SIMD API requirement, for example AVX2 compares should also produce masks.
|
||||||
|
// This rewrite has strict constraints, please see the error message.
|
||||||
|
// These constraints are also explointed in [writeSIMDRules], [writeSIMDMachineOps]
|
||||||
|
// and [writeSIMDSSA], please be careful when updating these constraints.
|
||||||
|
func overwrite(ops []Operation) error {
|
||||||
|
hasClassOverwrite := false
|
||||||
|
overwrite := func(op []Operand, idx int, o Operation) error {
|
||||||
|
if op[idx].OverwriteElementBits != nil {
|
||||||
|
if op[idx].ElemBits == nil {
|
||||||
|
panic(fmt.Errorf("ElemBits is nil at operand %d of %v", idx, o))
|
||||||
|
}
|
||||||
|
*op[idx].ElemBits = *op[idx].OverwriteElementBits
|
||||||
|
*op[idx].Lanes = *op[idx].Bits / *op[idx].ElemBits
|
||||||
|
*op[idx].Go = fmt.Sprintf("%s%dx%d", capitalizeFirst(*op[idx].Base), *op[idx].ElemBits, *op[idx].Lanes)
|
||||||
|
}
|
||||||
|
if op[idx].OverwriteClass != nil {
|
||||||
|
if op[idx].OverwriteBase == nil {
|
||||||
|
panic(fmt.Errorf("simdgen: [OverwriteClass] must be set together with [OverwriteBase]: %s", op[idx]))
|
||||||
|
}
|
||||||
|
oBase := *op[idx].OverwriteBase
|
||||||
|
oClass := *op[idx].OverwriteClass
|
||||||
|
if oClass != "mask" {
|
||||||
|
panic(fmt.Errorf("simdgen: [Class] overwrite only supports overwritting to mask: %s", op[idx]))
|
||||||
|
}
|
||||||
|
if oBase != "int" {
|
||||||
|
panic(fmt.Errorf("simdgen: [Class] overwrite must set [OverwriteBase] to int: %s", op[idx]))
|
||||||
|
}
|
||||||
|
if op[idx].Class != "vreg" {
|
||||||
|
panic(fmt.Errorf("simdgen: [Class] overwrite must be overwriting [Class] from vreg: %s", op[idx]))
|
||||||
|
}
|
||||||
|
hasClassOverwrite = true
|
||||||
|
*op[idx].Base = oBase
|
||||||
|
op[idx].Class = oClass
|
||||||
|
*op[idx].Go = fmt.Sprintf("Mask%dx%d", *op[idx].ElemBits, *op[idx].Lanes)
|
||||||
|
} else if op[idx].OverwriteBase != nil {
|
||||||
|
oBase := *op[idx].OverwriteBase
|
||||||
|
*op[idx].Go = strings.ReplaceAll(*op[idx].Go, capitalizeFirst(*op[idx].Base), capitalizeFirst(oBase))
|
||||||
|
if op[idx].Class == "greg" {
|
||||||
|
*op[idx].Go = strings.ReplaceAll(*op[idx].Go, *op[idx].Base, oBase)
|
||||||
|
}
|
||||||
|
*op[idx].Base = oBase
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
for i, o := range ops {
|
||||||
|
hasClassOverwrite = false
|
||||||
|
for j := range ops[i].In {
|
||||||
|
if err := overwrite(ops[i].In, j, o); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if hasClassOverwrite {
|
||||||
|
return fmt.Errorf("simdgen does not support [OverwriteClass] in inputs: %s", ops[i])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for j := range ops[i].Out {
|
||||||
|
if err := overwrite(ops[i].Out, j, o); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if hasClassOverwrite {
|
||||||
|
for _, in := range ops[i].In {
|
||||||
|
if in.Class == "mask" {
|
||||||
|
return fmt.Errorf("simdgen only supports [OverwriteClass] for operations without mask inputs")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// reportXEDInconsistency reports potential XED inconsistencies.
|
||||||
|
// We can add more fields to [Operation] to enable more checks and implement it here.
|
||||||
|
// Supported checks:
|
||||||
|
// [NameAndSizeCheck]: NAME[BWDQ] should set the elemBits accordingly.
|
||||||
|
// This check is useful to find inconsistencies, then we can add overwrite fields to
|
||||||
|
// those defs to correct them manually.
|
||||||
|
func reportXEDInconsistency(ops []Operation) error {
|
||||||
|
for _, o := range ops {
|
||||||
|
if o.NameAndSizeCheck != nil {
|
||||||
|
suffixSizeMap := map[byte]int{'B': 8, 'W': 16, 'D': 32, 'Q': 64}
|
||||||
|
checkOperand := func(opr Operand) error {
|
||||||
|
if opr.ElemBits == nil {
|
||||||
|
return fmt.Errorf("simdgen expects elemBits to be set when performing NameAndSizeCheck")
|
||||||
|
}
|
||||||
|
if v, ok := suffixSizeMap[o.Asm[len(o.Asm)-1]]; !ok {
|
||||||
|
return fmt.Errorf("simdgen expects asm to end with [BWDQ] when performing NameAndSizeCheck")
|
||||||
|
} else {
|
||||||
|
if v != *opr.ElemBits {
|
||||||
|
return fmt.Errorf("simdgen finds NameAndSizeCheck inconsistency in def: %s", o)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
for _, in := range o.In {
|
||||||
|
if in.Class != "vreg" && in.Class != "mask" {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if in.TreatLikeAScalarOfSize != nil {
|
||||||
|
// This is an irregular operand, don't check it.
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if err := checkOperand(in); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
for _, out := range o.Out {
|
||||||
|
if err := checkOperand(out); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o *Operation) hasMaskedMerging(maskType maskShape, outType outShape) bool {
|
||||||
|
// BLEND and VMOVDQU are not user-facing ops so we should filter them out.
|
||||||
|
return o.OperandOrder == nil && o.SpecialLower == nil && maskType == OneMask && outType == OneVregOut &&
|
||||||
|
len(o.InVariant) == 1 && !strings.Contains(o.Asm, "BLEND") && !strings.Contains(o.Asm, "VMOVDQU")
|
||||||
|
}
|
||||||
|
|
||||||
|
func getVbcstData(s string) (feat1Match, feat2Match string) {
|
||||||
|
_, err := fmt.Sscanf(s, "feat1=%[^;];feat2=%s", &feat1Match, &feat2Match)
|
||||||
|
if err != nil {
|
||||||
|
panic(err)
|
||||||
|
}
|
||||||
|
return
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o Operation) String() string {
|
||||||
|
return pprints(o)
|
||||||
|
}
|
||||||
|
|
||||||
|
func (op Operand) String() string {
|
||||||
|
return pprints(op)
|
||||||
|
}
|
||||||
1
src/simd/_gen/simdgen/go.yaml
Normal file
1
src/simd/_gen/simdgen/go.yaml
Normal file
|
|
@ -0,0 +1 @@
|
||||||
|
!import ops/*/go.yaml
|
||||||
438
src/simd/_gen/simdgen/godefs.go
Normal file
438
src/simd/_gen/simdgen/godefs.go
Normal file
|
|
@ -0,0 +1,438 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
package main
|
||||||
|
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"regexp"
|
||||||
|
"slices"
|
||||||
|
"strconv"
|
||||||
|
"strings"
|
||||||
|
"unicode"
|
||||||
|
|
||||||
|
"simd/_gen/unify"
|
||||||
|
)
|
||||||
|
|
||||||
|
type Operation struct {
|
||||||
|
rawOperation
|
||||||
|
|
||||||
|
// Go is the Go method name of this operation.
|
||||||
|
//
|
||||||
|
// It is derived from the raw Go method name by adding optional suffixes.
|
||||||
|
// Currently, "Masked" is the only suffix.
|
||||||
|
Go string
|
||||||
|
|
||||||
|
// Documentation is the doc string for this API.
|
||||||
|
//
|
||||||
|
// It is computed from the raw documentation:
|
||||||
|
//
|
||||||
|
// - "NAME" is replaced by the Go method name.
|
||||||
|
//
|
||||||
|
// - For masked operation, a sentence about masking is added.
|
||||||
|
Documentation string
|
||||||
|
|
||||||
|
// In is the sequence of parameters to the Go method.
|
||||||
|
//
|
||||||
|
// For masked operations, this will have the mask operand appended.
|
||||||
|
In []Operand
|
||||||
|
}
|
||||||
|
|
||||||
|
// rawOperation is the unifier representation of an [Operation]. It is
|
||||||
|
// translated into a more parsed form after unifier decoding.
|
||||||
|
type rawOperation struct {
|
||||||
|
Go string // Base Go method name
|
||||||
|
|
||||||
|
GoArch string // GOARCH for this definition
|
||||||
|
Asm string // Assembly mnemonic
|
||||||
|
OperandOrder *string // optional Operand order for better Go declarations
|
||||||
|
// Optional tag to indicate this operation is paired with special generic->machine ssa lowering rules.
|
||||||
|
// Should be paired with special templates in gen_simdrules.go
|
||||||
|
SpecialLower *string
|
||||||
|
|
||||||
|
In []Operand // Parameters
|
||||||
|
InVariant []Operand // Optional parameters
|
||||||
|
Out []Operand // Results
|
||||||
|
MemFeatures *string // The memory operand feature this operation supports
|
||||||
|
MemFeaturesData *string // Additional data associated with MemFeatures
|
||||||
|
Commutative bool // Commutativity
|
||||||
|
CPUFeature string // CPUID/Has* feature name
|
||||||
|
Zeroing *bool // nil => use asm suffix ".Z"; false => do not use asm suffix ".Z"
|
||||||
|
Documentation *string // Documentation will be appended to the stubs comments.
|
||||||
|
AddDoc *string // Additional doc to be appended.
|
||||||
|
// ConstMask is a hack to reduce the size of defs the user writes for const-immediate
|
||||||
|
// If present, it will be copied to [In[0].Const].
|
||||||
|
ConstImm *string
|
||||||
|
// NameAndSizeCheck is used to check [BWDQ] maps to (8|16|32|64) elemBits.
|
||||||
|
NameAndSizeCheck *bool
|
||||||
|
// If non-nil, all generation in gen_simdTypes.go and gen_intrinsics will be skipped.
|
||||||
|
NoTypes *string
|
||||||
|
// If non-nil, all generation in gen_simdGenericOps and gen_simdrules will be skipped.
|
||||||
|
NoGenericOps *string
|
||||||
|
// If non-nil, this string will be attached to the machine ssa op name. E.g. "const"
|
||||||
|
SSAVariant *string
|
||||||
|
// If true, do not emit method declarations, generic ops, or intrinsics for masked variants
|
||||||
|
// DO emit the architecture-specific opcodes and optimizations.
|
||||||
|
HideMaskMethods *bool
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o *Operation) IsMasked() bool {
|
||||||
|
if len(o.InVariant) == 0 {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if len(o.InVariant) == 1 && o.InVariant[0].Class == "mask" {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
panic(fmt.Errorf("unknown inVariant"))
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o *Operation) SkipMaskedMethod() bool {
|
||||||
|
if o.HideMaskMethods == nil {
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
if *o.HideMaskMethods && o.IsMasked() {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
var reForName = regexp.MustCompile(`\bNAME\b`)
|
||||||
|
|
||||||
|
func (o *Operation) DecodeUnified(v *unify.Value) error {
|
||||||
|
if err := v.Decode(&o.rawOperation); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
|
||||||
|
isMasked := o.IsMasked()
|
||||||
|
|
||||||
|
// Compute full Go method name.
|
||||||
|
o.Go = o.rawOperation.Go
|
||||||
|
if isMasked {
|
||||||
|
o.Go += "Masked"
|
||||||
|
}
|
||||||
|
|
||||||
|
// Compute doc string.
|
||||||
|
if o.rawOperation.Documentation != nil {
|
||||||
|
o.Documentation = *o.rawOperation.Documentation
|
||||||
|
} else {
|
||||||
|
o.Documentation = "// UNDOCUMENTED"
|
||||||
|
}
|
||||||
|
o.Documentation = reForName.ReplaceAllString(o.Documentation, o.Go)
|
||||||
|
if isMasked {
|
||||||
|
o.Documentation += "\n//\n// This operation is applied selectively under a write mask."
|
||||||
|
// Suppress generic op and method declaration for exported methods, if a mask is present.
|
||||||
|
if unicode.IsUpper([]rune(o.Go)[0]) {
|
||||||
|
trueVal := "true"
|
||||||
|
o.NoGenericOps = &trueVal
|
||||||
|
o.NoTypes = &trueVal
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if o.rawOperation.AddDoc != nil {
|
||||||
|
o.Documentation += "\n" + reForName.ReplaceAllString(*o.rawOperation.AddDoc, o.Go)
|
||||||
|
}
|
||||||
|
|
||||||
|
o.In = append(o.rawOperation.In, o.rawOperation.InVariant...)
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
func (o *Operation) VectorWidth() int {
|
||||||
|
out := o.Out[0]
|
||||||
|
if out.Class == "vreg" {
|
||||||
|
return *out.Bits
|
||||||
|
} else if out.Class == "greg" || out.Class == "mask" {
|
||||||
|
for i := range o.In {
|
||||||
|
if o.In[i].Class == "vreg" {
|
||||||
|
return *o.In[i].Bits
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
panic(fmt.Errorf("Figure out what the vector width is for %v and implement it", *o))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Right now simdgen computes the machine op name for most instructions
|
||||||
|
// as $Name$OutputSize, by this denotation, these instructions are "overloaded".
|
||||||
|
// for example:
|
||||||
|
// (Uint16x8) ConvertToInt8
|
||||||
|
// (Uint16x16) ConvertToInt8
|
||||||
|
// are both VPMOVWB128.
|
||||||
|
// To make them distinguishable we need to append the input size to them as well.
|
||||||
|
// TODO: document them well in the generated code.
|
||||||
|
var demotingConvertOps = map[string]bool{
|
||||||
|
"VPMOVQD128": true, "VPMOVSQD128": true, "VPMOVUSQD128": true, "VPMOVQW128": true, "VPMOVSQW128": true,
|
||||||
|
"VPMOVUSQW128": true, "VPMOVDW128": true, "VPMOVSDW128": true, "VPMOVUSDW128": true, "VPMOVQB128": true,
|
||||||
|
"VPMOVSQB128": true, "VPMOVUSQB128": true, "VPMOVDB128": true, "VPMOVSDB128": true, "VPMOVUSDB128": true,
|
||||||
|
"VPMOVWB128": true, "VPMOVSWB128": true, "VPMOVUSWB128": true,
|
||||||
|
"VPMOVQDMasked128": true, "VPMOVSQDMasked128": true, "VPMOVUSQDMasked128": true, "VPMOVQWMasked128": true, "VPMOVSQWMasked128": true,
|
||||||
|
"VPMOVUSQWMasked128": true, "VPMOVDWMasked128": true, "VPMOVSDWMasked128": true, "VPMOVUSDWMasked128": true, "VPMOVQBMasked128": true,
|
||||||
|
"VPMOVSQBMasked128": true, "VPMOVUSQBMasked128": true, "VPMOVDBMasked128": true, "VPMOVSDBMasked128": true, "VPMOVUSDBMasked128": true,
|
||||||
|
"VPMOVWBMasked128": true, "VPMOVSWBMasked128": true, "VPMOVUSWBMasked128": true,
|
||||||
|
}
|
||||||
|
|
||||||
|
func machineOpName(maskType maskShape, gOp Operation) string {
|
||||||
|
asm := gOp.Asm
|
||||||
|
if maskType == OneMask {
|
||||||
|
asm += "Masked"
|
||||||
|
}
|
||||||
|
asm = fmt.Sprintf("%s%d", asm, gOp.VectorWidth())
|
||||||
|
if gOp.SSAVariant != nil {
|
||||||
|
asm += *gOp.SSAVariant
|
||||||
|
}
|
||||||
|
if demotingConvertOps[asm] {
|
||||||
|
// Need to append the size of the source as well.
|
||||||
|
// TODO: should be "%sto%d".
|
||||||
|
asm = fmt.Sprintf("%s_%d", asm, *gOp.In[0].Bits)
|
||||||
|
}
|
||||||
|
return asm
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareStringPointers(x, y *string) int {
|
||||||
|
if x != nil && y != nil {
|
||||||
|
return compareNatural(*x, *y)
|
||||||
|
}
|
||||||
|
if x == nil && y == nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
if x == nil {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareIntPointers(x, y *int) int {
|
||||||
|
if x != nil && y != nil {
|
||||||
|
return *x - *y
|
||||||
|
}
|
||||||
|
if x == nil && y == nil {
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
if x == nil {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareOperations(x, y Operation) int {
|
||||||
|
if c := compareNatural(x.Go, y.Go); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
xIn, yIn := x.In, y.In
|
||||||
|
|
||||||
|
if len(xIn) > len(yIn) && xIn[len(xIn)-1].Class == "mask" {
|
||||||
|
xIn = xIn[:len(xIn)-1]
|
||||||
|
} else if len(xIn) < len(yIn) && yIn[len(yIn)-1].Class == "mask" {
|
||||||
|
yIn = yIn[:len(yIn)-1]
|
||||||
|
}
|
||||||
|
|
||||||
|
if len(xIn) < len(yIn) {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if len(xIn) > len(yIn) {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
if len(x.Out) < len(y.Out) {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if len(x.Out) > len(y.Out) {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
for i := range xIn {
|
||||||
|
ox, oy := &xIn[i], &yIn[i]
|
||||||
|
if c := compareOperands(ox, oy); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
func compareOperands(x, y *Operand) int {
|
||||||
|
if c := compareNatural(x.Class, y.Class); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if x.Class == "immediate" {
|
||||||
|
return compareStringPointers(x.ImmOffset, y.ImmOffset)
|
||||||
|
} else {
|
||||||
|
if c := compareStringPointers(x.Base, y.Base); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if c := compareIntPointers(x.ElemBits, y.ElemBits); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
if c := compareIntPointers(x.Bits, y.Bits); c != 0 {
|
||||||
|
return c
|
||||||
|
}
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
type Operand struct {
|
||||||
|
Class string // One of "mask", "immediate", "vreg", "greg", and "mem"
|
||||||
|
|
||||||
|
Go *string // Go type of this operand
|
||||||
|
AsmPos int // Position of this operand in the assembly instruction
|
||||||
|
|
||||||
|
Base *string // Base Go type ("int", "uint", "float")
|
||||||
|
ElemBits *int // Element bit width
|
||||||
|
Bits *int // Total vector bit width
|
||||||
|
|
||||||
|
Const *string // Optional constant value for immediates.
|
||||||
|
// Optional immediate arg offsets. If this field is non-nil,
|
||||||
|
// This operand will be an immediate operand:
|
||||||
|
// The compiler will right-shift the user-passed value by ImmOffset and set it as the AuxInt
|
||||||
|
// field of the operation.
|
||||||
|
ImmOffset *string
|
||||||
|
Name *string // optional name in the Go intrinsic declaration
|
||||||
|
Lanes *int // *Lanes equals Bits/ElemBits except for scalars, when *Lanes == 1
|
||||||
|
// TreatLikeAScalarOfSize means only the lower $TreatLikeAScalarOfSize bits of the vector
|
||||||
|
// is used, so at the API level we can make it just a scalar value of this size; Then we
|
||||||
|
// can overwrite it to a vector of the right size during intrinsics stage.
|
||||||
|
TreatLikeAScalarOfSize *int
|
||||||
|
// If non-nil, it means the [Class] field is overwritten here, right now this is used to
|
||||||
|
// overwrite the results of AVX2 compares to masks.
|
||||||
|
OverwriteClass *string
|
||||||
|
// If non-nil, it means the [Base] field is overwritten here. This field exist solely
|
||||||
|
// because Intel's XED data is inconsistent. e.g. VANDNP[SD] marks its operand int.
|
||||||
|
OverwriteBase *string
|
||||||
|
// If non-nil, it means the [ElementBits] field is overwritten. This field exist solely
|
||||||
|
// because Intel's XED data is inconsistent. e.g. AVX512 VPMADDUBSW marks its operand
|
||||||
|
// elemBits 16, which should be 8.
|
||||||
|
OverwriteElementBits *int
|
||||||
|
// FixedReg is the name of the fixed registers
|
||||||
|
FixedReg *string
|
||||||
|
}
|
||||||
|
|
||||||
|
// isDigit returns true if the byte is an ASCII digit.
|
||||||
|
func isDigit(b byte) bool {
|
||||||
|
return b >= '0' && b <= '9'
|
||||||
|
}
|
||||||
|
|
||||||
|
// compareNatural performs a "natural sort" comparison of two strings.
|
||||||
|
// It compares non-digit sections lexicographically and digit sections
|
||||||
|
// numerically. In the case of string-unequal "equal" strings like
|
||||||
|
// "a01b" and "a1b", strings.Compare breaks the tie.
|
||||||
|
//
|
||||||
|
// It returns:
|
||||||
|
//
|
||||||
|
// -1 if s1 < s2
|
||||||
|
// 0 if s1 == s2
|
||||||
|
// +1 if s1 > s2
|
||||||
|
func compareNatural(s1, s2 string) int {
|
||||||
|
i, j := 0, 0
|
||||||
|
len1, len2 := len(s1), len(s2)
|
||||||
|
|
||||||
|
for i < len1 && j < len2 {
|
||||||
|
// Find a non-digit segment or a number segment in both strings.
|
||||||
|
if isDigit(s1[i]) && isDigit(s2[j]) {
|
||||||
|
// Number segment comparison.
|
||||||
|
numStart1 := i
|
||||||
|
for i < len1 && isDigit(s1[i]) {
|
||||||
|
i++
|
||||||
|
}
|
||||||
|
num1, _ := strconv.Atoi(s1[numStart1:i])
|
||||||
|
|
||||||
|
numStart2 := j
|
||||||
|
for j < len2 && isDigit(s2[j]) {
|
||||||
|
j++
|
||||||
|
}
|
||||||
|
num2, _ := strconv.Atoi(s2[numStart2:j])
|
||||||
|
|
||||||
|
if num1 < num2 {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if num1 > num2 {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
// If numbers are equal, continue to the next segment.
|
||||||
|
} else {
|
||||||
|
// Non-digit comparison.
|
||||||
|
if s1[i] < s2[j] {
|
||||||
|
return -1
|
||||||
|
}
|
||||||
|
if s1[i] > s2[j] {
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
i++
|
||||||
|
j++
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// deal with a01b vs a1b; there needs to be an order.
|
||||||
|
return strings.Compare(s1, s2)
|
||||||
|
}
|
||||||
|
|
||||||
|
const generatedHeader = `// Code generated by x/arch/internal/simdgen using 'go run . -xedPath $XED_PATH -o godefs -goroot $GOROOT go.yaml types.yaml categories.yaml'; DO NOT EDIT.
|
||||||
|
`
|
||||||
|
|
||||||
|
func writeGoDefs(path string, cl unify.Closure) error {
|
||||||
|
// TODO: Merge operations with the same signature but multiple
|
||||||
|
// implementations (e.g., SSE vs AVX)
|
||||||
|
var ops []Operation
|
||||||
|
for def := range cl.All() {
|
||||||
|
var op Operation
|
||||||
|
if !def.Exact() {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
if err := def.Decode(&op); err != nil {
|
||||||
|
log.Println(err.Error())
|
||||||
|
log.Println(def)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
// TODO: verify that this is safe.
|
||||||
|
op.sortOperand()
|
||||||
|
ops = append(ops, op)
|
||||||
|
}
|
||||||
|
slices.SortFunc(ops, compareOperations)
|
||||||
|
// The parsed XED data might contain duplicates, like
|
||||||
|
// 512 bits VPADDP.
|
||||||
|
deduped := dedup(ops)
|
||||||
|
slices.SortFunc(deduped, compareOperations)
|
||||||
|
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("dedup len: %d\n", len(ops))
|
||||||
|
}
|
||||||
|
var err error
|
||||||
|
if err = overwrite(deduped); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("dedup len: %d\n", len(deduped))
|
||||||
|
}
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("dedup len: %d\n", len(deduped))
|
||||||
|
}
|
||||||
|
if !*FlagNoDedup {
|
||||||
|
// TODO: This can hide mistakes in the API definitions, especially when
|
||||||
|
// multiple patterns result in the same API unintentionally. Make it stricter.
|
||||||
|
if deduped, err = dedupGodef(deduped); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("dedup len: %d\n", len(deduped))
|
||||||
|
}
|
||||||
|
if !*FlagNoConstImmPorting {
|
||||||
|
if err = copyConstImm(deduped); err != nil {
|
||||||
|
return err
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if *Verbose {
|
||||||
|
log.Printf("dedup len: %d\n", len(deduped))
|
||||||
|
}
|
||||||
|
reportXEDInconsistency(deduped)
|
||||||
|
typeMap := parseSIMDTypes(deduped)
|
||||||
|
|
||||||
|
formatWriteAndClose(writeSIMDTypes(typeMap), path, "src/"+simdPackage+"/types_amd64.go")
|
||||||
|
formatWriteAndClose(writeSIMDFeatures(deduped), path, "src/"+simdPackage+"/cpu.go")
|
||||||
|
f, fI := writeSIMDStubs(deduped, typeMap)
|
||||||
|
formatWriteAndClose(f, path, "src/"+simdPackage+"/ops_amd64.go")
|
||||||
|
formatWriteAndClose(fI, path, "src/"+simdPackage+"/ops_internal_amd64.go")
|
||||||
|
formatWriteAndClose(writeSIMDIntrinsics(deduped, typeMap), path, "src/cmd/compile/internal/ssagen/simdintrinsics.go")
|
||||||
|
formatWriteAndClose(writeSIMDGenericOps(deduped), path, "src/cmd/compile/internal/ssa/_gen/simdgenericOps.go")
|
||||||
|
formatWriteAndClose(writeSIMDMachineOps(deduped), path, "src/cmd/compile/internal/ssa/_gen/simdAMD64ops.go")
|
||||||
|
formatWriteAndClose(writeSIMDSSA(deduped), path, "src/cmd/compile/internal/amd64/simdssa.go")
|
||||||
|
writeAndClose(writeSIMDRules(deduped).Bytes(), path, "src/cmd/compile/internal/ssa/_gen/simdAMD64.rules")
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
281
src/simd/_gen/simdgen/main.go
Normal file
281
src/simd/_gen/simdgen/main.go
Normal file
|
|
@ -0,0 +1,281 @@
|
||||||
|
// Copyright 2025 The Go Authors. All rights reserved.
|
||||||
|
// Use of this source code is governed by a BSD-style
|
||||||
|
// license that can be found in the LICENSE file.
|
||||||
|
|
||||||
|
// simdgen is an experiment in generating Go <-> asm SIMD mappings.
|
||||||
|
//
|
||||||
|
// Usage: simdgen [-xedPath=path] [-q=query] input.yaml...
|
||||||
|
//
|
||||||
|
// If -xedPath is provided, one of the inputs is a sum of op-code definitions
|
||||||
|
// generated from the Intel XED data at path.
|
||||||
|
//
|
||||||
|
// If input YAML files are provided, each file is read as an input value. See
|
||||||
|
// [unify.Closure.UnmarshalYAML] or "go doc unify.Closure.UnmarshalYAML" for the
|
||||||
|
// format of these files.
|
||||||
|
//
|
||||||
|
// TODO: Example definitions and values.
|
||||||
|
//
|
||||||
|
// The command unifies across all of the inputs and prints all possible results
|
||||||
|
// of this unification.
|
||||||
|
//
|
||||||
|
// If the -q flag is provided, its string value is parsed as a value and treated
|
||||||
|
// as another input to unification. This is intended as a way to "query" the
|
||||||
|
// result, typically by narrowing it down to a small subset of results.
|
||||||
|
//
|
||||||
|
// Typical usage:
|
||||||
|
//
|
||||||
|
// go run . -xedPath $XEDPATH *.yaml
|
||||||
|
//
|
||||||
|
// To see just the definitions generated from XED, run:
|
||||||
|
//
|
||||||
|
// go run . -xedPath $XEDPATH
|
||||||
|
//
|
||||||
|
// (This works because if there's only one input, there's nothing to unify it
|
||||||
|
// with, so the result is simply itself.)
|
||||||
|
//
|
||||||
|
// To see just the definitions for VPADDQ:
|
||||||
|
//
|
||||||
|
// go run . -xedPath $XEDPATH -q '{asm: VPADDQ}'
|
||||||
|
//
|
||||||
|
// simdgen can also generate Go definitions of SIMD mappings:
|
||||||
|
// To generate go files to the go root, run:
|
||||||
|
//
|
||||||
|
// go run . -xedPath $XEDPATH -o godefs -goroot $PATH/TO/go go.yaml categories.yaml types.yaml
|
||||||
|
//
|
||||||
|
// types.yaml is already written, it specifies the shapes of vectors.
|
||||||
|
// categories.yaml and go.yaml contains definitions that unifies with types.yaml and XED
|
||||||
|
// data, you can find an example in ops/AddSub/.
|
||||||
|
//
|
||||||
|
// When generating Go definitions, simdgen do 3 "magic"s:
|
||||||
|
// - It splits masked operations(with op's [Masked] field set) to const and non const:
|
||||||
|
// - One is a normal masked operation, the original
|
||||||
|
// - The other has its mask operand's [Const] fields set to "K0".
|
||||||
|
// - This way the user does not need to provide a separate "K0"-masked operation def.
|
||||||
|
//
|
||||||
|
// - It deduplicates intrinsic names that have duplicates:
|
||||||
|
// - If there are two operations that shares the same signature, one is AVX512 the other
|
||||||
|
// is before AVX512, the other will be selected.
|
||||||
|
// - This happens often when some operations are defined both before AVX512 and after.
|
||||||
|
// This way the user does not need to provide a separate "K0" operation for the
|
||||||
|
// AVX512 counterpart.
|
||||||
|
//
|
||||||
|
// - It copies the op's [ConstImm] field to its immediate operand's [Const] field.
|
||||||
|
// - This way the user does not need to provide verbose op definition while only
|
||||||
|
// the const immediate field is different. This is useful to reduce verbosity of
|
||||||
|
// compares with imm control predicates.
|
||||||
|
//
|
||||||
|
// These 3 magics could be disabled by enabling -nosplitmask, -nodedup or
|
||||||
|
// -noconstimmporting flags.
|
||||||
|
//
|
||||||
|
// simdgen right now only supports amd64, -arch=$OTHERARCH will trigger a fatal error.
|
||||||
|
package main
|
||||||
|
|
||||||
|
// Big TODOs:
|
||||||
|
//
|
||||||
|
// - This can produce duplicates, which can also lead to less efficient
|
||||||
|
// environment merging. Add hashing and use it for deduplication. Be careful
|
||||||
|
// about how this shows up in debug traces, since it could make things
|
||||||
|
// confusing if we don't show it happening.
|
||||||
|
//
|
||||||
|
// - Do I need Closure, Value, and Domain? It feels like I should only need two
|
||||||
|
// types.
|
||||||
|
|
||||||
|
import (
|
||||||
|
"cmp"
|
||||||
|
"flag"
|
||||||
|
"fmt"
|
||||||
|
"log"
|
||||||
|
"maps"
|
||||||
|
"os"
|
||||||
|
"path/filepath"
|
||||||
|
"runtime/pprof"
|
||||||
|
"slices"
|
||||||
|
"strings"
|
||||||
|
|
||||||
|
"simd/_gen/unify"
|
||||||
|
|
||||||
|
"gopkg.in/yaml.v3"
|
||||||
|
)
|
||||||
|
|
||||||
|
var (
|
||||||
|
xedPath = flag.String("xedPath", "", "load XED datafiles from `path`")
|
||||||
|
flagQ = flag.String("q", "", "query: read `def` as another input (skips final validation)")
|
||||||
|
flagO = flag.String("o", "yaml", "output type: yaml, godefs (generate definitions into a Go source tree")
|
||||||
|
flagGoDefRoot = flag.String("goroot", ".", "the path to the Go dev directory that will receive the generated files")
|
||||||
|
FlagNoDedup = flag.Bool("nodedup", false, "disable deduplicating godefs of 2 qualifying operations from different extensions")
|
||||||
|
FlagNoConstImmPorting = flag.Bool("noconstimmporting", false, "disable const immediate porting from op to imm operand")
|
||||||
|
FlagArch = flag.String("arch", "amd64", "the target architecture")
|
||||||
|
|
||||||
|
Verbose = flag.Bool("v", false, "verbose")
|
||||||
|
|
||||||
|
flagDebugXED = flag.Bool("debug-xed", false, "show XED instructions")
|
||||||
|
flagDebugUnify = flag.Bool("debug-unify", false, "print unification trace")
|
||||||
|
flagDebugHTML = flag.String("debug-html", "", "write unification trace to `file.html`")
|
||||||
|
FlagReportDup = flag.Bool("reportdup", false, "report the duplicate godefs")
|
||||||
|
|
||||||
|
flagCPUProfile = flag.String("cpuprofile", "", "write CPU profile to `file`")
|
||||||
|
flagMemProfile = flag.String("memprofile", "", "write memory profile to `file`")
|
||||||
|
)
|
||||||
|
|
||||||
|
const simdPackage = "simd"
|
||||||
|
|
||||||
|
func main() {
|
||||||
|
flag.Parse()
|
||||||
|
|
||||||
|
if *flagCPUProfile != "" {
|
||||||
|
f, err := os.Create(*flagCPUProfile)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("-cpuprofile: %s", err)
|
||||||
|
}
|
||||||
|
defer f.Close()
|
||||||
|
pprof.StartCPUProfile(f)
|
||||||
|
defer pprof.StopCPUProfile()
|
||||||
|
}
|
||||||
|
if *flagMemProfile != "" {
|
||||||
|
f, err := os.Create(*flagMemProfile)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("-memprofile: %s", err)
|
||||||
|
}
|
||||||
|
defer func() {
|
||||||
|
pprof.WriteHeapProfile(f)
|
||||||
|
f.Close()
|
||||||
|
}()
|
||||||
|
}
|
||||||
|
|
||||||
|
var inputs []unify.Closure
|
||||||
|
|
||||||
|
if *FlagArch != "amd64" {
|
||||||
|
log.Fatalf("simdgen only supports amd64")
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load XED into a defs set.
|
||||||
|
if *xedPath != "" {
|
||||||
|
xedDefs := loadXED(*xedPath)
|
||||||
|
inputs = append(inputs, unify.NewSum(xedDefs...))
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load query.
|
||||||
|
if *flagQ != "" {
|
||||||
|
r := strings.NewReader(*flagQ)
|
||||||
|
def, err := unify.Read(r, "<query>", unify.ReadOpts{})
|
||||||
|
if err != nil {
|
||||||
|
log.Fatalf("parsing -q: %s", err)
|
||||||
|
}
|
||||||
|
inputs = append(inputs, def)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load defs files.
|
||||||
|
must := make(map[*unify.Value]struct{})
|
||||||
|
for _, path := range flag.Args() {
|
||||||
|
defs, err := unify.ReadFile(path, unify.ReadOpts{})
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
inputs = append(inputs, defs)
|
||||||
|
|
||||||
|
if filepath.Base(path) == "go.yaml" {
|
||||||
|
// These must all be used in the final result
|
||||||
|
for def := range defs.Summands() {
|
||||||
|
must[def] = struct{}{}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Prepare for unification
|
||||||
|
if *flagDebugUnify {
|
||||||
|
unify.Debug.UnifyLog = os.Stderr
|
||||||
|
}
|
||||||
|
if *flagDebugHTML != "" {
|
||||||
|
f, err := os.Create(*flagDebugHTML)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
unify.Debug.HTML = f
|
||||||
|
defer f.Close()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Unify!
|
||||||
|
unified, err := unify.Unify(inputs...)
|
||||||
|
if err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Validate results.
|
||||||
|
//
|
||||||
|
// Don't validate if this is a command-line query because that tends to
|
||||||
|
// eliminate lots of required defs and is used in cases where maybe defs
|
||||||
|
// aren't enumerable anyway.
|
||||||
|
if *flagQ == "" && len(must) > 0 {
|
||||||
|
validate(unified, must)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Print results.
|
||||||
|
switch *flagO {
|
||||||
|
case "yaml":
|
||||||
|
// Produce a result that looks like encoding a slice, but stream it.
|
||||||
|
fmt.Println("!sum")
|
||||||
|
var val1 [1]*unify.Value
|
||||||
|
for val := range unified.All() {
|
||||||
|
val1[0] = val
|
||||||
|
// We have to make a new encoder each time or it'll print a document
|
||||||
|
// separator between each object.
|
||||||
|
enc := yaml.NewEncoder(os.Stdout)
|
||||||
|
if err := enc.Encode(val1); err != nil {
|
||||||
|
log.Fatal(err)
|
||||||
|
}
|
||||||
|
enc.Close()
|
||||||
|
}
|
||||||
|
case "godefs":
|
||||||
|
if err := writeGoDefs(*flagGoDefRoot, unified); err != nil {
|
||||||
|
log.Fatalf("Failed writing godefs: %+v", err)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if !*Verbose && *xedPath != "" {
|
||||||
|
if operandRemarks == 0 {
|
||||||
|
fmt.Fprintf(os.Stderr, "XED decoding generated no errors, which is unusual.\n")
|
||||||
|
} else {
|
||||||
|
fmt.Fprintf(os.Stderr, "XED decoding generated %d \"errors\" which is not cause for alarm, use -v for details.\n", operandRemarks)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
func validate(cl unify.Closure, required map[*unify.Value]struct{}) {
|
||||||
|
// Validate that:
|
||||||
|
// 1. All final defs are exact
|
||||||
|
// 2. All required defs are used
|
||||||
|
for def := range cl.All() {
|
||||||
|
if _, ok := def.Domain.(unify.Def); !ok {
|
||||||
|
fmt.Fprintf(os.Stderr, "%s: expected Def, got %T\n", def.PosString(), def.Domain)
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
|
if !def.Exact() {
|
||||||
|
fmt.Fprintf(os.Stderr, "%s: def not reduced to an exact value, why is %s:\n", def.PosString(), def.WhyNotExact())
|
||||||
|
fmt.Fprintf(os.Stderr, "\t%s\n", strings.ReplaceAll(def.String(), "\n", "\n\t"))
|
||||||
|
}
|
||||||
|
|
||||||
|
for root := range def.Provenance() {
|
||||||
|
delete(required, root)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Report unused defs
|
||||||
|
unused := slices.SortedFunc(maps.Keys(required),
|
||||||
|
func(a, b *unify.Value) int {
|
||||||
|
return cmp.Or(
|
||||||
|
cmp.Compare(a.Pos().Path, b.Pos().Path),
|
||||||
|
cmp.Compare(a.Pos().Line, b.Pos().Line),
|
||||||
|
)
|
||||||
|
})
|
||||||
|
for _, def := range unused {
|
||||||
|
// TODO: Can we say anything more actionable? This is always a problem
|
||||||
|
// with unification: if it fails, it's very hard to point a finger at
|
||||||
|
// any particular reason. We could go back and try unifying this again
|
||||||
|
// with each subset of the inputs (starting with individual inputs) to
|
||||||
|
// at least say "it doesn't unify with anything in x.yaml". That's a lot
|
||||||
|
// of work, but if we have trouble debugging unification failure it may
|
||||||
|
// be worth it.
|
||||||
|
fmt.Fprintf(os.Stderr, "%s: def required, but did not unify (%v)\n",
|
||||||
|
def.PosString(), def)
|
||||||
|
}
|
||||||
|
}
|
||||||
37
src/simd/_gen/simdgen/ops/AddSub/categories.yaml
Normal file
37
src/simd/_gen/simdgen/ops/AddSub/categories.yaml
Normal file
|
|
@ -0,0 +1,37 @@
|
||||||
|
!sum
|
||||||
|
- go: Add
|
||||||
|
commutative: true
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME adds corresponding elements of two vectors.
|
||||||
|
- go: AddSaturated
|
||||||
|
commutative: true
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME adds corresponding elements of two vectors with saturation.
|
||||||
|
- go: Sub
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME subtracts corresponding elements of two vectors.
|
||||||
|
- go: SubSaturated
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME subtracts corresponding elements of two vectors with saturation.
|
||||||
|
- go: AddPairs
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME horizontally adds adjacent pairs of elements.
|
||||||
|
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
|
||||||
|
- go: SubPairs
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME horizontally subtracts adjacent pairs of elements.
|
||||||
|
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
|
||||||
|
- go: AddPairsSaturated
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME horizontally adds adjacent pairs of elements with saturation.
|
||||||
|
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0+y1, y2+y3, ..., x0+x1, x2+x3, ...].
|
||||||
|
- go: SubPairsSaturated
|
||||||
|
commutative: false
|
||||||
|
documentation: !string |-
|
||||||
|
// NAME horizontally subtracts adjacent pairs of elements with saturation.
|
||||||
|
// For x = [x0, x1, x2, x3, ...] and y = [y0, y1, y2, y3, ...], the result is [y0-y1, y2-y3, ..., x0-x1, x2-x3, ...].
|
||||||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue