UNPCKLPD/UNPCKHPD is a 64-bit element operation. The masked version
doesn't match SHUFPS in lanes.
This reverts part of D144763.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Time | Test | |
---|---|---|
60,060 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
Comment Actions
Good Catch! LGTM
llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll | ||
---|---|---|
167 | Add comments to the changed test cases - saying these are negative tests as the predicate masks don't match |
Comment Actions
@pengfei and @RKSimon what about using {VP}UNPCK{L|H}QDQ{...}? I tested on ICL and didn't see any domain penalty. Wasn't able to find the hardware to test hsw/skl/.... and not sure if it falls under no-shuffle hasNoDomainDelayShuffle or something else but it is the ideal replacement both from perf and codesize perspective.
Comment Actions
Why not both? We can try with VSHUFPD to see if it has better scheduling, else try integer unpack if we don't have a domain penalty
Add comments to the changed test cases - saying these are negative tests as the predicate masks don't match