Force insert zero-idiom and break false dependency of dest register for several instructions.
The related instructions are:
VPERMD/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH
Paths
| Differential D116072
[X86] GLC: Break false dependency for dest register for several instructions. ClosedPublic Authored by gpei on Dec 20 2021, 7:27 PM.
Details Summary Force insert zero-idiom and break false dependency of dest register for several instructions. The related instructions are: VPERMD/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH
Diff Detail
Event TimelineComment Actions Some of these are quite surprising to me especially PMULLQ. Can you provide any more explanation about why these have false dependencies?
Comment Actions
Intel® 64 and IA-32 Architectures Optimization Reference Manual is updated, 2.2.1.4 Avoiding Destination False Dependency tells the details.
gpei added inline comments.
This revision is now accepted and ready to land.Apr 16 2022, 8:39 PM Comment Actions A few minors to address
This revision now requires changes to proceed.Apr 18 2022, 9:48 AM gpei marked 2 inline comments as done. gpei marked an inline comment as done. gpei marked 2 inline comments as done. Comment ActionsDone, thanks for pointing this!
This revision is now accepted and ready to land.Apr 20 2022, 1:34 AM This revision now requires changes to proceed.Apr 20 2022, 9:39 AM Comment Actions
We don't need change X86Subtarget.h anymore. These code are generated automatically by *.td now. This revision is now accepted and ready to land.Apr 20 2022, 2:14 PM This revision was landed with ongoing or failed builds.Apr 21 2022, 1:47 AM Closed by commit rG3e6b904f0a50: Force insert zero-idiom and break false dependency of dest register for several… (authored by gpei). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions
This is being discussed on https://github.com/llvm/llvm-project/issues/55130 - if you can prove that there is a performance regression since 13.x without the patch then it should probably go in. Other wise it's really a new feature and I'm not sure whether we are accepting those for point releases.
Revision Contents
Diff 424127 llvm/lib/Target/X86/X86.td
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86TargetTransformInfo.h
llvm/test/CodeGen/X86/getmant-false-deps.ll
llvm/test/CodeGen/X86/mulc-false-deps.ll
llvm/test/CodeGen/X86/perm.avx2-false-deps.ll
llvm/test/CodeGen/X86/perm.avx512-false-deps.ll
llvm/test/CodeGen/X86/pmullq-false-deps.ll
llvm/test/CodeGen/X86/range-false-deps.ll
|
This should probably be MULC since the instruction names are VFMULC and VFCMULC. The consistent C is after MUL not before.