Force insert zero-idiom and break false dependency of dest register for several instructions.
The related instructions are:
VPERMD/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH
Paths
| Differential D116072
[X86] GLC: Break false dependency for dest register for several instructions. ClosedPublic Authored by gpei on Dec 20 2021, 7:27 PM.
Details Summary Force insert zero-idiom and break false dependency of dest register for several instructions. The related instructions are: VPERMD/Q/PS/PD VRANGEPD/PS/SD/SS VGETMANTSS/SD/SH VGETMANDPS/PD - mem version only VPMULLQ VFMULCSH/PH VFCMULCSH/PH
Diff Detail
Unit TestsFailed Event TimelineComment Actions Some of these are quite surprising to me especially PMULLQ. Can you provide any more explanation about why these have false dependencies?
Comment Actions
Intel® 64 and IA-32 Architectures Optimization Reference Manual is updated, 2.2.1.4 Avoiding Destination False Dependency tells the details.
gpei added inline comments.
This revision is now accepted and ready to land.Apr 16 2022, 8:39 PM Comment Actions A few minors to address
This revision now requires changes to proceed.Apr 18 2022, 9:48 AM gpei marked 2 inline comments as done. gpei marked an inline comment as done. gpei marked 2 inline comments as done. Comment ActionsDone, thanks for pointing this!
This revision is now accepted and ready to land.Apr 20 2022, 1:34 AM This revision now requires changes to proceed.Apr 20 2022, 9:39 AM Comment Actions
We don't need change X86Subtarget.h anymore. These code are generated automatically by *.td now. This revision is now accepted and ready to land.Apr 20 2022, 2:14 PM This revision was landed with ongoing or failed builds.Apr 21 2022, 1:47 AM Closed by commit rG3e6b904f0a50: Force insert zero-idiom and break false dependency of dest register for several… (authored by gpei). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions
This is being discussed on https://github.com/llvm/llvm-project/issues/55130 - if you can prove that there is a performance regression since 13.x without the patch then it should probably go in. Other wise it's really a new feature and I'm not sure whether we are accepting those for point releases.
Revision Contents
Diff 423802 llvm/lib/Target/X86/X86.td
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86TargetTransformInfo.h
llvm/test/CodeGen/X86/getmant-false-deps.ll
llvm/test/CodeGen/X86/mulc-false-deps.ll
llvm/test/CodeGen/X86/perm.avx2-false-deps.ll
llvm/test/CodeGen/X86/perm.avx512-false-deps.ll
llvm/test/CodeGen/X86/pmullq-false-deps.ll
llvm/test/CodeGen/X86/range-false-deps.ll
|
This should probably be MULC since the instruction names are VFMULC and VFCMULC. The consistent C is after MUL not before.