As shown in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141854.html, widen load can also cause stall. Add a flag to guard the widening code, so users can disable it and evaluate its performance impact.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
1135 | Why not do this in loadi16 as well? | |
llvm/lib/Target/X86/X86Subtarget.cpp | ||
48 ↗ | (On Diff #267681) | Can we put this in X86ISelDAGToDAG.cpp and access it directly? The .td should generate an include that gets included there. |
48 ↗ | (On Diff #267681) | enalbe->enable x86-promote-anyext-load might be a better name? "wide" might get mixed up with vectors since we have WidenVector as a type legalization action. |
See https://reviews.llvm.org/rL51019 for the original commit.
The benefits of matching loadi32 like this are:
- mov is one byte shorter than movzx
- We can fold the load into a subsequent 32-bit instruction in some cases.
And the downside, of course, is that if we're unlucky and create a hazard, there's a performance hit.
Thanks for the background information. The new flag is on by default, so it doesn't change anything if it is not explicitly disabled.
And if the wide load hit a narrow store, the penalty is much larger than a separate alu instruction, because it must wait until all instructions before (and including) the narrow store retired.
llvm/lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
1135 | LLVM IR doesn't have anyext, I failed to write a test case to trigger it. :( |
LGTM
llvm/lib/Target/X86/X86InstrInfo.td | ||
---|---|---|
1135 | yeah it is probably hard to trigger with so much promotion of i16 ops. |
Why not do this in loadi16 as well?