As discussed in PR38938,
we fail to emit BEXTR if the mask is shifted.
We can't deal with that in X86DAGToDAGISel before the address mode for the inc is selected,
and we can't really do it in the normal DAGCombine, because we don't have generic ISD::BitFieldExtract node,
and if we simply turn the shifted mask into a normal mask + shift-left, it will be folded back.
So it would seem X86ISelLowering is the place to handle this.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
test/CodeGen/X86/tbm_patterns.ll | ||
---|---|---|
52–64 ↗ | (On Diff #166697) | This seems to be a miscompile, BEXTR does not touch EFLAGS? |
test/CodeGen/X86/tbm_patterns.ll | ||
---|---|---|
52–64 ↗ | (On Diff #166697) | Aha, it is not, i just can't read. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
35252 ↗ | (On Diff #166697) | Remove the call to dump. |
35308 ↗ | (On Diff #166697) | Remove call to dump. |
test/CodeGen/X86/extract-bits.ll | ||
5797 ↗ | (On Diff #166697) | This is not an improvement. We traded a shift right plus an and for a move immediate, a 2 uop bextr, and a shift left. So we went from 2 uops to 4. At least on Haswell. |
test/CodeGen/X86/extract-bits.ll | ||
---|---|---|
5797 ↗ | (On Diff #166697) | It's still an increase in instruction even on AMD in the BMI1 case. We still went from 2 uops to 3 uops. We'd only be ok with BEXTRI TBM instruction. In the case from PR38938 we were able to fold the shl into an addressing calculation which made it beneficial. |
I'm not sure matchBEXTRFromAnd() for BMI was ever a good idea for Intel CPUs. I took it out and got performance improvements on several benchmarks in our internal list. Only a couple regressions and one of those was on a test that's really sensitive to code layout.
Okay, so these simple cases are ok if there is TBM.
But the PR38938 - AMD Jaguar - does not have TBM.
There really isn't any fundamental difference in the IR between this, and the D52293,
so i would have guessed they should be using the *same* profitability check, correct?
So what would it be? "have TBM, or shifting a newly-loaded value?"
@RKSimon
Can you remove the D52293 dependency and just move the current code to DAG so we can fix PR38938?
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
35407 ↗ | (On Diff #167788) | This should probably be below the LegalizeOps check. We should give ample opportunity for AND based DAG combines to optimize this. |
Move to after isBeforeLegalizeOps().
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
35407 ↗ | (On Diff #167788) | I'm not sure about the test for this. |
test/CodeGen/X86/bmi-x86_64.ll | ||
---|---|---|
105 ↗ | (On Diff #167790) | That is how you named it when adding in rL232580 / https://github.com/llvm-mirror/llvm/commit/cbaefea0c0c1792390375b20c31b7c1fe8d0d2c7 |
105 ↗ | (On Diff #167790) | Whoops, typo, s/you//. |
I'm a bit worried about these test changes - I thought this patch was about moving the existing code, not altering the pattern matching features.
I guess i mislabelled it then. Please see the description of this differential for the context.
Would it be better to split this into a NFC code move + non-NFC "support shifted mask"?
For the PR we're trying to fix, perhaps we should be looking at these 3 similar functions in X86ISelDAGToDAG.cpp that are called when an address computation comes from a SHL+AND
if (!foldMaskAndShiftToExtract(*CurDAG, N, Mask, Shift, X, AM)) return false; // Try to fold the mask and shift directly into the scale. if (!foldMaskAndShiftToScale(*CurDAG, N, Mask, Shift, X, AM)) return false; // Try to swap the mask and shift to place shifts which can be done as // a scale on the outside of the mask. if (!foldMaskedShiftToScaledMask(*CurDAG, N, Mask, Shift, X, AM)) return false;