This moves some logic from combineShiftLeft and generalizes is to
work for shl and and. It also improves the non-mask constant
generation (reducing based on getSignificantBits instead of
countTrailingOnes) so that we can catch some cases where imm64 ->
sign-extended imm32/imm8 or imm32 -> sign-extended imm8
Details
- Reviewers
pengfei RKSimon t.p.northover
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | IIRC, LEA is expensive, so this looks like a good deal. | |
llvm/test/CodeGen/X86/bmi-x86_64.ll | ||
131–132 | Remove BEXTR-SLOW to eliminate the message. | |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
988 ↗ | (On Diff #488861) | One more andb |
1011 ↗ | (On Diff #488861) | ditto. |
llvm/test/CodeGen/X86/combine-bitreverse.ll | ||
241 | One more shrl? | |
300 | ditto. | |
325 | ditto. | |
llvm/test/CodeGen/X86/const-shift-of-constmasked.ll | ||
657 | You can add option --no_x86_scrub_sp when updating test. | |
1194 | ditto. | |
llvm/test/CodeGen/X86/const-shift-with-and.ll | ||
5 ↗ | (On Diff #488861) | What's these tests used for? They are not changed in this patch? |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) |
Sorry, didn't see earlier. Think this breaks a few combine patterns that probably relied on semi-canonicalizarion of (and (shift x, y), z) instead of the other way around. Fixing the cases caught here is doable, but is it possible to maybe move this to later in the process? Currently it's guarded behind AfterLegalize (for the exact reason of not breaking other patterns), but is there a higher level or different pass this might be better placed in? |
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | I thought LEA was only expensive if it uses 3 sources. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47679 | Add one space after if. | |
49129 | ditto. | |
llvm/test/CodeGen/X86/bitreverse.ll | ||
538 | You are right. SOM says only 3 operand LEA is slow. But I found MCA cannot reflect the difference. |
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | Looks like the scheduler models make the distinction for AMD CPUs but not Intel Small snippet of the predicate check: X86SchedPredicates.td 61:def IsThreeOperandsLEAFn : X86ScheduleBdVer2.td 560: IsThreeOperandsLEAFn, X86ScheduleBtVer2.td 945: IsThreeOperandsLEAFn, X86ScheduleZnver3.td 590: IsThreeOperandsLEAFn, |
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | Thanks for the information! We may need to do the same thing for Intel. Filed #60043 for it. |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | Do you know if there is a later stage I can move this transform too so that it won't break any other patterns? |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | No, I don't. Maybe do it in a peephole pass after ISel? |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | We have something similarish implemented during isel in X86DAGToDAGISel::tryShrinkShlLogicImm. |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) |
|
988 ↗ | (On Diff #488861) |
I did in fact try moving it there and it did clean up some of the missed optimizations but added some (more severe) other missed optimizations. |
988 ↗ | (On Diff #488861) |
Is there an example file / pass you could point to for me to emulate? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47444 | a lot of the if-else complexity is coming from you trying to handle and(shift(x,c1),c2) and shift(and(x,c2),c3) permutations in the same code - can you not have 2 variants in separate wrapper functions and then have a simpler core function that both call? | |
47454 | (style) assert message | |
47480 | (style) assert messages |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47444 |
It seemed to me there was enough duplication / not "too" complex to justify one function, but can ofc split. |
@pengfei fixed the added and and shift in btc_bts_btr.ll and combine-bitreverse.ll respectively. The diffs are gone so your comments don't show up anymore.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9329 | So it is only profitable when MASK is a constant? Why don't use c3 in the comments? | |
9337 | Duplicated define. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47459–47460 | You could use ISD::isExtOpcode(Opc). | |
47479 | I'd say the implementation is too complicated to me to understand. And the overall improvement doesn't look worth the complexity. Not to mention there's still regression in bitreverse.ll |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9329 |
| |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47479 |
I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).
Okay, I will split into 2-functions as Simon suggested which I think will decrease the complexity. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
I think you fixed combine-bitreverse.ll rather than bitreverse.ll |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
As simon suggested, split the two implementations. Think its a lot more readable now. | |
47479 |
I see. I think this is b.c of the reordering (and (shl X..)) -> (shl (and X..)) when X persists (then we need a mov). Also note other places in the patch get the inverse behavior i.e llvm/test/CodeGen/X86/combine-rotates.ll:rotl_merge_i5.
Not directly, no. Maybe by accident. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
Fixed. |
The new code is a bit easier to understand now, at least to me. The idea masks sense to me though I still think the logic is a bit verbose.
I'm not going to block this patch, but please wait some days for other reviewers.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9414 | Unintended change. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47663 | I found we prefer to const APInt. | |
47674 | countTrailingOnes? | |
47676 | Should it just need to check MaskCnt == 8/16/32. Or even AndMask == 0xff/0xffff/0xffffffff? | |
47706 | countTrailingOnes? | |
47735 | Duplicated. | |
47735 | it's | |
47847 | is | |
56683 | Add a comment to explain this is for combineLogicalShiftWithAnd. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
56683 |
this is only called for shift ops (i.e VisitSHL and visitShiftByConstant). |
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll | ||
---|---|---|
942 ↗ | (On Diff #491035) | @t.p.northover the changes to the DagCombiner (allowing shl/shr to combine through an and) seems to cause regression on Aarch64. Going to update with TargetLowering flag unless this is in fact preferable. |
Put the DAGCombiner changes behind TLI flag as it seems to cause
minor regression on aarch64
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll | ||
---|---|---|
942 ↗ | (On Diff #491035) |
I put it behind TLI.isDesirableToCombineShiftsAcross which is disabled by default so should be no issue. |
Made some changes (TLI flag) since this was accepted. Can someone do a quick pass of the most recent changes to re-verify?
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4013–4015 ↗ | (On Diff #491045) | Is Shift1 inner and Shift2 outer? |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
9414 | This still no changed. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
56685–56686 | The arguments are not used. Leave it void for now? |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4013–4015 ↗ | (On Diff #491045) |
Yes, renamed to OuterShift and InnerShift to be clearer. |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020 ↗ | (On Diff #493177) | Is the comment still incorrect? |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020 ↗ | (On Diff #493177) |
Oof, fixed. |
@goldstein.w.n What is happening with this patch? After they leave my "Ready to Review" list I tend to lose track......
When I tested on bootstrap build it caused on infinite loop. I think this approach is inherently brittle and susceptible
to that sort of bug. I haven't quite abandoned it as I think some of the shl/shr improvements can be salvaged. But
the imm improvedments I want to move to a new pass that runs at the way of end DAG lowering.
So it is only profitable when MASK is a constant? Why don't use c3 in the comments?