This moves some logic from combineShiftLeft and generalizes is to
work for shl and and. It also improves the non-mask constant
generation (reducing based on getSignificantBits instead of
countTrailingOnes) so that we can catch some cases where imm64 ->
sign-extended imm32/imm8 or imm32 -> sign-extended imm8
Details
- Reviewers
pengfei RKSimon t.p.northover
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | Looks like the scheduler models make the distinction for AMD CPUs but not Intel Small snippet of the predicate check: X86SchedPredicates.td 61:def IsThreeOperandsLEAFn : X86ScheduleBdVer2.td 560: IsThreeOperandsLEAFn, X86ScheduleBtVer2.td 945: IsThreeOperandsLEAFn, X86ScheduleZnver3.td 590: IsThreeOperandsLEAFn, |
llvm/test/CodeGen/X86/bitreverse.ll | ||
---|---|---|
538 | Thanks for the information! We may need to do the same thing for Intel. Filed #60043 for it. |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | Do you know if there is a later stage I can move this transform too so that it won't break any other patterns? |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | No, I don't. Maybe do it in a peephole pass after ISel? |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) | We have something similarish implemented during isel in X86DAGToDAGISel::tryShrinkShlLogicImm. |
llvm/test/CodeGen/X86/btc_bts_btr.ll | ||
---|---|---|
988 ↗ | (On Diff #488861) |
|
988 ↗ | (On Diff #488861) |
I did in fact try moving it there and it did clean up some of the missed optimizations but added some (more severe) other missed optimizations. |
988 ↗ | (On Diff #488861) |
Is there an example file / pass you could point to for me to emulate? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47444 | a lot of the if-else complexity is coming from you trying to handle and(shift(x,c1),c2) and shift(and(x,c2),c3) permutations in the same code - can you not have 2 variants in separate wrapper functions and then have a simpler core function that both call? | |
47454 | (style) assert message | |
47480 | (style) assert messages |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47444 |
It seemed to me there was enough duplication / not "too" complex to justify one function, but can ofc split. |
@pengfei fixed the added and and shift in btc_bts_btr.ll and combine-bitreverse.ll respectively. The diffs are gone so your comments don't show up anymore.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9353 | So it is only profitable when MASK is a constant? Why don't use c3 in the comments? | |
9361 | Duplicated define. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47459–47460 | You could use ISD::isExtOpcode(Opc). | |
47479 | I'd say the implementation is too complicated to me to understand. And the overall improvement doesn't look worth the complexity. Not to mention there's still regression in bitreverse.ll |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9353 |
| |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47479 |
I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).
Okay, I will split into 2-functions as Simon suggested which I think will decrease the complexity. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
I think you fixed combine-bitreverse.ll rather than bitreverse.ll |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
As simon suggested, split the two implementations. Think its a lot more readable now. | |
47479 |
I see. I think this is b.c of the reordering (and (shl X..)) -> (shl (and X..)) when X persists (then we need a mov). Also note other places in the patch get the inverse behavior i.e llvm/test/CodeGen/X86/combine-rotates.ll:rotl_merge_i5.
Not directly, no. Maybe by accident. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
47479 |
Fixed. |
The new code is a bit easier to understand now, at least to me. The idea masks sense to me though I still think the logic is a bit verbose.
I'm not going to block this patch, but please wait some days for other reviewers.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9440 | Unintended change. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
47663 | I found we prefer to const APInt. | |
47674 | countTrailingOnes? | |
47676 | Should it just need to check MaskCnt == 8/16/32. Or even AndMask == 0xff/0xffff/0xffffffff? | |
47706 | countTrailingOnes? | |
47735 | Duplicated. | |
47735 | it's | |
47847 | is | |
56681 | Add a comment to explain this is for combineLogicalShiftWithAnd. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
56681 |
this is only called for shift ops (i.e VisitSHL and visitShiftByConstant). |
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll | ||
---|---|---|
942 ↗ | (On Diff #491035) | @t.p.northover the changes to the DagCombiner (allowing shl/shr to combine through an and) seems to cause regression on Aarch64. Going to update with TargetLowering flag unless this is in fact preferable. |
Put the DAGCombiner changes behind TLI flag as it seems to cause
minor regression on aarch64
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll | ||
---|---|---|
942 ↗ | (On Diff #491035) |
I put it behind TLI.isDesirableToCombineShiftsAcross which is disabled by default so should be no issue. |
Made some changes (TLI flag) since this was accepted. Can someone do a quick pass of the most recent changes to re-verify?
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020–4022 | Is Shift1 inner and Shift2 outer? | |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
9440 | This still no changed. | |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
56685–56686 | The arguments are not used. Leave it void for now? |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020–4022 |
Yes, renamed to OuterShift and InnerShift to be clearer. |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020 | Is the comment still incorrect? |
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4020 |
Oof, fixed. |
@goldstein.w.n What is happening with this patch? After they leave my "Ready to Review" list I tend to lose track......
When I tested on bootstrap build it caused on infinite loop. I think this approach is inherently brittle and susceptible
to that sort of bug. I haven't quite abandoned it as I think some of the shl/shr improvements can be salvaged. But
the imm improvedments I want to move to a new pass that runs at the way of end DAG lowering.