This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve instruction ordering of constant `srl/shl` with `and` to get better and-masks
AcceptedPublic

Authored by goldstein.w.n on Jan 12 2023, 9:34 PM.

Details

Summary

This moves some logic from combineShiftLeft and generalizes is to
work for shl and and. It also improves the non-mask constant
generation (reducing based on getSignificantBits instead of
countTrailingOnes) so that we can catch some cases where imm64 ->
sign-extended imm32/imm8 or imm32 -> sign-extended imm8

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 12 2023, 9:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 9:34 PM
goldstein.w.n requested review of this revision.Jan 12 2023, 9:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 9:34 PM
pengfei added inline comments.Jan 13 2023, 6:23 PM
llvm/test/CodeGen/X86/bitreverse.ll
538

IIRC, LEA is expensive, so this looks like a good deal.

llvm/test/CodeGen/X86/bmi-x86_64.ll
131–132

Remove BEXTR-SLOW to eliminate the message.

llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

One more andb

1011 ↗(On Diff #488861)

ditto.

llvm/test/CodeGen/X86/combine-bitreverse.ll
241

One more shrl?

300

ditto.

325

ditto.

llvm/test/CodeGen/X86/const-shift-of-constmasked.ll
657

You can add option --no_x86_scrub_sp when updating test.

1194

ditto.

llvm/test/CodeGen/X86/const-shift-with-and.ll
5 ↗(On Diff #488861)

What's these tests used for? They are not changed in this patch?
Besides, do you still need new test case for this patch? Are these covered by the other changes already?

goldstein.w.n added inline comments.Jan 13 2023, 10:04 PM
llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

One more andb

Sorry, didn't see earlier. Think this breaks a few combine patterns that probably relied on semi-canonicalizarion of (and (shift x, y), z) instead of the other way around.

Fixing the cases caught here is doable, but is it possible to maybe move this to later in the process? Currently it's guarded behind AfterLegalize (for the exact reason of not breaking other patterns), but is there a higher level or different pass this might be better placed in?

craig.topper added inline comments.
llvm/test/CodeGen/X86/bitreverse.ll
538

I thought LEA was only expensive if it uses 3 sources.

craig.topper added inline comments.Jan 13 2023, 10:21 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
47494

You could check !VT.isScalarInteger() instead.

47554

Drop else after return.

56345

Why checking both operands of the shift? Isn't only the first one interesting? An AND on the shift amount is very different.

pengfei added inline comments.Jan 14 2023, 2:05 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47679

Add one space after if.

49129

ditto.

llvm/test/CodeGen/X86/bitreverse.ll
538

You are right. SOM says only 3 operand LEA is slow. But I found MCA cannot reflect the difference.
So this is yet another regression.
old: https://godbolt.org/z/dPW93v8zT
new: https://godbolt.org/z/hrEovKW4f

pengfei added inline comments.Jan 14 2023, 2:05 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47438

results

47438

One more space. You may run clang-format to help solvoing the format nits.

47533

Use curly braces for consistency.

craig.topper added inline comments.Jan 14 2023, 10:11 AM
llvm/test/CodeGen/X86/bitreverse.ll
538

Looks like the scheduler models make the distinction for AMD CPUs but not Intel

Small snippet of the predicate check:

X86SchedPredicates.td
61:def IsThreeOperandsLEAFn :

X86ScheduleBdVer2.td
560:    IsThreeOperandsLEAFn,

X86ScheduleBtVer2.td
945:    IsThreeOperandsLEAFn,

X86ScheduleZnver3.td
590:    IsThreeOperandsLEAFn,
pengfei added inline comments.Jan 14 2023, 11:00 PM
llvm/test/CodeGen/X86/bitreverse.ll
538

Thanks for the information! We may need to do the same thing for Intel. Filed #60043 for it.

Fix many a nit

goldstein.w.n marked 7 inline comments as done.Jan 15 2023, 3:10 PM
goldstein.w.n added inline comments.
llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?

pengfei added inline comments.Jan 15 2023, 6:12 PM
llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

No, I don't. Maybe do it in a peephole pass after ISel?

craig.topper added inline comments.Jan 15 2023, 6:57 PM
llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

We have something similarish implemented during isel in X86DAGToDAGISel::tryShrinkShlLogicImm.

goldstein.w.n added inline comments.Jan 15 2023, 7:05 PM
llvm/test/CodeGen/X86/btc_bts_btr.ll
988 ↗(On Diff #488861)

Do you know if there is a later stage I can move this transform too so that it won't break any other patterns?

988 ↗(On Diff #488861)

We have something similarish implemented during isel in X86DAGToDAGISel::tryShrinkShlLogicImm.

I did in fact try moving it there and it did clean up some of the missed optimizations but added some (more severe) other missed optimizations.

988 ↗(On Diff #488861)

No, I don't. Maybe do it in a peephole pass after ISel?

Is there an example file / pass you could point to for me to emulate?

RKSimon added inline comments.Jan 16 2023, 6:29 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47444

a lot of the if-else complexity is coming from you trying to handle and(shift(x,c1),c2) and shift(and(x,c2),c3) permutations in the same code - can you not have 2 variants in separate wrapper functions and then have a simpler core function that both call?

47454

(style) assert message

47480

(style) assert messages

goldstein.w.n added inline comments.Jan 16 2023, 6:49 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47444

a lot of the if-else complexity is coming from you trying to handle and(shift(x,c1),c2) and shift(and(x,c2),c3) permutations in the same code - can you not have 2 variants in separate wrapper functions and then have a simpler core function that both call?

It seemed to me there was enough duplication / not "too" complex to justify one function, but can ofc split.

Fix regressions + more format issues

goldstein.w.n marked 3 inline comments as done and an inline comment as not done.Jan 16 2023, 3:26 PM

@pengfei fixed the added and and shift in btc_bts_btr.ll and combine-bitreverse.ll respectively. The diffs are gone so your comments don't show up anymore.

Fix andn + remove bextr + no-scrub-sp

goldstein.w.n marked 2 inline comments as done.Jan 16 2023, 5:07 PM
pengfei added inline comments.Jan 18 2023, 7:32 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9353

So it is only profitable when MASK is a constant? Why don't use c3 in the comments?

9361

Duplicated define.

llvm/lib/Target/X86/X86ISelLowering.cpp
47459–47460

You could use ISD::isExtOpcode(Opc).

47479

I'd say the implementation is too complicated to me to understand. And the overall improvement doesn't look worth the complexity. Not to mention there's still regression in bitreverse.ll
I may not try to understand the code here, leave it for other reviewers.

goldstein.w.n added inline comments.Jan 18 2023, 9:02 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9353

So it is only profitable when MASK is a constant? Why don't use c3 in the comments?

llvm/lib/Target/X86/X86ISelLowering.cpp
47479

I'd say the implementation is too complicated to me to understand. And the overall improvement doesn't look worth the complexity. Not to mention there's still regression in bitreverse.ll

I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).

I may not try to understand the code here, leave it for other reviewers.

Okay, I will split into 2-functions as Simon suggested which I think will decrease the complexity.

Split and/shift cases (readability). Fix some nits

pengfei added inline comments.Jan 18 2023, 6:31 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
47479

I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).

I think you fixed combine-bitreverse.ll rather than bitreverse.ll
The problem in bitreverse.ll is 2 lea is replaced by 2 mov + and + shl. I think it is a minor regression due to the increased 2 mov, so I didn't insist on that.
Is the TODO in the file to address this problem?

goldstein.w.n marked 3 inline comments as done.

More cleanup

goldstein.w.n added inline comments.Jan 18 2023, 8:05 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
47479

I'd say the implementation is too complicated to me to understand. And the overall improvement doesn't look worth the complexity. Not to mention there's still regression in bitreverse.ll
I may not try to understand the code here, leave it for other reviewers.

As simon suggested, split the two implementations. Think its a lot more readable now.

47479

I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).

I think you fixed combine-bitreverse.ll rather than bitreverse.ll
The problem in bitreverse.ll is 2 lea is replaced by 2 mov + and + shl. I think it is a minor regression due to the increased 2 mov, so I didn't insist on that.

I see. I think this is b.c of the reordering (and (shl X..)) -> (shl (and X..)) when X persists (then we need a mov).
We could prevent it with a guard in combineAndWithLogicalShift that only did the transform if this was the last use of X the shiftop was shl 1/2/3 (can become lea). Is there a way to query "isLastUse(X)" for an SDValue or do we just need to do hasOneUse?

Also note other places in the patch get the inverse behavior i.e llvm/test/CodeGen/X86/combine-rotates.ll:rotl_merge_i5.

Is the TODO in the file to address this problem?

Not directly, no. Maybe by accident.

Fix regression in bitreverse

goldstein.w.n marked an inline comment as done.Jan 18 2023, 9:20 PM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
47479

I think the bitreverse.ll regressions have been fixed (that was the point of the new patterns in DAGCombiner.cpp).

I think you fixed combine-bitreverse.ll rather than bitreverse.ll
The problem in bitreverse.ll is 2 lea is replaced by 2 mov + and + shl. I think it is a minor regression due to the increased 2 mov, so I didn't insist on that.
Is the TODO in the file to address this problem?

Fixed.

pengfei accepted this revision.Jan 19 2023, 1:20 AM

The new code is a bit easier to understand now, at least to me. The idea masks sense to me though I still think the logic is a bit verbose.
I'm not going to block this patch, but please wait some days for other reviewers.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9440

Unintended change.

llvm/lib/Target/X86/X86ISelLowering.cpp
47663

I found we prefer to const APInt.

47674

countTrailingOnes?

47676

Should it just need to check MaskCnt == 8/16/32. Or even AndMask == 0xff/0xffff/0xffffffff?

47706

countTrailingOnes?

47735

Duplicated.

47735

it's

47847

is

56681

Add a comment to explain this is for combineLogicalShiftWithAnd.
Why combineAndWithLogicalShift not need this?

This revision is now accepted and ready to land.Jan 19 2023, 1:20 AM
goldstein.w.n marked 6 inline comments as done.Jan 19 2023, 9:52 AM
goldstein.w.n added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
56681

Add a comment to explain this is for combineLogicalShiftWithAnd.
Why combineAndWithLogicalShift not need this?

this is only called for shift ops (i.e VisitSHL and visitShiftByConstant).

goldstein.w.n marked an inline comment as done.

Fix nits + improve comments

Update tests for all backends

goldstein.w.n added inline comments.Jan 20 2023, 10:12 PM
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll
942 ↗(On Diff #491035)

@t.p.northover the changes to the DagCombiner (allowing shl/shr to combine through an and) seems to cause regression on Aarch64. Going to update with TargetLowering flag unless this is in fact preferable.

Put the DAGCombiner changes behind TLI flag as it seems to cause
minor regression on aarch64

goldstein.w.n added inline comments.Jan 21 2023, 12:16 AM
llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll
942 ↗(On Diff #491035)

@t.p.northover the changes to the DagCombiner (allowing shl/shr to combine through an and) seems to cause regression on Aarch64. Going to update with TargetLowering flag unless this is in fact preferable.

I put it behind TLI.isDesirableToCombineShiftsAcross which is disabled by default so should be no issue.

Made some changes (TLI flag) since this was accepted. Can someone do a quick pass of the most recent changes to re-verify?

pengfei added inline comments.Jan 29 2023, 6:19 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4020–4022

Is Shift1 inner and Shift2 outer?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9440

This still no changed.

llvm/lib/Target/X86/X86ISelLowering.cpp
56685–56686

The arguments are not used. Leave it void for now?

goldstein.w.n marked 6 inline comments as done.Jan 29 2023, 6:42 PM
goldstein.w.n added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
4020–4022

Is Shift1 inner and Shift2 outer?

Yes, renamed to OuterShift and InnerShift to be clearer.

goldstein.w.n marked an inline comment as done.

Fix style, issue, rename TLI helper, void variables

pengfei added inline comments.Jan 29 2023, 7:01 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4020

Is the comment still incorrect?

goldstein.w.n marked an inline comment as done.

Fix comment

pengfei accepted this revision.Jan 29 2023, 7:39 PM
goldstein.w.n added inline comments.Jan 30 2023, 10:08 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4020

Is the comment still incorrect?

Oof, fixed.

@goldstein.w.n What is happening with this patch? After they leave my "Ready to Review" list I tend to lose track......

@goldstein.w.n What is happening with this patch? After they leave my "Ready to Review" list I tend to lose track......

When I tested on bootstrap build it caused on infinite loop. I think this approach is inherently brittle and susceptible
to that sort of bug. I haven't quite abandoned it as I think some of the shl/shr improvements can be salvaged. But
the imm improvedments I want to move to a new pass that runs at the way of end DAG lowering.