This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits
ClosedPublic

Authored by RKSimon on Mar 15 2023, 3:10 AM.

Details

Summary

Inspired by some of the cases from D145468

Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

Diff Detail

Event Timeline

RKSimon created this revision.Mar 15 2023, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 3:10 AM
RKSimon requested review of this revision.Mar 15 2023, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2023, 3:10 AM
RKSimon added inline comments.Mar 15 2023, 3:13 AM
llvm/test/CodeGen/X86/bswap.ll
171

Looks like an equivalent patch for shl narrowing would be useful

232

Not sure if shl narrowing would solve this or we need better zext/trunc handling in the bswap matcher

llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505421)

Regression (WIP)

81 ↗(On Diff #505421)

Regression (WIP)

95 ↗(On Diff #505421)

Regression (WIP)

llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll
119 ↗(On Diff #505421)

Regression (WIP) - dead store folds might fail on load-ext / store-trunc mixtures?

RKSimon updated this revision to Diff 505474.Mar 15 2023, 6:47 AM

drop useless vector handling

RKSimon updated this revision to Diff 505570.Mar 15 2023, 11:01 AM

rebase again (and actually update the changed test this time....)

goldstein.w.n added inline comments.Mar 15 2023, 11:47 AM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505570)

I think this is okay here. We only get the right codegen by chance here and I don't think its something we can reasonably control during DAG isel.

I tried to improve this with D141653. Looked good for the tests but caused infinite loop in bootstrap build.

I think this (along with other imm level optimizations), need to be moved to a new pass (or function in isel) that runs at the very end.

kazu added inline comments.Mar 15 2023, 12:41 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505570)

think this (along with other imm level optimizations), need to be moved to a new pass (or function in isel) that runs at the very end.

I was just thinking about something similar. Specifically, optimizations to achieve smaller encoding with the same opcode should move to a new pass. Otherwise, we would have to see through ISD::ZERO_EXTEND and ISD::TRUNCATE everywhere, and that would be prone to missed optimizations.

Do we have known bits and demanded bits infrastructure at the x86 MIR level? (I'm guessing not.) Also, I am wondering whether a new pass would be more effective if we use information across basic blocks.

RKSimon added inline comments.Mar 16 2023, 3:20 AM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505570)

Adding value tracking at that level would be a huge amount of work - both X86 and AMDGPU currently using the DAG narrowing code, but ideally we'd be working to enable it on other targets as well.

goldstein.w.n added inline comments.Mar 26 2023, 3:27 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505570)

I was thinking it would be best saved for a pass between DAG narrowing and MIR. I.e DAG narrowing -> Imm Fixup -> MIR. Imm Fixup could work on SDValue types.

pengfei added inline comments.Mar 26 2023, 5:59 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505570)

Any possibility it can be solved in a new ISel mechanism like GlobalISel?

RKSimon planned changes to this revision.Mar 29 2023, 5:51 AM

Update - I'm going to investigate splitting this between the regular SimplifyDemandedBits DAG combines and some specific narrowing in X86ISelDAGToDAG.cpp + ISel - that way we have the best chance of making use of the extensive value tracking code we already have in SelectionDAG.

However, a lot of the DAG combines are already in place, we're just missing them due to poor combine ordering - so getting D127115 completed once and for all will likely help us the most, so I'm looking at the remaining regressions there first, and then will revisit this for cleanup.

RKSimon updated this revision to Diff 537747.Jul 6 2023, 8:40 AM
RKSimon edited the summary of this revision. (Show Details)

rebasing this now that D127115 has landed, solving most of the issues

RKSimon marked an inline comment as not done.Jul 6 2023, 8:42 AM
RKSimon added inline comments.
llvm/test/CodeGen/X86/bswap.ll
232

This looks to be the last regression - MatchBSwapHWordLow is very pattern specific, and can't peek through zext (or ignore AND masks for known zero bits)

RKSimon added inline comments.Jul 7 2023, 6:36 AM
llvm/test/CodeGen/X86/bswap.ll
232

I've confirmed this is fixed by adding an equivalent SHL narrowing fold in SimplifyDemandedBits, which I intend to do as a follow up.

RKSimon retitled this revision from [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits (WIP) to [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits.Jul 7 2023, 6:36 AM
RKSimon edited the summary of this revision. (Show Details)
pengfei added inline comments.Jul 7 2023, 7:09 AM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505421)

I saw you removed WIP in the title. Are you not considering this as regression or you will do it as a follow up?

RKSimon marked 3 inline comments as not done.Jul 7 2023, 9:11 AM
RKSimon added inline comments.
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67 ↗(On Diff #505421)

Let me take another look - it seems to be due to matchAddressRecursively not being very good at peeking through ZERO_EXTEND nodes.

goldstein.w.n added inline comments.Jul 9 2023, 11:52 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1871

Doesn't the InDemandedMask.countLeadingZeros() >= (BitWidth / 2) check the same thing as TLO.DAG.MaskedValueIsZero(Op0, APInt::getHighBitsSet(BitWidth, BitWidth / 2)) What is the rationale for having both?

RKSimon added inline comments.Jul 9 2023, 2:19 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1871

No, InDemandedMask checks if we don't care about the upper bits at all (i.e. the SRL is used by a truncate or a AND mask like in the original implementation), and the MaskedValueIsZero alternatively checks if the upper bits are already known to be zero, in which case the (free) zext/trunc can preserve those bits correctly, using a more profitable narrower op, and possibly allowing further folds to occur. Removing either will result in test changes.

any more thoughts/comments?

any more thoughts/comments?

One more comment.

I noticed in: D154805 that for the scalars the (shl (add x, c1), c2) is not properly folding to (add (shl x, c2), c1 << c2). This is because before reaching visitSHL (and performing the fold), the shl gets shrunk and its wrapped in an ANY_EXTEND.

This patch seems like to cause similiar issues. I'm still generally infavor as we can always update the folds, but is a slight concern that we may be adding a layer of indirection that not all folds work with.

I agree - a great deal of folds are poorly designed - we hit a lot of this with D127115 and are now seeing very similar things with D152928 - if you find examples, PLEASE write up an issue ticket as it speeds up triage a great deal.

foad added a comment.Jul 17 2023, 5:34 AM

AMDGPU changes seem good.

pengfei accepted this revision.Jul 17 2023, 6:13 AM

X86 changes look good to me expect one nit.

llvm/test/CodeGen/X86/2009-05-30-ISelBug.ll
12

This results in scale to be 4, which may do bad for performance?

This revision is now accepted and ready to land.Jul 17 2023, 6:13 AM
This revision was landed with ongoing or failed builds.Jul 17 2023, 7:50 AM
This revision was automatically updated to reflect the committed changes.