Page MenuHomePhabricator

[DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits (WIP)
Changes PlannedPublic

Authored by RKSimon on Wed, Mar 15, 3:10 AM.

Details

Summary

Inspired by some of the cases from D145468

Let SimplifyDemandedBits to drive the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.

There's still a number of regressions that I'm looking at, but the general shape of the patch is ready for initial review.

Diff Detail

Event Timeline

RKSimon created this revision.Wed, Mar 15, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Mar 15, 3:10 AM
RKSimon requested review of this revision.Wed, Mar 15, 3:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Mar 15, 3:10 AM
RKSimon added inline comments.Wed, Mar 15, 3:13 AM
llvm/test/CodeGen/X86/bswap.ll
171

Looks like an equivalent patch for shl narrowing would be useful

232

Not sure if shl narrowing would solve this or we need better zext/trunc handling in the bswap matcher

llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

Regression (WIP)

81

Regression (WIP)

95

Regression (WIP)

llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll
119 ↗(On Diff #505421)

Regression (WIP) - dead store folds might fail on load-ext / store-trunc mixtures?

RKSimon updated this revision to Diff 505474.Wed, Mar 15, 6:47 AM

drop useless vector handling

RKSimon updated this revision to Diff 505570.Wed, Mar 15, 11:01 AM

rebase again (and actually update the changed test this time....)

goldstein.w.n added inline comments.Wed, Mar 15, 11:47 AM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

I think this is okay here. We only get the right codegen by chance here and I don't think its something we can reasonably control during DAG isel.

I tried to improve this with D141653. Looked good for the tests but caused infinite loop in bootstrap build.

I think this (along with other imm level optimizations), need to be moved to a new pass (or function in isel) that runs at the very end.

kazu added inline comments.Wed, Mar 15, 12:41 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

think this (along with other imm level optimizations), need to be moved to a new pass (or function in isel) that runs at the very end.

I was just thinking about something similar. Specifically, optimizations to achieve smaller encoding with the same opcode should move to a new pass. Otherwise, we would have to see through ISD::ZERO_EXTEND and ISD::TRUNCATE everywhere, and that would be prone to missed optimizations.

Do we have known bits and demanded bits infrastructure at the x86 MIR level? (I'm guessing not.) Also, I am wondering whether a new pass would be more effective if we use information across basic blocks.

RKSimon added inline comments.Thu, Mar 16, 3:20 AM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

Adding value tracking at that level would be a huge amount of work - both X86 and AMDGPU currently using the DAG narrowing code, but ideally we'd be working to enable it on other targets as well.

goldstein.w.n added inline comments.Sun, Mar 26, 3:27 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

I was thinking it would be best saved for a pass between DAG narrowing and MIR. I.e DAG narrowing -> Imm Fixup -> MIR. Imm Fixup could work on SDValue types.

pengfei added inline comments.Sun, Mar 26, 5:59 PM
llvm/test/CodeGen/X86/h-register-addressing-64.ll
67

Any possibility it can be solved in a new ISel mechanism like GlobalISel?

RKSimon planned changes to this revision.Wed, Mar 29, 5:51 AM

Update - I'm going to investigate splitting this between the regular SimplifyDemandedBits DAG combines and some specific narrowing in X86ISelDAGToDAG.cpp + ISel - that way we have the best chance of making use of the extensive value tracking code we already have in SelectionDAG.

However, a lot of the DAG combines are already in place, we're just missing them due to poor combine ordering - so getting D127115 completed once and for all will likely help us the most, so I'm looking at the remaining regressions there first, and then will revisit this for cleanup.