This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
ClosedPublic

Authored by RKSimon on May 3 2022, 4:19 AM.

Details

Summary

Pulled out of D77804 as its going to be easier to address the regressions individually.

This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

The lost RISCV gorc2 fold shouldn't be a problem - instcombine would have already destroyed that pattern - see https://github.com/llvm/llvm-project/issues/50553

Diff Detail

Event Timeline

RKSimon created this revision.May 3 2022, 4:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 4:19 AM
RKSimon requested review of this revision.May 3 2022, 4:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2022, 4:19 AM
foad added a comment.May 4 2022, 1:35 AM

AMDGPU - we're losing a v2i16 splat pattern

I'm not 100% sure but I think this might be fixed if we did a better job of optimizing this fragment of DAG:

Legalized selection DAG:

          t33: v2i32 = scalar_to_vector t36
        t66: i64 = bitcast t33
      t67: i64 = srl t66, Constant:i32<16>
    t68: i16 = truncate t67
  t69: i32 = zero_extend t68

Currently gets "optimized" to:

Optimized legalized selection DAG:

        t33: v2i32 = scalar_to_vector t36
      t66: i64 = bitcast t33
    t67: i64 = srl t66, Constant:i32<16>
  t73: i32 = truncate t67

But the whole thing could just be t73: i32 = srl t36, Constant:i32<16>.

dmgreen added inline comments.May 4 2022, 6:04 AM
llvm/test/CodeGen/ARM/ror.ll
24–31 ↗(On Diff #426640)

Is this still an issue if the IR is canonically using fshr? https://godbolt.org/z/EMnGsq9nT

RKSimon added inline comments.May 4 2022, 7:24 AM
llvm/test/CodeGen/ARM/ror.ll
24–31 ↗(On Diff #426640)

No, if we can assume the IR has come from the middle end then it will be 2 FSHLR funnel shift intrinsics and will fold correctly. If this codegen can appear in DAG then its a regression.

I'm happy to update the tests to use the fshl/fshr intrinsics?

dmgreen added inline comments.May 4 2022, 11:42 AM
llvm/test/CodeGen/ARM/ror.ll
24–31 ↗(On Diff #426640)

Yeah, adding the new test for the canonical form sounds useful, to show it's not getting worse. Whether optimizing something like this would be useful or not, I'm not sure. It sounds unlikely to come up.

      t19: v2i32 = BUILD_VECTOR Constant:i32<28>, Constant:i32<28>
    t52: v2i32 = rotl t17, t19
    t29: v2i32 = BUILD_VECTOR Constant:i32<6>, Constant:i32<6>
  t30: v2i32 = srl t52, t29
      t22: v2i32 = BUILD_VECTOR Constant:i32<4>, Constant:i32<4>
    t23: v2i32 = srl t17, t22
    t26: v2i32 = BUILD_VECTOR Constant:i32<26>, Constant:i32<26>
  t51: v2i32 = shl t23, t26
t31: v2i32 = or t30, t51
RKSimon updated this revision to Diff 427873.May 7 2022, 9:51 AM
RKSimon edited the summary of this revision. (Show Details)

rebase

RKSimon updated this revision to Diff 427890.May 7 2022, 12:48 PM
RKSimon edited the summary of this revision. (Show Details)

rebase

@craig.topper What do you think I should do about the lost RISCV gorc2 fold?

RKSimon updated this revision to Diff 427891.May 7 2022, 12:51 PM

rebase - missed test update

rebase

@craig.topper What do you think I should do about the lost RISCV gorc2 fold?

I’m not concerned about it. The spec that includes that instruction hasn’t been ratified and doesn’t appear to have anyone driving it towards being ratified.

RKSimon updated this revision to Diff 427894.May 7 2022, 1:34 PM
RKSimon edited the summary of this revision. (Show Details)

rebase

RKSimon retitled this revision from [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling (WIP) to [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling.May 8 2022, 2:49 AM
foad added a comment.May 13 2022, 2:28 AM

I'm confused because the description says SimplifyMultipleUseDemandedBits but the patch changes SimplifyDemandedBits and I don't know how they relate.

Having said that, the actual patch makes sense to me and the AMDGPU changes are fine and save a few instructions overall.

RKSimon retitled this revision from [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling to [DAG] Enable ISD::SHL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits.May 13 2022, 2:44 AM
RKSimon edited the summary of this revision. (Show Details)

Cheers @foad I've tried to clear up the summary

RKSimon edited the summary of this revision. (Show Details)May 13 2022, 2:45 AM
foad accepted this revision.May 13 2022, 2:49 AM

LGTM to me once the other target maintainers are happy.

This revision is now accepted and ready to land.May 13 2022, 2:49 AM
jonpa edited reviewers, added: uweigand; removed: jonpa.May 13 2022, 6:16 AM

SystemZ changes LGTM.

RISC-V changes LGTM

This revision was landed with ongoing or failed builds.May 14 2022, 1:50 AM
This revision was automatically updated to reflect the committed changes.