This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering][AMDGPU] Remove the SimplifyDemandedBits function that takes a User and OpIdx. Stop using it in AMDGPU target for simplifyI24.
ClosedPublic

Authored by craig.topper on Dec 26 2018, 11:19 AM.

Details

Summary

As we saw in D56057 when we tried to use this function on X86, it's unsafe. It allows the operand node to have multiple users, but doesn't prevent recursing past the first node when it does have multiple users. This can cause other simplifications earlier in the graph without regard to what bits are needed by the other users of the first node. Ideally all we should do to the first node if it has multiple uses is bypass it when its not needed by the user we started from. Doing any other transformation that SimplifyDemandedBits can do like turning ZEXT/SEXT into AEXT would result in an increase in instructions.

Fortunately, we already have a function that can do just that, GetDemandedBits. It will only make transformations that involve bypassing a node.

This patch changes AMDGPU's simplifyI24, to use a combination of GetDemandedBits to handle the multiple use simpflications. And then uses the regular SimplifyDemandedBits on each operand to handle simplifications allowed when the operand only has a single use. Unfortunately, GetDemandedBits simplifies constants more aggressively than SimplifyDemandedBits. This caused the -7 constant in the changed test to be simplified to remove the upper bits. I had to modify computeKnownBits to account for this by ignoring the upper 8 bits of the input.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 26 2018, 11:19 AM
RKSimon added a subscriber: tstellar.

Adding @tstellar as IIRC he originally added that version of SimplifyDemandedBits

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2723

Worth putting this after the LHS/RHS so we don't call getOperand more than necessary?

4325

This looks (mostly) like an NFC? Just commit the bits that you can?

craig.topper marked 2 inline comments as done.Dec 26 2018, 2:52 PM
craig.topper added inline comments.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2723

Will do

4325

The part I'm most concerned about is that MUL_U24 is now using countMinLeadingZeros. I couldn't figure out why it was using countMinSignBits for both signed and unsigned before

Reduce calls to getOperand

LGTM but we need a AMDGPU guru to check it over as well.

Any AMDGPU comments? Else I think Craig can go ahead and commit.

arsenm accepted this revision.Jan 7 2019, 6:03 AM

LGTM. The immediate change is a little worse though, since the -7 is free and 0xfffff9 is not

This revision is now accepted and ready to land.Jan 7 2019, 6:03 AM
arsenm added inline comments.Jan 7 2019, 6:05 AM
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4325

There's this comment that I've never fully understood:

// We need to use sext even for MUL_U24, because MUL_U24 is used
// for signed multiply of 8 and 16-bit types.
return DAG.getSExtOrTrunc(Mul, DL, VT);

LGTM. The immediate change is a little worse though, since the -7 is free and 0xfffff9 is not

I'm not certain whether we need the constant simplification in GetDemandedBits at all to be honest, but that is a separate issue.

This revision was automatically updated to reflect the committed changes.