This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add shift left instruction to `TruncInstCombine` DAG
ClosedPublic

Authored by anton-afanasyev on Aug 15 2021, 10:39 AM.

Details

Summary

Add shl instruction to the DAG post-dominated by trunc, allowing
TruncInstCombine to reduce bitwidth of expressions containing left shifts.

The only thing we need to check is that the target bitwidth
must be wider than the maximal shift amount: https://alive2.llvm.org/ce/z/AwArqu

Part of https://reviews.llvm.org/D107766

Diff Detail

Event Timeline

anton-afanasyev requested review of this revision.Aug 15 2021, 10:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2021, 10:39 AM
lebedev.ri edited the summary of this revision. (Show Details)Aug 15 2021, 10:50 AM
lebedev.ri added inline comments.Aug 15 2021, 10:58 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
112–114
385–387

This doesn't sound right?
https://alive2.llvm.org/ce/z/DMaieM

We drop no-wrap flags on all other instructions here, let's just not bother?

lebedev.ri added inline comments.Aug 15 2021, 11:06 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110

I think you might want to also pass DT, /*CxtI=*/CurrentTruncInst;
i guess we don't yet have AssumptionCache here in AIC..

anton-afanasyev marked 3 inline comments as done.Aug 15 2021, 12:05 PM
anton-afanasyev added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110

No, we haven't AC, only DT. Do you mean me to add it to AIC to use for computeKnownBits()?

112–114

Thanks, done

385–387

Changed, thanks! Also fixed test to tackle this case.

anton-afanasyev marked 3 inline comments as done.

Address comments

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
110

Let's just keep this as-is for now, and change later with a test case, i guess?

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

Hmm...I haven't looked at this code much before, so I'm not sure yet.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
38–39

If the left shift amount >= the truncated bit-width, the result must be 0.
Looks like -instcombine gets this, but not -instsimplify.
Can we adjust this test to something that does not completely fold away?

53–55

If the intent is to check for a shift amount with no known bits, it would be better to just add a function argument %y, so we're not mixing up that constraint with number of uses or some other factor.

Move to getBestTruncatedType() and add early return

anton-afanasyev marked an inline comment as done.Aug 15 2021, 10:32 PM

@spatel this appears correct, but is this the right place for this logic, or should it be in getMinBitWidth()?

I've moved code closer to getMinBitWidth(), to count known bits for shifts only if correct dag is built. Also added early exit.

anton-afanasyev marked 2 inline comments as done.

Fix test

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
38–39

Hmm, yes, you're right. But this could be adjusted only if shift amount is variable, this case is covered by @shl_var_not_commute() function below. I can only remove this test at all.

53–55

Ok, done

lebedev.ri accepted this revision.EditedAug 16 2021, 12:21 AM

I'm not sure how much will this catch, but LG, thank you.
@spatel ?

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
3–4

Pedantic nitpick: commute means mul %x, %y <=> mul %y, %x
Here you want to use negative_test in place of not_commute, and drop commute.

This revision is now accepted and ready to land.Aug 16 2021, 12:21 AM
anton-afanasyev marked an inline comment as done.Aug 16 2021, 12:50 AM
anton-afanasyev added inline comments.
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
3–4

"Commute" here means that trunc and shl commutes as operators: trunc ∘ shl = shl ∘ trunc, i.e. trunc(shl(*, *)) = shl(trunc(*), trunc(*))). But I'm to change it, thanks.

lebedev.ri added inline comments.Aug 16 2021, 12:54 AM
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
3–4

I have literally never seen such usage in all of LLVM. But that may be sample size issue.
Usually it's marked as fold.

spatel accepted this revision.Aug 16 2021, 9:27 AM

LGTM too - see inline for test suggestions.

llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
3–4

I was also confused by the use of "commute" here. The file name says we're truncating shifts, so I'd just make these all "shl..." or "narrow_shl..." (assuming we'll add the other shift ops in follow-up patches).

38–39

It's fine to leave here. It does suggest a possible optimization (or two) - we could be trying to call SimplifyInst or similar utility to make sure the IR is reduced coming in or going out.

anton-afanasyev marked 4 inline comments as done.Aug 17 2021, 2:15 AM
anton-afanasyev added inline comments.
llvm/test/Transforms/AggressiveInstCombine/trunc_shifts.ll
3–4

Ok, renamed tests.

anton-afanasyev marked an inline comment as done.

Rename tests

This revision was landed with ongoing or failed builds.Aug 17 2021, 3:17 AM
This revision was automatically updated to reflect the committed changes.
anton-afanasyev marked an inline comment as not done.

For future self-reference, i should have written these proofs originally:
https://godbolt.org/z/1f3aaYcjW
https://alive2.llvm.org/ce/z/fQEvdF

spatel added inline comments.Aug 17 2021, 11:52 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282

This logic is getting more complicated in D108201, so I want to step back to this patch to check my understanding:
Can SrcBitWidth be different than OrigBitWidth?
If so, can we write a test for that?
If not, then assert that I->getOperand(1)->getType()->getScalarSizeInBits == OrigBitWidth. Then simplify this code to something like:

KnownBits KnownRHS = computeKnownBits(I->getOperand(1), DL);
unsigned MinBitWidth = KnownRHS.getMaxValue()
                           .uadd_sat(APInt(OrigBitWidth, 1))
                           .getZExtValue();
if (MinBitWidth >= OrigBitWidth)
  return nullptr;
bjope added a subscriber: bjope.Aug 17 2021, 4:41 PM

Seen some failures and I'm suspecting this commit.

For input like this:

define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}

we now get the following after aggressive instcombine

define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}

which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.

Seen some failures and I'm suspecting this commit.

For input like this:

define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}

we now get the following after aggressive instcombine

define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}

which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.

Thanks @bjope, it was overflow of unsigned = uint64_t assignment, fixed it here: https://reviews.llvm.org/rG803270c0c691

anton-afanasyev marked an inline comment as done.Aug 17 2021, 11:32 PM
anton-afanasyev added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282

Yes, SrcBitWidth can be different than OrigBitWidth (smaller or larger). OrigBitWidth is bitwidth of original (dominating) trunc operand (before truncation) whereas SrcBitWidth is bitwidth of shift instruction. Added tests for such cases: https://reviews.llvm.org/rG0988488ed461

bjope added a comment.Aug 18 2021, 1:30 AM

Seen some failures and I'm suspecting this commit.

For input like this:

define i32 @foo(i32 %call62, i16 %call63) {
  %conv64142 = zext i32 %call62 to i64
  %conv65 = sext i16 %call63 to i64
  %sh_prom66 = and i64 %conv65, 4294967295
  %shl67 = shl i64 %conv64142, %sh_prom66
  %conv68 = trunc i64 %shl67 to i32
  ret i32 %conv68
}

we now get the following after aggressive instcombine

define i32 @foo(i32 %call62, i16 %call63) {
  %conv65 = sext i16 %call63 to i32
  %sh_prom66 = and i32 %conv65, -1
  %shl67 = shl i32 %call62, %sh_prom66
  ret i32 %shl67
}

which is more poisonous according to https://alive2.llvm.org/ce/z/88tNrs

In my original test case %call63 is 32, so we shift leftby 32, which is ok when shifting the i64 but not after having rewritten into a 32-bit shift.

Thanks @bjope, it was overflow of unsigned = uint64_t assignment, fixed it here: https://reviews.llvm.org/rG803270c0c691

I've cherry-picked you fix and the test case that failed now pass again. Thanks!

anton-afanasyev marked an inline comment as done.Aug 19 2021, 7:34 AM
anton-afanasyev added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
282

Btw, I was wrong here: actually SrcBitWidth == OrigBitWidth, since zext, sext and trunc can only be leaves of trunc-dominated DAG, so OrigBitWidth is bitwidth of all DAG's instructions. I've removed all appropriate tests since they all were positive and didn't checked anything. (fixed in the last patch here https://reviews.llvm.org/D108355)

fhahn added a subscriber: fhahn.Oct 26 2021, 2:51 PM

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

Sure, thanks, I've already assigned this to myself and referered to llvm.org/PR52253. Actually, this PR52289 was splitted out from PR52253 by Theodoros at my request since they have different causes. I'm to fix one after another.

fhahn added a comment.Oct 27 2021, 3:48 AM

This seems to be causing a code-size regression (https://bugs.llvm.org/show_bug.cgi?id=52289). It would be great if you could take a look.

Sure, thanks, I've already assigned this to myself and referered to llvm.org/PR52253. Actually, this PR52289 was splitted out from PR52253 by Theodoros at my request since they have different causes. I'm to fix one after another.

Sounds great, thanks!