This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineInternal.h
24/25
InstCombineShifts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
lshr.ll
4/4
shift-add.ll

Differential D138814

[InstCombine] Combine lshr of add -> (a + b < a)
ClosedPublic

Authored by Pierre-vh on Nov 28 2022, 6:57 AM.

Download Raw Diff

Details

Reviewers

arsenm
lebedev.ri
spatel
craig.topper
foad

Commits

rGb3fdb7b0cba4: [InstCombine] Combine lshr of add -> (a + b < a)

Summary

Tries to perform

(lshr (add (zext X), (zext Y)), K)
->  (icmp ult (add X, Y), X)
where
  - The add's operands are zexts from a K-bits integer to a bigger type.
  - The add is only used by the shr, or by iK (or narrower) truncates.
  - The lshr type has more than 2 bits (other types are boolean math).
  - K > 1

This seems to be a pattern that just comes from OpenCL front-ends, so adding DAG/GISel combines doesn't seem to be worth the complexity.

Original patch D107552 by @abinavpp - adapted to use (a + b < a) instead of uaddo following discussion on the review.
See this issue https://github.com/RadeonOpenCompute/ROCm/issues/488

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Pierre-vh created this revision.Nov 28 2022, 6:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 6:57 AM

Herald added subscribers: StephenFan, hiraditya, Anastasia. · View Herald Transcript

Pierre-vh requested review of this revision.Nov 28 2022, 6:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 6:57 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Pierre-vh mentioned this in D138104: [AMDGPU] Precommit add_shr_carry test.Nov 28 2022, 6:57 AM

Pierre-vh mentioned this in D138106: [AMDGPU][GISel] Add lshr/add -> uaddo combine.

Pierre-vh mentioned this in D137705: [AMDGPU] Add DAG Combine for right-shift carry add to uaddo.

Pierre-vh edited the summary of this revision. (Show Details)Nov 28 2022, 7:20 AM

Pierre-vh edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B199755: Diff 478229.Nov 28 2022, 7:48 AM

Coincidence I think - I just filed a bug for this pattern:
https://github.com/llvm/llvm-project/issues/59232

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

The point about using uadd.with.overflow is that it also gives you the truncated 32 bit result of the add.

In D138814#3953729, @foad wrote:

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

The point about using uadd.with.overflow is that it also gives you the truncated 32 bit result of the add.

If we're not using that value (as shown in the first two tests), then do we still we want to canonicalize to an overflow intrinsic? That implies that we also need to transform the "not+icmp" pattern into "uadd+extract" to be consistent.

In D138814#3953742, @spatel wrote:

In D138814#3953729, @foad wrote:

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

The point about using uadd.with.overflow is that it also gives you the truncated 32 bit result of the add.

If we're not using that value (as shown in the first two tests), then do we still we want to canonicalize to an overflow intrinsic? That implies that we also need to transform the "not+icmp" pattern into "uadd+extract" to be consistent.

Oh, I see. In that case I guess the "not" form is reasonable. Hopefully value range tracking understands it.

In D138814#3953839, @foad wrote:

In D138814#3953742, @spatel wrote:

In D138814#3953729, @foad wrote:

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

The point about using uadd.with.overflow is that it also gives you the truncated 32 bit result of the add.

If we're not using that value (as shown in the first two tests), then do we still we want to canonicalize to an overflow intrinsic? That implies that we also need to transform the "not+icmp" pattern into "uadd+extract" to be consistent.

Oh, I see. In that case I guess the "not" form is reasonable. Hopefully value range tracking understands it.

Right - I don't know which form is going to get us to the ideal final output (might be different for different patterns).
In the earlier version(s) of this patch (is there a functional diff here from D107552?), there were questions about how it would interact with SCEV and/or vectorization.

If we're focusing on the overflow-only/minimal pattern, then I think we should structure the matching as a known-bits/demanded-bits problem. Ie, instead of zexts, we might have and masks. Instead of a shift at the end, that might also be a mask op.

In D138814#3953742, @spatel wrote:

In D138814#3953729, @foad wrote:

What do you think about producing the sequence with 'not' + 'icmp' instead of the uadd + extract?

The point about using uadd.with.overflow is that it also gives you the truncated 32 bit result of the add.

If we're not using that value (as shown in the first two tests), then do we still we want to canonicalize to an overflow intrinsic? That implies that we also need to transform the "not+icmp" pattern into "uadd+extract" to be consistent.

Well, some time ago we added a change that changes usubo without math usage into icmp https://reviews.llvm.org/rG926e7312b2f20f2f7b0a3d5ddbd29da5625507f3. Here this is not so simple as you need two instructions but I wander what direction do we want to go. Previously there were concerns about forming intrinsic before CodeGenPrepare.

Pierre-vh edited the summary of this revision. (Show Details)Nov 30 2022, 12:42 AM

Use known bits instead of zext

Harbormaster completed remote builds in B200197: Diff 478837.Nov 30 2022, 1:45 AM

foad added inline comments.Nov 30 2022, 1:59 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
916	This is incorrect in the i2/i1 case: https://alive2.llvm.org/ce/z/ga8CDx Using sext if the original shift was ashr will fix it.

Support i2/i1 case with SExt

Harbormaster completed remote builds in B200219: Diff 478872.Nov 30 2022, 3:45 AM

Nice! This is the proper place for it.
I'm not sure i understand why you can't just zext the overflow bit
and RAUW all the original users with that?

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
864	Is it not sufficient to just check that the width is 2x the shift amount? Do we need the power-of-two check?
880–891	This is rather unusual in InstCombine. Would it not be sufficient to just zext the result?
917–918	`mul i16 255, 2` is `i16 510`, which is non-negative, so the signbit of the original result is never set. IOW, just zero-extend.

Looks OK to me techincally but I will leave it to an InstCombine maintainer to approve.

I don't think there is any requirement for the wider type to be exactly double the narrower type.

Pierre-vh added inline comments.Nov 30 2022, 4:24 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
864	I don't think the power of 2 check is really needed - I inherited it from the original patch. I will remove it
880–891	I think the original intent was to only to do the transform when it appears beneficial, but if we're fine with doing it all the time as a canonical form then we can indeed just zext + RAUW. What do you think?
917–918	Is it normal that Alive2 reports the transformation as incorrect for i2 then (see @foad's comment above) ? Maybe we just need to SExt if it's i2 + ashr?

Pierre-vh added inline comments.Nov 30 2022, 4:44 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
880–891	Now that I think about it again, I don't think the transform would be valid if we don't look for the truncs? We would be replacing the .add - which may have <32 leading zeroes (if overflow), with a zext of the uaddo, which will always have 32 leading zeroes or more

lebedev.ri added inline comments.Nov 30 2022, 4:54 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

880–891

Ah yes, you look at truncs of the wide add.
It wouldn't be illegal to just leave it be.

I think what you want here, is:

if(!Op0.hasOneUse()) {
  for (User *Usr : Op0->users()) {
    if (Usr == &I)
      continue;

    TruncInst *Trunc = dyn_cast<TruncInst>(Usr);
    if (!Trunc || Trunc->getType()->getScalarSizeInBits() > HalfWidth)
      return nullptr;
  }
}

and then just

if(!Op0.hasOneUse()) {
   Value*WideOV = Builder.CreateZExt(Overflow);
   replaceInstUsesWith(Op0, WideOV);
}

lebedev.ri added inline comments.Nov 30 2022, 5:00 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
917–918	Ok, good point, please add that as a comment.

Comments

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
880–891	Indeed that works too, but I think you meant `CreateZExt(UAdd)` instead of the Overflow bit?

lebedev.ri added inline comments.Nov 30 2022, 5:22 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
911–912	Right, not an overflow bit. Probably adjust the name too.

Comment

lebedev.ri added inline comments.Nov 30 2022, 5:31 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
853	I think it would be nicer to hoist the match and rename the variable. Value Add = I.getOperand(0); Value X = nullptr, *Y = nullptr; if (!match(Add, m_Add(m_Value(X), m_Value(Y)))) return nullptr;

In D138814#3959942, @foad wrote:

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

In D138814#3960221, @spatel wrote:

In D138814#3959942, @foad wrote:

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

Will update the combine & add a base test diff.

Do you mean we shouldn't do the combine if the Add has only one use?

Harbormaster completed remote builds in B200247: Diff 478910.Nov 30 2022, 6:15 AM

Pierre-vh mentioned this in D139011: [InstCombine] Precommit D138814 tests.Nov 30 2022, 6:19 AM

Pierre-vh added a parent revision: D139011: [InstCombine] Precommit D138814 tests.Nov 30 2022, 6:20 AM

Rebase on D139011

Since I relaxed the rules on the combine, another test changed.
Not sure if the new conditions are correct, what do you think?

In D138814#3960260, @Pierre-vh wrote:

In D138814#3960221, @spatel wrote:

In D138814#3959942, @foad wrote:

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

Will update the combine & add a base test diff.

Do you mean we shouldn't do the combine if the Add has only one use?

I think it's the inverse - if the add has only one use, then fold to "not+icmp+zext":
https://github.com/llvm/llvm-project/issues/59232

If the add has >1 use, then I'm not sure what we want to happen. In the general form, we have something like this:
https://alive2.llvm.org/ce/z/sW5BME
...so what other pieces of the pattern need to be there to justify creating the add intrinsic? We're in target-independent InstCombine here, so we don't usually want to end up with more instructions than we started with.

In D138814#3960318, @spatel wrote:

In D138814#3960260, @Pierre-vh wrote:

In D138814#3960221, @spatel wrote:

In D138814#3959942, @foad wrote:

I don't think there is any requirement for the wider type to be exactly double the narrower type.

That's correct:
https://alive2.llvm.org/ce/z/iLVIgn

So this patch/tests are too narrow as-is. It should be checking something like "if we only demand the top N bits of an add, and the add operands are known zero in those top N bits, then fold the add into an overflow check."

Also, canonicalizing to the add intrinsic if we're not using the add part of the result seems like the wrong direction. I can't tell from the larger test what we're expecting to happen. Please pre-commit the baseline tests, so we can see the diffs.

Will update the combine & add a base test diff.

Do you mean we shouldn't do the combine if the Add has only one use?

I think it's the inverse - if the add has only one use, then fold to "not+icmp+zext":
https://github.com/llvm/llvm-project/issues/59232

If the add has >1 use, then I'm not sure what we want to happen. In the general form, we have something like this:
https://alive2.llvm.org/ce/z/sW5BME
...so what other pieces of the pattern need to be there to justify creating the add intrinsic? We're in target-independent InstCombine here, so we don't usually want to end up with more instructions than we started with.

I personally think we can create the add intrinsic if the add has more than one user, and the users are either the a/lshr or truncs (like we check now). I
In the end I'm not sure, to me it looks beneficial but I'll leave the final decision to people with more experience (cc @foad / @arsenm what do you think?)

Harbormaster completed remote builds in B200262: Diff 478932.Nov 30 2022, 7:04 AM

There's a potentially missing/difficult optimization in the larger example. It boils down to this:
https://alive2.llvm.org/ce/z/qpCq-X

Ie, should we replace a value (the trunc) with a narrow math op that produces the identical value directly? That might be good because it removes a use of the wide add and increases parallelism, but it might be bad because it creates an independent math op that could impede analysis and be more expensive than a trunc in codegen. That's the problem shown in issue #59217 - with a mul, it's pretty clear that we don't want to create more math.

Given that there's no clear answer (and no way to invert the transform that I'm aware of), the direction of this patch is ok with me.

llvm/test/Transforms/InstCombine/shift-add.ll
437–438	This is an awkward way to check if 2 bools are set. We're missing the reduction of boolean math to logic either way: https://alive2.llvm.org/ce/z/4dBQhx
648–649	For anything but the i1/i2 case, we should convert the `ashr` to `lshr` (as happened here and the next test)? So we could just bail out of the transform if the type doesn't have at least 3 bits (ignore the possibility of `ashr`).

Comments

llvm/test/Transforms/InstCombine/shift-add.ll
437–438	I removed support for types <3 bits; should I leave the test case in?
648–649	Ah I didn't know that, I'll simplify the combine to only check lshr then. Thanks!

Harbormaster completed remote builds in B200439: Diff 479179.Dec 1 2022, 12:48 AM

Thanks.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
847–848
880	Is there test coverage when the shift amt isn't the half of the original width?
902–905	alive2 proof please?

Add new tests (rebased)
Fix the combine to check for the EXACT amount of leading zeroes to ensure the transform is correct. Otherwise, if there's too little/too many leading zeroes it could mean that the shift was checking something other than the OV bit, I think.

Pierre-vh added inline comments.Dec 2 2022, 12:57 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
880	lshr_16_to_64_add_zext_basic? I also added a couple of test cases where the shift amount is lower/higher than what's desired.
902–905	At this stage, the add is only used by either ShAmt-sized truncs, or the shift. We're removing the shift, and for the truncs, they will cancel out. https://alive2.llvm.org/ce/z/TWQp29

Harbormaster completed remote builds in B200709: Diff 479537.Dec 2 2022, 3:51 AM

lebedev.ri added inline comments.Dec 2 2022, 6:02 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
869–871	What we want to know here, is whether both `X` and `Y` can be `NUW`-truncated to `ShAmt`-wide types. This is award because we are tiptoeing around adding new QoL helper functions. We already have this: (ValueTracking.h/cpp) /// Get the upper bound on bit size for this Value \p Op as a signed integer. /// i.e. x == sext(trunc(x to MaxSignificantBits) to bitwidth(x)). /// Similar to the APInt::getSignificantBits function. unsigned ComputeMaxSignificantBits(const Value Op, const DataLayout &DL, unsigned Depth = 0, AssumptionCache AC = nullptr, const Instruction CxtI = nullptr, const DominatorTree DT = nullptr); but that is for sign bits, while we want zero bits. Can you just add an unsigned variant of it next to it, and use it here?
902–905	Right, i'm being slow. This is correct.
908	I think we should just emit it all next to the original `add`, there isn't much point in sinking the final zext.

Comments

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Harbormaster completed remote builds in B201315: Diff 480372.Dec 6 2022, 6:49 AM

spatel mentioned this in rG924233c784fa: [InstCombine] add tests for bool math; NFC.Dec 6 2022, 10:43 AM

spatel mentioned this in rG71df24dd3917: [InstCombine] fold add-carry of bools to logic.

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Interesting, not sure what other reviewers think?
Maybe adding a TII hook so targets can enable/disable the combine is a good idea? e.g. something like allowUAddoCanonicalForm?

In D138814#3980701, @Pierre-vh wrote:

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Interesting, not sure what other reviewers think?
Maybe adding a TII hook so targets can enable/disable the combine is a good idea? e.g. something like allowUAddoCanonicalForm?

We don't want TTI/TLI hooks in InstCombine because it's supposed to be early/target-independent transforms only (although it has folds gated by the data-layout that are almost the same thing as using TTI).

We decided that there is a legitimate need for target-dependent canonicalization though, so that's now possible in AggressiveInstCombine. So moving this patch to that pass and gating the transform on a target hook seems like a non-controversial way forward.

Currently, CodeGenPrepare uses this hook:
https://github.com/llvm/llvm-project/blob/1fe65d866c5285261b7766f2d3930ae975b878ff/llvm/include/llvm/CodeGen/TargetLowering.h#L3086

If we're using that as an early predicate, then I think it should be moved to TargetTransformInfo. I don't see any uses outside of CGP currently.

In D138814#3981276, @spatel wrote:

In D138814#3980701, @Pierre-vh wrote:

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Interesting, not sure what other reviewers think?
Maybe adding a TII hook so targets can enable/disable the combine is a good idea? e.g. something like allowUAddoCanonicalForm?

We don't want TTI/TLI hooks in InstCombine because it's supposed to be early/target-independent transforms only (although it has folds gated by the data-layout that are almost the same thing as using TTI).

We decided that there is a legitimate need for target-dependent canonicalization though, so that's now possible in AggressiveInstCombine. So moving this patch to that pass and gating the transform on a target hook seems like a non-controversial way forward.

Currently, CodeGenPrepare uses this hook:
https://github.com/llvm/llvm-project/blob/1fe65d866c5285261b7766f2d3930ae975b878ff/llvm/include/llvm/CodeGen/TargetLowering.h#L3086

If we're using that as an early predicate, then I think it should be moved to TargetTransformInfo. I don't see any uses outside of CGP currently.

I'm more inclined to do this in CGP than AggressiveInstCombine if the shift is the preferred canonical form

In D138814#3981278, @arsenm wrote:

In D138814#3981276, @spatel wrote:

In D138814#3980701, @Pierre-vh wrote:

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Interesting, not sure what other reviewers think?
Maybe adding a TII hook so targets can enable/disable the combine is a good idea? e.g. something like allowUAddoCanonicalForm?

We don't want TTI/TLI hooks in InstCombine because it's supposed to be early/target-independent transforms only (although it has folds gated by the data-layout that are almost the same thing as using TTI).

We decided that there is a legitimate need for target-dependent canonicalization though, so that's now possible in AggressiveInstCombine. So moving this patch to that pass and gating the transform on a target hook seems like a non-controversial way forward.

Currently, CodeGenPrepare uses this hook:
https://github.com/llvm/llvm-project/blob/1fe65d866c5285261b7766f2d3930ae975b878ff/llvm/include/llvm/CodeGen/TargetLowering.h#L3086

If we're using that as an early predicate, then I think it should be moved to TargetTransformInfo. I don't see any uses outside of CGP currently.

I'm more inclined to do this in CGP than AggressiveInstCombine if the shift is the preferred canonical form

Rather, if we can produce the add and compare as a more canonical form and match that in the backend, that would be better

In D138814#3980701, @Pierre-vh wrote:

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

Interesting, not sure what other reviewers think?
Maybe adding a TII hook so targets can enable/disable the combine is a good idea? e.g. something like allowUAddoCanonicalForm?

I don't think there's any need for target dependence here. You just need to produce a + b < a instead of extract(uaddo(a, b), 1). The uaddo will be formed by the backend.

In D138814#3981279, @arsenm wrote:

Rather, if we can produce the add and compare as a more canonical form and match that in the backend, that would be better

Agreed - if we're just creating add+icmp rather than the intrinsic, then that seems fine to do here.

By a + b < a do you mean that the combine would:

Still reduce the add to the smaller type
Replace the overflow bit (lshr) with icmp lt (add a, b), a?

Does the backend already transform that to uaddo or will that need a separate patch in the target's CGP?

In D138814#3981324, @Pierre-vh wrote:

By a + b < a do you mean that the combine would:

Still reduce the add to the smaller type

Replace the overflow bit (lshr) with icmp lt (add a, b), a?

Does the backend already transform that to uaddo or will that need a separate patch in the target's CGP?

Looks like no: https://godbolt.org/z/nTrzqxeo9

In D138814#3981336, @arsenm wrote:

In D138814#3981324, @Pierre-vh wrote:

By a + b < a do you mean that the combine would:

Still reduce the add to the smaller type

Replace the overflow bit (lshr) with icmp lt (add a, b), a?

Does the backend already transform that to uaddo or will that need a separate patch in the target's CGP?

Looks like no: https://godbolt.org/z/nTrzqxeo9

You are using slt rather than ult.

Change to (a + b < a); don't combine if ShAmt == 1.

Seems like CGP does the conversion to uaddo; or at least in AMDGPU's case codegen is the same for uaddo/(a + b < a): https://godbolt.org/z/zfnorbvzr

We seem to have more bitwise operations now, is it expected to see the add folded into other operations? (xor/and)
Should I restrict the combine to ShAmt > 2?

Pierre-vh retitled this revision from [InstCombine] Combine a/lshr of add -> uadd.with.overflow to [InstCombine] Combine a/lshr of add -> (a + b < a).Dec 9 2022, 2:22 AM

Pierre-vh edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B202183: Diff 481580.Dec 9 2022, 2:22 AM

Rebase + ping

Harbormaster completed remote builds in B205620: Diff 486186.Jan 4 2023, 12:37 AM

nikic added inline comments.Jan 4 2023, 2:19 AM

llvm/test/Transforms/InstCombine/pr34349.ll
17 ↗	(On Diff #486186)	This case doesn't look like a profitable transform. Do I understand correctly that the actual motivating case here is the case where the inputs are `zext`, and then this was later generalized based on reviewer feedback to use known bits instead? For the zext case, this looks like an obviously desirable transform, but for the general case (where the truncs may not fold away) this is less clearly beneficial. I would personally restrict this to just zext unless we have specific motivation otherwise. (But I'm also not going to fight this if reviewers disagree.)

Pierre-vh added inline comments.Jan 5 2023, 2:34 AM

llvm/test/Transforms/InstCombine/pr34349.ll
17 ↗	(On Diff #486186)	I indeed did it using known bits after a comment from @spatel: In the earlier version(s) of this patch (is there a functional diff here from D107552?), there were questions about how it would interact with SCEV and/or vectorization. If we're focusing on the overflow-only/minimal pattern, then I think we should structure the matching as a known-bits/demanded-bits problem. Ie, instead of zexts, we might have `and` masks. Instead of a shift at the end, that might also be a mask op. For me it's fine to do it with just zext but then I'm afraid we'd miss some profitable cases

spatel added inline comments.Jan 5 2023, 7:25 AM

llvm/test/Transforms/InstCombine/pr34349.ll
17 ↗	(On Diff #486186)	If we can get all of the motivating cases by matching zext directly, we can do that as a first step to reduce risk. Then, we can do the more general transform if needed as a second step. IIUC, this means we could split off the ValueTracking part of the patch to an independent patch (add unit tests if there are no callers currently). Also, please commit the baseline tests to main as a preliminary patch (D139011), so we can see current diffs (the bool math test diffs should be eliminated after 71df24dd39177ecfc440a0 ?)

Rebased tests for proper diff

Before I remove the known bits part and replace it with zext matching I would like @arsenm/@foad to give their opinions too

In D138814#4031097, @Pierre-vh wrote:

Rebased tests for proper diff

Before I remove the known bits part and replace it with zext matching I would like @arsenm/@foad to give their opinions too

I think splitting the known bits part as a second step makes sense. The general case multiplies your potential bug surface so I think it's usually better to do this in 2 steps

Pierre-vh retitled this revision from [InstCombine] Combine a/lshr of add -> (a + b < a) to [InstCombine] Combine lshr of add -> (a + b < a).Jan 6 2023, 5:28 AM

Pierre-vh edited the summary of this revision. (Show Details)

Remove knownbits, use zext matching

Pierre-vh added a child revision: D141129: [InstCombine] Use KnownBits for lshr/add -> (a + b < a).Jan 6 2023, 5:42 AM

Harbormaster completed remote builds in B206092: Diff 486836.Jan 6 2023, 6:32 AM

spatel added inline comments.Jan 6 2023, 8:42 AM

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
866	We should put m_OneUse() limits on the m_ZExt matches. Also, add a test where the zexts have extra use(s). That should avoid the +2 instruction count regression in the next patch. We are still potentially increasing instruction count with this transform, but the trade-off seems more reasonable if we can eliminate more of the intermediate instructions.

Add m_OneUse to m_ZExt in match

Harbormaster completed remote builds in B206486: Diff 487358.Jan 9 2023, 4:39 AM

LGTM

This revision is now accepted and ready to land.Jan 9 2023, 7:58 AM

Pierre-vh mentioned this in rG561471b1b84f: [InstCombine] Precommit D138814 tests.Jan 10 2023, 12:28 AM

Closed by commit rGb3fdb7b0cba4: [InstCombine] Combine lshr of add -> (a + b < a) (authored by Pierre-vh). · Explain WhyJan 10 2023, 12:37 AM

This revision was automatically updated to reflect the committed changes.

Pierre-vh added a commit: rGb3fdb7b0cba4: [InstCombine] Combine lshr of add -> (a + b < a).

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

What's the historical stance on sadd.with.overflow?

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
848	Likewise I don't think you need this line and the "boolean math" comment belongs on the "K > 1" line.
860	I don't think you need this check since you check below that K != 1 and it matches the width of X and Y, and the shift type must be strictly wider than that. But I guess it's harmless.

In D138814#4039101, @foad wrote:

In D138814#3973599, @nikic wrote:

FWIW our historical stance has always been that uadd.with.overflow is non-canonical, and the canonical pattern is a + b < a (for non-constant b). uadd.with.overflow generally has worse optimization support, which is why we only form it during CGP for backend purposes.

What's the historical stance on sadd.with.overflow?

sadd.with.overflow (and generally, all signed overflow intrinsics) are considered canonical, and we do produce them in InstCombine.

Pierre-vh marked 2 inline comments as done.Jan 11 2023, 3:53 AM

Pierre-vh added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
848	Will address in D141129

abinavpp mentioned this in D106139: [DAGCombine] Combine srX of add that intends to get the carry as uaddo.Jan 14 2023, 2:34 AM

abinavpp mentioned this in D107552: [InstCombine] Combine lshr of add that intends to get the carry as llvm.uadd.with.overflow.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineInternal.h

1 line

InstCombineShifts.cpp

71 lines

test/

Transforms/

InstCombine/

lshr.ll

7 lines

shift-add.ll

95 lines

Diff 487714

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines	private:

Value foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,		Value foldAndOrOfICmpsOfAndWithPow2(ICmpInst LHS, ICmpInst *RHS,
Instruction *CxtI, bool IsAnd,		Instruction *CxtI, bool IsAnd,
bool IsLogical = false);		bool IsLogical = false);
Value matchSelectFromAndOr(Value A, Value B, Value C, Value *D,		Value matchSelectFromAndOr(Value A, Value B, Value C, Value *D,
bool InvertFalseVal = false);		bool InvertFalseVal = false);
Value getSelectCondition(Value A, Value *B, bool ABIsTheSame);		Value getSelectCondition(Value A, Value *B, bool ABIsTheSame);

		Instruction *foldLShrOverflowBit(BinaryOperator &I);
Instruction *foldExtractOfOverflowIntrinsic(ExtractValueInst &EV);		Instruction *foldExtractOfOverflowIntrinsic(ExtractValueInst &EV);
Instruction foldIntrinsicWithOverflowCommon(IntrinsicInst II);		Instruction foldIntrinsicWithOverflowCommon(IntrinsicInst II);
Instruction *foldFPSignBitOps(BinaryOperator &I);		Instruction *foldFPSignBitOps(BinaryOperator &I);
Instruction *foldFDivConstantDivisor(BinaryOperator &I);		Instruction *foldFDivConstantDivisor(BinaryOperator &I);

// Optimize one of these forms:		// Optimize one of these forms:
// and i1 Op, SI / select i1 Op, i1 SI, i1 false (if IsAnd = true)		// and i1 Op, SI / select i1 Op, i1 SI, i1 false (if IsAnd = true)
// or i1 Op, SI / select i1 Op, i1 true, i1 SI (if IsAnd = false)		// or i1 Op, SI / select i1 Op, i1 true, i1 SI (if IsAnd = false)
▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines if (!isa<Constant>(TrueVal) && FBO->getOperand(0) == TrueVal &&

Value *NewOp = Builder.CreateBinOp(FBO->getOpcode(), NewShift, NewRHS); Value *NewOp = Builder.CreateBinOp(FBO->getOpcode(), NewShift, NewRHS);

return SelectInst::Create(Cond, NewShift, NewOp); return SelectInst::Create(Cond, NewShift, NewOp);

} }

return nullptr; return nullptr;

} }

// Tries to perform

// (lshr (add (zext X), (zext Y)), K)

// -> (icmp ult (add X, Y), X)

// where

// - The add's operands are zexts from a K-bits integer to a bigger type.

// - The add is only used by the shr, or by iK (or narrower) truncates.

// - The lshr type has more than 2 bits (other types are boolean math).

lebedev.riUnsubmitted

Done

// - Only the K leading bits of X and Y can be non-zero.

- // - The add is only used by the shr, or by iK truncates.

- // - The lshr type has more than 2 bits.

+ // - The add is only used by the shr, or by iK (or narrower) truncates.

+ // - The lshr type has more than 2 bits. (other types are boolean math)

Instruction *InstCombinerImpl::foldLShrToOverflow(BinaryOperator &I) {

lebedev.ri:

foadUnsubmitted

Not Done

Likewise I don't think you need this line and the "boolean math" comment belongs on the "K > 1" line.

foad: Likewise I don't think you need this line and the "boolean math" comment belongs on the "K > 1"…

Pierre-vhAuthorUnsubmitted

Done

Will address in D141129

Pierre-vh: Will address in D141129

// - K > 1

// note that

// - The resulting add cannot have nuw/nsw, else on overflow we get a

// poison value and the transform isn't legal anymore.

Instruction *InstCombinerImpl::foldLShrOverflowBit(BinaryOperator &I) {

lebedev.riUnsubmitted

Done

I think it would be nicer to hoist the match and rename the variable.

Value *Add = I.getOperand(0);
Value *X = nullptr, *Y = nullptr;
if (!match(Add, m_Add(m_Value(X), m_Value(Y))))
  return nullptr;

lebedev.ri: I think it would be nicer to hoist the match and rename the variable. ``` Value *Add = I.

assert(I.getOpcode() == Instruction::LShr);

Value *Add = I.getOperand(0);

Value *ShiftAmt = I.getOperand(1);

Type *Ty = I.getType();

if (Ty->getScalarSizeInBits() < 3)

foadUnsubmitted

Done

I don't think you need this check since you check below that K != 1 and it matches the width of X and Y, and the shift type must be strictly wider than that. But I guess it's harmless.

foad: I don't think you need this check since you check below that K != 1 and it matches the width of…

return nullptr;

const APInt *ShAmtAPInt = nullptr;

Value *X = nullptr, *Y = nullptr;

lebedev.riUnsubmitted

Done

Is it not sufficient to just check that the width is 2x the shift amount?
Do we need the power-of-two check?

lebedev.ri: Is it not sufficient to just check that the width is 2x the shift amount? Do we need the power…

Pierre-vhAuthorUnsubmitted

Done

I don't think the power of 2 check is really needed - I inherited it from the original patch. I will remove it

Pierre-vh: I don't think the power of 2 check is really needed - I inherited it from the original patch. I…

if (!match(ShiftAmt, m_APInt(ShAmtAPInt)) ||

!match(Add,

spatelUnsubmitted

Done

We should put m_OneUse() limits on the m_ZExt matches. Also, add a test where the zexts have extra use(s).

That should avoid the +2 instruction count regression in the next patch. We are still potentially increasing instruction count with this transform, but the trade-off seems more reasonable if we can eliminate more of the intermediate instructions.

spatel: We should put m_OneUse() limits on the m_ZExt matches. Also, add a test where the zexts have…

m_Add(m_OneUse(m_ZExt(m_Value(X))), m_OneUse(m_ZExt(m_Value(Y))))))

return nullptr;

const unsigned ShAmt = ShAmtAPInt->getZExtValue();

if (ShAmt == 1)

lebedev.riUnsubmitted

Done

What we want to know here, is whether both X and Y can be NUW-truncated to ShAmt-wide types.
This is award because we are tiptoeing around adding new QoL helper functions.
We already have this: (ValueTracking.h/cpp)

/// Get the upper bound on bit size for this Value \p Op as a signed integer.
/// i.e.  x == sext(trunc(x to MaxSignificantBits) to bitwidth(x)).
/// Similar to the APInt::getSignificantBits function.
unsigned ComputeMaxSignificantBits(const Value *Op, const DataLayout &DL,
                                   unsigned Depth = 0,
                                   AssumptionCache *AC = nullptr,
                                   const Instruction *CxtI = nullptr,
                                   const DominatorTree *DT = nullptr);

but that is for sign bits, while we want zero bits.
Can you just add an unsigned variant of it next to it, and use it here?

lebedev.ri: What we want to know here, is whether both `X` and `Y` can be `NUW`-truncated to `ShAmt`-wide…

return nullptr;

// X/Y are zexts from `ShAmt`-sized ints.

if (X->getType()->getScalarSizeInBits() != ShAmt ||

Y->getType()->getScalarSizeInBits() != ShAmt)

return nullptr;

// Make sure that `Add` is only used by `I` and `ShAmt`-truncates.

if (!Add->hasOneUse()) {

lebedev.riUnsubmitted

Done

Is there test coverage when the shift amt isn't the half of the original width?

lebedev.ri: Is there test coverage when the shift amt isn't the half of the original width?

Pierre-vhAuthorUnsubmitted

Done

lshr_16_to_64_add_zext_basic?
I also added a couple of test cases where the shift amount is lower/higher than what's desired.

Pierre-vh: lshr_16_to_64_add_zext_basic? I also added a couple of test cases where the shift amount is…

for (User *U : Add->users()) {

if (U == &I)

continue;

TruncInst *Trunc = dyn_cast<TruncInst>(U);

if (!Trunc || Trunc->getType()->getScalarSizeInBits() > ShAmt)

return nullptr;

}

// Insert at Add so that the newly created `NarrowAdd` will dominate it's

lebedev.riUnsubmitted

Done

This is rather unusual in InstCombine.
Would it not be sufficient to just zext the result?

lebedev.ri: This is rather unusual in InstCombine. Would it not be sufficient to just zext the result?

Pierre-vhAuthorUnsubmitted

Done

I think the original intent was to only to do the transform when it appears beneficial, but if we're fine with doing it all the time as a canonical form then we can indeed just zext + RAUW. What do you think?

Pierre-vh: I think the original intent was to only to do the transform when it appears beneficial, but if…

Pierre-vhAuthorUnsubmitted

Done

Now that I think about it again, I don't think the transform would be valid if we don't look for the truncs?
We would be replacing the .add - which may have <32 leading zeroes (if overflow), with a zext of the uaddo, which will always have 32 leading zeroes or more

Pierre-vh: Now that I think about it again, I don't think the transform would be valid if we don't look…

lebedev.riUnsubmitted

Done

Ah yes, you look at truncs of the wide add.
It wouldn't be illegal to just leave it be.

I think what you want here, is:

if(!Op0.hasOneUse()) {
  for (User *Usr : Op0->users()) {
    if (Usr == &I)
      continue;

    TruncInst *Trunc = dyn_cast<TruncInst>(Usr);
    if (!Trunc || Trunc->getType()->getScalarSizeInBits() > HalfWidth)
      return nullptr;
  }
}

and then just

if(!Op0.hasOneUse()) {
   Value*WideOV = Builder.CreateZExt(Overflow);
   replaceInstUsesWith(Op0, WideOV);
}

lebedev.ri: Ah yes, you look at truncs of the wide add. It wouldn't be illegal to just leave it be. I…

Pierre-vhAuthorUnsubmitted

Done

Indeed that works too, but I think you meant CreateZExt(UAdd) instead of the Overflow bit?

Pierre-vh: Indeed that works too, but I think you meant `CreateZExt(UAdd)` instead of the Overflow bit?

// users (i.e. `Add`'s users).

Instruction *AddInst = cast<Instruction>(Add);

Builder.SetInsertPoint(AddInst);

Value *NarrowAdd = Builder.CreateAdd(X, Y, "add.narrowed");

Value *Overflow =

Builder.CreateICmpULT(NarrowAdd, X, "add.narrowed.overflow");

// Replace the uses of the original add with a zext of the

// NarrowAdd's result. Note that all users at this stage are known to

// be ShAmt-sized truncs, or the lshr itself.

if (!Add->hasOneUse())

replaceInstUsesWith(*AddInst, Builder.CreateZExt(NarrowAdd, Ty));

lebedev.riUnsubmitted

Done

alive2 proof please?

lebedev.ri: alive2 proof please?

Pierre-vhAuthorUnsubmitted

Done

At this stage, the add is only used by either ShAmt-sized truncs, or the shift.
We're removing the shift, and for the truncs, they will cancel out.
https://alive2.llvm.org/ce/z/TWQp29

Pierre-vh: At this stage, the add is only used by either ShAmt-sized truncs, or the shift. We're removing…

lebedev.riUnsubmitted

Not Done

Right, i'm being slow. This is correct.

lebedev.ri: Right, i'm being slow. This is correct.

// Replace the LShr with a zext of the overflow check.

return new ZExtInst(Overflow, Ty);

}

lebedev.riUnsubmitted

Done

I think we should just emit it all next to the original add,
there isn't much point in sinking the final zext.

lebedev.ri: I think we should just emit it all next to the original `add`, there isn't much point in…

Instruction *InstCombinerImpl::visitShl(BinaryOperator &I) { Instruction *InstCombinerImpl::visitShl(BinaryOperator &I) {

const SimplifyQuery Q = SQ.getWithInstruction(&I); const SimplifyQuery Q = SQ.getWithInstruction(&I);

lebedev.riUnsubmitted

Done

Right, not an overflow bit. Probably adjust the name too.

lebedev.ri: Right, not an overflow bit. Probably adjust the name too.

if (Value *V = simplifyShlInst(I.getOperand(0), I.getOperand(1), if (Value *V = simplifyShlInst(I.getOperand(0), I.getOperand(1),

I.hasNoSignedWrap(), I.hasNoUnsignedWrap(), Q)) I.hasNoSignedWrap(), I.hasNoUnsignedWrap(), Q))

return replaceInstUsesWith(I, V); return replaceInstUsesWith(I, V);

foadUnsubmitted

Done

This is incorrect in the i2/i1 case: https://alive2.llvm.org/ce/z/ga8CDx

Using sext if the original shift was ashr will fix it.

foad: This is incorrect in the i2/i1 case: https://alive2.llvm.org/ce/z/ga8CDx Using sext if the…

if (Instruction *X = foldVectorBinop(I)) if (Instruction *X = foldVectorBinop(I))

return X; return X;

lebedev.riUnsubmitted

Done

mul i16 255, 2 is i16 510, which is non-negative,
so the signbit of the original result is never set.
IOW, just zero-extend.

lebedev.ri: `mul i16 255, 2` is `i16 510`, which is non-negative, so the signbit of the original result is…

Pierre-vhAuthorUnsubmitted

Done

Is it normal that Alive2 reports the transformation as incorrect for i2 then (see @foad's comment above) ?
Maybe we just need to SExt if it's i2 + ashr?

Pierre-vh: Is it normal that Alive2 reports the transformation as incorrect for i2 then (see @foad's…

lebedev.riUnsubmitted

Done

Ok, good point, please add that as a comment.

lebedev.ri: Ok, good point, please add that as a comment.

if (Instruction *V = commonShiftTransforms(I)) if (Instruction *V = commonShiftTransforms(I))

return V; return V;

if (Instruction *V = dropRedundantMaskingOfLeftShiftInput(&I, Q, Builder)) if (Instruction *V = dropRedundantMaskingOfLeftShiftInput(&I, Q, Builder))

return V; return V;

Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1); Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1);

▲ Show 20 Lines • Show All 469 Lines • ▼ Show 20 Lines Instruction *InstCombinerImpl::visitLShr(BinaryOperator &I) {

// Transform (x << y) >> y to x & (-1 >> y) // Transform (x << y) >> y to x & (-1 >> y)

if (match(Op0, m_OneUse(m_Shl(m_Value(X), m_Specific(Op1))))) { if (match(Op0, m_OneUse(m_Shl(m_Value(X), m_Specific(Op1))))) {

Constant *AllOnes = ConstantInt::getAllOnesValue(Ty); Constant *AllOnes = ConstantInt::getAllOnesValue(Ty);

Value *Mask = Builder.CreateLShr(AllOnes, Op1); Value *Mask = Builder.CreateLShr(AllOnes, Op1);

return BinaryOperator::CreateAnd(Mask, X); return BinaryOperator::CreateAnd(Mask, X);

} }

if (Instruction *Overflow = foldLShrOverflowBit(I))

return Overflow;

return nullptr; return nullptr;

} }

Instruction * Instruction *

InstCombinerImpl::foldVariableSignZeroExtensionOfVariableHighBitExtract( InstCombinerImpl::foldVariableSignZeroExtensionOfVariableHighBitExtract(

BinaryOperator &OldAShr) { BinaryOperator &OldAShr) {

assert(OldAShr.getOpcode() == Instruction::AShr && assert(OldAShr.getOpcode() == Instruction::AShr &&

"Must be called with arithmetic right-shift instruction only."); "Must be called with arithmetic right-shift instruction only.");

▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/lshr.ll

Show First 20 Lines • Show All 1,037 Lines • ▼ Show 20 Lines	;
%lshr = lshr i2 %add, 1		%lshr = lshr i2 %add, 1
ret i2 %lshr		ret i2 %lshr
}		}

; negative test - need bools		; negative test - need bools

define i4 @not_bool_add_lshr(i2 %a, i2 %b) {		define i4 @not_bool_add_lshr(i2 %a, i2 %b) {
; CHECK-LABEL: @not_bool_add_lshr(		; CHECK-LABEL: @not_bool_add_lshr(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i2 [[A:%.]] to i4		; CHECK-NEXT: [[TMP1:%.]] = xor i2 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i2 [[B:%.]] to i4		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i2 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i4 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i4
; CHECK-NEXT: [[LSHR:%.*]] = lshr i4 [[ADD]], 2
; CHECK-NEXT: ret i4 [[LSHR]]		; CHECK-NEXT: ret i4 [[LSHR]]
;		;
%zext.a = zext i2 %a to i4		%zext.a = zext i2 %a to i4
%zext.b = zext i2 %b to i4		%zext.b = zext i2 %b to i4
%add = add i4 %zext.a, %zext.b		%add = add i4 %zext.a, %zext.b
%lshr = lshr i4 %add, 2		%lshr = lshr i4 %add, 2
ret i4 %lshr		ret i4 %lshr
}		}
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shift-add.ll

Show First 20 Lines • Show All 428 Lines • ▼ Show 20 Lines
;		;
%a = add i4 %x, 8		%a = add i4 %x, 8
%r = shl nsw i4 2, %a		%r = shl nsw i4 2, %a
ret i4 %r		ret i4 %r
}		}

define i2 @lshr_2_add_zext_basic(i1 %a, i1 %b) {		define i2 @lshr_2_add_zext_basic(i1 %a, i1 %b) {
; CHECK-LABEL: @lshr_2_add_zext_basic(		; CHECK-LABEL: @lshr_2_add_zext_basic(
; CHECK-NEXT: [[TMP1:%.]] = and i1 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = and i1 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[TMP1]] to i2		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[TMP1]] to i2
		spatelUnsubmitted Done Reply Inline Actions This is an awkward way to check if 2 bools are set. We're missing the reduction of boolean math to logic either way: https://alive2.llvm.org/ce/z/4dBQhx spatel: This is an awkward way to check if 2 bools are set. We're missing the reduction of boolean math…
		Pierre-vhAuthorUnsubmitted Done Reply Inline Actions I removed support for types <3 bits; should I leave the test case in? Pierre-vh: I removed support for types <3 bits; should I leave the test case in?
; CHECK-NEXT: ret i2 [[LSHR]]		; CHECK-NEXT: ret i2 [[LSHR]]
;		;
%zext.a = zext i1 %a to i2		%zext.a = zext i1 %a to i2
%zext.b = zext i1 %b to i2		%zext.b = zext i1 %b to i2
%add = add i2 %zext.a, %zext.b		%add = add i2 %zext.a, %zext.b
%lshr = lshr i2 %add, 1		%lshr = lshr i2 %add, 1
ret i2 %lshr		ret i2 %lshr
}		}
Show All 10 Lines	;
%zext.b = zext i1 %b to i2		%zext.b = zext i1 %b to i2
%add = add i2 %zext.a, %zext.b		%add = add i2 %zext.a, %zext.b
%lshr = ashr i2 %add, 1		%lshr = ashr i2 %add, 1
ret i2 %lshr		ret i2 %lshr
}		}

define i32 @lshr_16_add_zext_basic(i16 %a, i16 %b) {		define i32 @lshr_16_add_zext_basic(i16 %a, i16 %b) {
; CHECK-LABEL: @lshr_16_add_zext_basic(		; CHECK-LABEL: @lshr_16_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i16 [[A:%.]] to i32		; CHECK-NEXT: [[TMP1:%.]] = xor i16 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i16 [[B:%.]] to i32		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i16 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i32
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ADD]], 16
; CHECK-NEXT: ret i32 [[LSHR]]		; CHECK-NEXT: ret i32 [[LSHR]]
;		;
%zext.a = zext i16 %a to i32		%zext.a = zext i16 %a to i32
%zext.b = zext i16 %b to i32		%zext.b = zext i16 %b to i32
%add = add i32 %zext.a, %zext.b		%add = add i32 %zext.a, %zext.b
%lshr = lshr i32 %add, 16		%lshr = lshr i32 %add, 16
ret i32 %lshr		ret i32 %lshr
}		}
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%b16 = and i32 %b, 65535 ; 0x65535		%b16 = and i32 %b, 65535 ; 0x65535
%add = add i32 %a16, %b16		%add = add i32 %a16, %b16
%lshr = lshr i32 %add, 16		%lshr = lshr i32 %add, 16
ret i32 %lshr		ret i32 %lshr
}		}

define i64 @lshr_32_add_zext_basic(i32 %a, i32 %b) {		define i64 @lshr_32_add_zext_basic(i32 %a, i32 %b) {
; CHECK-LABEL: @lshr_32_add_zext_basic(		; CHECK-LABEL: @lshr_32_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = xor i32 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i32 [[B:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i32 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i64
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: ret i64 [[LSHR]]		; CHECK-NEXT: ret i64 [[LSHR]]
;		;
%zext.a = zext i32 %a to i64		%zext.a = zext i32 %a to i64
%zext.b = zext i32 %b to i64		%zext.b = zext i32 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%lshr = lshr i64 %add, 32		%lshr = lshr i64 %add, 32
ret i64 %lshr		ret i64 %lshr
}		}
Show All 38 Lines	;
%zext.b = zext i32 %b to i64		%zext.b = zext i32 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%lshr = lshr i64 %add, 33		%lshr = lshr i64 %add, 33
ret i64 %lshr		ret i64 %lshr
}		}

define i64 @lshr_16_to_64_add_zext_basic(i16 %a, i16 %b) {		define i64 @lshr_16_to_64_add_zext_basic(i16 %a, i16 %b) {
; CHECK-LABEL: @lshr_16_to_64_add_zext_basic(		; CHECK-LABEL: @lshr_16_to_64_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i16 [[A:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = xor i16 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i16 [[B:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i16 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i64
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[ADD]], 16
; CHECK-NEXT: ret i64 [[LSHR]]		; CHECK-NEXT: ret i64 [[LSHR]]
;		;
%zext.a = zext i16 %a to i64		%zext.a = zext i16 %a to i64
%zext.b = zext i16 %b to i64		%zext.b = zext i16 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%lshr = lshr i64 %add, 16		%lshr = lshr i64 %add, 16
ret i64 %lshr		ret i64 %lshr
}		}
Show All 26 Lines	;
%b32 = and i64 %b, 4294967295 ; 0xFFFFFFFF		%b32 = and i64 %b, 4294967295 ; 0xFFFFFFFF
%add = add i64 %a32, %b32		%add = add i64 %a32, %b32
%lshr = lshr i64 %add, 32		%lshr = lshr i64 %add, 32
ret i64 %lshr		ret i64 %lshr
}		}

define i32 @ashr_16_add_zext_basic(i16 %a, i16 %b) {		define i32 @ashr_16_add_zext_basic(i16 %a, i16 %b) {
; CHECK-LABEL: @ashr_16_add_zext_basic(		; CHECK-LABEL: @ashr_16_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i16 [[A:%.]] to i32		; CHECK-NEXT: [[TMP1:%.]] = xor i16 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i16 [[B:%.]] to i32		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i16 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i32
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[ADD]], 16
; CHECK-NEXT: ret i32 [[LSHR]]		; CHECK-NEXT: ret i32 [[LSHR]]
;		;
%zext.a = zext i16 %a to i32		%zext.a = zext i16 %a to i32
%zext.b = zext i16 %b to i32		%zext.b = zext i16 %b to i32
%add = add i32 %zext.a, %zext.b		%add = add i32 %zext.a, %zext.b
%lshr = lshr i32 %add, 16		%lshr = lshr i32 %add, 16
ret i32 %lshr		ret i32 %lshr
}		}

define i64 @ashr_32_add_zext_basic(i32 %a, i32 %b) {		define i64 @ashr_32_add_zext_basic(i32 %a, i32 %b) {
; CHECK-LABEL: @ashr_32_add_zext_basic(		; CHECK-LABEL: @ashr_32_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = xor i32 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i32 [[B:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i32 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i64
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[ADD]], 32
spatelUnsubmitted Done Reply Inline Actions For anything but the i1/i2 case, we should convert the `ashr` to `lshr` (as happened here and the next test)? So we could just bail out of the transform if the type doesn't have at least 3 bits (ignore the possibility of `ashr`). spatel: For anything but the i1/i2 case, we should convert the `ashr` to `lshr` (as happened here and…
Pierre-vhAuthorUnsubmitted Done Reply Inline Actions Ah I didn't know that, I'll simplify the combine to only check lshr then. Thanks! Pierre-vh: Ah I didn't know that, I'll simplify the combine to only check lshr then. Thanks!
; CHECK-NEXT: ret i64 [[LSHR]]		; CHECK-NEXT: ret i64 [[LSHR]]
;		;
%zext.a = zext i32 %a to i64		%zext.a = zext i32 %a to i64
%zext.b = zext i32 %b to i64		%zext.b = zext i32 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%lshr = ashr i64 %add, 32		%lshr = ashr i64 %add, 32
ret i64 %lshr		ret i64 %lshr
}		}

define i64 @ashr_16_to_64_add_zext_basic(i16 %a, i16 %b) {		define i64 @ashr_16_to_64_add_zext_basic(i16 %a, i16 %b) {
; CHECK-LABEL: @ashr_16_to_64_add_zext_basic(		; CHECK-LABEL: @ashr_16_to_64_add_zext_basic(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i16 [[A:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = xor i16 [[A:%.]], -1
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i16 [[B:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.]] = icmp ult i16 [[TMP1]], [[B:%.]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[LSHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i64
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[ADD]], 16
; CHECK-NEXT: ret i64 [[LSHR]]		; CHECK-NEXT: ret i64 [[LSHR]]
;		;
%zext.a = zext i16 %a to i64		%zext.a = zext i16 %a to i64
%zext.b = zext i16 %b to i64		%zext.b = zext i16 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%lshr = ashr i64 %add, 16		%lshr = ashr i64 %add, 16
ret i64 %lshr		ret i64 %lshr
}		}

define i32 @lshr_32_add_zext_trunc(i32 %a, i32 %b) {		define i32 @lshr_32_add_zext_trunc(i32 %a, i32 %b) {
; CHECK-LABEL: @lshr_32_add_zext_trunc(		; CHECK-LABEL: @lshr_32_add_zext_trunc(
; CHECK-NEXT: [[ZEXT_A:%.]] = zext i32 [[A:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED:%.]] = add i32 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[ZEXT_B:%.]] = zext i32 [[B:%.]] to i64		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.*]] = icmp ult i32 [[ADD_NARROWED]], [[A]]
; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i64 [[ZEXT_A]], [[ZEXT_B]]		; CHECK-NEXT: [[TRUNC_SHR:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i32
; CHECK-NEXT: [[TRUNC_ADD:%.*]] = trunc i64 [[ADD]] to i32		; CHECK-NEXT: [[RET:%.*]] = add i32 [[ADD_NARROWED]], [[TRUNC_SHR]]
; CHECK-NEXT: [[SHR:%.*]] = lshr i64 [[ADD]], 32
; CHECK-NEXT: [[TRUNC_SHR:%.*]] = trunc i64 [[SHR]] to i32
; CHECK-NEXT: [[RET:%.*]] = add i32 [[TRUNC_ADD]], [[TRUNC_SHR]]
; CHECK-NEXT: ret i32 [[RET]]		; CHECK-NEXT: ret i32 [[RET]]
;		;
%zext.a = zext i32 %a to i64		%zext.a = zext i32 %a to i64
%zext.b = zext i32 %b to i64		%zext.b = zext i32 %b to i64
%add = add i64 %zext.a, %zext.b		%add = add i64 %zext.a, %zext.b
%trunc.add = trunc i64 %add to i32		%trunc.add = trunc i64 %add to i32
%shr = lshr i64 %add, 32		%shr = lshr i64 %add, 32
%trunc.shr = trunc i64 %shr to i32		%trunc.shr = trunc i64 %shr to i32
%ret = add i32 %trunc.add, %trunc.shr		%ret = add i32 %trunc.add, %trunc.shr
ret i32 %ret		ret i32 %ret
}		}

define <3 x i32> @add3_i96(<3 x i32> %0, <3 x i32> %1) {		define <3 x i32> @add3_i96(<3 x i32> %0, <3 x i32> %1) {
; CHECK-LABEL: @add3_i96(		; CHECK-LABEL: @add3_i96(
; CHECK-NEXT: [[TMP3:%.]] = extractelement <3 x i32> [[TMP0:%.]], i64 0		; CHECK-NEXT: [[TMP3:%.]] = extractelement <3 x i32> [[TMP0:%.]], i64 0
; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64		; CHECK-NEXT: [[TMP4:%.]] = extractelement <3 x i32> [[TMP1:%.]], i64 0
; CHECK-NEXT: [[TMP5:%.]] = extractelement <3 x i32> [[TMP1:%.]], i64 0		; CHECK-NEXT: [[ADD_NARROWED:%.*]] = add i32 [[TMP4]], [[TMP3]]
		; CHECK-NEXT: [[ADD_NARROWED_OVERFLOW:%.*]] = icmp ult i32 [[ADD_NARROWED]], [[TMP4]]
		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <3 x i32> [[TMP0]], i64 1
; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[TMP5]] to i64		; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[TMP5]] to i64
; CHECK-NEXT: [[TMP7:%.*]] = add nuw nsw i64 [[TMP6]], [[TMP4]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <3 x i32> [[TMP1]], i64 1
; CHECK-NEXT: [[TMP8:%.*]] = extractelement <3 x i32> [[TMP0]], i64 1		; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP7]] to i64
; CHECK-NEXT: [[TMP9:%.*]] = zext i32 [[TMP8]] to i64		; CHECK-NEXT: [[TMP9:%.*]] = add nuw nsw i64 [[TMP8]], [[TMP6]]
; CHECK-NEXT: [[TMP10:%.*]] = extractelement <3 x i32> [[TMP1]], i64 1		; CHECK-NEXT: [[TMP10:%.*]] = zext i1 [[ADD_NARROWED_OVERFLOW]] to i64
; CHECK-NEXT: [[TMP11:%.*]] = zext i32 [[TMP10]] to i64		; CHECK-NEXT: [[TMP11:%.*]] = add nuw nsw i64 [[TMP9]], [[TMP10]]
; CHECK-NEXT: [[TMP12:%.*]] = add nuw nsw i64 [[TMP11]], [[TMP9]]		; CHECK-NEXT: [[TMP12:%.*]] = extractelement <3 x i32> [[TMP0]], i64 2
; CHECK-NEXT: [[TMP13:%.*]] = lshr i64 [[TMP7]], 32		; CHECK-NEXT: [[TMP13:%.*]] = extractelement <3 x i32> [[TMP1]], i64 2
; CHECK-NEXT: [[TMP14:%.*]] = add nuw nsw i64 [[TMP12]], [[TMP13]]		; CHECK-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <3 x i32> [[TMP0]], i64 2		; CHECK-NEXT: [[TMP15:%.*]] = lshr i64 [[TMP11]], 32
; CHECK-NEXT: [[TMP16:%.*]] = extractelement <3 x i32> [[TMP1]], i64 2		; CHECK-NEXT: [[TMP16:%.*]] = trunc i64 [[TMP15]] to i32
; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP16]], [[TMP15]]		; CHECK-NEXT: [[TMP17:%.*]] = add i32 [[TMP14]], [[TMP16]]
; CHECK-NEXT: [[TMP18:%.*]] = lshr i64 [[TMP14]], 32		; CHECK-NEXT: [[TMP18:%.*]] = insertelement <3 x i32> undef, i32 [[ADD_NARROWED]], i64 0
; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP18]] to i32		; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[TMP11]] to i32
; CHECK-NEXT: [[TMP20:%.*]] = add i32 [[TMP17]], [[TMP19]]		; CHECK-NEXT: [[TMP20:%.*]] = insertelement <3 x i32> [[TMP18]], i32 [[TMP19]], i64 1
; CHECK-NEXT: [[TMP21:%.*]] = trunc i64 [[TMP7]] to i32		; CHECK-NEXT: [[TMP21:%.*]] = insertelement <3 x i32> [[TMP20]], i32 [[TMP17]], i64 2
; CHECK-NEXT: [[TMP22:%.*]] = insertelement <3 x i32> undef, i32 [[TMP21]], i64 0		; CHECK-NEXT: ret <3 x i32> [[TMP21]]
; CHECK-NEXT: [[TMP23:%.*]] = trunc i64 [[TMP14]] to i32
; CHECK-NEXT: [[TMP24:%.*]] = insertelement <3 x i32> [[TMP22]], i32 [[TMP23]], i64 1
; CHECK-NEXT: [[TMP25:%.*]] = insertelement <3 x i32> [[TMP24]], i32 [[TMP20]], i64 2
; CHECK-NEXT: ret <3 x i32> [[TMP25]]
;		;
%3 = extractelement <3 x i32> %0, i64 0		%3 = extractelement <3 x i32> %0, i64 0
%4 = zext i32 %3 to i64		%4 = zext i32 %3 to i64
%5 = extractelement <3 x i32> %1, i64 0		%5 = extractelement <3 x i32> %1, i64 0
%6 = zext i32 %5 to i64		%6 = zext i32 %5 to i64
%7 = add nuw nsw i64 %6, %4		%7 = add nuw nsw i64 %6, %4
%8 = extractelement <3 x i32> %0, i64 1		%8 = extractelement <3 x i32> %0, i64 1
%9 = zext i32 %8 to i64		%9 = zext i32 %8 to i64
Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Combine lshr of add -> (a + b < a)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 487714

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp

llvm/test/Transforms/InstCombine/lshr.ll

llvm/test/Transforms/InstCombine/shift-add.ll

[InstCombine] Combine lshr of add -> (a + b < a)
ClosedPublic