This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
15/23
TruncInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
1/2
trunc_select_cmp.ll

Differential D74484

[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.
Needs ReviewPublic

Authored by aymanmus on Feb 12 2020, 6:02 AM.

Download Raw Diff

Details

Reviewers

aaboud
delena
spatel
nikic
ctetreau
lebedev.ri

Summary

Teach Trunc AggressiveInstCombine how to include ICmp instructions in chains that can be type reduced.

Diff Detail

Event Timeline

aymanmus created this revision.Feb 12 2020, 6:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 6:02 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
239	You want to operate on `APInt`, not assuming that it fits into 64-bits. This is a correctness issue.

lebedev.ri added inline comments.Feb 12 2020, 6:30 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
279	Hm, can't we already get a constant operand in one of the supported instructions? If yes, then i would semi-strongly suggest to split this up again..

lebedev.ri marked an inline comment as done.Feb 12 2020, 7:38 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
279	Ah, i see what i'm missing: AIC does not perform DCE, so i was looking at https://godbolt.org/z/ZXpBui and it wasn't being transformed because dead `%cmp` was still present. If it's not there (https://godbolt.org/z/UEMDQN), then we deal with ordinary constants fine. So i see this is ICmp-specific, and should be kept here.

In D74484#1872137, @lebedev.ri wrote:

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

After thinking about this, i'd be okay with not doing that straight away.
It should be doable, but would require substantial redesign to nicely model more than one dag and their connection/dependency.

In D74484#1873705, @lebedev.ri wrote:

In D74484#1872137, @lebedev.ri wrote:

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

After thinking about this, i'd be okay with not doing that straight away.
It should be doable, but would require substantial redesign to nicely model more than one dag and their connection/dependency.

Exactly, there are many improvements that can be added to this pass in many ways.
What I add here is a small and simple improvement that doesn't need serious redesign.
It has it's added value (not a big one though) at a minimal cost.

I think this is missing a test-case where the select could be truncated if you ignored the icmp, but not if you require the icmp to also be truncated. That should show up as a regression in the test diffs.

Deal with APInt instead of 64-bit value in constants.

aymanmus updated this revision to Diff 244348.Feb 13 2020, 12:40 AM

lebedev.ri added inline comments.Feb 13 2020, 1:11 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

241

Why copy?

246

APInt::getActiveBits()

248

Correct me if i'm wrong, but won't this just work?

static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt *C) {
  const APInt& Val = C->getValue();
  unsigned NonSignBits = Val.getBitWidth() - Val.getNumSignBits();

  return IsSigned ? 1 + NonSignBits : NonSignBits;
}

In D74484#1873745, @nikic wrote:

I think this is missing a test-case where the select could be truncated if you ignored the icmp, but not if you require the icmp to also be truncated. That should show up as a regression in the test diffs.

Not sure I understand what you mean exactly.
Shrinking the cmp is not *required* in order to shrink the select. If there is a cmp instruction that can be shrank along with the select, we shrink it. If not, it does not affect the shrinking of the select.
Is the following test case relevant?

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
; CHECK-LABEL: @cmp_shrink_select_not_cmp(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.*]] = sext i8 [[A:%.*]] to i16
; CHECK-NEXT: [[CONV2:%.*]] = sext i8 [[B:%.*]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[C:%.*]], [[D:%.*]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
; CHECK-NEXT: ret i16 [[COND]]
;
entry:

%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%cmp = icmp slt i32 %c, %d
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4

}

@aymanmus I have something along these lines in mind...

%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%conv3 = sext i8 %c to i32
%cmp = icmp slt i32 %conv3, TOO_BIG_CONST
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4

That is, where the icmp has a form that passes the isConstOrExt(C->getOperand(0)) && isConstOrExt(C->getOperand(1)) check, but later gets rejected due to constant bitwidth.

Not sure if that example does it, but I think there must be some case like this, unless I'm misunderstanding the patch.

In D74484#1873830, @nikic wrote:
@aymanmus I have something along these lines in mind...
%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%conv3 = sext i8 %c to i32
%cmp = icmp slt i32 %conv3, TOO_BIG_CONST
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4
That is, where the icmp has a form that passes the isConstOrExt(C->getOperand(0)) && isConstOrExt(C->getOperand(1)) check, but later gets rejected due to constant bitwidth.

Not sure if that example does it, but I think there must be some case like this, unless I'm misunderstanding the patch.

Totally agree.
Fixed the code to exclude the ICmp instruction from the part where we make the decision whether to transform or not.
ICmp instruction can only be replaced after we're already applying the transformation, and only if its operands can fit in the type we are converting to.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
248	IsSigned indicates whether the use of the constant is signed or not. I guess the "sign bit" of the APInt is deduced simply according to the content of the MSB. But think of the case where IsSigned = false, and APInt is of width 16 and value = 0xff80. Val.getNumSignBits() would return 9 (as if the value is signed and had 9 sign bits) and the function's return value would be 7. While instead, we expect to get 16 as a return value. Added a test case (@cmp_select_unsigned_const_i16_MSB1) that unjustifiably applies the transformation with the suggested code above and refrain from that with the current code. However, the previous comments are 100% legit.

Addressed @nikic & @lebedev.ri comments.

lebedev.ri added inline comments.Feb 13 2020, 7:31 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
246	Please explicitly spell out `unsigned` here
248	Right. What about this then: static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt *C) { const APInt& Val = C->getValue(); return IsSigned ? Val.getMinSignedBits() : Val.getActiveBits(); } ?

nikic added inline comments.Feb 13 2020, 9:19 AM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410	Doesn't the sext/zext case have to check the IsSigned flag as well? I don't think it's as simple as IsSigned => sext and !IsSigned => zext (e.g. it's fine if you have an equality comparison and both use same extension type), but same correctness checks are needed here.

lebedev.ri added inline comments.Feb 13 2020, 10:31 PM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
402–403	Wait, why are we giving up on non-scalars here? See how e.g. `Constant::isAllOnesValue()` handles different types, i think similar recursion should be done here.

Fix the constant aux functions to correctly handle vector constants.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410	The IsSigned is not relevant in this part of the code. IsSigned only indicates whether the use of the constant (in case the given Value is a constant) was in a signed or unsigned context. If the Value is not a constant, we don't care about IsSigned. I'll change the name of the operand to avoid such confusion.

nikic added inline comments.Feb 19 2020, 1:08 PM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410	Not sure I get it. Converting `(zext i16 %x to i32) ult (sext i16 %y to i32)` to `%x ult %y` is not legal (https://rise4fun.com/Alive/wVZy), and I don't think anything prevents that from happening with your current code.

aymanmus updated this revision to Diff 245631.Feb 20 2020, 6:01 AM

aymanmus marked an inline comment as done.

aymanmus added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410	No I understand. You're right. I gave it a thought and came up with a bit different approach to deal with all the limitations. I hope this new function (CmpCanBeShrinked) handles all the needed cases.

aymanmus marked an inline comment as done.Feb 20 2020, 6:03 AM

aymanmus added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410	Now I understand***

Ping.

Ping #2.

This is a lot more complicated than expected :(

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
41	I'd suggest Scl -> Scalar, we're not saving a lot of characters here :)
49	`return false` doesn't make sense here. I believe that in the case where it is not analyzable, you need to return the bitwidth of the type. It should be possible to test this by making the constant a constant expression.
81	nit: "fits in Ty in case"
109	Is it really sufficient that just one of them is a constant? Say we have `(zext i16 %x to i32) ult -1`, which is always true, converting it to `i16 %x ult -1` would no longer be always true.

aymanmus updated this revision to Diff 249376.Mar 10 2020, 7:36 AM

aymanmus marked 4 inline comments as done.

aymanmus added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
109	Making sure the const operand is valid for shrinking is done before we reach this point (line 82): if (Ty && Ty->getScalarSizeInBits() < getConstMinBitWidth(IsSigned, C)) return false; If we reach this point, then the constant is valid. Added a test case that exactly checks the scenario you offered.

PING.

I have not looked at the code changes, but I noticed the first few tests are all min/max patterns. Can you check your motivating apps/benchmarks to see if rGf2fbdf76d8d0 helps?

Just checked the patch @spatel mentioned.
Doesn't really help with the cases provided here.

In D74484#1951878, @aymanmus wrote:

Just checked the patch @spatel mentioned.
Doesn't really help with the cases provided here.

Thanks for checking. In that case, it would be helpful to comment on which of the tests are the real motivation (and possibly remove the tests that are not canonical IR).
For example, -instcombine alone is enough to reduce these:

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %cmp = icmp slt i32 %c, %d
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

define i16 @cmp_select_bigConst_cmp(i8 %a, i8 %b, i8 %c) {
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %conv3 = sext i8 %c to i32
  %cmp = icmp slt i32 %conv3, 70000
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

$ ./opt -instcombine cmpsel.ll -S

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
entry:
  %cmp = icmp slt i32 %c, %d
  %cond.v = select i1 %cmp, i8 %b, i8 %a
  %conv4 = sext i8 %cond.v to i16
  ret i16 %conv4
}

define i16 @cmp_select_bigConst_cmp(i8 %a, i8 %b, i8 %c) {
  %conv4 = sext i8 %b to i16
  ret i16 %conv4
}

spatel mentioned this in rGf2fbdf76d8d0: [InstCombine] do not exclude min/max from icmp with casted operand fold.Apr 2 2020, 5:54 AM

@spatel: Basically the test cases with the real motivation are the first 4.
Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

In D74484#1963795, @aymanmus wrote:

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

In D74484#1966744, @spatel wrote:

In D74484#1963795, @aymanmus wrote:

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

The first 4 with diff.

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

I agree with you 100%. That's why we have several test cases.

Running the same test with -instcombine instead of -aggressive-instcombine, out of the 23 test cases in the file, the following get optimized:

cmp_select_zext_i8_noTransformation
cmp_select_zext_sext_i8_noTransformation
cmp_select_signed_const_i16Const_noTransformation
cmp_select_unsigned_const_i16Const
cmp_select_unsigned_const_i16Const_noTransformation
cmp_select_unsigned_const_i16_MSB1
cmp_select_bigConst_cmp
cmp_zext_and_minus1_noTransformation

Out of these cases, the ones with "_noTransformation" suffix are there to make sure our pass does not apply any transformations, and the others include some special immediate values.

Actually, I feel that I might have misunderstood your comments intention. Can you explain to me what are you suggesting that we do?

Thanks,
Ayman

In D74484#1967040, @aymanmus wrote:

In D74484#1966744, @spatel wrote:

In D74484#1963795, @aymanmus wrote:

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

The first 4 with diff.

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

I agree with you 100%. That's why we have several test cases.

Running the same test with -instcombine instead of -aggressive-instcombine, out of the 23 test cases in the file, the following get optimized:

cmp_select_zext_i8_noTransformation

cmp_select_zext_sext_i8_noTransformation

cmp_select_signed_const_i16Const_noTransformation

cmp_select_unsigned_const_i16Const

cmp_select_unsigned_const_i16Const_noTransformation

cmp_select_unsigned_const_i16_MSB1

cmp_select_bigConst_cmp

cmp_zext_and_minus1_noTransformation

Out of these cases, the ones with "_noTransformation" suffix are there to make sure our pass does not apply any transformations, and the others include some special immediate values.

Actually, I feel that I might have misunderstood your comments intention. Can you explain to me what are you suggesting that we do?

I'm trying to understand which patterns are not capable of being reduced to the optimal form by instcombine. If there is some artificial limitation within regular instcombine that we can overcome (like unnecessary bypassing of min/max patterns), then we might be able to avoid a specialized solution in this pass. We want to show that there's a good reason to add code here in aggressive-instcombine to handle these optimizations. So if you can add comments in the test file that explain why instcombine can't handle some transform, that will explain to future readers of the code/tests why we have this particular transform here. Also, if instcombine somehow gets smarter in the future, then we would know if this code became redundant.

As per @spatel 's request,
Added a comment explaining the motive behind implementing the transformation in aggressive instcombine instead of normal instcombine.

I still haven't had a chance to actually look at the code here.
@nikic / @lebedev.ri - are your concerns handled?

llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll
4–5	This isn't entirely accurate for the tests as shown. The problem in several of these tests is not that there are extra uses; it's that we don't canonicalize icmp and min/max as well as possible. That was what rGf2fbdf76d8d0 was trying to solve, but I don't think there's a quick fix to restore that. Please add some extra uses in these first 4 tests, so we have a better representation of the motivating tests. One way to do that is to add something like: declare void @use(i32) and then: call void @use(i32 %conv)

Herald added a reviewer: ctetreau. · View Herald TranscriptApr 13 2020, 12:52 PM

aymanmus marked an inline comment as done.Apr 14 2020, 1:02 AM

aymanmus added inline comments.

llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll
4–5	I must disagree with you this time. The infrastructure for shrinking such cases inside InstCombine cannot handle these cases. If we look, for example, at the first case in this test, notice that the "sext" instruction has multiple uses. Which will prevent the mechanism inside instcombine from shrinking the whole sequence, even though both uses are going to be shrunk together (as part of the same sequence) with "sext". This is one of the main additions in AggressiveInstCombine, where while we're checking the validity/possibility of shrinking sequences, we remember all the instructions we encountered until now. That enables us to check if all the uses of such instruction are included in these instructions to make sure we'll be replacing all the uses of the old sext instruction with the new one.

Sorry, I did not explain my previous comment as well as possible. Let's take this minimal example:

define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

With instcombine today, we can reduce this as:

define i1 @f(i8 %a) {
  %cmp = icmp slt i8 %a, 109
  ret i1 %cmp
}

We can also see that with instcombine today, we can reduce this example even if the cast (sext in this example) has other uses:

declare void @use(i32)
define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

Narrow 'cmp' here even though the sext is not eliminated:

define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  ret i1 %cmp
}

So that's what I was trying to suggest - the limitation is not entirely about the multiple uses in these small examples. The 1st regression test example would have been altered by rGf2fbdf76d8d0 to become:

define i16 @cmp_select_sext_const(i8 %a) {
  %c = icmp sgt i8 %a, 109
  %narrow = select i1 %c, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

And even with an extra use of the sext, we would get the narrow icmp/select:

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

In fact, even when the select also has an extra use (you can reproduce this on current LLVM: https://godbolt.org/z/P6Kws_ ):

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  %cond = select i1 %cmp, i32 109, i32 %conv
  call void @use(i32 %cond)
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

Instcombine will produce this (which seems unintended because we increased the instruction count):

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %cond = zext i8 %narrow to i32
  call void @use(i32 %cond)
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

So I'm suggesting that we cover that last case in addition to the simpler test, so we will know what happens with multiple extra uses.

Also note that we have floated the idea again of adding min/max intrinsics to IR in D68408. Obviously, we need to post that as a proposal on llvm-dev for wider review...but if we had those intrinsics in IR, it probably affects at least some of the motivation for this patch, so I'd be interested in your thoughts on that.

ctetreau added inline comments.Apr 14 2020, 3:00 PM

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
58	if (auto *VT = dyn_cast<VectorType>(C->getType()))
59	getVectorNumElements is going away soon. Please do the cast. See: D77278 Also, is VT guaranteed to not be a scalable vector?

In D74484#1980606, @spatel wrote:
Sorry, I did not explain my previous comment as well as possible. Let's take this minimal example:
define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

Sorry, but I think the motivation of the patch is not yet clear.
The whole optimization (we're trying to improve) should be triggered by a truncate instruction that dominates a certain sequence.
This trunc can lead to shrinking a whole sequence of instructions to a new type and by that leave no need for the trunc instruction itself.
The example you put here has no trunc at all so it's not really relevant.

And even with an extra use of the sext, we would get the narrow icmp/select:

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

Same here.

In fact, even when the select also has an extra use (you can reproduce this on current LLVM: https://godbolt.org/z/P6Kws_ ):
define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  %cond = select i1 %cmp, i32 109, i32 %conv
  call void @use(i32 %cond)
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

This example indeed has a truncate instruction, but as we can see, the optimization applied to it did not originate from the truncate.
As a matter of fact, it applied despite of it. We can see that shrinking the select's type actually increased the instruction count.
So this is not a relevant example as well.
What we are trying to solve in this patch, are cases where a full sequence which is terminated with a truncate instruction can be shrunk to a new type, that will subsequently remove the need of a truncate instruction as a terminator.
A perfect example would be:

define dso_local i16 @cmp_select_sext(i8 %a, i8 %b) {
entry:
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %cmp = icmp slt i32 %conv, %conv2
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

which will not be changed with inst-combine, but with the new optimization, the whole sequence can be shrunk to i16 type (according to the destination type of the truncate instruction), and the truncate can be thrown away.

Also note that we have floated the idea again of adding min/max intrinsics to IR in D68408. Obviously, we need to post that as a proposal on llvm-dev for wider review...but if we had those intrinsics in IR, it probably affects at least some of the motivation for this patch, so I'd be interested in your thoughts on that.

Actually, this might help in some cases where the icmp+select implement a max/min semantics. In that case simply add the new nodes' opcodes (min/max) to the list of opcodes we deal with in the optimization.
But, we still need to handle the general case of icmp+select as well.

aymanmus marked 2 inline comments as done.Apr 19 2020, 4:05 AM

aymanmus added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
59	No, unfortunately I didn't take that into account. How do you suggest that I deal with that? How do scalable vectors of constants are represented? Is: if (auto *VT = dyn_cast<VectorType>(C->getType())) { for (unsigned i = 0; i < VT->getNumElements(); i++) { a correct approach to such case?

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:18 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

TruncInstCombine.cpp

110 lines

test/

Transforms/

AggressiveInstCombine/

trunc_select_cmp.ll

294 lines

Diff 255981

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

Show All 30 Lines
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "aggressive-instcombine"		#define DEBUG_TYPE "aggressive-instcombine"

		// Get the minimum number of bits needed for representing the given constant.
		// This function assumes the constant is of a scalar type.
		static unsigned getConstMinBitWidthScalar(bool IsSigned, Constant *C) {
		nikicUnsubmitted Done Reply Inline Actions I'd suggest Scl -> Scalar, we're not saving a lot of characters here :) nikic: I'd suggest Scl -> Scalar, we're not saving a lot of characters here :)
		assert(!C->getType()->isVectorTy() &&
		"getConstMinBitWidthScalar accepts scalar constants only");

		if (ConstantInt *CI = dyn_cast<ConstantInt>(C)) {
		const APInt &Val = CI->getValue();
		return IsSigned ? Val.getMinSignedBits() : Val.getActiveBits();
		}
		return C->getType()->getScalarSizeInBits();
		nikicUnsubmitted Done Reply Inline Actions `return false` doesn't make sense here. I believe that in the case where it is not analyzable, you need to return the bitwidth of the type. It should be possible to test this by making the constant a constant expression. nikic: `return false` doesn't make sense here. I believe that in the case where it is not analyzable…
		}

		// Get the minimum number of bits needed for the given constant.
		static unsigned getConstMinBitWidth(bool IsSigned, Constant *C) {
		Type *VT = C->getType();
		unsigned Min = 0;
		// In case of a constant vector, make sure all constant elements can fit in
		// the given type.
		if (VT->isVectorTy()) {
		ctetreauUnsubmitted Done Reply Inline Actions if (auto VT = dyn_cast<VectorType>(C->getType())) ctetreau:* if (auto *VT = dyn_cast<VectorType>(C->getType()))
		for (unsigned i = 0; i < VT->getVectorNumElements(); i++) {
		ctetreauUnsubmitted Not Done Reply Inline Actions getVectorNumElements is going away soon. Please do the cast. See: D77278 Also, is VT guaranteed to not be a scalable vector? ctetreau: getVectorNumElements is going away soon. Please do the cast. See: D77278 Also, is VT…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions No, unfortunately I didn't take that into account. How do you suggest that I deal with that? How do scalable vectors of constants are represented? Is: if (auto VT = dyn_cast<VectorType>(C->getType())) { for (unsigned i = 0; i < VT->getNumElements(); i++) { a correct approach to such case? aymanmus:* No, unfortunately I didn't take that into account. How do you suggest that I deal with that?
		unsigned MinCurr =
		getConstMinBitWidthScalar(IsSigned, C->getAggregateElement(i));
		Min = std::max(MinCurr, Min);
		}
		return Min;
		}
		return getConstMinBitWidthScalar(IsSigned, C);
		}

		// If @Ty is NULL, returns true if the provided ICmp instruction is a candidate
		// to be shrunk.
		// If @Ty is not NULL, returns true if the provided ICmp instruction can be
		// shrunk to the given type @Ty shrunk.
		static bool CmpCanBeShrunk(ICmpInst I, Type Ty = nullptr) {
		bool HasConst = false;
		unsigned MinScalarTypeInBits =
		I->getOperand(0)->getType()->getScalarSizeInBits();
		bool IsSigned = I->isSigned();

		for (auto &V : I->operands()) {
		if (Constant *C = dyn_cast<Constant>(V)) {
		// If Op is a constant, make sure it fits in Ty in case it was provided.
		nikicUnsubmitted Done Reply Inline Actions nit: "fits in Ty in case" nikic: nit: "fits in Ty in case"
		if (Ty && Ty->getScalarSizeInBits() < getConstMinBitWidth(IsSigned, C))
		return false;

		// Mark that one of the operands is a constant.
		HasConst = true;
		continue;
		}

		if (Instruction *I = dyn_cast<Instruction>(V)) {
		switch (I->getOpcode()) {
		case Instruction::ZExt:
		case Instruction::SExt: {
		// In case of sext/zext, make sure the original type can fit in @Ty if
		// provided.
		unsigned SclSize = cast<CastInst>(I)->getSrcTy()->getScalarSizeInBits();
		if (Ty && Ty->getScalarSizeInBits() < SclSize)
		return false;
		MinScalarTypeInBits = std::min(MinScalarTypeInBits, SclSize);
		continue;
		}
		}
		}
		return false;
		}

		// If both operands are valid, check that the combination is valid as well.
		// If one of them is a constant, return true.
		if (HasConst)
		nikicUnsubmitted Not Done Reply Inline Actions Is it really sufficient that just one of them is a constant? Say we have `(zext i16 %x to i32) ult -1`, which is always true, converting it to `i16 %x ult -1` would no longer be always true. nikic: Is it really sufficient that just one of them is a constant? Say we have `(zext i16 %x to i32)…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions Making sure the const operand is valid for shrinking is done before we reach this point (line 82): if (Ty && Ty->getScalarSizeInBits() < getConstMinBitWidth(IsSigned, C)) return false; If we reach this point, then the constant is valid. Added a test case that exactly checks the scenario you offered. aymanmus: Making sure the const operand is valid for shrinking is done before we reach this point (line…
		return true;

		// If the type to be shrunk to is larger than the src types of both extend
		// instructions, return true;
		if (Ty && Ty->getScalarSizeInBits() > MinScalarTypeInBits)
		return true;

		// If the type to be shrunk to is equal to the src type of at least one of
		// the extend instructions, make sure the opcodes are identical, except for
		// the case where the compare is signed and the extend operations are ZExt.
		unsigned Opc0 = cast<Instruction>(I->getOperand(0))->getOpcode();
		unsigned Opc1 = cast<Instruction>(I->getOperand(1))->getOpcode();
		return (!Ty \|\| (Opc0 == Opc1 && !(IsSigned && Opc0 == Instruction::ZExt)));
		}

/// Given an instruction and a container, it fills all the relevant operands of		/// Given an instruction and a container, it fills all the relevant operands of
/// that instruction, with respect to the Trunc expression dag optimizaton.		/// that instruction, with respect to the Trunc expression dag optimizaton.
static void getRelevantOperands(Instruction I, SmallVectorImpl<Value > &Ops) {		static void getRelevantOperands(Instruction I, SmallVectorImpl<Value > &Ops) {
unsigned Opc = I->getOpcode();		unsigned Opc = I->getOpcode();
switch (Opc) {		switch (Opc) {
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
// These CastInst are considered leaves of the evaluated expression, thus,		// These CastInst are considered leaves of the evaluated expression, thus,
// their operands are not relevent.		// their operands are not relevent.
break;		break;
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
		case Instruction::ICmp:
Ops.push_back(I->getOperand(0));		Ops.push_back(I->getOperand(0));
Ops.push_back(I->getOperand(1));		Ops.push_back(I->getOperand(1));
break;		break;
case Instruction::Select:		case Instruction::Select: {
		Value *Op0 = I->getOperand(0);
Ops.push_back(I->getOperand(1));		Ops.push_back(I->getOperand(1));
Ops.push_back(I->getOperand(2));		Ops.push_back(I->getOperand(2));
		// In case the condition is a compare instruction, that both of its operands
		// are a type extension/truncate or a constant, that can be shrunk without
		// loosing information in the compare instruction, add them as well.
		if (ICmpInst *C = dyn_cast<ICmpInst>(Op0))
		if (CmpCanBeShrunk(C))
		Ops.push_back(Op0);
break;		break;
		}
default:		default:
llvm_unreachable("Unreachable!");		llvm_unreachable("Unreachable!");
}		}
}		}

bool TruncInstCombine::buildTruncExpressionDag() {		bool TruncInstCombine::buildTruncExpressionDag() {
SmallVector<Value *, 8> Worklist;		SmallVector<Value *, 8> Worklist;
SmallVector<Instruction *, 8> Stack;		SmallVector<Instruction *, 8> Stack;
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	case Instruction::SExt:
// dest		// dest
break;		break;
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::Select: {		case Instruction::Select:
		case Instruction::ICmp: {
SmallVector<Value *, 2> Operands;		SmallVector<Value *, 2> Operands;
getRelevantOperands(I, Operands);		getRelevantOperands(I, Operands);
for (Value *Operand : Operands)		for (Value *Operand : Operands)
Worklist.push_back(Operand);		Worklist.push_back(Operand);
break;		break;
}		}
default:		default:
// TODO: Can handle more cases here:		// TODO: Can handle more cases here:
// 1. shufflevector, extractelement, insertelement		// 1. shufflevector, extractelement, insertelement
// 2. udiv, urem		// 2. udiv, urem
// 3. shl, lshr, ashr		// 3. shl, lshr, ashr
// 4. phi node(and loop handling)		// 4. phi node(and loop handling)
// ...		// ...
return false;		return false;
}		}
}		}
return true;		return true;
}		}

unsigned TruncInstCombine::getMinBitWidth() {		unsigned TruncInstCombine::getMinBitWidth() {
SmallVector<Value *, 8> Worklist;		SmallVector<Value *, 8> Worklist;
		lebedev.riUnsubmitted Done Reply Inline Actions You want to operate on `APInt`, not assuming that it fits into 64-bits. This is a correctness issue. lebedev.ri: You want to operate on `APInt`, not assuming that it fits into 64-bits. This is a correctness…
SmallVector<Instruction *, 8> Stack;		SmallVector<Instruction *, 8> Stack;

		lebedev.riUnsubmitted Not Done Reply Inline Actions Why copy? lebedev.ri: Why copy?
Value *Src = CurrentTruncInst->getOperand(0);		Value *Src = CurrentTruncInst->getOperand(0);
Type *DstTy = CurrentTruncInst->getType();		Type *DstTy = CurrentTruncInst->getType();
unsigned TruncBitWidth = DstTy->getScalarSizeInBits();		unsigned TruncBitWidth = DstTy->getScalarSizeInBits();
unsigned OrigBitWidth =		unsigned OrigBitWidth =
CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();		CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();
		lebedev.riUnsubmitted Not Done Reply Inline Actions APInt::getActiveBits() lebedev.ri: APInt::getActiveBits()
		lebedev.riUnsubmitted Done Reply Inline Actions Please explicitly spell out `unsigned` here lebedev.ri: Please explicitly spell out `unsigned` here

if (isa<Constant>(Src))		if (isa<Constant>(Src))
		lebedev.riUnsubmitted Not Done Reply Inline Actions Correct me if i'm wrong, but won't this just work? static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt C) { const APInt& Val = C->getValue(); unsigned NonSignBits = Val.getBitWidth() - Val.getNumSignBits(); return IsSigned ? 1 + NonSignBits : NonSignBits; } lebedev.ri:* Correct me if i'm wrong, but won't this just work? ``` static unsigned getConstMinBitWidth(bool…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions IsSigned indicates whether the use of the constant is signed or not. I guess the "sign bit" of the APInt is deduced simply according to the content of the MSB. But think of the case where IsSigned = false, and APInt is of width 16 and value = 0xff80. Val.getNumSignBits() would return 9 (as if the value is signed and had 9 sign bits) and the function's return value would be 7. While instead, we expect to get 16 as a return value. Added a test case (@cmp_select_unsigned_const_i16_MSB1) that unjustifiably applies the transformation with the suggested code above and refrain from that with the current code. However, the previous comments are 100% legit. aymanmus: IsSigned indicates whether the use of the constant is signed or not. I guess the "sign bit" of…
		lebedev.riUnsubmitted Done Reply Inline Actions Right. What about this then: static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt C) { const APInt& Val = C->getValue(); return IsSigned ? Val.getMinSignedBits() : Val.getActiveBits(); } ? lebedev.ri:* Right. What about this then: ``` static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt…
return TruncBitWidth;		return TruncBitWidth;

Worklist.push_back(Src);		Worklist.push_back(Src);
InstInfoMap[cast<Instruction>(Src)].ValidBitWidth = TruncBitWidth;		InstInfoMap[cast<Instruction>(Src)].ValidBitWidth = TruncBitWidth;

while (!Worklist.empty()) {		while (!Worklist.empty()) {
Value *Curr = Worklist.back();		Value *Curr = Worklist.back();

Show All 14 Lines	if (!Stack.empty() && Stack.back() == I) {
// Already handled all instruction operands, can remove it from both, the		// Already handled all instruction operands, can remove it from both, the
// Worklist and the Stack, and update MinBitWidth.		// Worklist and the Stack, and update MinBitWidth.
Worklist.pop_back();		Worklist.pop_back();
Stack.pop_back();		Stack.pop_back();
for (auto *Operand : Operands)		for (auto *Operand : Operands)
if (auto *IOp = dyn_cast<Instruction>(Operand))		if (auto *IOp = dyn_cast<Instruction>(Operand))
Info.MinBitWidth =		Info.MinBitWidth =
std::max(Info.MinBitWidth, InstInfoMap[IOp].MinBitWidth);		std::max(Info.MinBitWidth, InstInfoMap[IOp].MinBitWidth);
continue;		continue;
		lebedev.riUnsubmitted Not Done Reply Inline Actions Hm, can't we already get a constant operand in one of the supported instructions? If yes, then i would semi-strongly suggest to split this up again.. lebedev.ri: Hm, can't we already get a constant operand in one of the supported instructions? If yes, then…
		lebedev.riUnsubmitted Done Reply Inline Actions Ah, i see what i'm missing: AIC does not perform DCE, so i was looking at https://godbolt.org/z/ZXpBui and it wasn't being transformed because dead `%cmp` was still present. If it's not there (https://godbolt.org/z/UEMDQN), then we deal with ordinary constants fine. So i see this is ICmp-specific, and should be kept here. lebedev.ri: Ah, i see what i'm missing: AIC does not perform DCE, so i was looking at https://godbolt.
}		}

// Add the instruction to the stack before start handling its operands.		// Add the instruction to the stack before start handling its operands.
Stack.push_back(I);		Stack.push_back(I);
unsigned ValidBitWidth = Info.ValidBitWidth;		unsigned ValidBitWidth = Info.ValidBitWidth;

// Update minimum bit-width before handling its operands. This is required		// Update minimum bit-width before handling its operands. This is required
// when the instruction is part of a loop.		// when the instruction is part of a loop.
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	Value TruncInstCombine::getReducedOperand(Value V, Type *SclTy) {
Info Entry = InstInfoMap.lookup(I);		Info Entry = InstInfoMap.lookup(I);
assert(Entry.NewValue);		assert(Entry.NewValue);
return Entry.NewValue;		return Entry.NewValue;
}		}

void TruncInstCombine::ReduceExpressionDag(Type *SclTy) {		void TruncInstCombine::ReduceExpressionDag(Type *SclTy) {
for (auto &Itr : InstInfoMap) { // Forward		for (auto &Itr : InstInfoMap) { // Forward
Instruction *I = Itr.first;		Instruction *I = Itr.first;
TruncInstCombine::Info &NodeInfo = Itr.second;		TruncInstCombine::Info &NodeInfo = Itr.second;

		lebedev.riUnsubmitted Done Reply Inline Actions Wait, why are we giving up on non-scalars here? See how e.g. `Constant::isAllOnesValue()` handles different types, i think similar recursion should be done here. lebedev.ri: Wait, why are we giving up on non-scalars here? See how e.g. `Constant::isAllOnesValue()`…
assert(!NodeInfo.NewValue && "Instruction has been evaluated");		assert(!NodeInfo.NewValue && "Instruction has been evaluated");

IRBuilder<> Builder(I);		IRBuilder<> Builder(I);
Value *Res = nullptr;		Value *Res = nullptr;
unsigned Opc = I->getOpcode();		unsigned Opc = I->getOpcode();
switch (Opc) {		switch (Opc) {
case Instruction::Trunc:		case Instruction::Trunc:
		nikicUnsubmitted Not Done Reply Inline Actions Doesn't the sext/zext case have to check the IsSigned flag as well? I don't think it's as simple as IsSigned => sext and !IsSigned => zext (e.g. it's fine if you have an equality comparison and both use same extension type), but same correctness checks are needed here. nikic: Doesn't the sext/zext case have to check the IsSigned flag as well? I don't think it's as…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions The IsSigned is not relevant in this part of the code. IsSigned only indicates whether the use of the constant (in case the given Value is a constant) was in a signed or unsigned context. If the Value is not a constant, we don't care about IsSigned. I'll change the name of the operand to avoid such confusion. aymanmus: The IsSigned is not relevant in this part of the code. IsSigned only indicates whether the use…
		nikicUnsubmitted Not Done Reply Inline Actions Not sure I get it. Converting `(zext i16 %x to i32) ult (sext i16 %y to i32)` to `%x ult %y` is not legal (https://rise4fun.com/Alive/wVZy), and I don't think anything prevents that from happening with your current code. nikic: Not sure I get it. Converting `(zext i16 %x to i32) ult (sext i16 %y to i32)` to `%x ult %y` is…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions No I understand. You're right. I gave it a thought and came up with a bit different approach to deal with all the limitations. I hope this new function (CmpCanBeShrinked) handles all the needed cases. aymanmus: No I understand. You're right. I gave it a thought and came up with a bit different approach to…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions Now I understand* aymanmus: Now I understand*
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt: {		case Instruction::SExt: {
Type *Ty = getReducedType(I, SclTy);		Type *Ty = getReducedType(I, SclTy);
// If the source type of the cast is the type we're trying for then we can		// If the source type of the cast is the type we're trying for then we can
// just return the source. There's no need to insert it because it is not		// just return the source. There's no need to insert it because it is not
// new.		// new.
if (I->getOperand(0)->getType() == Ty) {		if (I->getOperand(0)->getType() == Ty) {
assert(!isa<TruncInst>(I) && "Cannot reach here with TruncInst");		assert(!isa<TruncInst>(I) && "Cannot reach here with TruncInst");
Show All 28 Lines	for (auto &Itr : InstInfoMap) { // Forward
case Instruction::Xor: {		case Instruction::Xor: {
Value *LHS = getReducedOperand(I->getOperand(0), SclTy);		Value *LHS = getReducedOperand(I->getOperand(0), SclTy);
Value *RHS = getReducedOperand(I->getOperand(1), SclTy);		Value *RHS = getReducedOperand(I->getOperand(1), SclTy);
Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);		Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);
break;		break;
}		}
case Instruction::Select: {		case Instruction::Select: {
Value *Op0 = I->getOperand(0);		Value *Op0 = I->getOperand(0);
		if (ICmpInst *C = dyn_cast<ICmpInst>(Op0))
		if (CmpCanBeShrunk(C, SclTy))
		Op0 = getReducedOperand(Op0, SclTy);
Value *LHS = getReducedOperand(I->getOperand(1), SclTy);		Value *LHS = getReducedOperand(I->getOperand(1), SclTy);
Value *RHS = getReducedOperand(I->getOperand(2), SclTy);		Value *RHS = getReducedOperand(I->getOperand(2), SclTy);
Res = Builder.CreateSelect(Op0, LHS, RHS);		Res = Builder.CreateSelect(Op0, LHS, RHS);
break;		break;
}		}
		case Instruction::ICmp: {
		auto ICmp = cast<ICmpInst>(I);
		Value *LHS = getReducedOperand(ICmp->getOperand(0), SclTy);
		Value *RHS = getReducedOperand(ICmp->getOperand(1), SclTy);
		Res = Builder.CreateICmp(ICmp->getPredicate(), LHS, RHS);
		break;
		}
default:		default:
llvm_unreachable("Unhandled instruction");		llvm_unreachable("Unhandled instruction");
}		}

NodeInfo.NewValue = Res;		NodeInfo.NewValue = Res;
if (auto *ResI = dyn_cast<Instruction>(Res))		if (auto *ResI = dyn_cast<Instruction>(Res))
ResI->takeName(I);		ResI->takeName(I);
}		}
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s		; RUN: opt < %s -aggressive-instcombine -dce -S \| FileCheck %s
; RUN: opt < %s -passes=aggressive-instcombine -S \| FileCheck %s
		; Today, InstCombine cannot handle the following cases since it doesn't
		; allow - in any way - an instruction with multiple uses to be shrunk.
		spatelUnsubmitted Not Done Reply Inline Actions This isn't entirely accurate for the tests as shown. The problem in several of these tests is not that there are extra uses; it's that we don't canonicalize icmp and min/max as well as possible. That was what rGf2fbdf76d8d0 was trying to solve, but I don't think there's a quick fix to restore that. Please add some extra uses in these first 4 tests, so we have a better representation of the motivating tests. One way to do that is to add something like: declare void @use(i32) and then: call void @use(i32 %conv) spatel: This isn't entirely accurate for the tests as shown. The problem in several of these tests is…
		aymanmusAuthorUnsubmitted Done Reply Inline Actions I must disagree with you this time. The infrastructure for shrinking such cases inside InstCombine cannot handle these cases. If we look, for example, at the first case in this test, notice that the "sext" instruction has multiple uses. Which will prevent the mechanism inside instcombine from shrinking the whole sequence, even though both uses are going to be shrunk together (as part of the same sequence) with "sext". This is one of the main additions in AggressiveInstCombine, where while we're checking the validity/possibility of shrinking sequences, we remember all the instructions we encountered until now. That enables us to check if all the uses of such instruction are included in these instructions to make sure we'll be replacing all the uses of the old sext instruction with the new one. aymanmus: I must disagree with you this time. The infrastructure for shrinking such cases inside…
		; Aggressive InstCombine, on the other hand, remembers all the nodes we
		; want to shrink, and in case of an instruction with multiple uses, it
		; makes sure that all uses are gonna be shrunk as well, thus allowing
		; the transformation to happen.

define dso_local i16 @cmp_select_sext_const(i8 %a) {		define dso_local i16 @cmp_select_sext_const(i8 %a) {
; CHECK-LABEL: @cmp_select_sext_const(		; CHECK-LABEL: @cmp_select_sext_const(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], 109		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], 109
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 109, i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 109, i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%cmp = icmp slt i32 %conv, 109		%cmp = icmp slt i32 %conv, 109
%cond = select i1 %cmp, i32 109, i32 %conv		%cond = select i1 %cmp, i32 109, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define dso_local i16 @cmp_select_sext(i8 %a, i8 %b) {		define dso_local i16 @cmp_select_sext(i8 %a, i8 %b) {
; CHECK-LABEL: @cmp_select_sext(		; CHECK-LABEL: @cmp_select_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i32		; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], [[CONV2]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[CONV2]], i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32		%conv2 = sext i8 %b to i32
%cmp = icmp slt i32 %conv, %conv2		%cmp = icmp slt i32 %conv, %conv2
%cond = select i1 %cmp, i32 %conv2, i32 %conv		%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define dso_local i16 @cmp_select_zext(i8 %a, i8 %b) {		define dso_local i16 @cmp_select_zext(i8 %a, i8 %b) {
; CHECK-LABEL: @cmp_select_zext(		; CHECK-LABEL: @cmp_select_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CONV2:%.]] = zext i8 [[B:%.]] to i32		; CHECK-NEXT: [[CONV2:%.]] = zext i8 [[B:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], [[CONV2]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[CONV2]], i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%conv2 = zext i8 %b to i32		%conv2 = zext i8 %b to i32
%cmp = icmp slt i32 %conv, %conv2		%cmp = icmp slt i32 %conv, %conv2
%cond = select i1 %cmp, i32 %conv2, i32 %conv		%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

		define dso_local i8 @cmp_select_zext_i8_noTransformation(i8 %a, i8 %b) {
		; CHECK-LABEL: @cmp_select_zext_i8_noTransformation(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32
		; CHECK-NEXT: [[CONV2:%.]] = zext i8 [[B:%.]] to i32
		; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP0]], i8 [[B]], i8 [[A]]
		; CHECK-NEXT: ret i8 [[COND]]
		;
		entry:
		%conv = zext i8 %a to i32
		%conv2 = zext i8 %b to i32
		%cmp = icmp slt i32 %conv, %conv2
		%cond = select i1 %cmp, i32 %conv2, i32 %conv
		%conv4 = trunc i32 %cond to i8
		ret i8 %conv4
		}

define dso_local i16 @cmp_select_zext_sext(i8 %a, i8 %b) {		define dso_local i16 @cmp_select_zext_sext(i8 %a, i8 %b) {
; CHECK-LABEL: @cmp_select_zext_sext(		; CHECK-LABEL: @cmp_select_zext_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i32		; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], [[CONV2]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[CONV2]], i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%conv2 = sext i8 %b to i32		%conv2 = sext i8 %b to i32
%cmp = icmp slt i32 %conv, %conv2		%cmp = icmp slt i32 %conv, %conv2
%cond = select i1 %cmp, i32 %conv2, i32 %conv		%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

		define dso_local i8 @cmp_select_zext_sext_i8_noTransformation(i8 %a, i8 %b) {
		; CHECK-LABEL: @cmp_select_zext_sext_i8_noTransformation(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32
		; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i32
		; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP0]], i8 [[B]], i8 [[A]]
		; CHECK-NEXT: ret i8 [[COND]]
		;
		entry:
		%conv = zext i8 %a to i32
		%conv2 = sext i8 %b to i32
		%cmp = icmp slt i32 %conv, %conv2
		%cond = select i1 %cmp, i32 %conv2, i32 %conv
		%conv4 = trunc i32 %cond to i8
		ret i8 %conv4
		}

define dso_local i16 @cmp_select_zext_sext_diffOrigTy(i8 %a, i16 %b) {		define dso_local i16 @cmp_select_zext_sext_diffOrigTy(i8 %a, i16 %b) {
; CHECK-LABEL: @cmp_select_zext_sext_diffOrigTy(		; CHECK-LABEL: @cmp_select_zext_sext_diffOrigTy(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CONV2:%.]] = sext i16 [[B:%.]] to i32		; CHECK-NEXT: [[CMP:%.]] = icmp slt i16 [[CONV]], [[B:%.]]
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], [[CONV2]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[B]], i16 [[CONV]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[CONV2]], i32 [[CONV]]		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%conv2 = sext i16 %b to i32		%conv2 = sext i16 %b to i32
%cmp = icmp slt i32 %conv, %conv2		%cmp = icmp slt i32 %conv, %conv2
%cond = select i1 %cmp, i32 %conv2, i32 %conv		%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define dso_local i16 @my_abs_sext(i8 %a) {		define dso_local i16 @my_abs_sext(i8 %a) {
; CHECK-LABEL: @my_abs_sext(		; CHECK-LABEL: @my_abs_sext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], 0
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 0, [[CONV]]		; CHECK-NEXT: [[SUB:%.*]] = sub i16 0, [[CONV]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[SUB]], i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%cmp = icmp slt i32 %conv, 0		%cmp = icmp slt i32 %conv, 0
%sub = sub nsw i32 0, %conv		%sub = sub nsw i32 0, %conv
%cond = select i1 %cmp, i32 %sub, i32 %conv		%cond = select i1 %cmp, i32 %sub, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define dso_local i16 @my_abs_zext(i8 %a) {		define dso_local i16 @my_abs_zext(i8 %a) {
; CHECK-LABEL: @my_abs_zext(		; CHECK-LABEL: @my_abs_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i16 [[CONV]], 0
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 0, [[CONV]]		; CHECK-NEXT: [[SUB:%.*]] = sub i16 0, [[CONV]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[SUB]], i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
entry:		entry:
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%cmp = icmp slt i32 %conv, 0		%cmp = icmp slt i32 %conv, 0
%sub = sub nsw i32 0, %conv		%sub = sub nsw i32 0, %conv
%cond = select i1 %cmp, i32 %sub, i32 %conv		%cond = select i1 %cmp, i32 %sub, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
Show All 28 Lines	entry:
%sub = sub nsw i32 0, %conv		%sub = sub nsw i32 0, %conv
%sel = select i1 %cond, i32 %sub, i32 %conv		%sel = select i1 %cond, i32 %sub, i32 %conv
%conv4 = trunc i32 %sel to i16		%conv4 = trunc i32 %sel to i16
ret i16 %conv4		ret i16 %conv4
}		}

define i16 @cmp_select_signed_const_i16Const_noTransformation(i8 %a) {		define i16 @cmp_select_signed_const_i16Const_noTransformation(i8 %a) {
; CHECK-LABEL: @cmp_select_signed_const_i16Const_noTransformation(		; CHECK-LABEL: @cmp_select_signed_const_i16Const_noTransformation(
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[CONV]], 32768		; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[A]] to i32
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 32768, i32 [[CONV]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 32768
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP2]], i16 -32768, i16 [[CONV]]
; CHECK-NEXT: ret i16 [[CONV4]]		; CHECK-NEXT: ret i16 [[COND]]
;		;
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%cmp = icmp slt i32 %conv, 32768		%cmp = icmp slt i32 %conv, 32768
%cond = select i1 %cmp, i32 32768, i32 %conv		%cond = select i1 %cmp, i32 32768, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define i16 @cmp_select_unsigned_const_i16Const(i8 %a) {		define i16 @cmp_select_unsigned_const_i16Const(i8 %a) {
; CHECK-LABEL: @cmp_select_unsigned_const_i16Const(		; CHECK-LABEL: @cmp_select_unsigned_const_i16Const(
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[CONV]], 32768		; CHECK-NEXT: [[CMP:%.*]] = icmp ult i16 [[CONV]], -32768
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 32768, i32 [[CONV]]		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 -32768, i16 [[CONV]]
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: ret i16 [[COND]]
; CHECK-NEXT: ret i16 [[CONV4]]
;		;
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%cmp = icmp ult i32 %conv, 32768		%cmp = icmp ult i32 %conv, 32768
%cond = select i1 %cmp, i32 32768, i32 %conv		%cond = select i1 %cmp, i32 32768, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

define i16 @cmp_select_unsigned_const_i16Const_noTransformation(i8 %a) {		define i16 @cmp_select_unsigned_const_i16Const_noTransformation(i8 %a) {
; CHECK-LABEL: @cmp_select_unsigned_const_i16Const_noTransformation(		; CHECK-LABEL: @cmp_select_unsigned_const_i16Const_noTransformation(
; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i32		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[CONV]], 65536		; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[A]] to i32
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i32 65536, i32 [[CONV]]		; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[TMP1]], 65536
; CHECK-NEXT: [[CONV4:%.*]] = trunc i32 [[COND]] to i16		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP2]], i16 0, i16 [[CONV]]
; CHECK-NEXT: ret i16 [[CONV4]]		; CHECK-NEXT: ret i16 [[COND]]
;		;
%conv = zext i8 %a to i32		%conv = zext i8 %a to i32
%cmp = icmp ult i32 %conv, 65536		%cmp = icmp ult i32 %conv, 65536
%cond = select i1 %cmp, i32 65536, i32 %conv		%cond = select i1 %cmp, i32 65536, i32 %conv
%conv4 = trunc i32 %cond to i16		%conv4 = trunc i32 %cond to i16
ret i16 %conv4		ret i16 %conv4
}		}

		define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
		; CHECK-LABEL: @cmp_shrink_select_not_cmp(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
		; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i16
		; CHECK-NEXT: [[CMP:%.]] = icmp slt i32 [[C:%.]], [[D:%.*]]
		; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
		; CHECK-NEXT: ret i16 [[COND]]
		;
		entry:
		%conv = sext i8 %a to i32
		%conv2 = sext i8 %b to i32
		%cmp = icmp slt i32 %c, %d
		%cond = select i1 %cmp, i32 %conv2, i32 %conv
		%conv4 = trunc i32 %cond to i16
		ret i16 %conv4
		}

		define i8 @cmp_select_unsigned_const_i16_MSB1(i8 %a) {
		; CHECK-LABEL: @cmp_select_unsigned_const_i16_MSB1(
		; CHECK-NEXT: [[CONV:%.]] = zext i8 [[A:%.]] to i16
		; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i16 [[CONV]], -128
		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP1]], i8 55, i8 [[A]]
		; CHECK-NEXT: ret i8 [[COND]]
		;
		%conv = zext i8 %a to i16
		%cmp = icmp ult i16 %conv, 65408
		%cond = select i1 %cmp, i16 55, i16 %conv
		%conv4 = trunc i16 %cond to i8
		ret i8 %conv4
		}

		define i16 @cmp_select_bigConst_cmp(i8 %a, i8 %b, i8 %c) {
		; CHECK-LABEL: @cmp_select_bigConst_cmp(
		; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i16
		; CHECK-NEXT: [[CONV2:%.]] = sext i8 [[B:%.]] to i16
		; CHECK-NEXT: [[TMP1:%.]] = sext i8 [[C:%.]] to i32
		; CHECK-NEXT: [[TMP2:%.*]] = icmp slt i32 [[TMP1]], 70000
		; CHECK-NEXT: [[COND:%.*]] = select i1 [[TMP2]], i16 [[CONV2]], i16 [[CONV]]
		; CHECK-NEXT: ret i16 [[COND]]
		;
		%conv = sext i8 %a to i32
		%conv2 = sext i8 %b to i32
		%conv3 = sext i8 %c to i32
		%cmp = icmp slt i32 %conv3, 70000
		%cond = select i1 %cmp, i32 %conv2, i32 %conv
		%conv4 = trunc i32 %cond to i16
		ret i16 %conv4
		}

		define dso_local <2 x i16> @cmp_select_vec_sext(<2 x i8> %a, <2 x i8> %b) {
		; CHECK-LABEL: @cmp_select_vec_sext(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = sext <2 x i8> [[A:%.]] to <2 x i16>
		; CHECK-NEXT: [[CONV2:%.]] = sext <2 x i8> [[B:%.]] to <2 x i16>
		; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i16> [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[COND:%.*]] = select <2 x i1> [[CMP]], <2 x i16> [[CONV2]], <2 x i16> [[CONV]]
		; CHECK-NEXT: ret <2 x i16> [[COND]]
		;
		entry:
		%conv = sext <2 x i8> %a to <2 x i32>
		%conv2 = sext <2 x i8> %b to <2 x i32>
		%cmp = icmp slt <2 x i32> %conv, %conv2
		%cond = select <2 x i1> %cmp, <2 x i32> %conv2, <2 x i32> %conv
		%conv4 = trunc <2 x i32> %cond to <2 x i16>
		ret <2 x i16> %conv4
		}

		define dso_local <2 x i16> @cmp_select_vec_zext(<2 x i8> %a, <2 x i8> %b) {
		; CHECK-LABEL: @cmp_select_vec_zext(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = zext <2 x i8> [[A:%.]] to <2 x i16>
		; CHECK-NEXT: [[CONV2:%.]] = zext <2 x i8> [[B:%.]] to <2 x i16>
		; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i16> [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[COND:%.*]] = select <2 x i1> [[CMP]], <2 x i16> [[CONV2]], <2 x i16> [[CONV]]
		; CHECK-NEXT: ret <2 x i16> [[COND]]
		;
		entry:
		%conv = zext <2 x i8> %a to <2 x i32>
		%conv2 = zext <2 x i8> %b to <2 x i32>
		%cmp = icmp slt <2 x i32> %conv, %conv2
		%cond = select <2 x i1> %cmp, <2 x i32> %conv2, <2 x i32> %conv
		%conv4 = trunc <2 x i32> %cond to <2 x i16>
		ret <2 x i16> %conv4
		}

		define dso_local <2 x i16> @cmp_select_vec_sext_zext(<2 x i8> %a, <2 x i8> %b) {
		; CHECK-LABEL: @cmp_select_vec_sext_zext(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = sext <2 x i8> [[A:%.]] to <2 x i16>
		; CHECK-NEXT: [[CONV2:%.]] = zext <2 x i8> [[B:%.]] to <2 x i16>
		; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i16> [[CONV]], [[CONV2]]
		; CHECK-NEXT: [[COND:%.*]] = select <2 x i1> [[CMP]], <2 x i16> [[CONV2]], <2 x i16> [[CONV]]
		; CHECK-NEXT: ret <2 x i16> [[COND]]
		;
		entry:
		%conv = sext <2 x i8> %a to <2 x i32>
		%conv2 = zext <2 x i8> %b to <2 x i32>
		%cmp = icmp slt <2 x i32> %conv, %conv2
		%cond = select <2 x i1> %cmp, <2 x i32> %conv2, <2 x i32> %conv
		%conv4 = trunc <2 x i32> %cond to <2 x i16>
		ret <2 x i16> %conv4
		}

		define dso_local <2 x i16> @cmp_select_vec_sext_const(<2 x i8> %a) {
		; CHECK-LABEL: @cmp_select_vec_sext_const(
		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[CONV:%.]] = sext <2 x i8> [[A:%.]] to <2 x i16>
		; CHECK-NEXT: [[CMP:%.*]] = icmp slt <2 x i16> [[CONV]], <i16 109, i16 28>
		; CHECK-NEXT: [[COND:%.*]] = select <2 x i1> [[CMP]], <2 x i16> <i16 109, i16 28>, <2 x i16> [[CONV]]
		; CHECK-NEXT: ret <2 x i16> [[COND]]
		;
		entry:
		%conv = sext <2 x i8> %a to <2 x i32>
		%cmp = icmp slt <2 x i32> %conv, <i32 109, i32 28>
		%cond = select <2 x i1> %cmp, <2 x i32> <i32 109, i32 28>, <2 x i32> %conv
		%conv4 = trunc <2 x i32> %cond to <2 x i16>
		ret <2 x i16> %conv4
		}

		define <2 x i8> @cmp_select_unsigned_const_vec_i8_noTransformation(<2 x i8> %a) {
		; CHECK-LABEL: @cmp_select_unsigned_const_vec_i8_noTransformation(
		; CHECK-NEXT: [[CONV:%.]] = zext <2 x i8> [[A:%.]] to <2 x i32>
		; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> [[CONV]], <i32 256, i32 127>
		; CHECK-NEXT: [[COND:%.*]] = select <2 x i1> [[TMP1]], <2 x i8> <i8 -33, i8 -46>, <2 x i8> [[A]]
		; CHECK-NEXT: ret <2 x i8> [[COND]]
		;
		%conv = zext <2 x i8> %a to <2 x i32>
		%cmp = icmp ult <2 x i32> %conv, <i32 256, i32 127>
		%cond = select <2 x i1> %cmp, <2 x i32> <i32 223, i32 1234>, <2 x i32> %conv
		%conv4 = trunc <2 x i32> %cond to <2 x i8>
		ret <2 x i8> %conv4
		}

		define i1 @cmp_zext_and_minus1_noTransformation(i16 %a) {
		; CHECK-LABEL: @cmp_zext_and_minus1_noTransformation(
		; CHECK-NEXT: [[CONV:%.]] = zext i16 [[A:%.]] to i32
		; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[CONV]], -1
		; CHECK-NEXT: ret i1 [[CMP]]
		;
		%conv = zext i16 %a to i32
		%cmp = icmp ult i32 %conv, -1
		ret i1 %cmp
		}

This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255981

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll

[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.
Needs ReviewPublic