Page MenuHomePhabricator

[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.
Needs ReviewPublic

Authored by aymanmus on Feb 12 2020, 6:02 AM.

Details

Summary

Teach Trunc AggressiveInstCombine how to include ICmp instructions in chains that can be type reduced.

Diff Detail

Event Timeline

aymanmus created this revision.Feb 12 2020, 6:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 6:02 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
239

You want to operate on APInt, not assuming that it fits into 64-bits.
This is a correctness issue.

lebedev.ri added inline comments.Feb 12 2020, 6:30 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
279

Hm, can't we already get a constant operand in one of the supported instructions?
If yes, then i would semi-strongly suggest to split this up again..

lebedev.ri marked an inline comment as done.Feb 12 2020, 7:38 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
279

Ah, i see what i'm missing: AIC does not perform DCE,
so i was looking at https://godbolt.org/z/ZXpBui
and it wasn't being transformed because dead %cmp was still present.
If it's not there (https://godbolt.org/z/UEMDQN), then we deal with ordinary constants fine.
So i see this is ICmp-specific, and should be kept here.

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

After thinking about this, i'd be okay with not doing that straight away.
It should be doable, but would require substantial redesign to nicely model more than one dag and their connection/dependency.

This seems not generic enough to me, can't we not require the icmp operands to be constants/[sz]ext's,
but instead try to see if it can be evaluated in smaller bitwidth (what the rest of the code here does)?

As long as the icmp can be shrunked to at least as small bitwidth as we need there to get rid of cast,
i think we can always pick the actual bitwidth we'll use, which might be wider than minimal?

After thinking about this, i'd be okay with not doing that straight away.
It should be doable, but would require substantial redesign to nicely model more than one dag and their connection/dependency.

Exactly, there are many improvements that can be added to this pass in many ways.
What I add here is a small and simple improvement that doesn't need serious redesign.
It has it's added value (not a big one though) at a minimal cost.

I think this is missing a test-case where the select could be truncated if you ignored the icmp, but not if you require the icmp to also be truncated. That should show up as a regression in the test diffs.

aymanmus updated this revision to Diff 244347.Feb 13 2020, 12:34 AM
aymanmus marked an inline comment as done.

Deal with APInt instead of 64-bit value in constants.

aymanmus updated this revision to Diff 244348.Feb 13 2020, 12:40 AM
lebedev.ri added inline comments.Feb 13 2020, 1:11 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
241

Why copy?

246

APInt::getActiveBits()

248

Correct me if i'm wrong, but won't this just work?

static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt *C) {
  const APInt& Val = C->getValue();
  unsigned NonSignBits = Val.getBitWidth() - Val.getNumSignBits();

  return IsSigned ? 1 + NonSignBits : NonSignBits;
}

I think this is missing a test-case where the select could be truncated if you ignored the icmp, but not if you require the icmp to also be truncated. That should show up as a regression in the test diffs.

Not sure I understand what you mean exactly.
Shrinking the cmp is not *required* in order to shrink the select. If there is a cmp instruction that can be shrank along with the select, we shrink it. If not, it does not affect the shrinking of the select.
Is the following test case relevant?

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
; CHECK-LABEL: @cmp_shrink_select_not_cmp(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.*]] = sext i8 [[A:%.*]] to i16
; CHECK-NEXT: [[CONV2:%.*]] = sext i8 [[B:%.*]] to i16
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[C:%.*]], [[D:%.*]]
; CHECK-NEXT: [[COND:%.*]] = select i1 [[CMP]], i16 [[CONV2]], i16 [[CONV]]
; CHECK-NEXT: ret i16 [[COND]]
;
entry:

%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%cmp = icmp slt i32 %c, %d
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4

}

nikic added a comment.Feb 13 2020, 1:32 AM

@aymanmus I have something along these lines in mind...

%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%conv3 = sext i8 %c to i32
%cmp = icmp slt i32 %conv3, TOO_BIG_CONST
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4

That is, where the icmp has a form that passes the isConstOrExt(C->getOperand(0)) && isConstOrExt(C->getOperand(1)) check, but later gets rejected due to constant bitwidth.

Not sure if that example does it, but I think there must be some case like this, unless I'm misunderstanding the patch.

aymanmus marked an inline comment as done.Feb 13 2020, 7:13 AM

@aymanmus I have something along these lines in mind...

%conv = sext i8 %a to i32
%conv2 = sext i8 %b to i32
%conv3 = sext i8 %c to i32
%cmp = icmp slt i32 %conv3, TOO_BIG_CONST
%cond = select i1 %cmp, i32 %conv2, i32 %conv
%conv4 = trunc i32 %cond to i16
ret i16 %conv4

That is, where the icmp has a form that passes the isConstOrExt(C->getOperand(0)) && isConstOrExt(C->getOperand(1)) check, but later gets rejected due to constant bitwidth.

Not sure if that example does it, but I think there must be some case like this, unless I'm misunderstanding the patch.

Totally agree.
Fixed the code to exclude the ICmp instruction from the part where we make the decision whether to transform or not.
ICmp instruction can only be replaced after we're already applying the transformation, and only if its operands can fit in the type we are converting to.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
248

IsSigned indicates whether the use of the constant is signed or not.
I guess the "sign bit" of the APInt is deduced simply according to the content of the MSB.
But think of the case where IsSigned = false, and APInt is of width 16 and value = 0xff80.
Val.getNumSignBits() would return 9 (as if the value is signed and had 9 sign bits) and the function's return value would be 7.
While instead, we expect to get 16 as a return value.
Added a test case (@cmp_select_unsigned_const_i16_MSB1) that unjustifiably applies the transformation with the suggested code above and refrain from that with the current code.

However, the previous comments are 100% legit.

aymanmus updated this revision to Diff 244427.Feb 13 2020, 7:14 AM

Addressed @nikic & @lebedev.ri comments.

lebedev.ri added inline comments.Feb 13 2020, 7:31 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
246

Please explicitly spell out unsigned here

248

Right. What about this then:

static unsigned getConstMinBitWidth(bool IsSigned, ConstantInt *C) {
  const APInt& Val = C->getValue();
  return IsSigned ? Val.getMinSignedBits() : Val.getActiveBits();
}

?

nikic added inline comments.Feb 13 2020, 9:19 AM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410

Doesn't the sext/zext case have to check the IsSigned flag as well? I don't think it's as simple as IsSigned => sext and !IsSigned => zext (e.g. it's fine if you have an equality comparison and both use same extension type), but same correctness checks are needed here.

lebedev.ri added inline comments.Feb 13 2020, 10:31 PM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
402–403

Wait, why are we giving up on non-scalars here?
See how e.g. Constant::isAllOnesValue() handles different types,
i think similar recursion should be done here.

aymanmus updated this revision to Diff 244869.Feb 16 2020, 5:35 AM
aymanmus marked 4 inline comments as done.

Fix the constant aux functions to correctly handle vector constants.

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410

The IsSigned is not relevant in this part of the code.
IsSigned only indicates whether the use of the constant (in case the given Value is a constant) was in a signed or unsigned context. If the Value is not a constant, we don't care about IsSigned.
I'll change the name of the operand to avoid such confusion.

nikic added inline comments.Feb 19 2020, 1:08 PM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410

Not sure I get it. Converting (zext i16 %x to i32) ult (sext i16 %y to i32) to %x ult %y is not legal (https://rise4fun.com/Alive/wVZy), and I don't think anything prevents that from happening with your current code.

aymanmus updated this revision to Diff 245631.Feb 20 2020, 6:01 AM
aymanmus marked an inline comment as done.
aymanmus added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410

No I understand.
You're right.
I gave it a thought and came up with a bit different approach to deal with all the limitations.
I hope this new function (CmpCanBeShrinked) handles all the needed cases.

aymanmus marked an inline comment as done.Feb 20 2020, 6:03 AM
aymanmus added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
410

Now I understand***

nikic added a comment.Mar 8 2020, 11:13 AM

This is a lot more complicated than expected :(

llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
41

I'd suggest Scl -> Scalar, we're not saving a lot of characters here :)

49

return false doesn't make sense here. I believe that in the case where it is not analyzable, you need to return the bitwidth of the type. It should be possible to test this by making the constant a constant expression.

81

nit: "fits in Ty in case"

109

Is it really sufficient that just one of them is a constant? Say we have (zext i16 %x to i32) ult -1, which is always true, converting it to i16 %x ult -1 would no longer be always true.

aymanmus updated this revision to Diff 249376.Mar 10 2020, 7:36 AM
aymanmus marked 4 inline comments as done.
aymanmus added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
109

Making sure the const operand is valid for shrinking is done before we reach this point (line 82):

if (Ty && Ty->getScalarSizeInBits() < getConstMinBitWidth(IsSigned, C))
        return false;

If we reach this point, then the constant is valid.
Added a test case that exactly checks the scenario you offered.

I have not looked at the code changes, but I noticed the first few tests are all min/max patterns. Can you check your motivating apps/benchmarks to see if rGf2fbdf76d8d0 helps?

Just checked the patch @spatel mentioned.
Doesn't really help with the cases provided here.

spatel added a comment.EditedMar 31 2020, 11:07 AM

Just checked the patch @spatel mentioned.
Doesn't really help with the cases provided here.

Thanks for checking. In that case, it would be helpful to comment on which of the tests are the real motivation (and possibly remove the tests that are not canonical IR).
For example, -instcombine alone is enough to reduce these:

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %cmp = icmp slt i32 %c, %d
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

define i16 @cmp_select_bigConst_cmp(i8 %a, i8 %b, i8 %c) {
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %conv3 = sext i8 %c to i32
  %cmp = icmp slt i32 %conv3, 70000
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}
$ ./opt -instcombine cmpsel.ll -S

define dso_local i16 @cmp_shrink_select_not_cmp(i8 %a, i8 %b, i32 %c, i32 %d) {
entry:
  %cmp = icmp slt i32 %c, %d
  %cond.v = select i1 %cmp, i8 %b, i8 %a
  %conv4 = sext i8 %cond.v to i16
  ret i16 %conv4
}

define i16 @cmp_select_bigConst_cmp(i8 %a, i8 %b, i8 %c) {
  %conv4 = sext i8 %b to i16
  ret i16 %conv4
}

@spatel: Basically the test cases with the real motivation are the first 4.
Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

spatel added a comment.Apr 7 2020, 5:18 AM

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

The first 4 with diff.

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

I agree with you 100%. That's why we have several test cases.

Running the same test with -instcombine instead of -aggressive-instcombine, out of the 23 test cases in the file, the following get optimized:

  • cmp_select_zext_i8_noTransformation
  • cmp_select_zext_sext_i8_noTransformation
  • cmp_select_signed_const_i16Const_noTransformation
  • cmp_select_unsigned_const_i16Const
  • cmp_select_unsigned_const_i16Const_noTransformation
  • cmp_select_unsigned_const_i16_MSB1
  • cmp_select_bigConst_cmp
  • cmp_zext_and_minus1_noTransformation

Out of these cases, the ones with "_noTransformation" suffix are there to make sure our pass does not apply any transformations, and the others include some special immediate values.

Actually, I feel that I might have misunderstood your comments intention. Can you explain to me what are you suggesting that we do?

Thanks,
Ayman

spatel added a comment.Apr 7 2020, 9:54 AM

@spatel: Basically the test cases with the real motivation are the first 4.

Do you mean the first 4 with diffs, or the first 4 that are being added as new tests?

The first 4 with diff.

Where most of the other cases are there to check various "edge" cases of the added code behavior.
I think, that even if some of the tests are not in a canonical form (and can be optimized by instcombine), we still have an added value having them here in order to check the behavior of this specific pass with similar cases.
Don't you agree?

Yes, I agree that we want to have tests for edge cases to make sure that the logic is correct here. But we also should have tests that show why this patch is necessary - functions that could not be solved in regular instcombine easily.

I agree with you 100%. That's why we have several test cases.

Running the same test with -instcombine instead of -aggressive-instcombine, out of the 23 test cases in the file, the following get optimized:

  • cmp_select_zext_i8_noTransformation
  • cmp_select_zext_sext_i8_noTransformation
  • cmp_select_signed_const_i16Const_noTransformation
  • cmp_select_unsigned_const_i16Const
  • cmp_select_unsigned_const_i16Const_noTransformation
  • cmp_select_unsigned_const_i16_MSB1
  • cmp_select_bigConst_cmp
  • cmp_zext_and_minus1_noTransformation

Out of these cases, the ones with "_noTransformation" suffix are there to make sure our pass does not apply any transformations, and the others include some special immediate values.

Actually, I feel that I might have misunderstood your comments intention. Can you explain to me what are you suggesting that we do?

I'm trying to understand which patterns are not capable of being reduced to the optimal form by instcombine. If there is some artificial limitation within regular instcombine that we can overcome (like unnecessary bypassing of min/max patterns), then we might be able to avoid a specialized solution in this pass. We want to show that there's a good reason to add code here in aggressive-instcombine to handle these optimizations. So if you can add comments in the test file that explain why instcombine can't handle some transform, that will explain to future readers of the code/tests why we have this particular transform here. Also, if instcombine somehow gets smarter in the future, then we would know if this code became redundant.

aymanmus updated this revision to Diff 255981.Apr 8 2020, 5:21 AM

As per @spatel 's request,
Added a comment explaining the motive behind implementing the transformation in aggressive instcombine instead of normal instcombine.

I still haven't had a chance to actually look at the code here.
@nikic / @lebedev.ri - are your concerns handled?

llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll
4–5

This isn't entirely accurate for the tests as shown. The problem in several of these tests is not that there are extra uses; it's that we don't canonicalize icmp and min/max as well as possible. That was what rGf2fbdf76d8d0 was trying to solve, but I don't think there's a quick fix to restore that.

Please add some extra uses in these first 4 tests, so we have a better representation of the motivating tests. One way to do that is to add something like:
declare void @use(i32)
and then:
call void @use(i32 %conv)

aymanmus marked an inline comment as done.Apr 14 2020, 1:02 AM
aymanmus added inline comments.
llvm/test/Transforms/AggressiveInstCombine/trunc_select_cmp.ll
4–5

I must disagree with you this time.
The infrastructure for shrinking such cases inside InstCombine cannot handle these cases.
If we look, for example, at the first case in this test, notice that the "sext" instruction has multiple uses.
Which will prevent the mechanism inside instcombine from shrinking the whole sequence, even though both uses are going to be shrunk together (as part of the same sequence) with "sext".
This is one of the main additions in AggressiveInstCombine, where while we're checking the validity/possibility of shrinking sequences, we remember all the instructions we encountered until now. That enables us to check if all the uses of such instruction are included in these instructions to make sure we'll be replacing all the uses of the old sext instruction with the new one.

Sorry, I did not explain my previous comment as well as possible. Let's take this minimal example:

define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

With instcombine today, we can reduce this as:

define i1 @f(i8 %a) {
  %cmp = icmp slt i8 %a, 109
  ret i1 %cmp
}

We can also see that with instcombine today, we can reduce this example even if the cast (sext in this example) has other uses:

declare void @use(i32)
define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

Narrow 'cmp' here even though the sext is not eliminated:

define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  ret i1 %cmp
}

So that's what I was trying to suggest - the limitation is not entirely about the multiple uses in these small examples. The 1st regression test example would have been altered by rGf2fbdf76d8d0 to become:

define i16 @cmp_select_sext_const(i8 %a) {
  %c = icmp sgt i8 %a, 109
  %narrow = select i1 %c, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

And even with an extra use of the sext, we would get the narrow icmp/select:

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

In fact, even when the select also has an extra use (you can reproduce this on current LLVM: https://godbolt.org/z/P6Kws_ ):

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  %cond = select i1 %cmp, i32 109, i32 %conv
  call void @use(i32 %cond)
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

Instcombine will produce this (which seems unintended because we increased the instruction count):

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %cond = zext i8 %narrow to i32
  call void @use(i32 %cond)
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

So I'm suggesting that we cover that last case in addition to the simpler test, so we will know what happens with multiple extra uses.

Also note that we have floated the idea again of adding min/max intrinsics to IR in D68408. Obviously, we need to post that as a proposal on llvm-dev for wider review...but if we had those intrinsics in IR, it probably affects at least some of the motivation for this patch, so I'd be interested in your thoughts on that.

ctetreau added inline comments.Apr 14 2020, 3:00 PM
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
58

if (auto *VT = dyn_cast<VectorType>(C->getType()))

59

getVectorNumElements is going away soon. Please do the cast.

See: D77278

Also, is VT guaranteed to not be a scalable vector?

Sorry, I did not explain my previous comment as well as possible. Let's take this minimal example:

define i1 @f(i8 %a) {
  %conv = sext i8 %a to i32
  %cmp = icmp slt i32 %conv, 109
  ret i1 %cmp
}

Sorry, but I think the motivation of the patch is not yet clear.
The whole optimization (we're trying to improve) should be triggered by a truncate instruction that dominates a certain sequence.
This trunc can lead to shrinking a whole sequence of instructions to a new type and by that leave no need for the trunc instruction itself.
The example you put here has no trunc at all so it's not really relevant.

And even with an extra use of the sext, we would get the narrow icmp/select:

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %1 = icmp sgt i8 %a, 109
  %narrow = select i1 %1, i8 %a, i8 109
  %conv4 = zext i8 %narrow to i16
  ret i16 %conv4
}

Same here.

In fact, even when the select also has an extra use (you can reproduce this on current LLVM: https://godbolt.org/z/P6Kws_ ):

define i16 @cmp_select_sext_const(i8 %a) {
  %conv = sext i8 %a to i32
  call void @use(i32 %conv)
  %cmp = icmp slt i8 %a, 109
  %cond = select i1 %cmp, i32 109, i32 %conv
  call void @use(i32 %cond)
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

This example indeed has a truncate instruction, but as we can see, the optimization applied to it did not originate from the truncate.
As a matter of fact, it applied despite of it. We can see that shrinking the select's type actually increased the instruction count.
So this is not a relevant example as well.
What we are trying to solve in this patch, are cases where a full sequence which is terminated with a truncate instruction can be shrunk to a new type, that will subsequently remove the need of a truncate instruction as a terminator.
A perfect example would be:

define dso_local i16 @cmp_select_sext(i8 %a, i8 %b) {
entry:
  %conv = sext i8 %a to i32
  %conv2 = sext i8 %b to i32
  %cmp = icmp slt i32 %conv, %conv2
  %cond = select i1 %cmp, i32 %conv2, i32 %conv
  %conv4 = trunc i32 %cond to i16
  ret i16 %conv4
}

which will not be changed with inst-combine, but with the new optimization, the whole sequence can be shrunk to i16 type (according to the destination type of the truncate instruction), and the truncate can be thrown away.

Also note that we have floated the idea again of adding min/max intrinsics to IR in D68408. Obviously, we need to post that as a proposal on llvm-dev for wider review...but if we had those intrinsics in IR, it probably affects at least some of the motivation for this patch, so I'd be interested in your thoughts on that.

Actually, this might help in some cases where the icmp+select implement a max/min semantics. In that case simply add the new nodes' opcodes (min/max) to the list of opcodes we deal with in the optimization.
But, we still need to handle the general case of icmp+select as well.

aymanmus marked 2 inline comments as done.Apr 19 2020, 4:05 AM
aymanmus added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
59

No, unfortunately I didn't take that into account.
How do you suggest that I deal with that?
How do scalable vectors of constants are represented?
Is:

if (auto *VT = dyn_cast<VectorType>(C->getType())) {
  for (unsigned i = 0; i < VT->getNumElements(); i++) {

a correct approach to such case?