This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineInternal.h
14/18
InstCombineSelect.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
sadd_sat.ll

Differential D68651

[InstCombine] Signed saturation patterns
ClosedPublic

Authored by dmgreen on Oct 8 2019, 9:00 AM.

Download Raw Diff

Details

Reviewers

spatel
nikic
lebedev.ri
efriedma

Commits

rG186155b89c2d: [InstCombine] Signed saturation patterns
rL375505: [InstCombine] Signed saturation patterns

Summary

This adds a instcombine match for code that attempts to perform signed saturating arithmetic by casting to a higher type. For example:
https://godbolt.org/z/9knBnP
As can be seen the unsigned cases are already matched, but signed are not. Adding them for these cases involves matching the min(max(add a b)) nodes, with proper truncs and extends.

There is some work in D68643 to make the default lowering for these better. With that I believe that the intrinsic is a better choice than extending into a larger type (although it's obviously hard to tell for all architectures). This also helps with vectorization, especially in dsp routines that often want to use saturating arithmetic.

This also adds a m_SpecificAPInt matcher which seemed to be useful, similar to m_SpecificInt but taking an APInt for cases like this where using an APInt make sense.

  %a_wide = sext i4 %a to i8
  %b_wide = sext i4 %b to i8
  %add = add i8 %b_wide, %a_wide
  %should_not_clamp_high = icmp slt i8 %add, 7
  %clamped_high = select i1 %should_not_clamp_high, i8 %add, i8 7
  %should_not_clamp_low = icmp sgt i8 %clamped_high, -8
  %clamped_low = select i1 %should_not_clamp_low, i8 %clamped_high, i8 -8
  %res = trunc i8 %clamped_low to i4
  ret i4 %res
=>
  %res = sadd_sat i4 %a, %b
  ret i4 %res
  %b_wide = sext i4 %b to i8
  %a_wide = sext i4 %a to i8
  %add = add i8 %b_wide, %a_wide
  %should_not_clamp_high = icmp slt i8 %add, 7
  %clamped_high = select i1 %should_not_clamp_high, i8 %add, i8 7
  %should_not_clamp_low = icmp sgt i8 %clamped_high, -8
  %clamped_low = select i1 %should_not_clamp_low, i8 %clamped_high, i8 -8

Done: 1
Optimization is correct!

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Oct 8 2019, 9:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 9:00 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Do we already form these saturating intrinsics in instcombine, if they are in their native format?
Can that code be extended to also use this knowledge about truncation?

Hello.

Can you explain what you mean by "native format"? Do you mean without the extends/truncs, as a different way of specifying them? (I think the problem at least from C is dealing with overflowing arithmetic being undefined. If you extend at least one bit then the arithmetic can't overflow, so you can do the min/max like it's done here).

I don't think there is anywhere in instcombine that currently forms a sadd_sat or ssub_sat (as opposed to uadd_sat or usub_sat), unless it's from an existing sadd_sat. We do form uadd_sat as in rL357012 and usub_sat from selects.

I really just need some way to generate sadd_sats for vectorisation. If there's a better way than this, I'm all ears :)

In D68651#1699976, @dmgreen wrote:

Hello.

Can you explain what you mean by "native format"? Do you mean without the extends/truncs, as a different way of specifying them?

Err, i meant native bitwidth, so yes, without any extends/truncs.

(I think the problem at least from C is dealing with overflowing arithmetic being undefined. If you extend at least one bit then the arithmetic can't overflow, so you can do the min/max like it's done here).

True.
This highlights that we also might want to form it from .with.overflow() intrinsics.

I don't think there is anywhere in instcombine that currently forms a sadd_sat or ssub_sat (as opposed to uadd_sat or usub_sat), unless it's from an existing sadd_sat. We do form uadd_sat as in rL357012 and usub_sat from selects.

I really just need some way to generate sadd_sats for vectorisation. If there's a better way than this, I'm all ears :)

In D68651#1699976, @dmgreen wrote:

We do form uadd_sat as in rL357012 and usub_sat from selects.

I really just need some way to generate sadd_sats for vectorisation. If there's a better way than this, I'm all ears :)

If we just need vectorization with saturating add to happen, we just need a better pattern matcher utility. There should be benefits for going to intrinsic that compensates for the possible loss of optimizations that we might get with a sequence of Instructions. Is this (i.e., saturating add/sub intrinsic as the canonical form) something already discussed in llvm-dev that I missed? When I was involved in the "vector idioms" discussion few years ago, general consensus (among 10+ people interested in that topic) was that we need better pattern matchers than pushing for vector idiom specific intrinsics. That's why I'm asking.

The same question also applies to rL357012.

@hsaito We've put quite a bit of effort into making sure that saturating intrinsics optimize as well or better than expanded IR sequences, which is why they are indeed considered canonical IR. Whether intrinsics are canonical needs to be decided on a case by case basis, there is no general rule about it. For example the unsigned mul overflow intrinsic is considered canonical (for obvious reasons -- the alternatives tend to be much more expensive) and is formed in InstCombine. Unsigned add/sub overflow on the other hand are not, because they tends to optimize much worse than expanded IR. Those are formed in CGP instead.

It should be noted that the saturating add/sub intrinsics are not intended primarily as vector intrinsics -- even though that's the case where they will most typically select to hardware instructions rather than be expanded in DAG. The intrinsic representation is simply generally useful for optimization.

Yes. I was going off the prior art for adding sadd_sat and ssub_sat to instcombine. And from the cases this patch is matching, where we are otherwise extending to a higher type, the intrinsic seems to produce equal or better code in most of the cases I've tried now. So this wasn't just for vectorisation, although it does make things a lot simpler there. (On an arm specific note, we have a scalar qadd instruction that can be used, if we can sort out the "q" flag otherwise being visible from C).

If the canonical form for one of these signed saturating adds/subs wasn't an intrinsic, what would it be? Going into a higher type is awkward for us because the i64 add is not legal, and so doesn't look like the kind of instruction that should be vectorised, plus in ISel we'd have to catch the lowering fairly early and do something special.

In D68651#1701189, @dmgreen wrote:

Yes. I was going off the prior art for adding sadd_sat and ssub_sat to instcombine. And from the cases this patch is matching, where we are otherwise extending to a higher type, the intrinsic seems to produce equal or better code in most of the cases I've tried now. So this wasn't just for vectorisation, although it does make things a lot simpler there. (On an arm specific note, we have a scalar qadd instruction that can be used, if we can sort out the "q" flag otherwise being visible from C).

If the canonical form for one of these signed saturating adds/subs wasn't an intrinsic, what would it be? Going into a higher type is awkward for us because the i64 add is not legal, and so doesn't look like the kind of instruction that should be vectorised, plus in ISel we'd have to catch the lowering fairly early and do something special.

I haven't looked at the patch in detail, but as author of at least part of the prior art cited here, I agree with the direction*. I also participated in some of the vector idioms discussions from a few years ago. There's overlap with the vector idiom problems, but as noted, these are generic (scalar too) math ops, so it's not exactly the same. We invested significantly in IR analysis and codegen for the math intrinsics, so that may have changed the thinking. I don't remember the sequence of events or if there was a dedicated llvm-dev thread for this, but the general idea is that if we have a generic intrinsic for the math and can easily invert the transform in the backend for targets/types that are not supported, try to canonicalize to the intrinsic.

https://bugs.llvm.org/show_bug.cgi?id=43580 may show another example where transforming to an intrinsic would help. cc @craig.topper @xbolva00
*We should have codegen tests in place for multiple targets/types, and it appears that is added with D68643.

I haven't looked at the patch in detail, but as author of at least part of the prior art cited here, I agree with the direction*. I also participated in some of the vector idioms discussions from a few years ago. There's overlap with the vector idiom problems, but as noted, these are generic (scalar too) math ops, so it's not exactly the same. We invested significantly in IR analysis and codegen for the math intrinsics, so that may have changed the thinking. I don't remember the sequence of events or if there was a dedicated llvm-dev thread for this, but the general idea is that if we have a generic intrinsic for the math and can easily invert the transform in the backend for targets/types that are not supported, try to canonicalize to the intrinsic.

Don't get me wrong. I'm not against the direction. I just want to make sure we have a general agreement with the rest of the community on where and how to draw the line ---- and get it documented (and the document to be kept up-to-date). Then, we can go back to the idiom list we created and determine which ones should be canonicalized, and which ones should go to better pattern matchers --- reflect that into the document. That'll make it easier for other interested people to actually work on those, w/o having to make a point on individual cases. That's the main purpose of raising this question.

I just want to make sure we have a general agreement with the rest of the community..

That's easy enough for this patch. (Well, IT issues aside..)

.. on where and how to draw the line ..

Not sure about in general though!

and get it documented (and the document to be kept up-to-date).

Any ideas what it would look like? And where this would live? At some point when we get into enough of the details the code becomes the best documentation, but I'm guessing you will disagree with that...

In D68651#1703888, @dmgreen wrote:

.. on where and how to draw the line ..

Not sure about in general though!

and get it documented (and the document to be kept up-to-date).

Any ideas what it would look like? And where this would live? At some point when we get into enough of the details the code becomes the best documentation, but I'm guessing you will disagree with that...

I'm not sure this can really be documented and enforced.

I'm in favor of treating signed saturation as canonical. The issue in delaying detection of such cases to instruction selection is the volatility of the IR: there is no guarantee that the IR will remain in the same form (expected by isel) from one day to the next. For example, some optimization may decide to just promote the operations to the wider type and only do the extension/truncate once, depending on how many saturating operations may be near one another. Handling this variability in isel is just not feasible.

fhahn added a subscriber: fhahn.Oct 10 2019, 11:30 AM

In D68651#1704375, @kparzysz wrote:

I'm in favor of treating signed saturation as canonical. The issue in delaying detection of such cases to instruction selection is the volatility of the IR: there is no guarantee that the IR will remain in the same form (expected by isel) from one day to the next. For example, some optimization may decide to just promote the operations to the wider type and only do the extension/truncate once, depending on how many saturating operations may be near one another. Handling this variability in isel is just not feasible.

I don't want to hijack this review. Let me have the rest of the discussion in the form of RFC on llvm-dev. Vector idiom discussions resulted in ~20 idioms. It would be nice if we can come up with a basic guideline on how to think and how to make a case for the new canonical form. For example, TI said they are interested in saturating mul. If saturating add/sub have a canonical form, I don't see why saturating mul should not.

Generally looks good to me, I'm only wondering whether the trunc is the right place to start the match. Starting from the min/max we could match a larger set of patterns, in particular those where the result of the saturation is still extended to a larger type -- for example doing a 16-bit saturating add but continuing with a 32-bit result.

llvm/include/llvm/IR/PatternMatch.h
663 ↗	(On Diff #223875)	nit: `///`
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
729 ↗	(On Diff #223875)	Rather than exact type equality, we could require that the original type is <= the trunc type and sext to the trunc type. This would allow also matching a saturating add between 16-bit and 8-bit number, for example. Not sure how practically relevant that would be though.

In D68651#1703888, @dmgreen wrote:

At some point when we get into enough of the details the code becomes the best documentation, but I'm guessing you will disagree with that...

First up, sorry about this line. I was in a bit of a mood over emails not working and someone distracted me, I pressed submit before I had properly constructed what I really wanted to say. Sorry if it came across as accusatory or passive aggressive. I was really wanting to say that; strong specifications, whilst they can be very useful, can also be stifling to innovation and bog down forward progress. We should try to keep a balance between writing things down and getting things done :)

Secondly, I had to revert D68643 because it wasn't sign extending the values as it should. Simple cases were working, but more complex examples were not. This will mostly effect illegal types (like i4's or i8/i16 on arm), but will make the results look less impressive. I will put up another patch together showing the correct results, which will contain more extends in places. I don't think it changes the decision of whether to use the intrinsics, just moves a few cases from looking like "an improvement", to just being "the same". And still better for many cases like vectorisation.

In D68651#1706810, @hsaito wrote:

I don't want to hijack this review. Let me have the rest of the discussion in the form of RFC on llvm-dev. Vector idiom discussions resulted in ~20 idioms. It would be nice if we can come up with a basic guideline on how to think and how to make a case for the new canonical form. For example, TI said they are interested in saturating mul. If saturating add/sub have a canonical form, I don't see why saturating mul should not.

I was going to say yeah, sounds great to me, let me look into adding it.. But I took another look at the MVE spec and we apparently don't have that instruction. There are some that do multiplying + doubling and saturate the high half. Apparently that's useful to someone! I don't believe it would be a detriment to have a saturating mul though, so sounds good to me if someone wants to add it.

Do you happen to know if that discussion you spoke of was written down? It sounds very interesting. One thing I would like to try is to kick llvm's use of reductions up to 11, and it might provide useful insights on the best way to make that work.

In D68651#1707017, @nikic wrote:

I'm only wondering whether the trunc is the right place to start the match. Starting from the min/max we could match a larger set of patterns, in particular those where the result of the saturation is still extended to a larger type -- for example doing a 16-bit saturating add but continuing with a 32-bit result.

Yeah I was wondering if that would be better. So long as it doesn't create strange types, I think it sounds like a good idea. I'll give it a try and let you know how it looks.

Update to start from a select, not the trunc.

lebedev.ri added inline comments.Oct 14 2019, 3:09 AM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2032–2035	Can you tell which test is not folded because of this check?
2043–2046	This is ugly. There should be a way to do that by passing an old type and a new bitwidth.
2048–2057	This should be closer to something like Value A, B; if(!match(AddSub, m_BinOp(m_SExt(m_Value(A)), m_SExt(m_Value(B))))) return nullptr' Function *F; if (AddSub.getOpcode() == Instruction::Add) F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::sadd_sat, NewTy); else if (AddSub.getOpcode() == Instruction::Sub) F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::ssub_sat, NewTy); else return nullptr;
2058–2060	I don't understand why this test is here. You know you matched `sext` on the way to `A` and `B`.

dmgreen marked 6 inline comments as done.Oct 14 2019, 7:07 AM

dmgreen added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2032–2035	Ah. That was meant to be sadd_sat31, but then I re-added back in some extra checks. I'll change it to a sadd_sat4 which should test this. I presume it's a good idea to prevent the creation i4's?
2043–2046	Do you mean the whole "if (Ty->isVectorTy()).. " or just the "VectorType::get(..)" part? The second one looks simple, by using getElementCount. Let me know if you meant the first.
2058–2060	I originally left this out of the new patch, but think it is still needed. We could be SExtending from something larger than the NewBitWidth, and if it does contain higher bits the results of the add might be different to what we expect. Something like https://rise4fun.com/Alive/WZvW, if I have that correct (I don't know if there is a way to say "sadd_sat" exactly. Let me know if you know of a way, I thought I'd seen something like that before but can't find it now. The idea is to try and replicate what that does in the i9 part).

Address comments.

lebedev.ri added inline comments.Oct 14 2019, 7:49 AM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2031	/// In what bitwidth can this be treated as saturating arithmetics?
2032–2035	Eh, good point. Not sure honestly, i don't remember ever seeing that check in any of the transforms recently-ish added.
2043–2046	The entirety of the block i highlighted.
2048–2057	Now that i think about it, the only really differing part is the new opcode, so ZZZ NewOpcode; if (AddSub->getOpcode() == Instruction::Add) NewOpcode = Intrinsic::sadd_sat; else if (AddSub->getOpcode() == Instruction::Sub) NewOpcode = Intrinsic::ssub_sat; else return nullptr; auto F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::ssub_sat, NewTy); even.
2058–2060	Please move to right after `if(!match(AddSub, m_BinOp(m_SExt(`. I guess i see what you mean, please ensure there is a test for that.
2061	This needs more meaningful comment.

Added a changeBitWidth to Type, and other cleanup.

lebedev.ri edited the summary of this revision. (Show Details)Oct 16 2019, 12:20 PM

Even though we don't care about trunc, in all the tests there is a trunc in the end.
Can you either drop it from tests, or at least add a test without trunc?

Looks good to me in principle.

I'm pretty sure there are two more patterns that can be handled - you are using extension
because in original code [signed] overflow is undefined.
But that is only the case in IR if nsw is specified.
You can write it without UB, e.g.:

----------------------------------------
  %agg = uadd_overflow {i4, i1} %a, %b
  %add = extractvalue {i4, i1} %agg, 0
  %ov = extractvalue {i4, i1} %agg, 1
  %res = select %ov, i4 -1, %add
  ret i4 %res
=>
  %res = uadd_sat i4 %a, %b
  ret i4 %res
  %agg = uadd_overflow {i4, i1} %a, %b
  %ov = extractvalue {i4, i1} %agg, 1
  %add = extractvalue {i4, i1} %agg, 0

Done: 1
Optimization is correct!

yet we don't handle it:
https://godbolt.org/z/6VAnoL

That can also be written without @llvm.uadd.with.overflow(), and the same for signed.

llvm/include/llvm/IR/Type.h
388 ↗	(On Diff #224899)	Can you use it somewhere in existing code and land this change separately?
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2032–2035	@spatel @nikic thoughts? I'm almost thinking this should not be here.

@lebedev.ri I believe that pattern is principally handled in canonicalizeSaturatedAdd(), it just depends on the way in which the overflow check is expressed. Apparently it checks for the ~X u< Y ? -1 : X + Y pattern, but not for (X + Y) < X ? -1 : X + Y, which is our canonical non-constant add overflow check form, I think.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2032–2035	I think this check makes sense in this case, because we are creating a type from integer constants. E.g. if those happen to be `-256` and `255`, we'd create a `sadd.sat.i9` call, which will introduce the use of an "unusual" type that was not present in the source program. I think that's something we should generally avoid, as it will likely result in an unfavorable lowering in the backend.

dmgreen mentioned this in D69139: [Types] Define a getWithNewBitWidth for Types and make use of it..Oct 17 2019, 2:26 PM

Thanks for the alive proofs. I had written a patch to do the same thing (for sat's at least), but it looks like you already have something similar that handles overflows too.

In D68651#1711777, @nikic wrote:

@lebedev.ri I believe that pattern is principally handled in canonicalizeSaturatedAdd(), it just depends on the way in which the overflow check is expressed. Apparently it checks for the ~X u< Y ? -1 : X + Y pattern, but not for (X + Y) < X ? -1 : X + Y, which is our canonical non-constant add overflow check form, I think.

I will look into this. Thanks for the pointers!

llvm/include/llvm/IR/Type.h
388 ↗	(On Diff #224899)	D69139, as you already saw.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2032–2035	InstCombine used to be more strict about changing to illegal types before shouldChangeType was relaxed to at least include common types. I've not ran performance numbers either way over whether it's better to be less strict about this. It likely doesn't come up a lot in this case. The checks we currently have here will only allow generate sats if there was an existing value of that type, from the sext (I think). sext from a larger type will not be valid to transform, and if both inputs are smaller the min/max will be superfluous (so we just use a standard add). So at least one of the inputs needs to be the same size as the final sat. Whether we should turn a sext from an illegal type into a sadd_sat with an illegal type is the question then. I'd still go with no, but that's mostly because it feel's simpler (and it's likely does not come up very often).

Mostly rebased out D69139, but also added a notrunc test.

dmgreen mentioned this in D69244: [InstCombine] Extra combine for uadd_sat.Oct 21 2019, 2:18 AM

Okay, LG.
@nikic ?

This revision is now accepted and ready to land.Oct 21 2019, 3:08 AM

dmgreen mentioned this in rL375417: [Types] Define a getWithNewBitWidth for Types and make use of it.Oct 21 2019, 7:53 AM

dmgreen mentioned this in rG3edb41654650: [Types] Define a getWithNewBitWidth for Types and make use of it.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2053	I'd use `Ty->getScalarSizeInBits()` rather than `Ty->getScalarType()->getIntegerBitWidth()` here and above. I believe that's idiomatic even if we're dealing specifically in integers.

lebedev.ri marked an inline comment as done.Oct 21 2019, 11:54 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2053	Oh, thanks for spotting this, i thought i posted about this, but i guess it was in an inline comment that i deleted before posting.

Closed by commit rG186155b89c2d: [InstCombine] Signed saturation patterns (authored by dmgreen). · Explain WhyOct 22 2019, 8:39 AM

This revision was automatically updated to reflect the committed changes.

dmgreen marked an inline comment as done.

LNT reports 7% slowdown for SingleSource/UnitTests/Vectorizer/gcc-loops. Hard to say if noise, since this was the last LNT build.

https://lnt.llvm.org/db_default/v4/nts/130142

Hello, Looks like noise to me: http://lnt.llvm.org/db_default/v4/nts/graph?plot.0=1344.1605916.3&highlight_run=130142

I don't think that benchmark contains anything that would hit this code. If you compare gcc-loops in these two runs, the hashes of the binaries are the same:
https://lnt.llvm.org/api/db_default/v4/nts/runs/130141
https://lnt.llvm.org/api/db_default/v4/nts/runs/130142
Why the first was repeatedly 1.3 and the later repeatedly 1.4, i'm not sure, but it looks like it's been bouncing around like that for a while.

(Also I feel that the hashes not changing should really appear in the run results somewhere! It's useful to filter out the noise)

Thanks!

Yes, it would be great to show that binaries are same.

dmgreen mentioned this in rGbf21f0d489fb: [InstCombine] Extra combine for uadd_sat.Oct 28 2019, 8:28 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineInternal.h

1 line

InstCombineSelect.cpp

67 lines

test/

Transforms/

InstCombine/

sadd_sat.ll

154 lines

Diff 226052

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 595 Lines • ▼ Show 20 Lines	private:
Value EmitGEPOffset(User GEP);		Value EmitGEPOffset(User GEP);
Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);		Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);
Instruction *foldCastedBitwiseLogic(BinaryOperator &I);		Instruction *foldCastedBitwiseLogic(BinaryOperator &I);
Instruction *narrowBinOp(TruncInst &Trunc);		Instruction *narrowBinOp(TruncInst &Trunc);
Instruction *narrowMaskedBinOp(BinaryOperator &And);		Instruction *narrowMaskedBinOp(BinaryOperator &And);
Instruction *narrowMathIfNoOverflow(BinaryOperator &I);		Instruction *narrowMathIfNoOverflow(BinaryOperator &I);
Instruction *narrowRotate(TruncInst &Trunc);		Instruction *narrowRotate(TruncInst &Trunc);
Instruction optimizeBitCastFromPhi(CastInst &CI, PHINode PN);		Instruction optimizeBitCastFromPhi(CastInst &CI, PHINode PN);
		Instruction *matchSAddSubSat(SelectInst &MinMax1);

/// Determine if a pair of casts can be replaced by a single cast.		/// Determine if a pair of casts can be replaced by a single cast.
///		///
/// \param CI1 The first of a pair of casts.		/// \param CI1 The first of a pair of casts.
/// \param CI2 The second of a pair of casts.		/// \param CI2 The second of a pair of casts.
///		///
/// \return 0 if the cast pair cannot be eliminated, otherwise returns an		/// \return 0 if the cast pair cannot be eliminated, otherwise returns an
/// Instruction::CastOps value for a cast that can replace the pair, casting		/// Instruction::CastOps value for a cast that can replace the pair, casting
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 1,998 Lines • ▼ Show 20 Lines	if (!Overflow) {
return BinaryOperator::CreateNSW(BinaryOperator::Add, NewMinMax,		return BinaryOperator::CreateNSW(BinaryOperator::Add, NewMinMax,
ConstantInt::get(X->getType(), *C1));		ConstantInt::get(X->getType(), *C1));
}		}
}		}

return nullptr;		return nullptr;
}		}

		/// Match a sadd_sat or ssub_sat which is using min/max to clamp the value.
		Instruction *InstCombiner::matchSAddSubSat(SelectInst &MinMax1) {
		Type *Ty = MinMax1.getType();

		// We are looking for a tree of:
		// max(INT_MIN, min(INT_MAX, add(sext(A), sext(B))))
		// Where the min and max could be reversed
		Instruction *MinMax2;
		BinaryOperator *AddSub;
		const APInt MinValue, MaxValue;
		if (match(&MinMax1, m_SMin(m_Instruction(MinMax2), m_APInt(MaxValue)))) {
		if (!match(MinMax2, m_SMax(m_BinOp(AddSub), m_APInt(MinValue))))
		return nullptr;
		} else if (match(&MinMax1,
		m_SMax(m_Instruction(MinMax2), m_APInt(MinValue)))) {
		if (!match(MinMax2, m_SMin(m_BinOp(AddSub), m_APInt(MaxValue))))
		return nullptr;
		} else
		return nullptr;

		// Check that the constants clamp a saturate, and that the new type would be
		// sensible to convert to.
		if (!(MaxValue + 1).isPowerOf2() \|\| -MinValue != *MaxValue + 1)
		return nullptr;
		// In what bitwidth can this be treated as saturating arithmetics?
		lebedev.riUnsubmitted Done Reply Inline Actions /// In what bitwidth can this be treated as saturating arithmetics? lebedev.ri: ``` /// In what bitwidth can this be treated as saturating arithmetics? ```
		unsigned NewBitWidth = (*MaxValue + 1).logBase2() + 1;
		// FIXME: This isn't quite right for vectors, but using the scalar type is a
		// good first approximation for what should be done there.
		if (!shouldChangeType(Ty->getScalarType()->getIntegerBitWidth(), NewBitWidth))
		lebedev.riUnsubmitted Done Reply Inline Actions Can you tell which test is not folded because of this check? lebedev.ri: Can you tell which test is not folded because of this check?
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Ah. That was meant to be sadd_sat31, but then I re-added back in some extra checks. I'll change it to a sadd_sat4 which should test this. I presume it's a good idea to prevent the creation i4's? dmgreen: Ah. That was meant to be sadd_sat31, but then I re-added back in some extra checks. I'll change…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Eh, good point. Not sure honestly, i don't remember ever seeing that check in any of the transforms recently-ish added. lebedev.ri: Eh, good point. Not sure honestly, i don't remember ever seeing that check in any of the…
		lebedev.riUnsubmitted Not Done Reply Inline Actions @spatel @nikic thoughts? I'm almost thinking this should not be here. lebedev.ri: @spatel @nikic thoughts? I'm almost thinking this should not be here.
		nikicUnsubmitted Not Done Reply Inline Actions I think this check makes sense in this case, because we are creating a type from integer constants. E.g. if those happen to be `-256` and `255`, we'd create a `sadd.sat.i9` call, which will introduce the use of an "unusual" type that was not present in the source program. I think that's something we should generally avoid, as it will likely result in an unfavorable lowering in the backend. nikic: I think this check makes sense in this case, because we are creating a type from integer…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions InstCombine used to be more strict about changing to illegal types before shouldChangeType was relaxed to at least include common types. I've not ran performance numbers either way over whether it's better to be less strict about this. It likely doesn't come up a lot in this case. The checks we currently have here will only allow generate sats if there was an existing value of that type, from the sext (I think). sext from a larger type will not be valid to transform, and if both inputs are smaller the min/max will be superfluous (so we just use a standard add). So at least one of the inputs needs to be the same size as the final sat. Whether we should turn a sext from an illegal type into a sadd_sat with an illegal type is the question then. I'd still go with no, but that's mostly because it feel's simpler (and it's likely does not come up very often). dmgreen: InstCombine used to be more strict about changing to illegal types before shouldChangeType was…
		return nullptr;

		// Also make sure that the number of uses is as expected. The "3"s are for the
		// the two items of min/max (the compare and the select).
		if (MinMax2->hasNUsesOrMore(3) \|\| AddSub->hasNUsesOrMore(3))
		return nullptr;

		// Create the new type (which can be a vector type)
		Type *NewTy = Ty->getWithNewBitWidth(NewBitWidth);
		// Match the two extends from the add/sub
		Value A, B;
		lebedev.riUnsubmitted Done Reply Inline Actions This is ugly. There should be a way to do that by passing an old type and a new bitwidth. lebedev.ri: This is ugly. There should be a way to do that by passing an old type and a new bitwidth.
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Do you mean the whole "if (Ty->isVectorTy()).. " or just the "VectorType::get(..)" part? The second one looks simple, by using getElementCount. Let me know if you meant the first. dmgreen: Do you mean the whole "if (Ty->isVectorTy()).. " or just the "VectorType::get(..)" part? The…
		lebedev.riUnsubmitted Done Reply Inline Actions The entirety of the block i highlighted. lebedev.ri: The entirety of the block i highlighted.
		if(!match(AddSub, m_BinOp(m_SExt(m_Value(A)), m_SExt(m_Value(B)))))
		return nullptr;
		// And check the incoming values are of a type smaller than or equal to the
		// size of the saturation. Otherwise the higher bits can cause different
		// results.
		if (A->getType()->getScalarSizeInBits() > NewBitWidth \|\|
		B->getType()->getScalarSizeInBits() > NewBitWidth)
		nikicUnsubmitted Done Reply Inline Actions I'd use `Ty->getScalarSizeInBits()` rather than `Ty->getScalarType()->getIntegerBitWidth()` here and above. I believe that's idiomatic even if we're dealing specifically in integers. nikic: I'd use `Ty->getScalarSizeInBits()` rather than `Ty->getScalarType()->getIntegerBitWidth()`…
		lebedev.riUnsubmitted Done Reply Inline Actions Oh, thanks for spotting this, i thought i posted about this, but i guess it was in an inline comment that i deleted before posting. lebedev.ri: Oh, thanks for spotting this, i thought i posted about this, but i guess it was in an inline…
		return nullptr;

		Intrinsic::ID IntrinsicID;
		if (AddSub->getOpcode() == Instruction::Add)
		lebedev.riUnsubmitted Done Reply Inline Actions This should be closer to something like Value A, B; if(!match(AddSub, m_BinOp(m_SExt(m_Value(A)), m_SExt(m_Value(B))))) return nullptr' Function F; if (AddSub.getOpcode() == Instruction::Add) F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::sadd_sat, NewTy); else if (AddSub.getOpcode() == Instruction::Sub) F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::ssub_sat, NewTy); else return nullptr; lebedev.ri:* This should be closer to something like ``` Value A, B; if(!match(AddSub, m_BinOp(m_SExt…
		lebedev.riUnsubmitted Done Reply Inline Actions Now that i think about it, the only really differing part is the new opcode, so ZZZ NewOpcode; if (AddSub->getOpcode() == Instruction::Add) NewOpcode = Intrinsic::sadd_sat; else if (AddSub->getOpcode() == Instruction::Sub) NewOpcode = Intrinsic::ssub_sat; else return nullptr; auto F = Intrinsic::getDeclaration(MinMax1.getModule(), Intrinsic::ssub_sat, NewTy); even. lebedev.ri: Now that i think about it, the only really differing part is the new opcode, so ``` ZZZ…
		IntrinsicID = Intrinsic::sadd_sat;
		else if (AddSub->getOpcode() == Instruction::Sub)
		IntrinsicID = Intrinsic::ssub_sat;
		lebedev.riUnsubmitted Not Done Reply Inline Actions I don't understand why this test is here. You know you matched `sext` on the way to `A` and `B`. lebedev.ri: I don't understand why this test is here. You know you matched `sext` on the way to `A` and `B`.
		dmgreenAuthorUnsubmitted Done Reply Inline Actions I originally left this out of the new patch, but think it is still needed. We could be SExtending from something larger than the NewBitWidth, and if it does contain higher bits the results of the add might be different to what we expect. Something like https://rise4fun.com/Alive/WZvW, if I have that correct (I don't know if there is a way to say "sadd_sat" exactly. Let me know if you know of a way, I thought I'd seen something like that before but can't find it now. The idea is to try and replicate what that does in the i9 part). dmgreen: I originally left this out of the new patch, but think it is still needed. We could be…
		lebedev.riUnsubmitted Done Reply Inline Actions Please move to right after `if(!match(AddSub, m_BinOp(m_SExt(`. I guess i see what you mean, please ensure there is a test for that. lebedev.ri: Please move to right after `if(!match(AddSub, m_BinOp(m_SExt(`. I guess i see what you mean…
		else
		lebedev.riUnsubmitted Done Reply Inline Actions This needs more meaningful comment. lebedev.ri: This needs more meaningful comment.
		return nullptr;

		// Finally create and return the sat intrinsic, truncated to the new type
		Function *F = Intrinsic::getDeclaration(MinMax1.getModule(), IntrinsicID, NewTy);
		Value *AT = Builder.CreateSExt(A, NewTy);
		Value *BT = Builder.CreateSExt(B, NewTy);
		Value *Sat = Builder.CreateCall(F, {AT, BT});
		return CastInst::Create(Instruction::SExt, Sat, Ty);
		}

/// Reduce a sequence of min/max with a common operand.		/// Reduce a sequence of min/max with a common operand.
static Instruction factorizeMinMaxTree(SelectPatternFlavor SPF, Value LHS,		static Instruction factorizeMinMaxTree(SelectPatternFlavor SPF, Value LHS,
Value *RHS,		Value *RHS,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
assert(SelectPatternResult::isMinOrMax(SPF) && "Expected a min/max");		assert(SelectPatternResult::isMinOrMax(SPF) && "Expected a min/max");
// TODO: Allow FP min/max with nnan/nsz.		// TODO: Allow FP min/max with nnan/nsz.
if (!LHS->getType()->isIntOrIntVectorTy())		if (!LHS->getType()->isIntOrIntVectorTy())
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 410 Lines • ▼ Show 20 Lines	if (SelectPatternResult::isMinOrMax(SPF)) {
if (Instruction *I = moveNotAfterMinMax(RHS, LHS))		if (Instruction *I = moveNotAfterMinMax(RHS, LHS))
return I;		return I;

if (Instruction *I = moveAddAfterMinMax(SPF, LHS, RHS, Builder))		if (Instruction *I = moveAddAfterMinMax(SPF, LHS, RHS, Builder))
return I;		return I;

if (Instruction *I = factorizeMinMaxTree(SPF, LHS, RHS, Builder))		if (Instruction *I = factorizeMinMaxTree(SPF, LHS, RHS, Builder))
return I;		return I;
		if (Instruction *I = matchSAddSubSat(SI))
		return I;
}		}
}		}

// Canonicalize select of FP values where NaN and -0.0 are not valid as		// Canonicalize select of FP values where NaN and -0.0 are not valid as
// minnum/maxnum intrinsics.		// minnum/maxnum intrinsics.
if (isa<FPMathOperator>(SI) && SI.hasNoNaNs() && SI.hasNoSignedZeros()) {		if (isa<FPMathOperator>(SI) && SI.hasNoNaNs() && SI.hasNoSignedZeros()) {
Value X, Y;		Value X, Y;
if (match(&SI, m_OrdFMax(m_Value(X), m_Value(Y))))		if (match(&SI, m_OrdFMax(m_Value(X), m_Value(Y))))
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sadd_sat.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -instcombine -S \| FileCheck %s		; RUN: opt < %s -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"		target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"

define i32 @sadd_sat32(i32 %a, i32 %b) {		define i32 @sadd_sat32(i32 %a, i32 %b) {
; CHECK-LABEL: @sadd_sat32(		; CHECK-LABEL: @sadd_sat32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.sadd.sat.i32(i32 [[B:%.]], i32 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i32 [[B:%.]] to i64		; CHECK-NEXT: ret i32 [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i64 [[ADD]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[ADD]], i64 2147483647
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[SPEC_STORE_SELECT]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 -2147483648
; CHECK-NEXT: [[CONV7:%.*]] = trunc i64 [[SPEC_STORE_SELECT8]] to i32
; CHECK-NEXT: ret i32 [[CONV7]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i32 %b to i64		%conv1 = sext i32 %b to i64
%add = add i64 %conv1, %conv		%add = add i64 %conv1, %conv
%0 = icmp slt i64 %add, 2147483647		%0 = icmp slt i64 %add, 2147483647
%spec.store.select = select i1 %0, i64 %add, i64 2147483647		%spec.store.select = select i1 %0, i64 %add, i64 2147483647
%1 = icmp sgt i64 %spec.store.select, -2147483648		%1 = icmp sgt i64 %spec.store.select, -2147483648
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648
%conv7 = trunc i64 %spec.store.select8 to i32		%conv7 = trunc i64 %spec.store.select8 to i32
ret i32 %conv7		ret i32 %conv7
}		}

define i32 @ssub_sat32(i32 %a, i32 %b) {		define i32 @ssub_sat32(i32 %a, i32 %b) {
; CHECK-LABEL: @ssub_sat32(		; CHECK-LABEL: @ssub_sat32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.ssub.sat.i32(i32 [[A:%.]], i32 [[B:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i32 [[B:%.]] to i64		; CHECK-NEXT: ret i32 [[TMP0]]
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i64 [[CONV]], [[CONV1]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i64 [[SUB]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[SUB]], i64 2147483647
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[SPEC_STORE_SELECT]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 -2147483648
; CHECK-NEXT: [[CONV7:%.*]] = trunc i64 [[SPEC_STORE_SELECT8]] to i32
; CHECK-NEXT: ret i32 [[CONV7]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i32 %b to i64		%conv1 = sext i32 %b to i64
%sub = sub i64 %conv, %conv1		%sub = sub i64 %conv, %conv1
%0 = icmp slt i64 %sub, 2147483647		%0 = icmp slt i64 %sub, 2147483647
%spec.store.select = select i1 %0, i64 %sub, i64 2147483647		%spec.store.select = select i1 %0, i64 %sub, i64 2147483647
%1 = icmp sgt i64 %spec.store.select, -2147483648		%1 = icmp sgt i64 %spec.store.select, -2147483648
Show All 25 Lines	entry:
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648
%conv7 = trunc i64 %spec.store.select8 to i32		%conv7 = trunc i64 %spec.store.select8 to i32
ret i32 %conv7		ret i32 %conv7
}		}

define signext i16 @sadd_sat16(i16 signext %a, i16 signext %b) {		define signext i16 @sadd_sat16(i16 signext %a, i16 signext %b) {
; CHECK-LABEL: @sadd_sat16(		; CHECK-LABEL: @sadd_sat16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i16 [[A:%.]] to i32		; CHECK-NEXT: [[TMP0:%.]] = call i16 @llvm.sadd.sat.i16(i16 [[B:%.]], i16 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i16 [[B:%.]] to i32		; CHECK-NEXT: ret i16 [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[ADD]], 32767
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i32 [[ADD]], i32 32767
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT]], -32768
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i32 [[SPEC_STORE_SELECT]], i32 -32768
; CHECK-NEXT: [[CONV9:%.*]] = trunc i32 [[SPEC_STORE_SELECT10]] to i16
; CHECK-NEXT: ret i16 [[CONV9]]
;		;
entry:		entry:
%conv = sext i16 %a to i32		%conv = sext i16 %a to i32
%conv1 = sext i16 %b to i32		%conv1 = sext i16 %b to i32
%add = add i32 %conv1, %conv		%add = add i32 %conv1, %conv
%0 = icmp slt i32 %add, 32767		%0 = icmp slt i32 %add, 32767
%spec.store.select = select i1 %0, i32 %add, i32 32767		%spec.store.select = select i1 %0, i32 %add, i32 32767
%1 = icmp sgt i32 %spec.store.select, -32768		%1 = icmp sgt i32 %spec.store.select, -32768
%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -32768		%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -32768
%conv9 = trunc i32 %spec.store.select10 to i16		%conv9 = trunc i32 %spec.store.select10 to i16
ret i16 %conv9		ret i16 %conv9
}		}

define signext i16 @ssub_sat16(i16 signext %a, i16 signext %b) {		define signext i16 @ssub_sat16(i16 signext %a, i16 signext %b) {
; CHECK-LABEL: @ssub_sat16(		; CHECK-LABEL: @ssub_sat16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i16 [[A:%.]] to i32		; CHECK-NEXT: [[TMP0:%.]] = call i16 @llvm.ssub.sat.i16(i16 [[A:%.]], i16 [[B:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i16 [[B:%.]] to i32		; CHECK-NEXT: ret i16 [[TMP0]]
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[CONV]], [[CONV1]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[SUB]], 32767
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i32 [[SUB]], i32 32767
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT]], -32768
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i32 [[SPEC_STORE_SELECT]], i32 -32768
; CHECK-NEXT: [[CONV9:%.*]] = trunc i32 [[SPEC_STORE_SELECT10]] to i16
; CHECK-NEXT: ret i16 [[CONV9]]
;		;
entry:		entry:
%conv = sext i16 %a to i32		%conv = sext i16 %a to i32
%conv1 = sext i16 %b to i32		%conv1 = sext i16 %b to i32
%sub = sub i32 %conv, %conv1		%sub = sub i32 %conv, %conv1
%0 = icmp slt i32 %sub, 32767		%0 = icmp slt i32 %sub, 32767
%spec.store.select = select i1 %0, i32 %sub, i32 32767		%spec.store.select = select i1 %0, i32 %sub, i32 32767
%1 = icmp sgt i32 %spec.store.select, -32768		%1 = icmp sgt i32 %spec.store.select, -32768
%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -32768		%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -32768
%conv9 = trunc i32 %spec.store.select10 to i16		%conv9 = trunc i32 %spec.store.select10 to i16
ret i16 %conv9		ret i16 %conv9
}		}

define signext i8 @sadd_sat8(i8 signext %a, i8 signext %b) {		define signext i8 @sadd_sat8(i8 signext %a, i8 signext %b) {
; CHECK-LABEL: @sadd_sat8(		; CHECK-LABEL: @sadd_sat8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.sadd.sat.i8(i8 [[B:%.]], i8 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i8 [[B:%.]] to i32		; CHECK-NEXT: ret i8 [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[ADD]], 127
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i32 [[ADD]], i32 127
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT]], -128
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i32 [[SPEC_STORE_SELECT]], i32 -128
; CHECK-NEXT: [[CONV9:%.*]] = trunc i32 [[SPEC_STORE_SELECT10]] to i8
; CHECK-NEXT: ret i8 [[CONV9]]
;		;
entry:		entry:
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%conv1 = sext i8 %b to i32		%conv1 = sext i8 %b to i32
%add = add i32 %conv1, %conv		%add = add i32 %conv1, %conv
%0 = icmp slt i32 %add, 127		%0 = icmp slt i32 %add, 127
%spec.store.select = select i1 %0, i32 %add, i32 127		%spec.store.select = select i1 %0, i32 %add, i32 127
%1 = icmp sgt i32 %spec.store.select, -128		%1 = icmp sgt i32 %spec.store.select, -128
%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -128		%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -128
%conv9 = trunc i32 %spec.store.select10 to i8		%conv9 = trunc i32 %spec.store.select10 to i8
ret i8 %conv9		ret i8 %conv9
}		}

define signext i8 @ssub_sat8(i8 signext %a, i8 signext %b) {		define signext i8 @ssub_sat8(i8 signext %a, i8 signext %b) {
; CHECK-LABEL: @ssub_sat8(		; CHECK-LABEL: @ssub_sat8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i8 [[A:%.]] to i32		; CHECK-NEXT: [[TMP0:%.]] = call i8 @llvm.ssub.sat.i8(i8 [[A:%.]], i8 [[B:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i8 [[B:%.]] to i32		; CHECK-NEXT: ret i8 [[TMP0]]
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[CONV]], [[CONV1]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i32 [[SUB]], 127
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i32 [[SUB]], i32 127
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 [[SPEC_STORE_SELECT]], -128
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i32 [[SPEC_STORE_SELECT]], i32 -128
; CHECK-NEXT: [[CONV9:%.*]] = trunc i32 [[SPEC_STORE_SELECT10]] to i8
; CHECK-NEXT: ret i8 [[CONV9]]
;		;
entry:		entry:
%conv = sext i8 %a to i32		%conv = sext i8 %a to i32
%conv1 = sext i8 %b to i32		%conv1 = sext i8 %b to i32
%sub = sub i32 %conv, %conv1		%sub = sub i32 %conv, %conv1
%0 = icmp slt i32 %sub, 127		%0 = icmp slt i32 %sub, 127
%spec.store.select = select i1 %0, i32 %sub, i32 127		%spec.store.select = select i1 %0, i32 %sub, i32 127
%1 = icmp sgt i32 %spec.store.select, -128		%1 = icmp sgt i32 %spec.store.select, -128
%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -128		%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -128
%conv9 = trunc i32 %spec.store.select10 to i8		%conv9 = trunc i32 %spec.store.select10 to i8
ret i8 %conv9		ret i8 %conv9
}		}

define signext i64 @sadd_sat64(i64 signext %a, i64 signext %b) {		define signext i64 @sadd_sat64(i64 signext %a, i64 signext %b) {
; CHECK-LABEL: @sadd_sat64(		; CHECK-LABEL: @sadd_sat64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i64 [[A:%.]] to i65		; CHECK-NEXT: [[TMP0:%.]] = call i64 @llvm.sadd.sat.i64(i64 [[B:%.]], i64 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i64 [[B:%.]] to i65		; CHECK-NEXT: ret i64 [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i65 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i65 [[ADD]], 9223372036854775807
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i65 [[ADD]], i65 9223372036854775807
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i65 [[SPEC_STORE_SELECT]], -9223372036854775808
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i65 [[SPEC_STORE_SELECT]], i65 -9223372036854775808
; CHECK-NEXT: [[CONV9:%.*]] = trunc i65 [[SPEC_STORE_SELECT10]] to i64
; CHECK-NEXT: ret i64 [[CONV9]]
;		;
entry:		entry:
%conv = sext i64 %a to i65		%conv = sext i64 %a to i65
%conv1 = sext i64 %b to i65		%conv1 = sext i64 %b to i65
%add = add i65 %conv1, %conv		%add = add i65 %conv1, %conv
%0 = icmp slt i65 %add, 9223372036854775807		%0 = icmp slt i65 %add, 9223372036854775807
%spec.store.select = select i1 %0, i65 %add, i65 9223372036854775807		%spec.store.select = select i1 %0, i65 %add, i65 9223372036854775807
%1 = icmp sgt i65 %spec.store.select, -9223372036854775808		%1 = icmp sgt i65 %spec.store.select, -9223372036854775808
%spec.store.select10 = select i1 %1, i65 %spec.store.select, i65 -9223372036854775808		%spec.store.select10 = select i1 %1, i65 %spec.store.select, i65 -9223372036854775808
%conv9 = trunc i65 %spec.store.select10 to i64		%conv9 = trunc i65 %spec.store.select10 to i64
ret i64 %conv9		ret i64 %conv9
}		}

define signext i64 @ssub_sat64(i64 signext %a, i64 signext %b) {		define signext i64 @ssub_sat64(i64 signext %a, i64 signext %b) {
; CHECK-LABEL: @ssub_sat64(		; CHECK-LABEL: @ssub_sat64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i64 [[A:%.]] to i65		; CHECK-NEXT: [[TMP0:%.]] = call i64 @llvm.ssub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i64 [[B:%.]] to i65		; CHECK-NEXT: ret i64 [[TMP0]]
; CHECK-NEXT: [[SUB:%.*]] = sub nsw i65 [[CONV]], [[CONV1]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i65 [[SUB]], 9223372036854775807
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i65 [[SUB]], i65 9223372036854775807
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i65 [[SPEC_STORE_SELECT]], -9223372036854775808
; CHECK-NEXT: [[SPEC_STORE_SELECT10:%.*]] = select i1 [[TMP1]], i65 [[SPEC_STORE_SELECT]], i65 -9223372036854775808
; CHECK-NEXT: [[CONV9:%.*]] = trunc i65 [[SPEC_STORE_SELECT10]] to i64
; CHECK-NEXT: ret i64 [[CONV9]]
;		;
entry:		entry:
%conv = sext i64 %a to i65		%conv = sext i64 %a to i65
%conv1 = sext i64 %b to i65		%conv1 = sext i64 %b to i65
%sub = sub i65 %conv, %conv1		%sub = sub i65 %conv, %conv1
%0 = icmp slt i65 %sub, 9223372036854775807		%0 = icmp slt i65 %sub, 9223372036854775807
%spec.store.select = select i1 %0, i65 %sub, i65 9223372036854775807		%spec.store.select = select i1 %0, i65 %sub, i65 9223372036854775807
%1 = icmp sgt i65 %spec.store.select, -9223372036854775808		%1 = icmp sgt i65 %spec.store.select, -9223372036854775808
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	entry:
%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -8		%spec.store.select10 = select i1 %1, i32 %spec.store.select, i32 -8
%conv9 = trunc i32 %spec.store.select10 to i4		%conv9 = trunc i32 %spec.store.select10 to i4
ret i4 %conv9		ret i4 %conv9
}		}

define <4 x i32> @sadd_satv4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @sadd_satv4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @sadd_satv4i32(		; CHECK-LABEL: @sadd_satv4i32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext <4 x i32> [[A:%.]] to <4 x i64>		; CHECK-NEXT: [[TMP0:%.]] = call <4 x i32> @llvm.sadd.sat.v4i32(<4 x i32> [[B:%.]], <4 x i32> [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext <4 x i32> [[B:%.]] to <4 x i64>		; CHECK-NEXT: ret <4 x i32> [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw <4 x i64> [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i64> [[ADD]], <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select <4 x i1> [[TMP0]], <4 x i64> [[ADD]], <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i64> [[SPEC_STORE_SELECT]], <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select <4 x i1> [[TMP1]], <4 x i64> [[SPEC_STORE_SELECT]], <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
; CHECK-NEXT: [[CONV7:%.*]] = trunc <4 x i64> [[SPEC_STORE_SELECT8]] to <4 x i32>
; CHECK-NEXT: ret <4 x i32> [[CONV7]]
;		;
entry:		entry:
%conv = sext <4 x i32> %a to <4 x i64>		%conv = sext <4 x i32> %a to <4 x i64>
%conv1 = sext <4 x i32> %b to <4 x i64>		%conv1 = sext <4 x i32> %b to <4 x i64>
%add = add <4 x i64> %conv1, %conv		%add = add <4 x i64> %conv1, %conv
%0 = icmp slt <4 x i64> %add, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>		%0 = icmp slt <4 x i64> %add, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
%spec.store.select = select <4 x i1> %0, <4 x i64> %add, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>		%spec.store.select = select <4 x i1> %0, <4 x i64> %add, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
%1 = icmp sgt <4 x i64> %spec.store.select, <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>		%1 = icmp sgt <4 x i64> %spec.store.select, <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
%spec.store.select8 = select <4 x i1> %1, <4 x i64> %spec.store.select, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>		%spec.store.select8 = select <4 x i1> %1, <4 x i64> %spec.store.select, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
%conv7 = trunc <4 x i64> %spec.store.select8 to <4 x i32>		%conv7 = trunc <4 x i64> %spec.store.select8 to <4 x i32>
ret <4 x i32> %conv7		ret <4 x i32> %conv7
}		}

define <4 x i32> @ssub_satv4i32(<4 x i32> %a, <4 x i32> %b) {		define <4 x i32> @ssub_satv4i32(<4 x i32> %a, <4 x i32> %b) {
; CHECK-LABEL: @ssub_satv4i32(		; CHECK-LABEL: @ssub_satv4i32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext <4 x i32> [[A:%.]] to <4 x i64>		; CHECK-NEXT: [[TMP0:%.]] = call <4 x i32> @llvm.ssub.sat.v4i32(<4 x i32> [[B:%.]], <4 x i32> [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext <4 x i32> [[B:%.]] to <4 x i64>		; CHECK-NEXT: ret <4 x i32> [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = sub nsw <4 x i64> [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt <4 x i64> [[ADD]], <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select <4 x i1> [[TMP0]], <4 x i64> [[ADD]], <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i64> [[SPEC_STORE_SELECT]], <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select <4 x i1> [[TMP1]], <4 x i64> [[SPEC_STORE_SELECT]], <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
; CHECK-NEXT: [[CONV7:%.*]] = trunc <4 x i64> [[SPEC_STORE_SELECT8]] to <4 x i32>
; CHECK-NEXT: ret <4 x i32> [[CONV7]]
;		;
entry:		entry:
%conv = sext <4 x i32> %a to <4 x i64>		%conv = sext <4 x i32> %a to <4 x i64>
%conv1 = sext <4 x i32> %b to <4 x i64>		%conv1 = sext <4 x i32> %b to <4 x i64>
%add = sub <4 x i64> %conv1, %conv		%add = sub <4 x i64> %conv1, %conv
%0 = icmp slt <4 x i64> %add, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>		%0 = icmp slt <4 x i64> %add, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
%spec.store.select = select <4 x i1> %0, <4 x i64> %add, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>		%spec.store.select = select <4 x i1> %0, <4 x i64> %add, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>
%1 = icmp sgt <4 x i64> %spec.store.select, <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>		%1 = icmp sgt <4 x i64> %spec.store.select, <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>
Show All 39 Lines	entry:
%spec.store.select8 = select <4 x i1> %1, <4 x i32> %spec.store.select, <4 x i32> <i32 -16, i32 -16, i32 -16, i32 -16>		%spec.store.select8 = select <4 x i1> %1, <4 x i32> %spec.store.select, <4 x i32> <i32 -16, i32 -16, i32 -16, i32 -16>
ret <4 x i32> %spec.store.select8		ret <4 x i32> %spec.store.select8
}		}


define i32 @sadd_sat32_extrause_1(i32 %a, i32 %b) {		define i32 @sadd_sat32_extrause_1(i32 %a, i32 %b) {
; CHECK-LABEL: @sadd_sat32_extrause_1(		; CHECK-LABEL: @sadd_sat32_extrause_1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.sadd.sat.i32(i32 [[B:%.]], i32 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i32 [[B:%.]] to i64		; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = sext i32 [[TMP0]] to i64
; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i64 [[ADD]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[ADD]], i64 2147483647
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[SPEC_STORE_SELECT]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 -2147483648
; CHECK-NEXT: [[CONV7:%.*]] = trunc i64 [[SPEC_STORE_SELECT8]] to i32
; CHECK-NEXT: call void @use64(i64 [[SPEC_STORE_SELECT8]])		; CHECK-NEXT: call void @use64(i64 [[SPEC_STORE_SELECT8]])
; CHECK-NEXT: ret i32 [[CONV7]]		; CHECK-NEXT: ret i32 [[TMP0]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i32 %b to i64		%conv1 = sext i32 %b to i64
%add = add i64 %conv1, %conv		%add = add i64 %conv1, %conv
%0 = icmp slt i64 %add, 2147483647		%0 = icmp slt i64 %add, 2147483647
%spec.store.select = select i1 %0, i64 %add, i64 2147483647		%spec.store.select = select i1 %0, i64 %add, i64 2147483647
%1 = icmp sgt i64 %spec.store.select, -2147483648		%1 = icmp sgt i64 %spec.store.select, -2147483648
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	entry:
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -32768		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -32768
%conv7 = trunc i64 %spec.store.select8 to i32		%conv7 = trunc i64 %spec.store.select8 to i32
ret i32 %conv7		ret i32 %conv7
}		}

define i32 @sadd_sat32_ext16(i32 %a, i16 %b) {		define i32 @sadd_sat32_ext16(i32 %a, i16 %b) {
; CHECK-LABEL: @sadd_sat32_ext16(		; CHECK-LABEL: @sadd_sat32_ext16(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = sext i16 [[B:%.]] to i32
; CHECK-NEXT: [[CONV1:%.]] = sext i16 [[B:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.sadd.sat.i32(i32 [[TMP0]], i32 [[A:%.]])
; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[CONV1]], [[CONV]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i64 [[ADD]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[ADD]], i64 2147483647
; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i64 [[SPEC_STORE_SELECT]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 -2147483648
; CHECK-NEXT: [[CONV7:%.*]] = trunc i64 [[SPEC_STORE_SELECT8]] to i32
; CHECK-NEXT: ret i32 [[CONV7]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i16 %b to i64		%conv1 = sext i16 %b to i64
%add = add i64 %conv1, %conv		%add = add i64 %conv1, %conv
%0 = icmp slt i64 %add, 2147483647		%0 = icmp slt i64 %add, 2147483647
%spec.store.select = select i1 %0, i64 %add, i64 2147483647		%spec.store.select = select i1 %0, i64 %add, i64 2147483647
%1 = icmp sgt i64 %spec.store.select, -2147483648		%1 = icmp sgt i64 %spec.store.select, -2147483648
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	entry:
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 -2147483648
%conv7 = trunc i64 %spec.store.select8 to i32		%conv7 = trunc i64 %spec.store.select8 to i32
ret i32 %conv7		ret i32 %conv7
}		}

define i32 @sadd_sat32_maxmin(i32 %a, i32 %b) {		define i32 @sadd_sat32_maxmin(i32 %a, i32 %b) {
; CHECK-LABEL: @sadd_sat32_maxmin(		; CHECK-LABEL: @sadd_sat32_maxmin(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.sadd.sat.i32(i32 [[B:%.]], i32 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i32 [[B:%.]] to i64		; CHECK-NEXT: ret i32 [[TMP0]]
; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i64 [[ADD]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[ADD]], i64 -2147483648
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i64 [[SPEC_STORE_SELECT]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 2147483647
; CHECK-NEXT: [[CONV7:%.*]] = trunc i64 [[SPEC_STORE_SELECT8]] to i32
; CHECK-NEXT: ret i32 [[CONV7]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i32 %b to i64		%conv1 = sext i32 %b to i64
%add = add i64 %conv1, %conv		%add = add i64 %conv1, %conv
%0 = icmp sgt i64 %add, -2147483648		%0 = icmp sgt i64 %add, -2147483648
%spec.store.select = select i1 %0, i64 %add, i64 -2147483648		%spec.store.select = select i1 %0, i64 %add, i64 -2147483648
%1 = icmp slt i64 %spec.store.select, 2147483647		%1 = icmp slt i64 %spec.store.select, 2147483647
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 2147483647		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 2147483647
%conv7 = trunc i64 %spec.store.select8 to i32		%conv7 = trunc i64 %spec.store.select8 to i32
ret i32 %conv7		ret i32 %conv7
}		}

define i64 @sadd_sat32_notrunc(i32 %a, i32 %b) {		define i64 @sadd_sat32_notrunc(i32 %a, i32 %b) {
; CHECK-LABEL: @sadd_sat32_notrunc(		; CHECK-LABEL: @sadd_sat32_notrunc(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sext i32 [[A:%.]] to i64		; CHECK-NEXT: [[TMP0:%.]] = call i32 @llvm.sadd.sat.i32(i32 [[B:%.]], i32 [[A:%.*]])
; CHECK-NEXT: [[CONV1:%.]] = sext i32 [[B:%.]] to i64		; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = sext i32 [[TMP0]] to i64
; CHECK-NEXT: [[ADD:%.*]] = add nsw i64 [[CONV1]], [[CONV]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt i64 [[ADD]], -2147483648
; CHECK-NEXT: [[SPEC_STORE_SELECT:%.*]] = select i1 [[TMP0]], i64 [[ADD]], i64 -2147483648
; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i64 [[SPEC_STORE_SELECT]], 2147483647
; CHECK-NEXT: [[SPEC_STORE_SELECT8:%.*]] = select i1 [[TMP1]], i64 [[SPEC_STORE_SELECT]], i64 2147483647
; CHECK-NEXT: ret i64 [[SPEC_STORE_SELECT8]]		; CHECK-NEXT: ret i64 [[SPEC_STORE_SELECT8]]
;		;
entry:		entry:
%conv = sext i32 %a to i64		%conv = sext i32 %a to i64
%conv1 = sext i32 %b to i64		%conv1 = sext i32 %b to i64
%add = add i64 %conv1, %conv		%add = add i64 %conv1, %conv
%0 = icmp sgt i64 %add, -2147483648		%0 = icmp sgt i64 %add, -2147483648
%spec.store.select = select i1 %0, i64 %add, i64 -2147483648		%spec.store.select = select i1 %0, i64 %add, i64 -2147483648
%1 = icmp slt i64 %spec.store.select, 2147483647		%1 = icmp slt i64 %spec.store.select, 2147483647
%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 2147483647		%spec.store.select8 = select i1 %1, i64 %spec.store.select, i64 2147483647
ret i64 %spec.store.select8		ret i64 %spec.store.select8
}		}

declare void @use64(i64)		declare void @use64(i64)