This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
1/2
AggressiveInstCombine.cpp
-
test/Transforms/AggressiveInstCombine/
-
Transforms/
-
AggressiveInstCombine/
-
AArch64/
-
fptosisat.ll
-
ARM/
-
fptosisat.ll

Differential D125755

[AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat
ClosedPublic

Authored by dmgreen on May 17 2022, 3:03 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
efriedma
craig.topper
nikic

Commits

rG4a5cb957a1e1: [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat

Summary

This adds a fold for aggressive instcombine that converts smin(smax(fptosi(x))) into a llvm.fptosi.sat, providing that the saturation constants are correct and the cost of the llvm.fptosi.sat is lower.

Unfortunately, a llvm.fptosi.sat cannot always be converted back to a smin/smax/fptosi. The llvm.fptosi.sat intrinsic is more defined that the original, which produces poison if the original fptosi was out of range. The llvm.fptosi.sat will saturate any value, so needs to be expanded to a fptosi(fpmin(fpmax(x))), which can be worse for codegeneration depending on the target.

So this is an RFC change that is conditional on the backend reporting that the llvm.fptosi.sat is cheaper that the original smin+smax+fptost. This is a change to the way that AggressiveInstrcombine has worked in the past. Instead of just being a canonicalization pass, that canonicalization can be dependant on the target in certain specific cases. This concept can also be useful in other cases such as the table base cttz from D113291 and possibly funnel shifts? (although I know the details there less).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.May 17 2022, 3:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 3:03 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

dmgreen requested review of this revision.May 17 2022, 3:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 3:03 AM

dmgreen edited the summary of this revision. (Show Details)May 17 2022, 3:03 AM

Harbormaster completed remote builds in B164831: Diff 429977.May 17 2022, 3:03 AM

Are you needing the fp sat intrinsic to be visible in IR (better vectorization?) - otherwise might this be better off in CGP?

Yeah - it needs to be earlier than that, before vectorization and preferably unrolling to get the costs correct.

OK - speaking to @spatel offline there was a concern that AIC was becoming a bit of a dumping ground for combines that didn't really fit anywhere else.

It might be that we need to consider a CostDrivenInstCombine pass or something instead? But then I'm not sure how much that will overlap with VectorCombine etc,

In D125755#3521773, @RKSimon wrote:

OK - speaking to @spatel offline there was a concern that AIC was becoming a bit of a dumping ground for combines that didn't really fit anywhere else.

AggressiveInstCombine was intended to be the dumping ground for regular InstCombine. :)

There was concern that InstCombine was taking too long per invocation and was getting added all over the default optimization pipelines just to cleanup mess from other passes. So AIC was a place that would only run once or twice in the pipeline where we could offload peepholes to ease compile-time cost. But that hasn't happened - we don't seem to be as collectively concerned about the compiler getting slower as much as when AIC was added. And everyone defaults to patching InstCombine because that guarantees that their fold will run and mesh with more folds. AIC is still only run at -O3 AFAIK.

It might be that we need to consider a CostDrivenInstCombine pass or something instead? But then I'm not sure how much that will overlap with VectorCombine etc,

VectorCombine is definitely a more focused pass. This patch and D113291 do not have vector requirements...but yes, the code starts to look similar once we put TTI cost comparisons into the mix.

In D113291, Craig made the point that we can predicate InstCombine and other pass transforms based on type legality from the DataLayout, so why not on opcode/intrinsic legality too?

So I'm not opposed, but I also don't have a good sense about potential downside of allowing target-specific IR transforms earlier in the optimization pipeline. Clearly, there's demand to do this because we get this kind of request fairly regularly.

@efriedma @lattner - any thoughts?

I remember that there was talk about a FloatCombine pass with costs.

I like the distinction between a container CostDrivenInstCombine and special purpose cost-driven passes: Vector, Float, ...

If you start a modern CostDrivenInstCombine, it would be great to integrate a modern statistic tool to track hit rates of combines.

There's a continual struggle between pushing towards canonical IR, vs, pushing towards things we think are cheap on some specific target. Normally, the way we've resolved that is by distinguishing "early" vs. "late" optimizations: we try to push towards a canonical form early on to make the code easier to analyze, then we start doing optimizations like vectorization etc. using target-specific heuristics. AggressiveInstCombine doesn't really have anything to do with "early" vs. "late"; if we want something that runs just before vectorization, we should probably add a dedicated pass.

If vectorization is involved, we have to worry about cost difference between vector and scalar versions. For example, for vectors, we might want to use floating-point min, but for scalars we prefer integer min. Not sure if this is actually true for any target, but worth considering. If we need to deal with situations like that, we can't cleanly query the cost model, so we should prefer some unified approach. For example, attach some range metadata to fptosi.sat, or add some sort of "combiner" to VPlan.

That said, what targets are we worried about here? I guess soft-float targets? For targets with native floats, it's hard for me to imagine "nnan fptosi.sat" is significantly more expensive than "fptosi+min+max". It looks like isel currently doesn't take advantage of nnan, but it probably should.

Thanks for the comments.

In D125755#3523457, @efriedma wrote:

There's a continual struggle between pushing towards canonical IR, vs, pushing towards things we think are cheap on some specific target. Normally, the way we've resolved that is by distinguishing "early" vs. "late" optimizations: we try to push towards a canonical form early on to make the code easier to analyze, then we start doing optimizations like vectorization etc. using target-specific heuristics. AggressiveInstCombine doesn't really have anything to do with "early" vs. "late"; if we want something that runs just before vectorization, we should probably add a dedicated pass.

It's not really about vectorizations - although that is where the biggest gains will come from. It's really about getting all the other cost based decisions correct throughout the llvm pipeline. If we don't then there will always be performance left on the table.

The inliner runs early and has always considered costs. As @spatel / @craig.topper said above, most other passes are directed by the target through the datalayout, it is just a different form of cost modelling.

I had considered moving the code that calculates costs in this patch into a separate function in the TTI. So AggressiveInstCombine can just call shouldChangeToFPSat, and that can be overridden by the target. The actual details inside shouldChangeToFPSat needn't be from the cost model, but in the case of fptosi.sat it might make the most sense. It's not always obvious when they will be profitable, and are usually custom legalized.

If vectorization is involved, we have to worry about cost difference between vector and scalar versions. For example, for vectors, we might want to use floating-point min, but for scalars we prefer integer min. Not sure if this is actually true for any target, but worth considering. If we need to deal with situations like that, we can't cleanly query the cost model, so we should prefer some unified approach. For example, attach some range metadata to fptosi.sat, or add some sort of "combiner" to VPlan.

There would be other ways to tackle this exact issue, but it doesn't solve the problem in general. I think for operations where there is a vector form but not a scalar form then there should be combiners in VPlan. MULH for example has vector instructions but not a scalar form. There was an example in D88152 of how that could work from a long time ago, but VPlan is still missing a few pieces (and has changed a lot since then).

We would also need to get the cost of smin(smax(fptosi)) correct, but I don't believe we can cost model multiple-element nodes like that well in general. Single instructions are usually easy enough to cost model. Two instructions like zext(load) are possible but it starts to get ugly. Three or more just doesn't work.

And none of that gets D113291 unstuck. Which is more about doing the transform early enough to benefit from other optimizations in the pipeline.

That said, what targets are we worried about here? I guess soft-float targets? For targets with native floats, it's hard for me to imagine "nnan fptosi.sat" is significantly more expensive than "fptosi+min+max". It looks like isel currently doesn't take advantage of nnan, but it probably should.

Yeah - soft-float Thumb1 targets are where I see significant losses from doing this unconditionally. You can always have targets where the integer smin/smax are cheaper than the fp min/max.

So I'm not opposed, but I also don't have a good sense about potential downside of allowing target-specific IR transforms earlier in the optimization pipeline. Clearly, there's demand to do this because we get this kind of request fairly regularly.

Do we know what the downsides presented in the past were? I've heard about an increased need for testing on all targets before, but that has really always been true with datalayout controlled combines. I believe that was more about the core of instcombine though, where more fundamental canonicalizations are happening. This and D113291 I feel are more about higher level irreversible transforms.

In D125755#3524383, @dmgreen wrote:

So I'm not opposed, but I also don't have a good sense about potential downside of allowing target-specific IR transforms earlier in the optimization pipeline. Clearly, there's demand to do this because we get this kind of request fairly regularly.

Do we know what the downsides presented in the past were? I've heard about an increased need for testing on all targets before, but that has really always been true with datalayout controlled combines. I believe that was more about the core of instcombine though, where more fundamental canonicalizations are happening. This and D113291 I feel are more about higher level irreversible transforms.

The primary downside of target-specific transforms is that it goes against canonicalization: if a combine is expecting a specific form of IR, that form will only show up on specific targets. So we either miss some transforms on some targets, or we write code to match the same thing in each possible form.

It's always been a bit of a spectrum; transforms like inlining are fundamentally driven by heuristics, and those heuristics are going to lead to different IR on different targets. But we want to encourage using canonical forms, even when we sometimes end up transforming from A->B, then later end up transforming B->A in the target's code generator. This shapes the way we define IR to some extent; for example, llvm.cttz has an "is_zero_poison" flag so we can use the same intrinsic on all targets.

This isn't to say we can never make a target-specific decision early, but we should explore alternatives that allow making a target-specific decisions later, where possible.

The primary downside of target-specific transforms is that it goes against canonicalization: if a combine is expecting a specific form of IR, that form will only show up on specific targets. So we either miss some transforms on some targets, or we write code to match the same thing in each possible form.

It's always been a bit of a spectrum; transforms like inlining are fundamentally driven by heuristics, and those heuristics are going to lead to different IR on different targets. But we want to encourage using canonical forms, even when we sometimes end up transforming from A->B, then later end up transforming B->A in the target's code generator. This shapes the way we define IR to some extent; for example, llvm.cttz has an "is_zero_poison" flag so we can use the same intrinsic on all targets.

This isn't to say we can never make a target-specific decision early, but we should explore alternatives that allow making a target-specific decisions later, where possible.

OK, I would like to get this and D113291 unstuck if we can. There are other similar problems that would fall into the same camp. I agree that canonicalization can be very useful (and we should be careful not to break it needlessly), but that canonicalizations doesn't need to be identical for every target. It is always going to be detrimental if it is - if the costs being too incorrect to make correct decisions, or extra optimizations that could occur in the mid-end do not happen where they should. A single (semi-)target-independant canonicalization doesn't seem like a strong enough benefit to sacrifice optimization power or compile time.

For this patch - the transform needs to happen before loop unrolling to get the costs correct. And I feel is less about canonicalization as the transform is not reversible, neither form is canonical. There are other cost decisions like inlining but I believe they would be less likely to be super important. Which sounds like AggressiveInstCombine would be a good place for it, given its position in the pipeline. It is early enough to get the following costs correct, but doesn't muddy up instcombine with target queries.

In summary, the cost of a saturating fptosi vs. plain fptosi+min+max varies so much across targets that that we need two canonical forms: one for targets where saturating fptosi with the given types is cheap, and one for targets where it isn't. I guess that conclusion is fine, but we should state it in a comment in the code, so it's clear why we're cost-modeling this.

I'm not sure how D113291 is related to that conclusion, though. The cost of llvm.cttz should be the the same as, or cheaper than, a table lookup. Worst case, SelectionDAG emits a table lookup itself. (SelectionDAG currently doesn't have code to emit a table lookup, I think, but that's straightforward to change.)

dmgreen mentioned this in D113291: [AggressiveInstCombine] Lower Table Based CTTZ .May 26 2022, 6:23 AM

Add a comment to tryToFPToSat explaining the situation.

Harbormaster completed remote builds in B166863: Diff 432854.May 30 2022, 12:41 AM

ping

LGTM

This revision is now accepted and ready to land.Jun 7 2022, 3:26 PM

This revision was landed with ongoing or failed builds.Jun 10 2022, 1:36 AM

Closed by commit rG4a5cb957a1e1: [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat (authored by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rG4a5cb957a1e1: [AggressiveInstcombine] Conditionally fold saturated fptosi to llvm.fptosi.sat.

dnsampaio added a subscriber: dnsampaio.Jul 7 2023, 5:04 AM

dnsampaio added inline comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
405–407	Hi @dmgreen, long time. I'm updating our downstream compiler from llvm 12 to llvm 15 and this getCastInstrCost call is doing a rather strange thing, by asking the cost of a sign extend from `i32` to `i8`, which seems awkward. Instead of using the fixed Instruction::SExt, shouldn't it be using `CastInst::getCastOpcode(IntTy, true, SatTy, true)` here? Reproducer: @a = global float 0.000000e+00 @b = global i32 0 define i32 @c() { %1 = load float, ptr @a %2 = fptosi float %1 to i32 %3 = call i32 @llvm.smin.i32(i32 %2, i32 127) %4 = call i32 @llvm.smax.i32(i32 %3, i32 -128) store i32 %4, ptr @b ret i32 undef } declare i32 @llvm.smin.i32(i32, i32) declare i32 @llvm.smax.i32(i32, i32) Coming from a creduced code: float a; b; c() { b = a; if (b > 127) b = 127; if (b < -128) b = -128; }

dmgreen added inline comments.Jul 10 2023, 1:35 AM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
405–407	Hello. Hope you are well. It sounds like the dst and src types are the wrong way around. I think that SatTy should always be smaller than IntTy. I can run a couple of quick tests and will put up a patch for it.

dmgreen mentioned this in D154841: [AIC] Fix the sext cost operands in tryToFPToSat.Jul 10 2023, 6:21 AM

dmgreen mentioned this in rGaa97f6b4947e: [AIC] Fix the sext cost operands in tryToFPToSat.Aug 7 2023, 1:33 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

80 lines

test/

Transforms/

AggressiveInstCombine/

AArch64/

fptosisat.ll

91 lines

ARM/

fptosisat.ll

207 lines

Diff 435834

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

Show All 16 Lines
#include "llvm-c/Initialization.h"		#include "llvm-c/Initialization.h"
#include "llvm-c/Transforms/AggressiveInstCombine.h"		#include "llvm-c/Transforms/AggressiveInstCombine.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
▲ Show 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	if (match(MulOp0, m_And(m_c_Add(m_LShr(m_Value(ShiftOp0), m_SpecificInt(4)),
}		}
}		}
}		}
}		}

return false;		return false;
}		}

		/// Fold smin(smax(fptosi(x), C1), C2) to llvm.fptosi.sat(x), providing C1 and
		/// C2 saturate the value of the fp conversion. The transform is not reversable
		/// as the fptosi.sat is more defined than the input - all values produce a
		/// valid value for the fptosi.sat, where as some produce poison for original
		/// that were out of range of the integer conversion. The reversed pattern may
		/// use fmax and fmin instead. As we cannot directly reverse the transform, and
		/// it is not always profitable, we make it conditional on the cost being
		/// reported as lower by TTI.
		static bool tryToFPToSat(Instruction &I, TargetTransformInfo &TTI) {
		// Look for min(max(fptosi, converting to fptosi_sat.
		Value *In;
		const APInt MinC, MaxC;
		if (!match(&I, m_SMax(m_OneUse(m_SMin(m_OneUse(m_FPToSI(m_Value(In))),
		m_APInt(MinC))),
		m_APInt(MaxC))) &&
		!match(&I, m_SMin(m_OneUse(m_SMax(m_OneUse(m_FPToSI(m_Value(In))),
		m_APInt(MaxC))),
		m_APInt(MinC))))
		return false;

		// Check that the constants clamp a saturate.
		if (!(MinC + 1).isPowerOf2() \|\| -MaxC != *MinC + 1)
		return false;

		Type *IntTy = I.getType();
		Type *FpTy = In->getType();
		Type *SatTy =
		IntegerType::get(IntTy->getContext(), (*MinC + 1).exactLogBase2() + 1);
		if (auto *VecTy = dyn_cast<VectorType>(IntTy))
		SatTy = VectorType::get(SatTy, VecTy->getElementCount());

		// Get the cost of the intrinsic, and check that against the cost of
		// fptosi+smin+smax
		InstructionCost SatCost = TTI.getIntrinsicInstrCost(
		IntrinsicCostAttributes(Intrinsic::fptosi_sat, SatTy, {In}, {FpTy}),
		TTI::TCK_RecipThroughput);
		SatCost += TTI.getCastInstrCost(Instruction::SExt, SatTy, IntTy,
		TTI::CastContextHint::None,
		TTI::TCK_RecipThroughput);
		dnsampaioUnsubmitted Not Done Reply Inline Actions Hi @dmgreen, long time. I'm updating our downstream compiler from llvm 12 to llvm 15 and this getCastInstrCost call is doing a rather strange thing, by asking the cost of a sign extend from `i32` to `i8`, which seems awkward. Instead of using the fixed Instruction::SExt, shouldn't it be using `CastInst::getCastOpcode(IntTy, true, SatTy, true)` here? Reproducer: @a = global float 0.000000e+00 @b = global i32 0 define i32 @c() { %1 = load float, ptr @a %2 = fptosi float %1 to i32 %3 = call i32 @llvm.smin.i32(i32 %2, i32 127) %4 = call i32 @llvm.smax.i32(i32 %3, i32 -128) store i32 %4, ptr @b ret i32 undef } declare i32 @llvm.smin.i32(i32, i32) declare i32 @llvm.smax.i32(i32, i32) Coming from a creduced code: float a; b; c() { b = a; if (b > 127) b = 127; if (b < -128) b = -128; } dnsampaio: Hi @dmgreen, long time. I'm updating our downstream compiler from llvm 12 to llvm 15 and this…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions Hello. Hope you are well. It sounds like the dst and src types are the wrong way around. I think that SatTy should always be smaller than IntTy. I can run a couple of quick tests and will put up a patch for it. dmgreen: Hello. Hope you are well. It sounds like the dst and src types are the wrong way around. I…

		InstructionCost MinMaxCost = TTI.getCastInstrCost(
		Instruction::FPToSI, IntTy, FpTy, TTI::CastContextHint::None,
		TTI::TCK_RecipThroughput);
		MinMaxCost += TTI.getIntrinsicInstrCost(
		IntrinsicCostAttributes(Intrinsic::smin, IntTy, {IntTy}),
		TTI::TCK_RecipThroughput);
		MinMaxCost += TTI.getIntrinsicInstrCost(
		IntrinsicCostAttributes(Intrinsic::smax, IntTy, {IntTy}),
		TTI::TCK_RecipThroughput);

		if (SatCost >= MinMaxCost)
		return false;

		IRBuilder<> Builder(&I);
		Function *Fn = Intrinsic::getDeclaration(I.getModule(), Intrinsic::fptosi_sat,
		{SatTy, FpTy});
		Value *Sat = Builder.CreateCall(Fn, In);
		I.replaceAllUsesWith(Builder.CreateSExt(Sat, IntTy));
		return true;
		}

/// This is the entry point for folds that could be implemented in regular		/// This is the entry point for folds that could be implemented in regular
/// InstCombine, but they are separated because they are not expected to		/// InstCombine, but they are separated because they are not expected to
/// occur frequently and/or have more than a constant-length pattern match.		/// occur frequently and/or have more than a constant-length pattern match.
static bool foldUnusualPatterns(Function &F, DominatorTree &DT) {		static bool foldUnusualPatterns(Function &F, DominatorTree &DT,
		TargetTransformInfo &TTI) {
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block backwards for efficiency. We're matching a chain of		// Walk the block backwards for efficiency. We're matching a chain of
// use->defs, so we're more likely to succeed by starting from the bottom.		// use->defs, so we're more likely to succeed by starting from the bottom.
// Also, we want to avoid matching partial patterns.		// Also, we want to avoid matching partial patterns.
// TODO: It would be more efficient if we removed dead instructions		// TODO: It would be more efficient if we removed dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : llvm::reverse(BB)) {		for (Instruction &I : llvm::reverse(BB)) {
MadeChange \|= foldAnyOrAllBitsSet(I);		MadeChange \|= foldAnyOrAllBitsSet(I);
MadeChange \|= foldGuardedFunnelShift(I, DT);		MadeChange \|= foldGuardedFunnelShift(I, DT);
MadeChange \|= tryToRecognizePopCount(I);		MadeChange \|= tryToRecognizePopCount(I);
		MadeChange \|= tryToFPToSat(I, TTI);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
if (MadeChange)		if (MadeChange)
for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
SimplifyInstructionsInBlock(&BB);		SimplifyInstructionsInBlock(&BB);

return MadeChange;		return MadeChange;
}		}

/// This is the entry point for all transforms. Pass manager differences are		/// This is the entry point for all transforms. Pass manager differences are
/// handled in the callers of this function.		/// handled in the callers of this function.
static bool runImpl(Function &F, AssumptionCache &AC, TargetLibraryInfo &TLI,		static bool runImpl(Function &F, AssumptionCache &AC, TargetTransformInfo &TTI,
DominatorTree &DT) {		TargetLibraryInfo &TLI, DominatorTree &DT) {
bool MadeChange = false;		bool MadeChange = false;
const DataLayout &DL = F.getParent()->getDataLayout();		const DataLayout &DL = F.getParent()->getDataLayout();
TruncInstCombine TIC(AC, TLI, DL, DT);		TruncInstCombine TIC(AC, TLI, DL, DT);
MadeChange \|= TIC.run(F);		MadeChange \|= TIC.run(F);
MadeChange \|= foldUnusualPatterns(F, DT);		MadeChange \|= foldUnusualPatterns(F, DT, TTI);
return MadeChange;		return MadeChange;
}		}

void AggressiveInstCombinerLegacyPass::getAnalysisUsage(		void AggressiveInstCombinerLegacyPass::getAnalysisUsage(
AnalysisUsage &AU) const {		AnalysisUsage &AU) const {
AU.setPreservesCFG();		AU.setPreservesCFG();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<BasicAAWrapperPass>();		AU.addPreserved<BasicAAWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}

bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {		bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);		auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
return runImpl(F, AC, TLI, DT);		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		return runImpl(F, AC, TTI, TLI, DT);
}		}

PreservedAnalyses AggressiveInstCombinePass::run(Function &F,		PreservedAnalyses AggressiveInstCombinePass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);		auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
if (!runImpl(F, AC, TLI, DT)) {		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
		if (!runImpl(F, AC, TTI, TLI, DT)) {
// No changes, all analyses are preserved.		// No changes, all analyses are preserved.
return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}
// Mark all the analyses that instcombine updates as preserved.		// Mark all the analyses that instcombine updates as preserved.
PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();		PA.preserveSet<CFGAnalyses>();
return PA;		return PA;
}		}

char AggressiveInstCombinerLegacyPass::ID = 0;		char AggressiveInstCombinerLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(AggressiveInstCombinerLegacyPass,		INITIALIZE_PASS_BEGIN(AggressiveInstCombinerLegacyPass,
"aggressive-instcombine",		"aggressive-instcombine",
"Combine pattern based expressions", false, false)		"Combine pattern based expressions", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_END(AggressiveInstCombinerLegacyPass, "aggressive-instcombine",		INITIALIZE_PASS_END(AggressiveInstCombinerLegacyPass, "aggressive-instcombine",
"Combine pattern based expressions", false, false)		"Combine pattern based expressions", false, false)

// Initialization Routines		// Initialization Routines
void llvm::initializeAggressiveInstCombine(PassRegistry &Registry) {		void llvm::initializeAggressiveInstCombine(PassRegistry &Registry) {
initializeAggressiveInstCombinerLegacyPassPass(Registry);		initializeAggressiveInstCombinerLegacyPassPass(Registry);
}		}

Show All 11 Lines

llvm/test/Transforms/AggressiveInstCombine/AArch64/fptosisat.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=aggressive-instcombine -mtriple aarch64-none-eabi -S \| FileCheck %s		; RUN: opt < %s -passes=aggressive-instcombine -mtriple aarch64-none-eabi -S \| FileCheck %s --check-prefixes=CHECK,CHECK-FP
; RUN: opt < %s -passes=aggressive-instcombine -mtriple aarch64-none-eabi -mattr=+fullfp16 -S \| FileCheck %s		; RUN: opt < %s -passes=aggressive-instcombine -mtriple aarch64-none-eabi -mattr=+fullfp16 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-FP16

define i64 @f32_i32(float %in) {		define i64 @f32_i32(float %in) {
; CHECK-LABEL: @f32_i32(		; CHECK-LABEL: @f32_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f32(float [[IN:%.]])
; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[MAX]]
;		;
%conv = fptosi float %in to i64		%conv = fptosi float %in to i64
%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)		%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)		%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
ret i64 %max		ret i64 %max
}		}

define i64 @f32_i31(float %in) {		define i64 @f32_i31(float %in) {
Show All 32 Lines	;
%conv = fptosi float %in to i32		%conv = fptosi float %in to i32
%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)		%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)
%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)		%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)
ret i32 %max		ret i32 %max
}		}

define i64 @f64_i32(double %in) {		define i64 @f64_i32(double %in) {
; CHECK-LABEL: @f64_i32(		; CHECK-LABEL: @f64_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi double [[IN:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f64(double [[IN:%.]])
; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[MAX]]
;		;
%conv = fptosi double %in to i64		%conv = fptosi double %in to i64
%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)		%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)		%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
ret i64 %max		ret i64 %max
}		}

define i64 @f64_i31(double %in) {		define i64 @f64_i31(double %in) {
Show All 18 Lines
;		;
%conv = fptosi double %in to i32		%conv = fptosi double %in to i32
%min = call i32 @llvm.smin.i32(i32 %conv, i32 32767)		%min = call i32 @llvm.smin.i32(i32 %conv, i32 32767)
%max = call i32 @llvm.smax.i32(i32 %min, i32 -32768)		%max = call i32 @llvm.smax.i32(i32 %min, i32 -32768)
ret i32 %max		ret i32 %max
}		}

define i64 @f16_i32(half %in) {		define i64 @f16_i32(half %in) {
; CHECK-LABEL: @f16_i32(		; CHECK-FP-LABEL: @f16_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64		; CHECK-FP-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64
; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)		; CHECK-FP-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)
; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)		; CHECK-FP-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)
; CHECK-NEXT: ret i64 [[MAX]]		; CHECK-FP-NEXT: ret i64 [[MAX]]
		;
		; CHECK-FP16-LABEL: @f16_i32(
		; CHECK-FP16-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f16(half [[IN:%.]])
		; CHECK-FP16-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
		; CHECK-FP16-NEXT: ret i64 [[TMP2]]
;		;
%conv = fptosi half %in to i64		%conv = fptosi half %in to i64
%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)		%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)		%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
ret i64 %max		ret i64 %max
}		}

define i64 @f16_i31(half %in) {		define i64 @f16_i31(half %in) {
Show All 32 Lines	;
%conv = fptosi half %in to i32		%conv = fptosi half %in to i32
%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)		%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)
%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)		%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)
ret i32 %max		ret i32 %max
}		}

define <2 x i64> @v2f32_i32(<2 x float> %in) {		define <2 x i64> @v2f32_i32(<2 x float> %in) {
; CHECK-LABEL: @v2f32_i32(		; CHECK-LABEL: @v2f32_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi <2 x float> [[IN:%.]] to <2 x i64>		; CHECK-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> [[IN:%.]])
; CHECK-NEXT: [[MIN:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[CONV]], <2 x i64> <i64 2147483647, i64 2147483647>)		; CHECK-NEXT: [[TMP2:%.*]] = sext <2 x i32> [[TMP1]] to <2 x i64>
; CHECK-NEXT: [[MAX:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[MIN]], <2 x i64> <i64 -2147483648, i64 -2147483648>)		; CHECK-NEXT: ret <2 x i64> [[TMP2]]
; CHECK-NEXT: ret <2 x i64> [[MAX]]
;		;
%conv = fptosi <2 x float> %in to <2 x i64>		%conv = fptosi <2 x float> %in to <2 x i64>
%min = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %conv, <2 x i64> <i64 2147483647, i64 2147483647>)		%min = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %conv, <2 x i64> <i64 2147483647, i64 2147483647>)
%max = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %min, <2 x i64> <i64 -2147483648, i64 -2147483648>)		%max = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %min, <2 x i64> <i64 -2147483648, i64 -2147483648>)
ret <2 x i64> %max		ret <2 x i64> %max
}		}

define <4 x i64> @v4f32_i32(<4 x float> %in) {		define <4 x i64> @v4f32_i32(<4 x float> %in) {
; CHECK-LABEL: @v4f32_i32(		; CHECK-LABEL: @v4f32_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i64>		; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[IN:%.]])
; CHECK-NEXT: [[MIN:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[CONV]], <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)		; CHECK-NEXT: [[TMP2:%.*]] = sext <4 x i32> [[TMP1]] to <4 x i64>
; CHECK-NEXT: [[MAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[MIN]], <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)		; CHECK-NEXT: ret <4 x i64> [[TMP2]]
; CHECK-NEXT: ret <4 x i64> [[MAX]]
;		;
%conv = fptosi <4 x float> %in to <4 x i64>		%conv = fptosi <4 x float> %in to <4 x i64>
%min = call <4 x i64> @llvm.smin.v4i64(<4 x i64> %conv, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)		%min = call <4 x i64> @llvm.smin.v4i64(<4 x i64> %conv, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
%max = call <4 x i64> @llvm.smax.v4i64(<4 x i64> %min, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)		%max = call <4 x i64> @llvm.smax.v4i64(<4 x i64> %min, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
ret <4 x i64> %max		ret <4 x i64> %max
}		}

define <8 x i64> @v8f32_i32(<8 x float> %in) {		define <8 x i64> @v8f32_i32(<8 x float> %in) {
; CHECK-LABEL: @v8f32_i32(		; CHECK-LABEL: @v8f32_i32(
; CHECK-NEXT: [[CONV:%.]] = fptosi <8 x float> [[IN:%.]] to <8 x i64>		; CHECK-NEXT: [[TMP1:%.]] = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f32(<8 x float> [[IN:%.]])
; CHECK-NEXT: [[MIN:%.*]] = call <8 x i64> @llvm.smin.v8i64(<8 x i64> [[CONV]], <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)		; CHECK-NEXT: [[TMP2:%.*]] = sext <8 x i32> [[TMP1]] to <8 x i64>
; CHECK-NEXT: [[MAX:%.*]] = call <8 x i64> @llvm.smax.v8i64(<8 x i64> [[MIN]], <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)		; CHECK-NEXT: ret <8 x i64> [[TMP2]]
; CHECK-NEXT: ret <8 x i64> [[MAX]]
;		;
%conv = fptosi <8 x float> %in to <8 x i64>		%conv = fptosi <8 x float> %in to <8 x i64>
%min = call <8 x i64> @llvm.smin.v8i64(<8 x i64> %conv, <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)		%min = call <8 x i64> @llvm.smin.v8i64(<8 x i64> %conv, <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
%max = call <8 x i64> @llvm.smax.v8i64(<8 x i64> %min, <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)		%max = call <8 x i64> @llvm.smax.v8i64(<8 x i64> %min, <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
ret <8 x i64> %max		ret <8 x i64> %max
}		}

define <4 x i32> @v4f16_i16(<4 x half> %in) {		define <4 x i32> @v4f16_i16(<4 x half> %in) {
; CHECK-LABEL: @v4f16_i16(		; CHECK-FP-LABEL: @v4f16_i16(
; CHECK-NEXT: [[CONV:%.]] = fptosi <4 x half> [[IN:%.]] to <4 x i32>		; CHECK-FP-NEXT: [[CONV:%.]] = fptosi <4 x half> [[IN:%.]] to <4 x i32>
; CHECK-NEXT: [[MIN:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[CONV]], <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)		; CHECK-FP-NEXT: [[MIN:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[CONV]], <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)
; CHECK-NEXT: [[MAX:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[MIN]], <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)		; CHECK-FP-NEXT: [[MAX:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[MIN]], <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
; CHECK-NEXT: ret <4 x i32> [[MAX]]		; CHECK-FP-NEXT: ret <4 x i32> [[MAX]]
		;
		; CHECK-FP16-LABEL: @v4f16_i16(
		; CHECK-FP16-NEXT: [[TMP1:%.]] = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> [[IN:%.]])
		; CHECK-FP16-NEXT: [[TMP2:%.*]] = sext <4 x i16> [[TMP1]] to <4 x i32>
		; CHECK-FP16-NEXT: ret <4 x i32> [[TMP2]]
;		;
%conv = fptosi <4 x half> %in to <4 x i32>		%conv = fptosi <4 x half> %in to <4 x i32>
%min = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %conv, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)		%min = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %conv, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)
%max = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %min, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)		%max = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %min, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
ret <4 x i32> %max		ret <4 x i32> %max
}		}

define <8 x i32> @v8f16_i16(<8 x half> %in) {		define <8 x i32> @v8f16_i16(<8 x half> %in) {
; CHECK-LABEL: @v8f16_i16(		; CHECK-FP-LABEL: @v8f16_i16(
; CHECK-NEXT: [[CONV:%.]] = fptosi <8 x half> [[IN:%.]] to <8 x i32>		; CHECK-FP-NEXT: [[CONV:%.]] = fptosi <8 x half> [[IN:%.]] to <8 x i32>
; CHECK-NEXT: [[MIN:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[CONV]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)		; CHECK-FP-NEXT: [[MIN:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[CONV]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
; CHECK-NEXT: [[MAX:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[MIN]], <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)		; CHECK-FP-NEXT: [[MAX:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[MIN]], <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
; CHECK-NEXT: ret <8 x i32> [[MAX]]		; CHECK-FP-NEXT: ret <8 x i32> [[MAX]]
		;
		; CHECK-FP16-LABEL: @v8f16_i16(
		; CHECK-FP16-NEXT: [[TMP1:%.]] = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> [[IN:%.]])
		; CHECK-FP16-NEXT: [[TMP2:%.*]] = sext <8 x i16> [[TMP1]] to <8 x i32>
		; CHECK-FP16-NEXT: ret <8 x i32> [[TMP2]]
;		;
%conv = fptosi <8 x half> %in to <8 x i32>		%conv = fptosi <8 x half> %in to <8 x i32>
%min = call <8 x i32> @llvm.smin.v8i32(<8 x i32> %conv, <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)		%min = call <8 x i32> @llvm.smin.v8i32(<8 x i32> %conv, <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
%max = call <8 x i32> @llvm.smax.v8i32(<8 x i32> %min, <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)		%max = call <8 x i32> @llvm.smax.v8i32(<8 x i32> %min, <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
ret <8 x i32> %max		ret <8 x i32> %max
}		}


Show All 37 Lines	;
%conv = fptosi float %in to i64		%conv = fptosi float %in to i64
%min = call i64 @llvm.smin.i64(i64 %conv, i64 -2147483648)		%min = call i64 @llvm.smin.i64(i64 %conv, i64 -2147483648)
%max = call i64 @llvm.smax.i64(i64 %min, i64 2147483647)		%max = call i64 @llvm.smax.i64(i64 %min, i64 2147483647)
ret i64 %max		ret i64 %max
}		}

define i64 @f32_i32_maxmin(float %in) {		define i64 @f32_i32_maxmin(float %in) {
; CHECK-LABEL: @f32_i32_maxmin(		; CHECK-LABEL: @f32_i32_maxmin(
; CHECK-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f32(float [[IN:%.]])
; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[CONV]], i64 -2147483648)		; CHECK-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[MAX]], i64 2147483647)		; CHECK-NEXT: ret i64 [[TMP2]]
; CHECK-NEXT: ret i64 [[MIN]]
;		;
%conv = fptosi float %in to i64		%conv = fptosi float %in to i64
%max = call i64 @llvm.smax.i64(i64 %conv, i64 -2147483648)		%max = call i64 @llvm.smax.i64(i64 %conv, i64 -2147483648)
%min = call i64 @llvm.smin.i64(i64 %max, i64 2147483647)		%min = call i64 @llvm.smin.i64(i64 %max, i64 2147483647)
ret i64 %min		ret i64 %min
}		}


Show All 15 Lines

llvm/test/Transforms/AggressiveInstCombine/ARM/fptosisat.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -S \| FileCheck %s			; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -S \| FileCheck %s --check-prefixes=CHECK,CHECK-BASE
	; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -mattr=+mve.fp -S \| FileCheck %s			; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -mattr=+mve.fp -S \| FileCheck %s --check-prefixes=CHECK,CHECK-MVEFP
	; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -mattr=+mve.fp,+fp64 -S \| FileCheck %s			; RUN: opt < %s -passes=aggressive-instcombine -mtriple thumbv8.1m.main-none-eabi -mattr=+mve.fp,+fp64 -S \| FileCheck %s --check-prefixes=CHECK,CHECK-FP64

	define i64 @f32_i32(float %in) {			define i64 @f32_i32(float %in) {
	; CHECK-LABEL: @f32_i32(			; CHECK-BASE-LABEL: @f32_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64
	; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)
	; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)
	; CHECK-NEXT: ret i64 [[MAX]]			; CHECK-BASE-NEXT: ret i64 [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @f32_i32(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f32(float [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				; CHECK-MVEFP-NEXT: ret i64 [[TMP2]]
				;
				; CHECK-FP64-LABEL: @f32_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f32(float [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				; CHECK-FP64-NEXT: ret i64 [[TMP2]]
	;			;
	%conv = fptosi float %in to i64			%conv = fptosi float %in to i64
	%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)			%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
	%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)			%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
	ret i64 %max			ret i64 %max
	}			}

	define i64 @f32_i31(float %in) {			define i64 @f32_i31(float %in) {
	; CHECK-LABEL: @f32_i31(			; CHECK-BASE-LABEL: @f32_i31(
	; CHECK-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi float [[IN:%.]] to i64
	; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 1073741823)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 1073741823)
	; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -1073741824)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -1073741824)
	; CHECK-NEXT: ret i64 [[MAX]]			; CHECK-BASE-NEXT: ret i64 [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @f32_i31(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call i31 @llvm.fptosi.sat.i31.f32(float [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext i31 [[TMP1]] to i64
				; CHECK-MVEFP-NEXT: ret i64 [[TMP2]]
				;
				; CHECK-FP64-LABEL: @f32_i31(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call i31 @llvm.fptosi.sat.i31.f32(float [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext i31 [[TMP1]] to i64
				; CHECK-FP64-NEXT: ret i64 [[TMP2]]
	;			;
	%conv = fptosi float %in to i64			%conv = fptosi float %in to i64
	%min = call i64 @llvm.smin.i64(i64 %conv, i64 1073741823)			%min = call i64 @llvm.smin.i64(i64 %conv, i64 1073741823)
	%max = call i64 @llvm.smax.i64(i64 %min, i64 -1073741824)			%max = call i64 @llvm.smax.i64(i64 %min, i64 -1073741824)
	ret i64 %max			ret i64 %max
	}			}

	define i32 @f32_i16(float %in) {			define i32 @f32_i16(float %in) {
	Show All 18 Lines
	;			;
	%conv = fptosi float %in to i32			%conv = fptosi float %in to i32
	%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)			%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)
	%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)			%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)
	ret i32 %max			ret i32 %max
	}			}

	define i64 @f64_i32(double %in) {			define i64 @f64_i32(double %in) {
	; CHECK-LABEL: @f64_i32(			; CHECK-BASE-LABEL: @f64_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi double [[IN:%.]] to i64			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi double [[IN:%.]] to i64
	; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)
	; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)
	; CHECK-NEXT: ret i64 [[MAX]]			; CHECK-BASE-NEXT: ret i64 [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @f64_i32(
				; CHECK-MVEFP-NEXT: [[CONV:%.]] = fptosi double [[IN:%.]] to i64
				; CHECK-MVEFP-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)
				; CHECK-MVEFP-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)
				; CHECK-MVEFP-NEXT: ret i64 [[MAX]]
				;
				; CHECK-FP64-LABEL: @f64_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f64(double [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				; CHECK-FP64-NEXT: ret i64 [[TMP2]]
	;			;
	%conv = fptosi double %in to i64			%conv = fptosi double %in to i64
	%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)			%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
	%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)			%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
	ret i64 %max			ret i64 %max
	}			}

	define i64 @f64_i31(double %in) {			define i64 @f64_i31(double %in) {
	Show All 18 Lines
	;			;
	%conv = fptosi double %in to i32			%conv = fptosi double %in to i32
	%min = call i32 @llvm.smin.i32(i32 %conv, i32 32767)			%min = call i32 @llvm.smin.i32(i32 %conv, i32 32767)
	%max = call i32 @llvm.smax.i32(i32 %min, i32 -32768)			%max = call i32 @llvm.smax.i32(i32 %min, i32 -32768)
	ret i32 %max			ret i32 %max
	}			}

	define i64 @f16_i32(half %in) {			define i64 @f16_i32(half %in) {
	; CHECK-LABEL: @f16_i32(			; CHECK-BASE-LABEL: @f16_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64
	; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 2147483647)
	; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -2147483648)
	; CHECK-NEXT: ret i64 [[MAX]]			; CHECK-BASE-NEXT: ret i64 [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @f16_i32(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f16(half [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				; CHECK-MVEFP-NEXT: ret i64 [[TMP2]]
				;
				; CHECK-FP64-LABEL: @f16_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call i32 @llvm.fptosi.sat.i32.f16(half [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext i32 [[TMP1]] to i64
				; CHECK-FP64-NEXT: ret i64 [[TMP2]]
	;			;
	%conv = fptosi half %in to i64			%conv = fptosi half %in to i64
	%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)			%min = call i64 @llvm.smin.i64(i64 %conv, i64 2147483647)
	%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)			%max = call i64 @llvm.smax.i64(i64 %min, i64 -2147483648)
	ret i64 %max			ret i64 %max
	}			}

	define i64 @f16_i31(half %in) {			define i64 @f16_i31(half %in) {
	; CHECK-LABEL: @f16_i31(			; CHECK-BASE-LABEL: @f16_i31(
	; CHECK-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi half [[IN:%.]] to i64
	; CHECK-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 1073741823)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call i64 @llvm.smin.i64(i64 [[CONV]], i64 1073741823)
	; CHECK-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -1073741824)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call i64 @llvm.smax.i64(i64 [[MIN]], i64 -1073741824)
	; CHECK-NEXT: ret i64 [[MAX]]			; CHECK-BASE-NEXT: ret i64 [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @f16_i31(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call i31 @llvm.fptosi.sat.i31.f16(half [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext i31 [[TMP1]] to i64
				; CHECK-MVEFP-NEXT: ret i64 [[TMP2]]
				;
				; CHECK-FP64-LABEL: @f16_i31(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call i31 @llvm.fptosi.sat.i31.f16(half [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext i31 [[TMP1]] to i64
				; CHECK-FP64-NEXT: ret i64 [[TMP2]]
	;			;
	%conv = fptosi half %in to i64			%conv = fptosi half %in to i64
	%min = call i64 @llvm.smin.i64(i64 %conv, i64 1073741823)			%min = call i64 @llvm.smin.i64(i64 %conv, i64 1073741823)
	%max = call i64 @llvm.smax.i64(i64 %min, i64 -1073741824)			%max = call i64 @llvm.smax.i64(i64 %min, i64 -1073741824)
	ret i64 %max			ret i64 %max
	}			}

	define i32 @f16_i16(half %in) {			define i32 @f16_i16(half %in) {
	Show All 18 Lines
	;			;
	%conv = fptosi half %in to i32			%conv = fptosi half %in to i32
	%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)			%min = call i32 @llvm.smin.i32(i32 %conv, i32 127)
	%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)			%max = call i32 @llvm.smax.i32(i32 %min, i32 -128)
	ret i32 %max			ret i32 %max
	}			}

	define <2 x i64> @v2f32_i32(<2 x float> %in) {			define <2 x i64> @v2f32_i32(<2 x float> %in) {
	; CHECK-LABEL: @v2f32_i32(			; CHECK-BASE-LABEL: @v2f32_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi <2 x float> [[IN:%.]] to <2 x i64>			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi <2 x float> [[IN:%.]] to <2 x i64>
	; CHECK-NEXT: [[MIN:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[CONV]], <2 x i64> <i64 2147483647, i64 2147483647>)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call <2 x i64> @llvm.smin.v2i64(<2 x i64> [[CONV]], <2 x i64> <i64 2147483647, i64 2147483647>)
	; CHECK-NEXT: [[MAX:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[MIN]], <2 x i64> <i64 -2147483648, i64 -2147483648>)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call <2 x i64> @llvm.smax.v2i64(<2 x i64> [[MIN]], <2 x i64> <i64 -2147483648, i64 -2147483648>)
	; CHECK-NEXT: ret <2 x i64> [[MAX]]			; CHECK-BASE-NEXT: ret <2 x i64> [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @v2f32_i32(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext <2 x i32> [[TMP1]] to <2 x i64>
				; CHECK-MVEFP-NEXT: ret <2 x i64> [[TMP2]]
				;
				; CHECK-FP64-LABEL: @v2f32_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.fptosi.sat.v2i32.v2f32(<2 x float> [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext <2 x i32> [[TMP1]] to <2 x i64>
				; CHECK-FP64-NEXT: ret <2 x i64> [[TMP2]]
	;			;
	%conv = fptosi <2 x float> %in to <2 x i64>			%conv = fptosi <2 x float> %in to <2 x i64>
	%min = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %conv, <2 x i64> <i64 2147483647, i64 2147483647>)			%min = call <2 x i64> @llvm.smin.v2i64(<2 x i64> %conv, <2 x i64> <i64 2147483647, i64 2147483647>)
	%max = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %min, <2 x i64> <i64 -2147483648, i64 -2147483648>)			%max = call <2 x i64> @llvm.smax.v2i64(<2 x i64> %min, <2 x i64> <i64 -2147483648, i64 -2147483648>)
	ret <2 x i64> %max			ret <2 x i64> %max
	}			}

	define <4 x i64> @v4f32_i32(<4 x float> %in) {			define <4 x i64> @v4f32_i32(<4 x float> %in) {
	; CHECK-LABEL: @v4f32_i32(			; CHECK-BASE-LABEL: @v4f32_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i64>			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi <4 x float> [[IN:%.]] to <4 x i64>
	; CHECK-NEXT: [[MIN:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[CONV]], <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call <4 x i64> @llvm.smin.v4i64(<4 x i64> [[CONV]], <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
	; CHECK-NEXT: [[MAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[MIN]], <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call <4 x i64> @llvm.smax.v4i64(<4 x i64> [[MIN]], <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
	; CHECK-NEXT: ret <4 x i64> [[MAX]]			; CHECK-BASE-NEXT: ret <4 x i64> [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @v4f32_i32(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext <4 x i32> [[TMP1]] to <4 x i64>
				; CHECK-MVEFP-NEXT: ret <4 x i64> [[TMP2]]
				;
				; CHECK-FP64-LABEL: @v4f32_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.fptosi.sat.v4i32.v4f32(<4 x float> [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext <4 x i32> [[TMP1]] to <4 x i64>
				; CHECK-FP64-NEXT: ret <4 x i64> [[TMP2]]
	;			;
	%conv = fptosi <4 x float> %in to <4 x i64>			%conv = fptosi <4 x float> %in to <4 x i64>
	%min = call <4 x i64> @llvm.smin.v4i64(<4 x i64> %conv, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)			%min = call <4 x i64> @llvm.smin.v4i64(<4 x i64> %conv, <4 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
	%max = call <4 x i64> @llvm.smax.v4i64(<4 x i64> %min, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)			%max = call <4 x i64> @llvm.smax.v4i64(<4 x i64> %min, <4 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
	ret <4 x i64> %max			ret <4 x i64> %max
	}			}

	define <8 x i64> @v8f32_i32(<8 x float> %in) {			define <8 x i64> @v8f32_i32(<8 x float> %in) {
	; CHECK-LABEL: @v8f32_i32(			; CHECK-BASE-LABEL: @v8f32_i32(
	; CHECK-NEXT: [[CONV:%.]] = fptosi <8 x float> [[IN:%.]] to <8 x i64>			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi <8 x float> [[IN:%.]] to <8 x i64>
	; CHECK-NEXT: [[MIN:%.*]] = call <8 x i64> @llvm.smin.v8i64(<8 x i64> [[CONV]], <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call <8 x i64> @llvm.smin.v8i64(<8 x i64> [[CONV]], <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
	; CHECK-NEXT: [[MAX:%.*]] = call <8 x i64> @llvm.smax.v8i64(<8 x i64> [[MIN]], <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call <8 x i64> @llvm.smax.v8i64(<8 x i64> [[MIN]], <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
	; CHECK-NEXT: ret <8 x i64> [[MAX]]			; CHECK-BASE-NEXT: ret <8 x i64> [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @v8f32_i32(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f32(<8 x float> [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext <8 x i32> [[TMP1]] to <8 x i64>
				; CHECK-MVEFP-NEXT: ret <8 x i64> [[TMP2]]
				;
				; CHECK-FP64-LABEL: @v8f32_i32(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call <8 x i32> @llvm.fptosi.sat.v8i32.v8f32(<8 x float> [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext <8 x i32> [[TMP1]] to <8 x i64>
				; CHECK-FP64-NEXT: ret <8 x i64> [[TMP2]]
	;			;
	%conv = fptosi <8 x float> %in to <8 x i64>			%conv = fptosi <8 x float> %in to <8 x i64>
	%min = call <8 x i64> @llvm.smin.v8i64(<8 x i64> %conv, <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)			%min = call <8 x i64> @llvm.smin.v8i64(<8 x i64> %conv, <8 x i64> <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483647>)
	%max = call <8 x i64> @llvm.smax.v8i64(<8 x i64> %min, <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)			%max = call <8 x i64> @llvm.smax.v8i64(<8 x i64> %min, <8 x i64> <i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648, i64 -2147483648>)
	ret <8 x i64> %max			ret <8 x i64> %max
	}			}

	define <4 x i32> @v4f16_i16(<4 x half> %in) {			define <4 x i32> @v4f16_i16(<4 x half> %in) {
	; CHECK-LABEL: @v4f16_i16(			; CHECK-BASE-LABEL: @v4f16_i16(
	; CHECK-NEXT: [[CONV:%.]] = fptosi <4 x half> [[IN:%.]] to <4 x i32>			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi <4 x half> [[IN:%.]] to <4 x i32>
	; CHECK-NEXT: [[MIN:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[CONV]], <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call <4 x i32> @llvm.smin.v4i32(<4 x i32> [[CONV]], <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)
	; CHECK-NEXT: [[MAX:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[MIN]], <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call <4 x i32> @llvm.smax.v4i32(<4 x i32> [[MIN]], <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
	; CHECK-NEXT: ret <4 x i32> [[MAX]]			; CHECK-BASE-NEXT: ret <4 x i32> [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @v4f16_i16(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext <4 x i16> [[TMP1]] to <4 x i32>
				; CHECK-MVEFP-NEXT: ret <4 x i32> [[TMP2]]
				;
				; CHECK-FP64-LABEL: @v4f16_i16(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call <4 x i16> @llvm.fptosi.sat.v4i16.v4f16(<4 x half> [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext <4 x i16> [[TMP1]] to <4 x i32>
				; CHECK-FP64-NEXT: ret <4 x i32> [[TMP2]]
	;			;
	%conv = fptosi <4 x half> %in to <4 x i32>			%conv = fptosi <4 x half> %in to <4 x i32>
	%min = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %conv, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)			%min = call <4 x i32> @llvm.smin.v4i32(<4 x i32> %conv, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>)
	%max = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %min, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)			%max = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %min, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
	ret <4 x i32> %max			ret <4 x i32> %max
	}			}

	define <8 x i32> @v8f16_i16(<8 x half> %in) {			define <8 x i32> @v8f16_i16(<8 x half> %in) {
	; CHECK-LABEL: @v8f16_i16(			; CHECK-BASE-LABEL: @v8f16_i16(
	; CHECK-NEXT: [[CONV:%.]] = fptosi <8 x half> [[IN:%.]] to <8 x i32>			; CHECK-BASE-NEXT: [[CONV:%.]] = fptosi <8 x half> [[IN:%.]] to <8 x i32>
	; CHECK-NEXT: [[MIN:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[CONV]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)			; CHECK-BASE-NEXT: [[MIN:%.*]] = call <8 x i32> @llvm.smin.v8i32(<8 x i32> [[CONV]], <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
	; CHECK-NEXT: [[MAX:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[MIN]], <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)			; CHECK-BASE-NEXT: [[MAX:%.*]] = call <8 x i32> @llvm.smax.v8i32(<8 x i32> [[MIN]], <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
	; CHECK-NEXT: ret <8 x i32> [[MAX]]			; CHECK-BASE-NEXT: ret <8 x i32> [[MAX]]
				;
				; CHECK-MVEFP-LABEL: @v8f16_i16(
				; CHECK-MVEFP-NEXT: [[TMP1:%.]] = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> [[IN:%.]])
				; CHECK-MVEFP-NEXT: [[TMP2:%.*]] = sext <8 x i16> [[TMP1]] to <8 x i32>
				; CHECK-MVEFP-NEXT: ret <8 x i32> [[TMP2]]
				;
				; CHECK-FP64-LABEL: @v8f16_i16(
				; CHECK-FP64-NEXT: [[TMP1:%.]] = call <8 x i16> @llvm.fptosi.sat.v8i16.v8f16(<8 x half> [[IN:%.]])
				; CHECK-FP64-NEXT: [[TMP2:%.*]] = sext <8 x i16> [[TMP1]] to <8 x i32>
				; CHECK-FP64-NEXT: ret <8 x i32> [[TMP2]]
	;			;
	%conv = fptosi <8 x half> %in to <8 x i32>			%conv = fptosi <8 x half> %in to <8 x i32>
	%min = call <8 x i32> @llvm.smin.v8i32(<8 x i32> %conv, <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)			%min = call <8 x i32> @llvm.smin.v8i32(<8 x i32> %conv, <8 x i32> <i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767, i32 32767>)
	%max = call <8 x i32> @llvm.smax.v8i32(<8 x i32> %min, <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)			%max = call <8 x i32> @llvm.smax.v8i32(<8 x i32> %min, <8 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768, i32 -32768>)
	ret <8 x i32> %max			ret <8 x i32> %max
	}			}


	Show All 14 Lines