This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
MachineCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
aarch64-combine-fmul-fsub.mir
-
arm64-fma-combines.ll
-
i128-math.ll
-
machine-combiner-madd.ll
-
madd-combiner.ll
-
madd-lohi.ll
-
mul-lohi.ll
-
srem-seteq-vec-nonsplat.ll
-
X86/
-
machine-combiner-int-vec.ll
-
machine-combiner-int.ll

Differential D125588

[MachineCombiner] Improve MachineCombiner's cost model
Needs ReviewPublic

Authored by Carrot on May 13 2022, 3:40 PM.

Download Raw Diff

Details

Reviewers

dmgreen
Gerolf
fhahn
snnw

Summary

This patch contains following improvements to MachineCombiner's cost model.

1 Ignore coalescable COPY instructions in computing latency because it will be deleted after RA.
2 When computing CycleCount, use (FirstDepth + RootLatency) instead of (RootDepth + RootLatency) to avoid double counting of instructions when there are multiple instructions in either the old or new instruction sequence.

Diff Detail

Event Timeline

Carrot created this revision.May 13 2022, 3:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 3:40 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

Carrot requested review of this revision.May 13 2022, 3:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2022, 3:40 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B164404: Diff 429369.May 13 2022, 5:28 PM

Thanks for splitting this out. I've been trying to run some tests.

Do you have any benchmark results? I worry that even though the old model was less correct, it still gave better performance results.

I tested SPEC2006 int on my skylake desktop.

base
400.perlbench    9770        241       40.6 *                                  
401.bzip2        9650        380       25.4 *                                  
403.gcc          8050        198       40.7 *                                  
429.mcf          9120        204       44.8 *                                  
445.gobmk       10490        343       30.6 *                                  
456.hmmer        9330        253       36.9 *                                  
458.sjeng       12100        355       34.1 *                                  
462.libquantum  20720        308       67.3 *                                  
464.h264ref     22130        325       68.1 *                                  
471.omnetpp      6250        239       26.1 *                                  
473.astar        7020        295       23.8 *                                  
483.xalancbmk    6900        144       47.8 *                                  
 Est. SPECint(R)_base2006              38.3

exp
400.perlbench    9770        241       40.6 *                                  
401.bzip2        9650        381       25.3 *                                  
403.gcc          8050        198       40.7 *                                  
429.mcf          9120        202       45.3 *                                  
445.gobmk       10490        342       30.7 *                                  
456.hmmer        9330        253       36.9 *                                  
458.sjeng       12100        354       34.1 *                                  
462.libquantum  20720        318       65.2 *                                  
464.h264ref     22130        324       68.3 *                                  
471.omnetpp      6250        237       26.3 *                                  
473.astar        7020        295       23.8 *                                  
483.xalancbmk    6900        145       47.5 *                                  
 Est. SPECint(R)_base2006              38.2

Only mcf and libquantum have a little larger delta, they are caused by larger run to run variation.

base
429.mcf          9120        207       44.1 S                                  
429.mcf          9120        204       44.8 *                                  
429.mcf          9120        198       46.0 S         
462.libquantum  20720        302       68.7 S                                  
462.libquantum  20720        334       62.1 S                                  
462.libquantum  20720        308       67.3 *                                  

exp
429.mcf          9120        202       45.3 *                                  
429.mcf          9120        207       44.0 S                                  
429.mcf          9120        197       46.2 S                                  
462.libquantum  20720        325       63.8 S                                  
462.libquantum  20720        318       65.2 *                                  
462.libquantum  20720        296       70.1 S

So basically no difference.

Thanks - do you have AArch64 numbers? The MachineCombiner does seem to be used on X86, but to a much smaller degree than on AArch64 where there are many more patterns. The reassociation might be useful, but it is the converting madd to mul+add and the fmla changes that are more worrying.

I tried SPEC2006 on a AArch64 machine. But for some unknown problem I couldn't run 464.h264ref correctly, even gcc compiled code generates same output.

base
400.perlbench    9770        306       31.9 *
401.bzip2        9650        418       23.1 *
403.gcc          8050        206       39.1 *
429.mcf          9120        332       27.5 *
445.gobmk       10490        337       31.1 *
456.hmmer        9330        275       34.0 *
458.sjeng       12100        391       30.9 *
462.libquantum  20720        214       96.8 *
464.h264ref                                 NR
471.omnetpp      6250        229       27.3 *
473.astar        7020        313       22.4 *
483.xalancbmk    6900        174       39.6 *
 Est. SPECint(R)_base2006              33.6


exp
400.perlbench    9770        306       31.9 *
401.bzip2        9650        418       23.1 *
403.gcc          8050        207       38.9 *
429.mcf          9120        330       27.6 *
445.gobmk       10490        337       31.1 *
456.hmmer        9330        280       33.4 *
458.sjeng       12100        391       31.0 *
462.libquantum  20720        213       97.5 *
464.h264ref                                 NR
471.omnetpp      6250        230       27.2 *
473.astar        7020        313       22.4 *
483.xalancbmk    6900        177       39.1 *
 Est. SPECint(R)_base2006              33.5

The biggest difference is from 456.hmmer. I reran it and got same result
456.hmmer        9330        273       34.1 S
456.hmmer        9330        274       34.0 *
456.hmmer        9330        274       34.0 S

So there is no real performance impact of SPEC2006 on AArch64.

Herald added a subscriber: jsji. · View Herald TranscriptMay 31 2022, 7:25 PM

Thanks. Is that with -mcpu=native or -mcpu=generic, and on which type of machine was it run? It sounds like the noise level is pretty high, which is a perpetual problem.

There are more accurate results in the .csv file that it generates - the results are not rounded to whole numbers so can be a little more accurate.

The test was run on a neoverse n1 machine. I didn't specify any -mcpu, so I guess it is -mcpu=generic.

The .csv results are:

base
400.perlbench,9770,305.860896,31.942625,1,S,,,,,NR,"SelectedIteration (base #3)"
401.bzip2,9650,417.556045,23.11067,1,S,,,,,NR,"SelectedIteration (base #1)"
403.gcc,8050,205.940987,39.088868,1,S,,,,,NR,"SelectedIteration (base #2)"
429.mcf,9120,331.917529,27.476705,1,S,,,,,NR,"SelectedIteration (base #3)"
445.gobmk,10490,337.410193,31.089754,1,S,,,,,NR,"SelectedIteration (base #1)"
456.hmmer,9330,274.508242,33.988051,1,S,,,,,NR,"SelectedIteration (base #1)"
458.sjeng,12100,391.184073,30.93173,1,S,,,,,NR,"SelectedIteration (base #1)"
462.libquantum,20720,213.955433,96.842598,1,S,,,,,NR,"SelectedIteration (base #2)"
464.h264ref,,,,,NR,,,,,NR
471.omnetpp,6250,228.927823,27.301181,1,S,,,,,NR,"SelectedIteration (base #2)"
473.astar,7020,313.225501,22.411968,1,S,,,,,NR,"SelectedIteration (base #3)"
483.xalancbmk,6900,174.455772,39.551572,1,S,,,,,NR,"SelectedIteration (base #2)"
SPECint_base2006,33.555777,,33.555777


exp
400.perlbench,9770,306.425665,31.883752,1,S,,,,,NR,"SelectedIteration (base #2)"
401.bzip2,9650,417.524178,23.112434,1,S,,,,,NR,"SelectedIteration (base #1)"
403.gcc,8050,206.76557,38.932981,1,S,,,,,NR,"SelectedIteration (base #3)"
429.mcf,9120,330.285014,27.612515,1,S,,,,,NR,"SelectedIteration (base #3)"
445.gobmk,10490,336.858972,31.140628,1,S,,,,,NR,"SelectedIteration (base #1)"
456.hmmer,9330,279.698504,33.357347,1,S,,,,,NR,"SelectedIteration (base #2)"
458.sjeng,12100,390.854833,30.957785,1,S,,,,,NR,"SelectedIteration (base #3)"
462.libquantum,20720,212.563873,97.476583,1,S,,,,,NR,"SelectedIteration (base #2)"
464.h264ref,,,,,NR,,,,,NR
471.omnetpp,6250,230.126178,27.159014,1,S,,,,,NR,"SelectedIteration (base #3)"
473.astar,7020,313.361278,22.402257,1,S,,,,,NR,"SelectedIteration (base #3)"
483.xalancbmk,6900,176.504437,39.092502,1,S,,,,,NR,"SelectedIteration (base #3)"
SPECint_base2006,33.4708,,33.4708

ping

Matt added a subscriber: Matt.Jun 13 2022, 4:04 PM

Hello - I have been trying to take a look at the problems here, I feel without a lot of success.

The codegen changes that change madd to mul+add look worse on paper, and the performance results you've quoted seem to be a noisy decrease in performance. That matches the results I have which don't seem fantastic.

I think in general there shouldn't be any reason for the machine combiner to choose mul+add over madd. (There might be certain times on inorder cores where the mul+add is faster due to the exact scheduling, but the machine combiner isn't considering those characteristics, and we need consider general codegen even if -mcpu=generic is using an inorder schedule). A MUL is really a MADD with a WZR addend register, and with late forwarding the MADD should be preferred. We may be fighting the schedule a bit - it doesn't always report the schedules correct. I think I can look into trying to improve that my simplifying it a little, but that might take some time to get right. And I worry it might not exactly fix the issues if this isn't considering that instructions can have different latencies from each operand.

It sounds to me like the MADD is decoded into two uops, mul and add. Then the mul can be immediately executable once the two multipliers are available. Is this correct?

I agree in this case MADD is always preferred. But the problem is current MachineCombiner and latency computation doesn't model this behavior. It's an independent issue. I think it should be solved in another patch.

dmgreen mentioned this in D124564: [MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency.Jun 26 2022, 8:47 AM

In D125588#3600680, @Carrot wrote:

It sounds to me like the MADD is decoded into two uops, mul and add. Then the mul can be immediately executable once the two multipliers are available. Is this correct?

Not necessarily two microops exactly, but late-farwarding into the add operand of the MADD. The two mul operands are needed on cycle 0, but the add operand is needed on cycle 2. The schedule has this information, but it can easily get tripped up by the way it tries to specify it. Whatever happens here I think it might be worth trying to simplify that so that it is more correct more of the time, but that involves updating all codegen ever, so needs to be carefully handled.

I agree in this case MADD is always preferred. But the problem is current MachineCombiner and latency computation doesn't model this behavior. It's an independent issue. I think it should be solved in another patch.

I think - the old costmodel was not great, but the new code is still incorrect for some cases. For the MADD case we don't handle the differing latencies of each of the input operand. (The latency for the mul operand might be 3, but for the add operand is 1. The cost just always uses "depth" + 3). There may also be differences when inserting multiple instructions too. The old model was lastinstrdepth + sum(latencies). The new one is firstinstrdepth + sum(latencies), which I feel may not be more accurate if the instructions in InsInstrs have dependencies that are not in InsInstrs.

As a concrete way forward - can you take the isCoalescableCopy and split it into a separate patch. It can also change the if (!MI->isCopy()) return false; to if (!MI->isCopy()) return MI->isTransient(); and maybe rename the function to isTransient. That should be fine to go separately, I believe, and the isCoalescableCopy makes a nice change. For the rest we may need something that more accurately calculates the final depth for instructions with forwarding like the MADD case.

In D125588#3610936, @dmgreen wrote:

I think - the old costmodel was not great, but the new code is still incorrect for some cases. For the MADD case we don't handle the differing latencies of each of the input operand. (The latency for the mul operand might be 3, but for the add operand is 1. The cost just always uses "depth" + 3). There may also be differences when inserting multiple instructions too. The old model was lastinstrdepth + sum(latencies). The new one is firstinstrdepth + sum(latencies), which I feel may not be more accurate if the instructions in InsInstrs have dependencies that are not in InsInstrs.

You are right about the comparison. It is also partly mentioned in the comments of function getLatenciesForInstrSequences.

/// Estimate the latency of the new and original instruction sequence by summing
/// up the latencies of the inserted and deleted instructions. This assumes
/// that the inserted and deleted instructions are dependent instruction chains,
/// which might not hold in all cases.

So the new model firstinstrdepth + sum(latencies) is not correct if InsInstrs is not a single dependence chain. But it is still correct in many cases.
The old model lastinstrdepth + sum(latencies) is always wrong for a multiple instruction sequence InsInstrs, because instructions in the range [0..n-2] are counted in both firstinstrdepth and sum(latencies).

As a concrete way forward - can you take the isCoalescableCopy and split it into a separate patch. It can also change the if (!MI->isCopy()) return false; to if (!MI->isCopy()) return MI->isTransient(); and maybe rename the function to isTransient. That should be fine to go separately, I believe, and the isCoalescableCopy makes a nice change.

I will do.

For the rest we may need something that more accurately calculates the final depth for instructions with forwarding like the MADD case.

Agree.

thanks.

dmgreen mentioned this in D129449: [AArch64] Update latencies for Cortex-A55 schedule..Jul 13 2022, 8:33 AM

mingmingl added a subscriber: mingmingl.Jul 13 2022, 8:31 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineCombiner.cpp

72 lines

test/

CodeGen/

AArch64/

aarch64-combine-fmul-fsub.mir

118 lines

arm64-fma-combines.ll

3 lines

i128-math.ll

6 lines

machine-combiner-madd.ll

9 lines

madd-combiner.ll

4 lines

madd-lohi.ll

6 lines

mul-lohi.ll

12 lines

srem-seteq-vec-nonsplat.ll

33 lines

X86/

machine-combiner-int-vec.ll

386 lines

machine-combiner-int.ll

60 lines

Diff 429369

llvm/lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	public:
void getAnalysisUsage(AnalysisUsage &AU) const override;		void getAnalysisUsage(AnalysisUsage &AU) const override;
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;
StringRef getPassName() const override { return "Machine InstCombiner"; }		StringRef getPassName() const override { return "Machine InstCombiner"; }

private:		private:
bool doSubstitute(unsigned NewSize, unsigned OldSize, bool OptForSize);		bool doSubstitute(unsigned NewSize, unsigned OldSize, bool OptForSize);
bool combineInstructions(MachineBasicBlock *);		bool combineInstructions(MachineBasicBlock *);
MachineInstr *getOperandDef(const MachineOperand &MO);		MachineInstr *getOperandDef(const MachineOperand &MO);
unsigned getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,		bool isCoalescableCopy(MachineInstr *MI);
		std::pair<unsigned, unsigned>
		getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineTraceMetrics::Trace BlockTrace);		MachineTraceMetrics::Trace BlockTrace);
unsigned getLatency(MachineInstr Root, MachineInstr NewRoot,		unsigned getLatency(MachineInstr Root, MachineInstr NewRoot,
MachineTraceMetrics::Trace BlockTrace);		MachineTraceMetrics::Trace BlockTrace);
bool		bool
improvesCriticalPathLen(MachineBasicBlock MBB, MachineInstr Root,		improvesCriticalPathLen(MachineBasicBlock MBB, MachineInstr Root,
MachineTraceMetrics::Trace BlockTrace,		MachineTraceMetrics::Trace BlockTrace,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	MachineInstr *MachineCombiner::getOperandDef(const MachineOperand &MO) {
if (MO.isReg() && Register::isVirtualRegister(MO.getReg()))		if (MO.isReg() && Register::isVirtualRegister(MO.getReg()))
DefInstr = MRI->getUniqueVRegDef(MO.getReg());		DefInstr = MRI->getUniqueVRegDef(MO.getReg());
// PHI's have no depth etc.		// PHI's have no depth etc.
if (DefInstr && DefInstr->isPHI())		if (DefInstr && DefInstr->isPHI())
DefInstr = nullptr;		DefInstr = nullptr;
return DefInstr;		return DefInstr;
}		}

		/// Check if MI is a COPY instruction, and its src and dst registers can be
		/// coalesced.
		bool MachineCombiner::isCoalescableCopy(MachineInstr *MI) {
		if (!MI->isCopy())
		return false;

		Register Dst = MI->getOperand(0).getReg();
		Register Src = MI->getOperand(1).getReg();

		if (!MI->isFullCopy()) {
		// If src RC contains super registers of dst RC, it can also be coalesced.
		if (MI->getOperand(0).getSubReg() \|\| Src.isPhysical() \|\| Dst.isPhysical())
		return false;

		auto SrcSub = MI->getOperand(1).getSubReg();
		auto SrcRC = MRI->getRegClass(Src);
		auto DstRC = MRI->getRegClass(Dst);
		return TRI->getMatchingSuperRegClass(SrcRC, DstRC, SrcSub) != nullptr;
		}

		if (Src.isPhysical() && Dst.isPhysical())
		return Src == Dst;

		if (Src.isVirtual() && Dst.isVirtual()) {
		auto SrcRC = MRI->getRegClass(Src);
		auto DstRC = MRI->getRegClass(Dst);
		return SrcRC->hasSuperClassEq(DstRC) \|\| SrcRC->hasSubClassEq(DstRC);
		}

		if (Src.isVirtual())
		std::swap(Src, Dst);

		// Now Src is physical register, Dst is virtual register.
		auto DstRC = MRI->getRegClass(Dst);
		return DstRC->contains(Src);
		}

/// Computes depth of instructions in vector \InsInstr.		/// Computes depth of instructions in vector \InsInstr.
///		///
/// \param InsInstrs is a vector of machine instructions		/// \param InsInstrs is a vector of machine instructions
/// \param InstrIdxForVirtReg is a dense map of virtual register to index		/// \param InstrIdxForVirtReg is a dense map of virtual register to index
/// of defining machine instruction in \p InsInstrs		/// of defining machine instruction in \p InsInstrs
/// \param BlockTrace is a trace of machine instructions		/// \param BlockTrace is a trace of machine instructions
///		///
/// \returns Depth of last instruction in \InsInstrs ("NewRoot")		/// \returns Depth of the first and last instruction in \InsInstrs ("NewRoot")
unsigned		std::pair<unsigned, unsigned>
MachineCombiner::getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,		MachineCombiner::getDepth(SmallVectorImpl<MachineInstr *> &InsInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineTraceMetrics::Trace BlockTrace) {		MachineTraceMetrics::Trace BlockTrace) {
SmallVector<unsigned, 16> InstrDepth;		SmallVector<unsigned, 16> InstrDepth;
assert(TSchedModel.hasInstrSchedModelOrItineraries() &&		assert(TSchedModel.hasInstrSchedModelOrItineraries() &&
"Missing machine model\n");		"Missing machine model\n");

// For each instruction in the new sequence compute the depth based on the		// For each instruction in the new sequence compute the depth based on the
Show All 21 Lines	for (const MachineOperand &MO : InstrPtr->operands()) {
int DefIdx = DefInstr->findRegisterDefOperandIdx(MO.getReg());		int DefIdx = DefInstr->findRegisterDefOperandIdx(MO.getReg());
int UseIdx = InstrPtr->findRegisterUseOperandIdx(MO.getReg());		int UseIdx = InstrPtr->findRegisterUseOperandIdx(MO.getReg());
LatencyOp = TSchedModel.computeOperandLatency(DefInstr, DefIdx,		LatencyOp = TSchedModel.computeOperandLatency(DefInstr, DefIdx,
InstrPtr, UseIdx);		InstrPtr, UseIdx);
} else {		} else {
MachineInstr *DefInstr = getOperandDef(MO);		MachineInstr *DefInstr = getOperandDef(MO);
if (DefInstr) {		if (DefInstr) {
DepthOp = BlockTrace.getInstrCycles(*DefInstr).Depth;		DepthOp = BlockTrace.getInstrCycles(*DefInstr).Depth;
		if (!isCoalescableCopy(DefInstr))
LatencyOp = TSchedModel.computeOperandLatency(		LatencyOp = TSchedModel.computeOperandLatency(
DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),		DefInstr, DefInstr->findRegisterDefOperandIdx(MO.getReg()),
InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));		InstrPtr, InstrPtr->findRegisterUseOperandIdx(MO.getReg()));
}		}
}		}
IDepth = std::max(IDepth, DepthOp + LatencyOp);		IDepth = std::max(IDepth, DepthOp + LatencyOp);
}		}
InstrDepth.push_back(IDepth);		InstrDepth.push_back(IDepth);
}		}
unsigned NewRootIdx = InsInstrs.size() - 1;		unsigned NewRootIdx = InsInstrs.size() - 1;
return InstrDepth[NewRootIdx];		return {InstrDepth[0], InstrDepth[NewRootIdx]};
}		}

/// Computes instruction latency as max of latency of defined operands.		/// Computes instruction latency as max of latency of defined operands.
///		///
/// \param Root is a machine instruction that could be replaced by NewRoot.		/// \param Root is a machine instruction that could be replaced by NewRoot.
/// It is used to compute a more accurate latency information for NewRoot in		/// It is used to compute a more accurate latency information for NewRoot in
/// case there is a dependent instruction in the same trace (\p BlockTrace)		/// case there is a dependent instruction in the same trace (\p BlockTrace)
/// \param NewRoot is the instruction for which the latency is computed		/// \param NewRoot is the instruction for which the latency is computed
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	bool MachineCombiner::improvesCriticalPathLen(
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg,
MachineCombinerPattern Pattern,		MachineCombinerPattern Pattern,
bool SlackIsAccurate) {		bool SlackIsAccurate) {
assert(TSchedModel.hasInstrSchedModelOrItineraries() &&		assert(TSchedModel.hasInstrSchedModelOrItineraries() &&
"Missing machine model\n");		"Missing machine model\n");
// Get depth and latency of NewRoot and Root.		// Get depth and latency of NewRoot and Root.
unsigned NewRootDepth = getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);		unsigned NewFirstDepth, NewRootDepth;
		std::tie(NewFirstDepth, NewRootDepth) =
		getDepth(InsInstrs, InstrIdxForVirtReg, BlockTrace);
unsigned RootDepth = BlockTrace.getInstrCycles(*Root).Depth;		unsigned RootDepth = BlockTrace.getInstrCycles(*Root).Depth;
		unsigned FirstDepth = BlockTrace.getInstrCycles(*DelInstrs[0]).Depth;

LLVM_DEBUG(dbgs() << " Dependence data for " << *Root << "\tNewRootDepth: "		LLVM_DEBUG(dbgs() << " Dependence data for " << *Root << "\tNewRootDepth: "
<< NewRootDepth << "\tRootDepth: " << RootDepth);		<< NewRootDepth << "\tRootDepth: " << RootDepth);

// For a transform such as reassociation, the cost equation is		// For a transform such as reassociation, the cost equation is
// conservatively calculated so that we must improve the depth (data		// conservatively calculated so that we must improve the depth (data
// dependency cycles) in the critical path to proceed with the transform.		// dependency cycles) in the critical path to proceed with the transform.
// Being conservative also protects against inaccuracies in the underlying		// Being conservative also protects against inaccuracies in the underlying
Show All 11 Lines	bool MachineCombiner::improvesCriticalPathLen(
// even if the instruction depths (data dependency cycles) become worse.		// even if the instruction depths (data dependency cycles) become worse.

// Account for the latency of the inserted and deleted instructions by		// Account for the latency of the inserted and deleted instructions by
unsigned NewRootLatency, RootLatency;		unsigned NewRootLatency, RootLatency;
std::tie(NewRootLatency, RootLatency) =		std::tie(NewRootLatency, RootLatency) =
getLatenciesForInstrSequences(*Root, InsInstrs, DelInstrs, BlockTrace);		getLatenciesForInstrSequences(*Root, InsInstrs, DelInstrs, BlockTrace);

unsigned RootSlack = BlockTrace.getInstrSlack(*Root);		unsigned RootSlack = BlockTrace.getInstrSlack(*Root);
unsigned NewCycleCount = NewRootDepth + NewRootLatency;		unsigned NewCycleCount = NewFirstDepth + NewRootLatency;
unsigned OldCycleCount =		unsigned OldCycleCount =
RootDepth + RootLatency + (SlackIsAccurate ? RootSlack : 0);		FirstDepth + RootLatency + (SlackIsAccurate ? RootSlack : 0);
LLVM_DEBUG(dbgs() << "\n\tNewRootLatency: " << NewRootLatency		LLVM_DEBUG(dbgs() << "\n\tNewRootLatency: " << NewRootLatency
<< "\tRootLatency: " << RootLatency << "\n\tRootSlack: "		<< "\tRootLatency: " << RootLatency << "\n\tRootSlack: "
<< RootSlack << " SlackIsAccurate=" << SlackIsAccurate		<< RootSlack << " SlackIsAccurate=" << SlackIsAccurate
<< "\n\tNewRootDepth + NewRootLatency = " << NewCycleCount		<< "\n\tNewRootDepth + NewRootLatency = " << NewCycleCount
<< "\n\tRootDepth + RootLatency + RootSlack = "		<< "\n\tRootDepth + RootLatency + RootSlack = "
<< OldCycleCount;);		<< OldCycleCount;);
LLVM_DEBUG(NewCycleCount <= OldCycleCount		LLVM_DEBUG(NewCycleCount <= OldCycleCount
? dbgs() << "\n\t It IMPROVES PathLen because"		? dbgs() << "\n\t It IMPROVES PathLen because"
: dbgs() << "\n\t It DOES NOT improve PathLen because");		: dbgs() << "\n\t It DOES NOT improve PathLen because");
LLVM_DEBUG(dbgs() << "\n\t\tNewCycleCount = " << NewCycleCount		LLVM_DEBUG(dbgs() << "\n\t\tNewCycleCount = " << NewCycleCount
<< ", OldCycleCount = " << OldCycleCount << "\n");		<< ", OldCycleCount = " << OldCycleCount << "\n");

return NewCycleCount <= OldCycleCount;		if (NewCycleCount == OldCycleCount)
		return InsInstrs.size() < DelInstrs.size();
		else
		return NewCycleCount < OldCycleCount;
}		}

/// helper routine to convert instructions into SC		/// helper routine to convert instructions into SC
void MachineCombiner::instr2instrSC(		void MachineCombiner::instr2instrSC(
SmallVectorImpl<MachineInstr *> &Instrs,		SmallVectorImpl<MachineInstr *> &Instrs,
SmallVectorImpl<const MCSchedClassDesc *> &InstrsSC) {		SmallVectorImpl<const MCSchedClassDesc *> &InstrsSC) {
for (auto *InstrPtr : Instrs) {		for (auto *InstrPtr : Instrs) {
unsigned Opc = InstrPtr->getOpcode();		unsigned Opc = InstrPtr->getOpcode();
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir

	# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=UNPROFITABLE,ALL %s			# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=cortex-a57 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=UNPROFITABLE %s
	# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=falkor -enable-unsafe-fp-math %s -machine-combiner-verify-pattern-order=true \| FileCheck --check-prefixes=PROFITABLE,ALL %s			# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=falkor -enable-unsafe-fp-math %s -machine-combiner-verify-pattern-order=true \| FileCheck --check-prefixes=PROFITABLE %s
	# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=exynos-m3 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=PROFITABLE,ALL %s			# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=exynos-m3 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=EXYNOS %s
	# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=thunderx2t99 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=PROFITABLE,ALL %s			# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=thunderx2t99 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=THUNDER2 %s
	# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=thunderx3t110 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=PROFITABLE,ALL %s			# RUN: llc -run-pass=machine-combiner -o - -mtriple=aarch64-unknown-linux -mcpu=thunderx3t110 -enable-unsafe-fp-math -machine-combiner-verify-pattern-order=true %s \| FileCheck --check-prefixes=THUNDER3 %s
	#			#
	name: f1_2s			name: f1_2s
	registers:			registers:
	- { id: 0, class: fpr64 }			- { id: 0, class: fpr64 }
	- { id: 1, class: fpr64 }			- { id: 1, class: fpr64 }
	- { id: 2, class: fpr64 }			- { id: 2, class: fpr64 }
	- { id: 3, class: fpr64 }			- { id: 3, class: fpr64 }
	- { id: 4, class: fpr64 }			- { id: 4, class: fpr64 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%2:fpr64 = COPY $d2			%2:fpr64 = COPY $d2
	%1:fpr64 = COPY $d1			%1:fpr64 = COPY $d1
	%0:fpr64 = COPY $d0			%0:fpr64 = COPY $d0
	%3:fpr64 = FMULv2f32 %0, %1			%3:fpr64 = FMULv2f32 %0, %1
	%4:fpr64 = FSUBv2f32 killed %3, %2			%4:fpr64 = FSUBv2f32 killed %3, %2
	$d0 = COPY %4			$d0 = COPY %4
	RET_ReallyLR implicit $d0			RET_ReallyLR implicit $d0

	...			...
	# UNPROFITABLE-LABEL: name: f1_2s			# UNPROFITABLE-LABEL: name: f1_2s
	# UNPROFITABLE: %3:fpr64 = FMULv2f32 %0, %1			# UNPROFITABLE: %3:fpr64 = FMULv2f32 %0, %1
	# UNPROFITABLE-NEXT: FSUBv2f32 killed %3, %2			# UNPROFITABLE-NEXT: FSUBv2f32 killed %3, %2
	#			#
				# THUNDER2-LABEL: name: f1_2s
				# THUNDER2: [[R1:%[0-9]+]]:fpr64 = FNEGv2f32 %2
				# THUNDER2-NEXT: FMLAv2f32 killed [[R1]], %0, %1
				#
				# THUNDER3-LABEL: name: f1_2s
				# THUNDER3: [[R1:%[0-9]+]]:fpr64 = FNEGv2f32 %2
				# THUNDER3-NEXT: FMLAv2f32 killed [[R1]], %0, %1
				#
	# PROFITABLE-LABEL: name: f1_2s			# PROFITABLE-LABEL: name: f1_2s
	# PROFITABLE: [[R1:%[0-9]+]]:fpr64 = FNEGv2f32 %2			# PROFITABLE: [[R1:%[0-9]+]]:fpr64 = FNEGv2f32 %2
	# PROFITABLE-NEXT: FMLAv2f32 killed [[R1]], %0, %1			# PROFITABLE-NEXT: FMLAv2f32 killed [[R1]], %0, %1
				#
				# EXYNOS-LABEL: name: f1_2s
				# EXYNOS: %3:fpr64 = FMULv2f32 %0, %1
				# EXYNOS-NEXT: FSUBv2f32 killed %3, %
	---			---
	name: f1_4s			name: f1_4s
	registers:			registers:
	- { id: 0, class: fpr128 }			- { id: 0, class: fpr128 }
	- { id: 1, class: fpr128 }			- { id: 1, class: fpr128 }
	- { id: 2, class: fpr128 }			- { id: 2, class: fpr128 }
	- { id: 3, class: fpr128 }			- { id: 3, class: fpr128 }
	- { id: 4, class: fpr128 }			- { id: 4, class: fpr128 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%2:fpr128 = COPY $q2			%2:fpr128 = COPY $q2
	%1:fpr128 = COPY $q1			%1:fpr128 = COPY $q1
	%0:fpr128 = COPY $q0			%0:fpr128 = COPY $q0
	%3:fpr128 = FMULv4f32 %0, %1			%3:fpr128 = FMULv4f32 %0, %1
	%4:fpr128 = FSUBv4f32 killed %3, %2			%4:fpr128 = FSUBv4f32 killed %3, %2
	$q0 = COPY %4			$q0 = COPY %4
	RET_ReallyLR implicit $q0			RET_ReallyLR implicit $q0

	...			...
	# UNPROFITABLE-LABEL: name: f1_4s			# UNPROFITABLE-LABEL: name: f1_4s
	# UNPROFITABLE: %3:fpr128 = FMULv4f32 %0, %1			# UNPROFITABLE: %3:fpr128 = FMULv4f32 %0, %1
	# UNPROFITABLE-NEXT: FSUBv4f32 killed %3, %2			# UNPROFITABLE-NEXT: FSUBv4f32 killed %3, %2
	#			#
				# THUNDER2-LABEL: name: f1_4s
				# THUNDER2: [[R1:%[0-9]+]]:fpr128 = FNEGv4f32 %2
				# THUNDER2-NEXT: FMLAv4f32 killed [[R1]], %0, %1
				#
				# THUNDER3-LABEL: name: f1_4s
				# THUNDER3: [[R1:%[0-9]+]]:fpr128 = FNEGv4f32 %2
				# THUNDER3-NEXT: FMLAv4f32 killed [[R1]], %0, %1
				#
	# PROFITABLE-LABEL: name: f1_4s			# PROFITABLE-LABEL: name: f1_4s
	# PROFITABLE: [[R1:%[0-9]+]]:fpr128 = FNEGv4f32 %2			# PROFITABLE: [[R1:%[0-9]+]]:fpr128 = FNEGv4f32 %2
	# PROFITABLE-NEXT: FMLAv4f32 killed [[R1]], %0, %1			# PROFITABLE-NEXT: FMLAv4f32 killed [[R1]], %0, %1
				#
				# EXYNOS-LABEL: name: f1_4s
				# EXYNOS: %3:fpr128 = FMULv4f32 %0, %1
				# EXYNOS-NEXT: FSUBv4f32 killed %3, %2
	---			---
	name: f1_2d			name: f1_2d
	registers:			registers:
	- { id: 0, class: fpr128 }			- { id: 0, class: fpr128 }
	- { id: 1, class: fpr128 }			- { id: 1, class: fpr128 }
	- { id: 2, class: fpr128 }			- { id: 2, class: fpr128 }
	- { id: 3, class: fpr128 }			- { id: 3, class: fpr128 }
	- { id: 4, class: fpr128 }			- { id: 4, class: fpr128 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%2:fpr128 = COPY $q2			%2:fpr128 = COPY $q2
	%1:fpr128 = COPY $q1			%1:fpr128 = COPY $q1
	%0:fpr128 = COPY $q0			%0:fpr128 = COPY $q0
	%3:fpr128 = FMULv2f64 %0, %1			%3:fpr128 = FMULv2f64 %0, %1
	%4:fpr128 = FSUBv2f64 killed %3, %2			%4:fpr128 = FSUBv2f64 killed %3, %2
	$q0 = COPY %4			$q0 = COPY %4
	RET_ReallyLR implicit $q0			RET_ReallyLR implicit $q0

	...			...
	# UNPROFITABLE-LABEL: name: f1_2d			# UNPROFITABLE-LABEL: name: f1_2d
	# UNPROFITABLE: %3:fpr128 = FMULv2f64 %0, %1			# UNPROFITABLE: %3:fpr128 = FMULv2f64 %0, %1
	# UNPROFITABLE-NEXT: FSUBv2f64 killed %3, %2			# UNPROFITABLE-NEXT: FSUBv2f64 killed %3, %2
	#			#
				# THUNDER2-LABEL: name: f1_2d
				# THUNDER2: [[R1:%[0-9]+]]:fpr128 = FNEGv2f64 %2
				# THUNDER2-NEXT: FMLAv2f64 killed [[R1]], %0, %1
				#
				# THUNDER3-LABEL: name: f1_2d
				# THUNDER3: [[R1:%[0-9]+]]:fpr128 = FNEGv2f64 %2
				# THUNDER3-NEXT: FMLAv2f64 killed [[R1]], %0, %1
				#
	# PROFITABLE-LABEL: name: f1_2d			# PROFITABLE-LABEL: name: f1_2d
	# PROFITABLE: [[R1:%[0-9]+]]:fpr128 = FNEGv2f64 %2			# PROFITABLE: [[R1:%[0-9]+]]:fpr128 = FNEGv2f64 %2
	# PROFITABLE-NEXT: FMLAv2f64 killed [[R1]], %0, %1			# PROFITABLE-NEXT: FMLAv2f64 killed [[R1]], %0, %1
				#
				# EXYNOS-LABEL: name: f1_2d
				# EXYNOS: %3:fpr128 = FMULv2f64 %0, %1
				# EXYNOS-NEXT: FSUBv2f64 killed %3, %2
	---			---
	name: f1_both_fmul_2s			name: f1_both_fmul_2s
	registers:			registers:
	- { id: 0, class: fpr64 }			- { id: 0, class: fpr64 }
	- { id: 1, class: fpr64 }			- { id: 1, class: fpr64 }
	- { id: 2, class: fpr64 }			- { id: 2, class: fpr64 }
	- { id: 3, class: fpr64 }			- { id: 3, class: fpr64 }
	- { id: 4, class: fpr64 }			- { id: 4, class: fpr64 }
	- { id: 5, class: fpr64 }			- { id: 5, class: fpr64 }
	- { id: 6, class: fpr64 }			- { id: 6, class: fpr64 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%3:fpr64 = COPY $q3			%3:fpr64 = COPY $q3
	%2:fpr64 = COPY $q2			%2:fpr64 = COPY $q2
	%1:fpr64 = COPY $q1			%1:fpr64 = COPY $q1
	%0:fpr64 = COPY $q0			%0:fpr64 = COPY $q0
	%4:fpr64 = FMULv2f32 %0, %1			%4:fpr64 = FMULv2f32 %0, %1
	%5:fpr64 = FMULv2f32 %2, %3			%5:fpr64 = FMULv2f32 %2, %3
	%6:fpr64 = FSUBv2f32 killed %4, %5			%6:fpr64 = FSUBv2f32 killed %4, %5
	$q0 = COPY %6			$q0 = COPY %6
	RET_ReallyLR implicit $q0			RET_ReallyLR implicit $q0

	...			...
	# ALL-LABEL: name: f1_both_fmul_2s			# UNPROFITABLE-LABEL: name: f1_both_fmul_2s
	# ALL: %4:fpr64 = FMULv2f32 %0, %1			# UNPROFITABLE: %4:fpr64 = FMULv2f32 %0, %1
	# ALL-NEXT: FMLSv2f32 killed %4, %2, %3			# UNPROFITABLE-NEXT: %6:fpr64 = FMLSv2f32 killed %4, %2, %3
				#
				# THUNDER2-LABEL: name: f1_both_fmul_2s
				# THUNDER2: %4:fpr64 = FMULv2f32 %0, %1
				# THUNDER2-NEXT: %6:fpr64 = FMLSv2f32 killed %4, %2, %3
				#
				# THUNDER3-LABEL: name: f1_both_fmul_2s
				# THUNDER3: %4:fpr64 = FMULv2f32 %0, %1
				# THUNDER3-NEXT: %6:fpr64 = FMLSv2f32 killed %4, %2, %3
				#
				# PROFITABLE-LABEL: name: f1_both_fmul_2s
				# PROFITABLE: %4:fpr64 = FMULv2f32 %0, %1
				# PROFITABLE-NEXT: %5:fpr64 = FMULv2f32 %2, %3
				# PROFITABLE-NEXT: FSUBv2f32 killed %4, %5
				#
				# EXYNOS-LABEL: name: f1_both_fmul_2s
				# EXYNOS: %4:fpr64 = FMULv2f32 %0, %1
				# EXYNOS-NEXT: %5:fpr64 = FMULv2f32 %2, %3
				# EXYNOS-NEXT: FSUBv2f32 killed %4, %5
	---			---
	name: f1_both_fmul_4s			name: f1_both_fmul_4s
	registers:			registers:
	- { id: 0, class: fpr128 }			- { id: 0, class: fpr128 }
	- { id: 1, class: fpr128 }			- { id: 1, class: fpr128 }
	- { id: 2, class: fpr128 }			- { id: 2, class: fpr128 }
	- { id: 3, class: fpr128 }			- { id: 3, class: fpr128 }
	- { id: 4, class: fpr128 }			- { id: 4, class: fpr128 }
	- { id: 5, class: fpr128 }			- { id: 5, class: fpr128 }
	- { id: 6, class: fpr128 }			- { id: 6, class: fpr128 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%3:fpr128 = COPY $q3			%3:fpr128 = COPY $q3
	%2:fpr128 = COPY $q2			%2:fpr128 = COPY $q2
	%1:fpr128 = COPY $q1			%1:fpr128 = COPY $q1
	%0:fpr128 = COPY $q0			%0:fpr128 = COPY $q0
	%4:fpr128 = FMULv4f32 %0, %1			%4:fpr128 = FMULv4f32 %0, %1
	%5:fpr128 = FMULv4f32 %2, %3			%5:fpr128 = FMULv4f32 %2, %3
	%6:fpr128 = FSUBv4f32 killed %4, %5			%6:fpr128 = FSUBv4f32 killed %4, %5
	$q0 = COPY %6			$q0 = COPY %6
	RET_ReallyLR implicit $q0			RET_ReallyLR implicit $q0

	...			...
	# ALL-LABEL: name: f1_both_fmul_4s			# UNPROFITABLE-LABEL: name: f1_both_fmul_4s
	# ALL: %4:fpr128 = FMULv4f32 %0, %1			# UNPROFITABLE: %4:fpr128 = FMULv4f32 %0, %1
	# ALL-NEXT: FMLSv4f32 killed %4, %2, %3			# UNPROFITABLE-NEXT: %6:fpr128 = FMLSv4f32 killed %4, %2, %3
				#
				# THUNDER2-LABEL: name: f1_both_fmul_4s
				# THUNDER2: %4:fpr128 = FMULv4f32 %0, %1
				# THUNDER2-NEXT: %6:fpr128 = FMLSv4f32 killed %4, %2, %3
				#
				# THUNDER3-LABEL: name: f1_both_fmul_4s
				# THUNDER3: %4:fpr128 = FMULv4f32 %0, %1
				# THUNDER3-NEXT: %6:fpr128 = FMLSv4f32 killed %4, %2, %3
				#
				# PROFITABLE-LABEL: name: f1_both_fmul_4s
				# PROFITABLE: %4:fpr128 = FMULv4f32 %0, %1
				# PROFITABLE-NEXT: %5:fpr128 = FMULv4f32 %2, %3
				# PROFITABLE-NEXT: FSUBv4f32 killed %4, %5
				#
				# EXYNOS-LABEL: name: f1_both_fmul_4s
				# EXYNOS: %4:fpr128 = FMULv4f32 %0, %1
				# EXYNOS-NEXT: %5:fpr128 = FMULv4f32 %2, %3
				# EXYNOS-NEXT: FSUBv4f32 killed %4, %5
	---			---
	name: f1_both_fmul_2d			name: f1_both_fmul_2d
	registers:			registers:
	- { id: 0, class: fpr128 }			- { id: 0, class: fpr128 }
	- { id: 1, class: fpr128 }			- { id: 1, class: fpr128 }
	- { id: 2, class: fpr128 }			- { id: 2, class: fpr128 }
	- { id: 3, class: fpr128 }			- { id: 3, class: fpr128 }
	- { id: 4, class: fpr128 }			- { id: 4, class: fpr128 }
	- { id: 5, class: fpr128 }			- { id: 5, class: fpr128 }
	- { id: 6, class: fpr128 }			- { id: 6, class: fpr128 }
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	%3:fpr128 = COPY $q3			%3:fpr128 = COPY $q3
	%2:fpr128 = COPY $q2			%2:fpr128 = COPY $q2
	%1:fpr128 = COPY $q1			%1:fpr128 = COPY $q1
	%0:fpr128 = COPY $q0			%0:fpr128 = COPY $q0
	%4:fpr128 = FMULv2f64 %0, %1			%4:fpr128 = FMULv2f64 %0, %1
	%5:fpr128 = FMULv2f64 %2, %3			%5:fpr128 = FMULv2f64 %2, %3
	%6:fpr128 = FSUBv2f64 killed %4, %5			%6:fpr128 = FSUBv2f64 killed %4, %5
	$q0 = COPY %6			$q0 = COPY %6
	RET_ReallyLR implicit $q0			RET_ReallyLR implicit $q0

	...			...
	# ALL-LABEL: name: f1_both_fmul_2d			# UNPROFITABLE-LABEL: name: f1_both_fmul_2d
	# ALL: %4:fpr128 = FMULv2f64 %0, %1			# UNPROFITABLE: %4:fpr128 = FMULv2f64 %0, %1
	# ALL-NEXT: FMLSv2f64 killed %4, %2, %3			# UNPROFITABLE-NEXT: %6:fpr128 = FMLSv2f64 killed %4, %2, %3
				#
				# THUNDER2-LABEL: name: f1_both_fmul_2d
				# THUNDER2: %4:fpr128 = FMULv2f64 %0, %1
				# THUNDER2-NEXT: %6:fpr128 = FMLSv2f64 killed %4, %2, %3
				#
				# THUNDER3-LABEL: name: f1_both_fmul_2d
				# THUNDER3: %4:fpr128 = FMULv2f64 %0, %1
				# THUNDER3-NEXT: %6:fpr128 = FMLSv2f64 killed %4, %2, %3
				#
				# PROFITABLE-LABEL: name: f1_both_fmul_2d
				# PROFITABLE: %4:fpr128 = FMULv2f64 %0, %1
				# PROFITABLE-NEXT: %5:fpr128 = FMULv2f64 %2, %3
				# PROFITABLE-NEXT: FSUBv2f64 killed %4, %5
				#
				# EXYNOS-LABEL: name: f1_both_fmul_2d
				# EXYNOS: %4:fpr128 = FMULv2f64 %0, %1
				# EXYNOS-NEXT: %5:fpr128 = FMULv2f64 %2, %3
				# EXYNOS-NEXT: FSUBv2f64 killed %4, %5

llvm/test/CodeGen/AArch64/arm64-fma-combines.ll

	; RUN: llc < %s -O=3 -mtriple=arm64-apple-ios -mcpu=cyclone -mattr=+fullfp16 -enable-unsafe-fp-math -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -O=3 -mtriple=arm64-apple-ios -mcpu=cyclone -mattr=+fullfp16 -enable-unsafe-fp-math -verify-machineinstrs \| FileCheck %s

	define void @foo_2d(double* %src) {			define void @foo_2d(double* %src) {
	; CHECK-LABEL: %entry			; CHECK-LABEL: %entry
	; CHECK: fmul {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}			; CHECK: fmul {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
	; CHECK: fmadd {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}			; CHECK: fmul {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
				; CHECK-NEXT: fadd {{d[0-9]+}}, {{d[0-9]+}}, {{d[0-9]+}}
	entry:			entry:
	%arrayidx1 = getelementptr inbounds double, double* %src, i64 5			%arrayidx1 = getelementptr inbounds double, double* %src, i64 5
	%arrayidx2 = getelementptr inbounds double, double* %src, i64 11			%arrayidx2 = getelementptr inbounds double, double* %src, i64 11
	%tmp = bitcast double* %arrayidx1 to <2 x double>*			%tmp = bitcast double* %arrayidx1 to <2 x double>*
	%tmp1 = load double, double* %arrayidx2, align 8			%tmp1 = load double, double* %arrayidx2, align 8
	%tmp2 = load double, double* %arrayidx1, align 8			%tmp2 = load double, double* %arrayidx1, align 8
	%fmul = fmul fast double %tmp1, %tmp1			%fmul = fmul fast double %tmp1, %tmp1
	%fmul2 = fmul fast double %tmp2, 0x3F94AFD6A052BF5B			%fmul2 = fmul fast double %tmp2, 0x3F94AFD6A052BF5B
	▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/i128-math.ll

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%1 = tail call i128 @llvm.ssub.sat.i128(i128 %x, i128 %y)		%1 = tail call i128 @llvm.ssub.sat.i128(i128 %x, i128 %y)
ret i128 %1		ret i128 %1
}		}

define i128 @u128_mul(i128 %x, i128 %y) {		define i128 @u128_mul(i128 %x, i128 %y) {
; CHECK-LABEL: u128_mul:		; CHECK-LABEL: u128_mul:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: umulh x8, x0, x2		; CHECK-NEXT: umulh x8, x0, x2
		; CHECK-NEXT: mul x9, x1, x2
; CHECK-NEXT: madd x8, x0, x3, x8		; CHECK-NEXT: madd x8, x0, x3, x8
; CHECK-NEXT: mul x0, x0, x2		; CHECK-NEXT: mul x0, x0, x2
; CHECK-NEXT: madd x1, x1, x2, x8		; CHECK-NEXT: add x1, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = mul i128 %x, %y		%1 = mul i128 %x, %y
ret i128 %1		ret i128 %1
}		}

define { i128, i8 } @u128_checked_mul(i128 %x, i128 %y) {		define { i128, i8 } @u128_checked_mul(i128 %x, i128 %y) {
; CHECK-LABEL: u128_checked_mul:		; CHECK-LABEL: u128_checked_mul:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%4 = select i1 %3, i128 -1, i128 %2		%4 = select i1 %3, i128 -1, i128 %2
ret i128 %4		ret i128 %4
}		}

define i128 @i128_mul(i128 %x, i128 %y) {		define i128 @i128_mul(i128 %x, i128 %y) {
; CHECK-LABEL: i128_mul:		; CHECK-LABEL: i128_mul:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: umulh x8, x0, x2		; CHECK-NEXT: umulh x8, x0, x2
		; CHECK-NEXT: mul x9, x1, x2
; CHECK-NEXT: madd x8, x0, x3, x8		; CHECK-NEXT: madd x8, x0, x3, x8
; CHECK-NEXT: mul x0, x0, x2		; CHECK-NEXT: mul x0, x0, x2
; CHECK-NEXT: madd x1, x1, x2, x8		; CHECK-NEXT: add x1, x8, x9
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = mul i128 %x, %y		%1 = mul i128 %x, %y
ret i128 %1		ret i128 %1
}		}

define { i128, i8 } @i128_checked_mul(i128 %x, i128 %y) {		define { i128, i8 } @i128_checked_mul(i128 %x, i128 %y) {
; CHECK-LABEL: i128_checked_mul:		; CHECK-LABEL: i128_checked_mul:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/machine-combiner-madd.ll

	; Test all AArch64 subarches with scheduling models.			; Test all AArch64 subarches with scheduling models.
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=a64fx < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=a64fx < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a57 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a57 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a72 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a72 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a73 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cortex-a73 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cyclone < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=cyclone < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=exynos-m3 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=kryo < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=kryo < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=thunderx2t99 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=thunderx2t99 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=thunderx3t110 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=thunderx3t110 < %s \| FileCheck %s
	; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=tsv110 < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-linux-gnu -mcpu=tsv110 < %s \| FileCheck %s

	; Make sure that inst-combine fuses the multiply add in the addressing mode of			; Make sure that machine combiner doesn't fuse the multiply add because the
	; the load.			; latency of max(mul, load)+add is shorter than load+madd.

	; CHECK-LABEL: fun:			; CHECK-LABEL: fun:
	; CHECK-NOT: mul			; CHECK: mul
	; CHECK: madd			; CHECK-NOT: madd
	; CHECK-NOT: mul

	%class.D = type { %class.basic_string.base, [4 x i8] }			%class.D = type { %class.basic_string.base, [4 x i8] }
	%class.basic_string.base = type <{ i64, i64, i32 }>			%class.basic_string.base = type <{ i64, i64, i32 }>
	@a = global %class.D* zeroinitializer, align 8			@a = global %class.D* zeroinitializer, align 8
	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1)			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i1)
	define internal void @fun() section ".text.startup" {			define internal void @fun() section ".text.startup" {
	entry:			entry:
	%tmp.i.i = alloca %class.D, align 8			%tmp.i.i = alloca %class.D, align 8
	Show All 15 Lines

llvm/test/CodeGen/AArch64/madd-combiner.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-apple-darwin -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-ISEL			; RUN: llc -mtriple=aarch64-apple-darwin -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-ISEL
	; RUN: llc -mtriple=aarch64-apple-darwin -fast-isel -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-FAST			; RUN: llc -mtriple=aarch64-apple-darwin -fast-isel -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=CHECK,CHECK-FAST

	; Test that we use the correct register class.			; Test that we use the correct register class.
	define i32 @mul_add_imm(i32 %a, i32 %b) {			define i32 @mul_add_imm(i32 %a, i32 %b) {
	; CHECK-LABEL: mul_add_imm:			; CHECK-LABEL: mul_add_imm:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: orr w8, wzr, #0x4			; CHECK-NEXT: mul w8, w0, w1
	; CHECK-NEXT: madd w0, w0, w1, w8			; CHECK-NEXT: add w0, w8, #4
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = mul i32 %a, %b			%1 = mul i32 %a, %b
	%2 = add i32 %1, 4			%2 = add i32 %1, 4
	ret i32 %2			ret i32 %2
	}			}

	define i32 @mul_sub_imm1(i32 %a, i32 %b) {			define i32 @mul_sub_imm1(i32 %a, i32 %b) {
	; CHECK-LABEL: mul_sub_imm1:			; CHECK-LABEL: mul_sub_imm1:
	▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/madd-lohi.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=arm64-apple-ios7.0 %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios7.0 %s -o - \| FileCheck %s
	; RUN: llc -mtriple=aarch64_be-linux-gnu %s -o - \| FileCheck --check-prefix=CHECK-BE %s			; RUN: llc -mtriple=aarch64_be-linux-gnu %s -o - \| FileCheck --check-prefix=CHECK-BE %s

	define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {			define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {
	; CHECK-LABEL: test_128bitmul:			; CHECK-LABEL: test_128bitmul:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: umulh x8, x0, x2			; CHECK-NEXT: umulh x8, x0, x2
				; CHECK-NEXT: mul x9, x1, x2
	; CHECK-NEXT: madd x8, x0, x3, x8			; CHECK-NEXT: madd x8, x0, x3, x8
	; CHECK-NEXT: mul x0, x0, x2			; CHECK-NEXT: mul x0, x0, x2
	; CHECK-NEXT: madd x1, x1, x2, x8			; CHECK-NEXT: add x1, x8, x9
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; CHECK-BE-LABEL: test_128bitmul:			; CHECK-BE-LABEL: test_128bitmul:
	; CHECK-BE: // %bb.0:			; CHECK-BE: // %bb.0:
	; CHECK-BE-NEXT: umulh x8, x1, x3			; CHECK-BE-NEXT: umulh x8, x1, x3
				; CHECK-BE-NEXT: mul x9, x0, x3
	; CHECK-BE-NEXT: madd x8, x1, x2, x8			; CHECK-BE-NEXT: madd x8, x1, x2, x8
	; CHECK-BE-NEXT: mul x1, x1, x3			; CHECK-BE-NEXT: mul x1, x1, x3
	; CHECK-BE-NEXT: madd x0, x0, x3, x8			; CHECK-BE-NEXT: add x0, x8, x9
	; CHECK-BE-NEXT: ret			; CHECK-BE-NEXT: ret


	%prod = mul i128 %lhs, %rhs			%prod = mul i128 %lhs, %rhs
	ret i128 %prod			ret i128 %prod
	}			}

llvm/test/CodeGen/AArch64/mul-lohi.ll

	; RUN: llc -mtriple=arm64-apple-ios7.0 -mcpu=cyclone %s -o - \| FileCheck %s			; RUN: llc -mtriple=arm64-apple-ios7.0 -mcpu=cyclone %s -o - \| FileCheck %s
	; RUN: llc -mtriple=aarch64_be-linux-gnu -mcpu=cyclone %s -o - \| FileCheck --check-prefix=CHECK-BE %s			; RUN: llc -mtriple=aarch64_be-linux-gnu -mcpu=cyclone %s -o - \| FileCheck --check-prefix=CHECK-BE %s

	define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {			define i128 @test_128bitmul(i128 %lhs, i128 %rhs) {
	; CHECK-LABEL: test_128bitmul:			; CHECK-LABEL: test_128bitmul:
				; CHECK: mul [[TEMP0:x[0-9]+]], x0, x3
	; CHECK: umulh [[HI:x[0-9]+]], x0, x2			; CHECK: umulh [[HI:x[0-9]+]], x0, x2
	; CHECK: madd [[TEMP1:x[0-9]+]], x0, x3, [[HI]]			; CHECK: add [[TEMP1:x[0-9]+]], [[HI]], [[TEMP0]]
	; CHECK-DAG: madd x1, x1, x2, [[TEMP1]]			; CHECK: mul [[TEMP2:x[0-9]+]], x1, x2
				; CHECK-DAG: add x1, [[TEMP1]], [[TEMP2]]
	; CHECK-DAG: mul x0, x0, x2			; CHECK-DAG: mul x0, x0, x2
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	; CHECK-BE-LABEL: test_128bitmul:			; CHECK-BE-LABEL: test_128bitmul:
				; CHECK-BE: mul [[TEMP0:x[0-9]+]], x1, x2
	; CHECK-BE: umulh [[HI:x[0-9]+]], x1, x3			; CHECK-BE: umulh [[HI:x[0-9]+]], x1, x3
	; CHECK-BE: madd [[TEMP1:x[0-9]+]], x1, x2, [[HI]]			; CHECK-BE: add [[TEMP1:x[0-9]+]], [[HI]], [[TEMP0]]
	; CHECK-BE-DAG: madd x0, x0, x3, [[TEMP1]]			; CHECK-BE: mul [[TEMP2:x[0-9]+]], x0, x3
				; CHECK-BE-DAG: add x0, [[TEMP1]], [[TEMP2]]
	; CHECK-BE-DAG: mul x1, x1, x3			; CHECK-BE-DAG: mul x1, x1, x3
	; CHECK-BE-NEXT: ret			; CHECK-BE-NEXT: ret

	%prod = mul i128 %lhs, %rhs			%prod = mul i128 %lhs, %rhs
	ret i128 %prod			ret i128 %prod
	}			}

	; The machine combiner should create madd instructions when			; The machine combiner should create madd instructions when
	Show All 26 Lines

llvm/test/CodeGen/AArch64/srem-seteq-vec-nonsplat.ll

	Show First 20 Lines • Show All 348 Lines • ▼ Show 20 Lines
	; One INT_MIN divisor in odd divisor			; One INT_MIN divisor in odd divisor
	define <4 x i32> @test_srem_odd_INT_MIN(<4 x i32> %X) nounwind {			define <4 x i32> @test_srem_odd_INT_MIN(<4 x i32> %X) nounwind {
	; CHECK-LABEL: test_srem_odd_INT_MIN:			; CHECK-LABEL: test_srem_odd_INT_MIN:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI13_0			; CHECK-NEXT: adrp x8, .LCPI13_0
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI13_0]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI13_0]
	; CHECK-NEXT: adrp x8, .LCPI13_1			; CHECK-NEXT: adrp x8, .LCPI13_1
	; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s			; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI13_1]
	; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s			; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s
	; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI13_1]
	; CHECK-NEXT: adrp x8, .LCPI13_2			; CHECK-NEXT: adrp x8, .LCPI13_2
	; CHECK-NEXT: mla v1.4s, v0.4s, v2.4s			; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI13_2]			; CHECK-NEXT: mul v2.4s, v0.4s, v3.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI13_2]
	; CHECK-NEXT: adrp x8, .LCPI13_3			; CHECK-NEXT: adrp x8, .LCPI13_3
	; CHECK-NEXT: sshl v2.4s, v1.4s, v2.4s			; CHECK-NEXT: add v1.4s, v1.4s, v2.4s
				; CHECK-NEXT: sshl v2.4s, v1.4s, v3.4s
	; CHECK-NEXT: usra v2.4s, v1.4s, #31			; CHECK-NEXT: usra v2.4s, v1.4s, #31
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI13_3]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI13_3]
	; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s			; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s
	; CHECK-NEXT: movi v1.4s, #1			; CHECK-NEXT: movi v1.4s, #1
	; CHECK-NEXT: cmeq v0.4s, v0.4s, #0			; CHECK-NEXT: cmeq v0.4s, v0.4s, #0
	; CHECK-NEXT: and v0.16b, v0.16b, v1.16b			; CHECK-NEXT: and v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%srem = srem <4 x i32> %X, <i32 5, i32 5, i32 2147483648, i32 5>			%srem = srem <4 x i32> %X, <i32 5, i32 5, i32 2147483648, i32 5>
	%cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>			%cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>
	%ret = zext <4 x i1> %cmp to <4 x i32>			%ret = zext <4 x i1> %cmp to <4 x i32>
	ret <4 x i32> %ret			ret <4 x i32> %ret
	}			}

	; One INT_MIN divisor in even divisor			; One INT_MIN divisor in even divisor
	define <4 x i32> @test_srem_even_INT_MIN(<4 x i32> %X) nounwind {			define <4 x i32> @test_srem_even_INT_MIN(<4 x i32> %X) nounwind {
	; CHECK-LABEL: test_srem_even_INT_MIN:			; CHECK-LABEL: test_srem_even_INT_MIN:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI14_0			; CHECK-NEXT: adrp x8, .LCPI14_0
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI14_0]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI14_0]
	; CHECK-NEXT: adrp x8, .LCPI14_1			; CHECK-NEXT: adrp x8, .LCPI14_1
	; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s			; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI14_1]
	; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s			; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s
	; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI14_1]
	; CHECK-NEXT: adrp x8, .LCPI14_2			; CHECK-NEXT: adrp x8, .LCPI14_2
	; CHECK-NEXT: mla v1.4s, v0.4s, v2.4s			; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI14_2]			; CHECK-NEXT: mul v2.4s, v0.4s, v3.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI14_2]
	; CHECK-NEXT: adrp x8, .LCPI14_3			; CHECK-NEXT: adrp x8, .LCPI14_3
	; CHECK-NEXT: sshl v2.4s, v1.4s, v2.4s			; CHECK-NEXT: add v1.4s, v1.4s, v2.4s
				; CHECK-NEXT: sshl v2.4s, v1.4s, v3.4s
	; CHECK-NEXT: usra v2.4s, v1.4s, #31			; CHECK-NEXT: usra v2.4s, v1.4s, #31
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI14_3]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI14_3]
	; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s			; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s
	; CHECK-NEXT: movi v1.4s, #1			; CHECK-NEXT: movi v1.4s, #1
	; CHECK-NEXT: cmeq v0.4s, v0.4s, #0			; CHECK-NEXT: cmeq v0.4s, v0.4s, #0
	; CHECK-NEXT: and v0.16b, v0.16b, v1.16b			; CHECK-NEXT: and v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%srem = srem <4 x i32> %X, <i32 14, i32 14, i32 2147483648, i32 14>			%srem = srem <4 x i32> %X, <i32 14, i32 14, i32 2147483648, i32 14>
	%cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>			%cmp = icmp eq <4 x i32> %srem, <i32 0, i32 0, i32 0, i32 0>
	%ret = zext <4 x i1> %cmp to <4 x i32>			%ret = zext <4 x i1> %cmp to <4 x i32>
	ret <4 x i32> %ret			ret <4 x i32> %ret
	}			}

	; One INT_MIN divisor in odd+even divisor			; One INT_MIN divisor in odd+even divisor
	define <4 x i32> @test_srem_odd_even_INT_MIN(<4 x i32> %X) nounwind {			define <4 x i32> @test_srem_odd_even_INT_MIN(<4 x i32> %X) nounwind {
	; CHECK-LABEL: test_srem_odd_even_INT_MIN:			; CHECK-LABEL: test_srem_odd_even_INT_MIN:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI15_0			; CHECK-NEXT: adrp x8, .LCPI15_0
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI15_0]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI15_0]
	; CHECK-NEXT: adrp x8, .LCPI15_1			; CHECK-NEXT: adrp x8, .LCPI15_1
	; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s			; CHECK-NEXT: smull2 v2.2d, v0.4s, v1.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI15_1]
	; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s			; CHECK-NEXT: smull v1.2d, v0.2s, v1.2s
	; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI15_1]
	; CHECK-NEXT: adrp x8, .LCPI15_2			; CHECK-NEXT: adrp x8, .LCPI15_2
	; CHECK-NEXT: mla v1.4s, v0.4s, v2.4s			; CHECK-NEXT: uzp2 v1.4s, v1.4s, v2.4s
	; CHECK-NEXT: ldr q2, [x8, :lo12:.LCPI15_2]			; CHECK-NEXT: mul v2.4s, v0.4s, v3.4s
				; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI15_2]
	; CHECK-NEXT: adrp x8, .LCPI15_3			; CHECK-NEXT: adrp x8, .LCPI15_3
	; CHECK-NEXT: sshl v2.4s, v1.4s, v2.4s			; CHECK-NEXT: add v1.4s, v1.4s, v2.4s
				; CHECK-NEXT: sshl v2.4s, v1.4s, v3.4s
	; CHECK-NEXT: usra v2.4s, v1.4s, #31			; CHECK-NEXT: usra v2.4s, v1.4s, #31
	; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI15_3]			; CHECK-NEXT: ldr q1, [x8, :lo12:.LCPI15_3]
	; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s			; CHECK-NEXT: mls v0.4s, v2.4s, v1.4s
	; CHECK-NEXT: movi v1.4s, #1			; CHECK-NEXT: movi v1.4s, #1
	; CHECK-NEXT: cmeq v0.4s, v0.4s, #0			; CHECK-NEXT: cmeq v0.4s, v0.4s, #0
	; CHECK-NEXT: and v0.16b, v0.16b, v1.16b			; CHECK-NEXT: and v0.16b, v0.16b, v1.16b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%srem = srem <4 x i32> %X, <i32 5, i32 14, i32 2147483648, i32 100>			%srem = srem <4 x i32> %X, <i32 5, i32 14, i32 2147483648, i32 100>
	▲ Show 20 Lines • Show All 311 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/machine-combiner-int-vec.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=sse2 -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=SSE			; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=sse2 -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=SSE
	; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx2 -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2			; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx2 -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=AVX --check-prefix=AVX2
	; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx512vl,avx512bw -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512			; RUN: llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx512vl,avx512bw -machine-combiner-verify-pattern-order=true < %s \| FileCheck %s --check-prefix=AVX --check-prefix=AVX512

	; Verify that 128-bit vector logical ops are reassociated.			; Verify that 128-bit vector logical ops are reassociated.

	define <4 x i32> @reassociate_and_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {			define <4 x i32> @reassociate_and_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {
	; SSE-LABEL: reassociate_and_v4i32:			; SSE-LABEL: reassociate_and_v4i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm1, %xmm0			; SSE-NEXT: paddd %xmm1, %xmm0
				; SSE-NEXT: pand %xmm3, %xmm2
	; SSE-NEXT: pand %xmm2, %xmm0			; SSE-NEXT: pand %xmm2, %xmm0
	; SSE-NEXT: pand %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_and_v4i32:			; AVX2-LABEL: reassociate_and_v4i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpand %xmm0, %xmm2, %xmm0			; AVX2-NEXT: vpand %xmm3, %xmm2, %xmm1
	; AVX2-NEXT: vpand %xmm0, %xmm3, %xmm0			; AVX2-NEXT: vpand %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_and_v4i32:			; AVX512-LABEL: reassociate_and_v4i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpternlogd $128, %xmm2, %xmm3, %xmm0			; AVX512-NEXT: vpternlogd $128, %xmm2, %xmm3, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = and <4 x i32> %x2, %t0			%t1 = and <4 x i32> %x2, %t0
	%t2 = and <4 x i32> %x3, %t1			%t2 = and <4 x i32> %x3, %t1
	ret <4 x i32> %t2			ret <4 x i32> %t2
	}			}

	define <4 x i32> @reassociate_or_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {			define <4 x i32> @reassociate_or_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {
	; SSE-LABEL: reassociate_or_v4i32:			; SSE-LABEL: reassociate_or_v4i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm1, %xmm0			; SSE-NEXT: paddd %xmm1, %xmm0
				; SSE-NEXT: por %xmm3, %xmm2
	; SSE-NEXT: por %xmm2, %xmm0			; SSE-NEXT: por %xmm2, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_or_v4i32:			; AVX2-LABEL: reassociate_or_v4i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpor %xmm0, %xmm2, %xmm0			; AVX2-NEXT: vpor %xmm3, %xmm2, %xmm1
	; AVX2-NEXT: vpor %xmm0, %xmm3, %xmm0			; AVX2-NEXT: vpor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_or_v4i32:			; AVX512-LABEL: reassociate_or_v4i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpternlogd $254, %xmm2, %xmm3, %xmm0			; AVX512-NEXT: vpternlogd $254, %xmm2, %xmm3, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = or <4 x i32> %x2, %t0			%t1 = or <4 x i32> %x2, %t0
	%t2 = or <4 x i32> %x3, %t1			%t2 = or <4 x i32> %x3, %t1
	ret <4 x i32> %t2			ret <4 x i32> %t2
	}			}

	define <4 x i32> @reassociate_xor_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {			define <4 x i32> @reassociate_xor_v4i32(<4 x i32> %x0, <4 x i32> %x1, <4 x i32> %x2, <4 x i32> %x3) {
	; SSE-LABEL: reassociate_xor_v4i32:			; SSE-LABEL: reassociate_xor_v4i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm1, %xmm0			; SSE-NEXT: paddd %xmm1, %xmm0
				; SSE-NEXT: pxor %xmm3, %xmm2
	; SSE-NEXT: pxor %xmm2, %xmm0			; SSE-NEXT: pxor %xmm2, %xmm0
	; SSE-NEXT: pxor %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_xor_v4i32:			; AVX2-LABEL: reassociate_xor_v4i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: vpxor %xmm0, %xmm2, %xmm0			; AVX2-NEXT: vpxor %xmm3, %xmm2, %xmm1
	; AVX2-NEXT: vpxor %xmm0, %xmm3, %xmm0			; AVX2-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_xor_v4i32:			; AVX512-LABEL: reassociate_xor_v4i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpternlogd $150, %xmm2, %xmm3, %xmm0			; AVX512-NEXT: vpternlogd $150, %xmm2, %xmm3, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = xor <4 x i32> %x2, %t0			%t1 = xor <4 x i32> %x2, %t0
	%t2 = xor <4 x i32> %x3, %t1			%t2 = xor <4 x i32> %x3, %t1
	ret <4 x i32> %t2			ret <4 x i32> %t2
	}			}

	; Verify that 256-bit vector logical ops are reassociated.			; Verify that 256-bit vector logical ops are reassociated.

	define <8 x i32> @reassociate_and_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {			define <8 x i32> @reassociate_and_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {
	; SSE-LABEL: reassociate_and_v8i32:			; SSE-LABEL: reassociate_and_v8i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm2, %xmm0			; SSE-NEXT: paddd %xmm2, %xmm0
				; SSE-NEXT: pand %xmm6, %xmm4
	; SSE-NEXT: pand %xmm4, %xmm0			; SSE-NEXT: pand %xmm4, %xmm0
	; SSE-NEXT: pand %xmm6, %xmm0
	; SSE-NEXT: paddd %xmm3, %xmm1			; SSE-NEXT: paddd %xmm3, %xmm1
				; SSE-NEXT: pand %xmm7, %xmm5
	; SSE-NEXT: pand %xmm5, %xmm1			; SSE-NEXT: pand %xmm5, %xmm1
	; SSE-NEXT: pand %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_and_v8i32:			; AVX2-LABEL: reassociate_and_v8i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpand %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpand %ymm3, %ymm2, %ymm1
	; AVX2-NEXT: vpand %ymm0, %ymm3, %ymm0			; AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_and_v8i32:			; AVX512-LABEL: reassociate_and_v8i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpternlogd $128, %ymm2, %ymm3, %ymm0			; AVX512-NEXT: vpternlogd $128, %ymm2, %ymm3, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = and <8 x i32> %x2, %t0			%t1 = and <8 x i32> %x2, %t0
	%t2 = and <8 x i32> %x3, %t1			%t2 = and <8 x i32> %x3, %t1
	ret <8 x i32> %t2			ret <8 x i32> %t2
	}			}

	define <8 x i32> @reassociate_or_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {			define <8 x i32> @reassociate_or_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {
	; SSE-LABEL: reassociate_or_v8i32:			; SSE-LABEL: reassociate_or_v8i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm2, %xmm0			; SSE-NEXT: paddd %xmm2, %xmm0
				; SSE-NEXT: por %xmm6, %xmm4
	; SSE-NEXT: por %xmm4, %xmm0			; SSE-NEXT: por %xmm4, %xmm0
	; SSE-NEXT: por %xmm6, %xmm0
	; SSE-NEXT: paddd %xmm3, %xmm1			; SSE-NEXT: paddd %xmm3, %xmm1
				; SSE-NEXT: por %xmm7, %xmm5
	; SSE-NEXT: por %xmm5, %xmm1			; SSE-NEXT: por %xmm5, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_or_v8i32:			; AVX2-LABEL: reassociate_or_v8i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpor %ymm3, %ymm2, %ymm1
	; AVX2-NEXT: vpor %ymm0, %ymm3, %ymm0			; AVX2-NEXT: vpor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_or_v8i32:			; AVX512-LABEL: reassociate_or_v8i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpternlogd $254, %ymm2, %ymm3, %ymm0			; AVX512-NEXT: vpternlogd $254, %ymm2, %ymm3, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = or <8 x i32> %x2, %t0			%t1 = or <8 x i32> %x2, %t0
	%t2 = or <8 x i32> %x3, %t1			%t2 = or <8 x i32> %x3, %t1
	ret <8 x i32> %t2			ret <8 x i32> %t2
	}			}

	define <8 x i32> @reassociate_xor_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {			define <8 x i32> @reassociate_xor_v8i32(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, <8 x i32> %x3) {
	; SSE-LABEL: reassociate_xor_v8i32:			; SSE-LABEL: reassociate_xor_v8i32:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddd %xmm2, %xmm0			; SSE-NEXT: paddd %xmm2, %xmm0
				; SSE-NEXT: pxor %xmm6, %xmm4
	; SSE-NEXT: pxor %xmm4, %xmm0			; SSE-NEXT: pxor %xmm4, %xmm0
	; SSE-NEXT: pxor %xmm6, %xmm0
	; SSE-NEXT: paddd %xmm3, %xmm1			; SSE-NEXT: paddd %xmm3, %xmm1
				; SSE-NEXT: pxor %xmm7, %xmm5
	; SSE-NEXT: pxor %xmm5, %xmm1			; SSE-NEXT: pxor %xmm5, %xmm1
	; SSE-NEXT: pxor %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_xor_v8i32:			; AVX2-LABEL: reassociate_xor_v8i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: vpxor %ymm0, %ymm2, %ymm0			; AVX2-NEXT: vpxor %ymm3, %ymm2, %ymm1
	; AVX2-NEXT: vpxor %ymm0, %ymm3, %ymm0			; AVX2-NEXT: vpxor %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_xor_v8i32:			; AVX512-LABEL: reassociate_xor_v8i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpternlogd $150, %ymm2, %ymm3, %ymm0			; AVX512-NEXT: vpternlogd $150, %ymm2, %ymm3, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	Show All 21 Lines
	; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm1			; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm1
	; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pand {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_and_v16i32:			; AVX2-LABEL: reassociate_and_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpand %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpand %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpand %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpand %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpand %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpand %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpand %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpand %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_and_v16i32:			; AVX512-LABEL: reassociate_and_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpternlogd $128, %zmm2, %zmm3, %zmm0			; AVX512-NEXT: vpternlogd $128, %zmm2, %zmm3, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	Show All 18 Lines
	; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm1			; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm1
	; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: por {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_or_v16i32:			; AVX2-LABEL: reassociate_or_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpor %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpor %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpor %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpor %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpor %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpor %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpor %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_or_v16i32:			; AVX512-LABEL: reassociate_or_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpternlogd $254, %zmm2, %zmm3, %zmm0			; AVX512-NEXT: vpternlogd $254, %zmm2, %zmm3, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	Show All 18 Lines
	; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm1			; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm1
	; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pxor {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_xor_v16i32:			; AVX2-LABEL: reassociate_xor_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpxor %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpxor %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpxor %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpxor %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpxor %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpxor %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpxor %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpxor %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_xor_v16i32:			; AVX512-LABEL: reassociate_xor_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpternlogd $150, %zmm2, %zmm3, %zmm0			; AVX512-NEXT: vpternlogd $150, %zmm2, %zmm3, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <16 x i32> %x0, %x1			%t0 = add <16 x i32> %x0, %x1
	%t1 = xor <16 x i32> %x2, %t0			%t1 = xor <16 x i32> %x2, %t0
	%t2 = xor <16 x i32> %x3, %t1			%t2 = xor <16 x i32> %x3, %t1
	ret <16 x i32> %t2			ret <16 x i32> %t2
	}			}

	; Verify that 128-bit vector min/max are reassociated.			; Verify that 128-bit vector min/max are reassociated.

	define <16 x i8> @reassociate_umax_v16i8(<16 x i8> %x0, <16 x i8> %x1, <16 x i8> %x2, <16 x i8> %x3) {			define <16 x i8> @reassociate_umax_v16i8(<16 x i8> %x0, <16 x i8> %x1, <16 x i8> %x2, <16 x i8> %x3) {
	; SSE-LABEL: reassociate_umax_v16i8:			; SSE-LABEL: reassociate_umax_v16i8:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddb %xmm1, %xmm0			; SSE-NEXT: paddb %xmm1, %xmm0
				; SSE-NEXT: pmaxub %xmm3, %xmm2
	; SSE-NEXT: pmaxub %xmm2, %xmm0			; SSE-NEXT: pmaxub %xmm2, %xmm0
	; SSE-NEXT: pmaxub %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v16i8:			; AVX-LABEL: reassociate_umax_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxub %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxub %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxub %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxub %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i8> %x0, %x1			%t0 = add <16 x i8> %x0, %x1
	%t1 = icmp ugt <16 x i8> %x2, %t0			%t1 = icmp ugt <16 x i8> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0			%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0
	%t3 = icmp ugt <16 x i8> %x3, %t2			%t3 = icmp ugt <16 x i8> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2			%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2
	ret <16 x i8> %t4			ret <16 x i8> %t4
	}			}

	define <8 x i16> @reassociate_umax_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {			define <8 x i16> @reassociate_umax_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {
	; SSE-LABEL: reassociate_umax_v8i16:			; SSE-LABEL: reassociate_umax_v8i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddw %xmm1, %xmm0			; SSE-NEXT: paddw %xmm1, %xmm0
	; SSE-NEXT: psubusw %xmm2, %xmm0			; SSE-NEXT: psubusw %xmm2, %xmm0
	; SSE-NEXT: paddw %xmm2, %xmm0			; SSE-NEXT: paddw %xmm2, %xmm0
	; SSE-NEXT: psubusw %xmm3, %xmm0			; SSE-NEXT: psubusw %xmm3, %xmm0
	; SSE-NEXT: paddw %xmm3, %xmm0			; SSE-NEXT: paddw %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v8i16:			; AVX-LABEL: reassociate_umax_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxuw %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxuw %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxuw %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxuw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i16> %x0, %x1			%t0 = add <8 x i16> %x0, %x1
	%t1 = icmp ugt <8 x i16> %x2, %t0			%t1 = icmp ugt <8 x i16> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0			%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0
	%t3 = icmp ugt <8 x i16> %x3, %t2			%t3 = icmp ugt <8 x i16> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2			%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2
	ret <8 x i16> %t4			ret <8 x i16> %t4
	Show All 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm1			; SSE-NEXT: pandn %xmm4, %xmm1
	; SSE-NEXT: por %xmm3, %xmm1			; SSE-NEXT: por %xmm3, %xmm1
	; SSE-NEXT: movdqa %xmm1, %xmm0			; SSE-NEXT: movdqa %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v4i32:			; AVX-LABEL: reassociate_umax_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxud %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxud %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxud %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxud %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = icmp ugt <4 x i32> %x2, %t0			%t1 = icmp ugt <4 x i32> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0			%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0
	%t3 = icmp ugt <4 x i32> %x3, %t2			%t3 = icmp ugt <4 x i32> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2			%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2
	ret <4 x i32> %t4			ret <4 x i32> %t4
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %xmm1, %xmm3, %xmm1			; AVX2-NEXT: vpxor %xmm1, %xmm3, %xmm1
	; AVX2-NEXT: vpcmpgtq %xmm2, %xmm1, %xmm1			; AVX2-NEXT: vpcmpgtq %xmm2, %xmm1, %xmm1
	; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v2i64:			; AVX512-LABEL: reassociate_umax_v2i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpmaxuq %xmm0, %xmm2, %xmm0			; AVX512-NEXT: vpmaxuq %xmm3, %xmm2, %xmm1
	; AVX512-NEXT: vpmaxuq %xmm0, %xmm3, %xmm0			; AVX512-NEXT: vpmaxuq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <2 x i64> %x0, %x1			%t0 = add <2 x i64> %x0, %x1
	%t1 = icmp ugt <2 x i64> %x2, %t0			%t1 = icmp ugt <2 x i64> %x2, %t0
	%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0			%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0
	%t3 = icmp ugt <2 x i64> %x3, %t2			%t3 = icmp ugt <2 x i64> %x3, %t2
	%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2			%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2
	ret <2 x i64> %t4			ret <2 x i64> %t4
	Show All 13 Lines
	; SSE-NEXT: pand %xmm0, %xmm3			; SSE-NEXT: pand %xmm0, %xmm3
	; SSE-NEXT: pandn %xmm1, %xmm0			; SSE-NEXT: pandn %xmm1, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v16i8:			; AVX-LABEL: reassociate_smax_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxsb %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxsb %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxsb %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxsb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i8> %x0, %x1			%t0 = add <16 x i8> %x0, %x1
	%t1 = icmp sgt <16 x i8> %x2, %t0			%t1 = icmp sgt <16 x i8> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0			%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0
	%t3 = icmp sgt <16 x i8> %x3, %t2			%t3 = icmp sgt <16 x i8> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2			%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2
	ret <16 x i8> %t4			ret <16 x i8> %t4
	}			}

	define <8 x i16> @reassociate_smax_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {			define <8 x i16> @reassociate_smax_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {
	; SSE-LABEL: reassociate_smax_v8i16:			; SSE-LABEL: reassociate_smax_v8i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddw %xmm1, %xmm0			; SSE-NEXT: paddw %xmm1, %xmm0
				; SSE-NEXT: pmaxsw %xmm3, %xmm2
	; SSE-NEXT: pmaxsw %xmm2, %xmm0			; SSE-NEXT: pmaxsw %xmm2, %xmm0
	; SSE-NEXT: pmaxsw %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v8i16:			; AVX-LABEL: reassociate_smax_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxsw %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxsw %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxsw %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxsw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i16> %x0, %x1			%t0 = add <8 x i16> %x0, %x1
	%t1 = icmp sgt <8 x i16> %x2, %t0			%t1 = icmp sgt <8 x i16> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0			%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0
	%t3 = icmp sgt <8 x i16> %x3, %t2			%t3 = icmp sgt <8 x i16> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2			%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2
	ret <8 x i16> %t4			ret <8 x i16> %t4
	Show All 13 Lines
	; SSE-NEXT: pand %xmm0, %xmm3			; SSE-NEXT: pand %xmm0, %xmm3
	; SSE-NEXT: pandn %xmm1, %xmm0			; SSE-NEXT: pandn %xmm1, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v4i32:			; AVX-LABEL: reassociate_smax_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpmaxsd %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpmaxsd %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpmaxsd %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpmaxsd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = icmp sgt <4 x i32> %x2, %t0			%t1 = icmp sgt <4 x i32> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0			%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0
	%t3 = icmp sgt <4 x i32> %x3, %t2			%t3 = icmp sgt <4 x i32> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2			%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2
	ret <4 x i32> %t4			ret <4 x i32> %t4
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %xmm1, %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpcmpgtq %xmm0, %xmm3, %xmm1			; AVX2-NEXT: vpcmpgtq %xmm0, %xmm3, %xmm1
	; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v2i64:			; AVX512-LABEL: reassociate_smax_v2i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpmaxsq %xmm0, %xmm2, %xmm0			; AVX512-NEXT: vpmaxsq %xmm3, %xmm2, %xmm1
	; AVX512-NEXT: vpmaxsq %xmm0, %xmm3, %xmm0			; AVX512-NEXT: vpmaxsq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <2 x i64> %x0, %x1			%t0 = add <2 x i64> %x0, %x1
	%t1 = icmp sgt <2 x i64> %x2, %t0			%t1 = icmp sgt <2 x i64> %x2, %t0
	%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0			%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0
	%t3 = icmp sgt <2 x i64> %x3, %t2			%t3 = icmp sgt <2 x i64> %x3, %t2
	%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2			%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2
	ret <2 x i64> %t4			ret <2 x i64> %t4
	}			}

	define <16 x i8> @reassociate_umin_v16i8(<16 x i8> %x0, <16 x i8> %x1, <16 x i8> %x2, <16 x i8> %x3) {			define <16 x i8> @reassociate_umin_v16i8(<16 x i8> %x0, <16 x i8> %x1, <16 x i8> %x2, <16 x i8> %x3) {
	; SSE-LABEL: reassociate_umin_v16i8:			; SSE-LABEL: reassociate_umin_v16i8:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddb %xmm1, %xmm0			; SSE-NEXT: paddb %xmm1, %xmm0
				; SSE-NEXT: pminub %xmm3, %xmm2
	; SSE-NEXT: pminub %xmm2, %xmm0			; SSE-NEXT: pminub %xmm2, %xmm0
	; SSE-NEXT: pminub %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v16i8:			; AVX-LABEL: reassociate_umin_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminub %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminub %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminub %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminub %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i8> %x0, %x1			%t0 = add <16 x i8> %x0, %x1
	%t1 = icmp ult <16 x i8> %x2, %t0			%t1 = icmp ult <16 x i8> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0			%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0
	%t3 = icmp ult <16 x i8> %x3, %t2			%t3 = icmp ult <16 x i8> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2			%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2
	ret <16 x i8> %t4			ret <16 x i8> %t4
	Show All 10 Lines
	; SSE-NEXT: psubusw %xmm2, %xmm0			; SSE-NEXT: psubusw %xmm2, %xmm0
	; SSE-NEXT: psubw %xmm0, %xmm3			; SSE-NEXT: psubw %xmm0, %xmm3
	; SSE-NEXT: movdqa %xmm3, %xmm0			; SSE-NEXT: movdqa %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v8i16:			; AVX-LABEL: reassociate_umin_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminuw %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminuw %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminuw %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminuw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i16> %x0, %x1			%t0 = add <8 x i16> %x0, %x1
	%t1 = icmp ult <8 x i16> %x2, %t0			%t1 = icmp ult <8 x i16> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0			%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0
	%t3 = icmp ult <8 x i16> %x3, %t2			%t3 = icmp ult <8 x i16> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2			%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2
	ret <8 x i16> %t4			ret <8 x i16> %t4
	Show All 19 Lines
	; SSE-NEXT: pand %xmm0, %xmm3			; SSE-NEXT: pand %xmm0, %xmm3
	; SSE-NEXT: pandn %xmm5, %xmm0			; SSE-NEXT: pandn %xmm5, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v4i32:			; AVX-LABEL: reassociate_umin_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminud %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminud %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminud %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminud %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = icmp ult <4 x i32> %x2, %t0			%t1 = icmp ult <4 x i32> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0			%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0
	%t3 = icmp ult <4 x i32> %x3, %t2			%t3 = icmp ult <4 x i32> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2			%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2
	ret <4 x i32> %t4			ret <4 x i32> %t4
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %xmm1, %xmm3, %xmm1			; AVX2-NEXT: vpxor %xmm1, %xmm3, %xmm1
	; AVX2-NEXT: vpcmpgtq %xmm1, %xmm2, %xmm1			; AVX2-NEXT: vpcmpgtq %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v2i64:			; AVX512-LABEL: reassociate_umin_v2i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpminuq %xmm0, %xmm2, %xmm0			; AVX512-NEXT: vpminuq %xmm3, %xmm2, %xmm1
	; AVX512-NEXT: vpminuq %xmm0, %xmm3, %xmm0			; AVX512-NEXT: vpminuq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <2 x i64> %x0, %x1			%t0 = add <2 x i64> %x0, %x1
	%t1 = icmp ult <2 x i64> %x2, %t0			%t1 = icmp ult <2 x i64> %x2, %t0
	%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0			%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0
	%t3 = icmp ult <2 x i64> %x3, %t2			%t3 = icmp ult <2 x i64> %x3, %t2
	%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2			%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2
	ret <2 x i64> %t4			ret <2 x i64> %t4
	Show All 13 Lines
	; SSE-NEXT: pand %xmm0, %xmm3			; SSE-NEXT: pand %xmm0, %xmm3
	; SSE-NEXT: pandn %xmm1, %xmm0			; SSE-NEXT: pandn %xmm1, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v16i8:			; AVX-LABEL: reassociate_smin_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminsb %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminsb %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminsb %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminsb %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i8> %x0, %x1			%t0 = add <16 x i8> %x0, %x1
	%t1 = icmp slt <16 x i8> %x2, %t0			%t1 = icmp slt <16 x i8> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0			%t2 = select <16 x i1> %t1, <16 x i8> %x2, <16 x i8> %t0
	%t3 = icmp slt <16 x i8> %x3, %t2			%t3 = icmp slt <16 x i8> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2			%t4 = select <16 x i1> %t3, <16 x i8> %x3, <16 x i8> %t2
	ret <16 x i8> %t4			ret <16 x i8> %t4
	}			}

	define <8 x i16> @reassociate_smin_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {			define <8 x i16> @reassociate_smin_v8i16(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, <8 x i16> %x3) {
	; SSE-LABEL: reassociate_smin_v8i16:			; SSE-LABEL: reassociate_smin_v8i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddw %xmm1, %xmm0			; SSE-NEXT: paddw %xmm1, %xmm0
				; SSE-NEXT: pminsw %xmm3, %xmm2
	; SSE-NEXT: pminsw %xmm2, %xmm0			; SSE-NEXT: pminsw %xmm2, %xmm0
	; SSE-NEXT: pminsw %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v8i16:			; AVX-LABEL: reassociate_smin_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminsw %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminsw %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminsw %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminsw %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i16> %x0, %x1			%t0 = add <8 x i16> %x0, %x1
	%t1 = icmp slt <8 x i16> %x2, %t0			%t1 = icmp slt <8 x i16> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0			%t2 = select <8 x i1> %t1, <8 x i16> %x2, <8 x i16> %t0
	%t3 = icmp slt <8 x i16> %x3, %t2			%t3 = icmp slt <8 x i16> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2			%t4 = select <8 x i1> %t3, <8 x i16> %x3, <8 x i16> %t2
	ret <8 x i16> %t4			ret <8 x i16> %t4
	Show All 13 Lines
	; SSE-NEXT: pand %xmm0, %xmm3			; SSE-NEXT: pand %xmm0, %xmm3
	; SSE-NEXT: pandn %xmm1, %xmm0			; SSE-NEXT: pandn %xmm1, %xmm0
	; SSE-NEXT: por %xmm3, %xmm0			; SSE-NEXT: por %xmm3, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v4i32:			; AVX-LABEL: reassociate_smin_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vpminsd %xmm0, %xmm2, %xmm0			; AVX-NEXT: vpminsd %xmm3, %xmm2, %xmm1
	; AVX-NEXT: vpminsd %xmm0, %xmm3, %xmm0			; AVX-NEXT: vpminsd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <4 x i32> %x0, %x1			%t0 = add <4 x i32> %x0, %x1
	%t1 = icmp slt <4 x i32> %x2, %t0			%t1 = icmp slt <4 x i32> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0			%t2 = select <4 x i1> %t1, <4 x i32> %x2, <4 x i32> %t0
	%t3 = icmp slt <4 x i32> %x3, %t2			%t3 = icmp slt <4 x i32> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2			%t4 = select <4 x i1> %t3, <4 x i32> %x3, <4 x i32> %t2
	ret <4 x i32> %t4			ret <4 x i32> %t4
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %xmm1, %xmm2, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm2, %xmm0, %xmm0
	; AVX2-NEXT: vpcmpgtq %xmm3, %xmm0, %xmm1			; AVX2-NEXT: vpcmpgtq %xmm3, %xmm0, %xmm1
	; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0			; AVX2-NEXT: vblendvpd %xmm1, %xmm3, %xmm0, %xmm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v2i64:			; AVX512-LABEL: reassociate_smin_v2i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0			; AVX512-NEXT: vpaddq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: vpminsq %xmm0, %xmm2, %xmm0			; AVX512-NEXT: vpminsq %xmm3, %xmm2, %xmm1
	; AVX512-NEXT: vpminsq %xmm0, %xmm3, %xmm0			; AVX512-NEXT: vpminsq %xmm1, %xmm0, %xmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <2 x i64> %x0, %x1			%t0 = add <2 x i64> %x0, %x1
	%t1 = icmp slt <2 x i64> %x2, %t0			%t1 = icmp slt <2 x i64> %x2, %t0
	%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0			%t2 = select <2 x i1> %t1, <2 x i64> %x2, <2 x i64> %t0
	%t3 = icmp slt <2 x i64> %x3, %t2			%t3 = icmp slt <2 x i64> %x3, %t2
	%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2			%t4 = select <2 x i1> %t3, <2 x i64> %x3, <2 x i64> %t2
	ret <2 x i64> %t4			ret <2 x i64> %t4
	}			}

	; Verify that 256-bit vector min/max are reassociated.			; Verify that 256-bit vector min/max are reassociated.

	define <32 x i8> @reassociate_umax_v32i8(<32 x i8> %x0, <32 x i8> %x1, <32 x i8> %x2, <32 x i8> %x3) {			define <32 x i8> @reassociate_umax_v32i8(<32 x i8> %x0, <32 x i8> %x1, <32 x i8> %x2, <32 x i8> %x3) {
	; SSE-LABEL: reassociate_umax_v32i8:			; SSE-LABEL: reassociate_umax_v32i8:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddb %xmm2, %xmm0			; SSE-NEXT: paddb %xmm2, %xmm0
	; SSE-NEXT: paddb %xmm3, %xmm1			; SSE-NEXT: paddb %xmm3, %xmm1
	; SSE-NEXT: pmaxub %xmm5, %xmm1			; SSE-NEXT: pmaxub %xmm6, %xmm4
	; SSE-NEXT: pmaxub %xmm4, %xmm0			; SSE-NEXT: pmaxub %xmm4, %xmm0
	; SSE-NEXT: pmaxub %xmm6, %xmm0			; SSE-NEXT: pmaxub %xmm7, %xmm5
	; SSE-NEXT: pmaxub %xmm7, %xmm1			; SSE-NEXT: pmaxub %xmm5, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v32i8:			; AVX-LABEL: reassociate_umax_v32i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxub %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxub %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxub %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxub %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <32 x i8> %x0, %x1			%t0 = add <32 x i8> %x0, %x1
	%t1 = icmp ugt <32 x i8> %x2, %t0			%t1 = icmp ugt <32 x i8> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0			%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0
	%t3 = icmp ugt <32 x i8> %x3, %t2			%t3 = icmp ugt <32 x i8> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2			%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2
	ret <32 x i8> %t4			ret <32 x i8> %t4
	Show All 12 Lines
	; SSE-NEXT: paddw %xmm6, %xmm0			; SSE-NEXT: paddw %xmm6, %xmm0
	; SSE-NEXT: psubusw %xmm7, %xmm1			; SSE-NEXT: psubusw %xmm7, %xmm1
	; SSE-NEXT: paddw %xmm7, %xmm1			; SSE-NEXT: paddw %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v16i16:			; AVX-LABEL: reassociate_umax_v16i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxuw %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxuw %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxuw %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxuw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i16> %x0, %x1			%t0 = add <16 x i16> %x0, %x1
	%t1 = icmp ugt <16 x i16> %x2, %t0			%t1 = icmp ugt <16 x i16> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0			%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0
	%t3 = icmp ugt <16 x i16> %x3, %t2			%t3 = icmp ugt <16 x i16> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2			%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2
	ret <16 x i16> %t4			ret <16 x i16> %t4
	Show All 37 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umax_v8i32:			; AVX-LABEL: reassociate_umax_v8i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxud %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxud %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxud %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxud %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = icmp ugt <8 x i32> %x2, %t0			%t1 = icmp ugt <8 x i32> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0			%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0
	%t3 = icmp ugt <8 x i32> %x3, %t2			%t3 = icmp ugt <8 x i32> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2			%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2
	ret <8 x i32> %t4			ret <8 x i32> %t4
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %ymm4, %ymm3, %ymm2			; AVX2-NEXT: vpxor %ymm4, %ymm3, %ymm2
	; AVX2-NEXT: vpcmpgtq %ymm1, %ymm2, %ymm1			; AVX2-NEXT: vpcmpgtq %ymm1, %ymm2, %ymm1
	; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v4i64:			; AVX512-LABEL: reassociate_umax_v4i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpmaxuq %ymm0, %ymm2, %ymm0			; AVX512-NEXT: vpmaxuq %ymm3, %ymm2, %ymm1
	; AVX512-NEXT: vpmaxuq %ymm0, %ymm3, %ymm0			; AVX512-NEXT: vpmaxuq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i64> %x0, %x1			%t0 = add <4 x i64> %x0, %x1
	%t1 = icmp ugt <4 x i64> %x2, %t0			%t1 = icmp ugt <4 x i64> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0			%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0
	%t3 = icmp ugt <4 x i64> %x3, %t2			%t3 = icmp ugt <4 x i64> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2			%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2
	ret <4 x i64> %t4			ret <4 x i64> %t4
	Show All 24 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v32i8:			; AVX-LABEL: reassociate_smax_v32i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxsb %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxsb %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxsb %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxsb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <32 x i8> %x0, %x1			%t0 = add <32 x i8> %x0, %x1
	%t1 = icmp sgt <32 x i8> %x2, %t0			%t1 = icmp sgt <32 x i8> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0			%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0
	%t3 = icmp sgt <32 x i8> %x3, %t2			%t3 = icmp sgt <32 x i8> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2			%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2
	ret <32 x i8> %t4			ret <32 x i8> %t4
	}			}

	define <16 x i16> @reassociate_smax_v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2, <16 x i16> %x3) {			define <16 x i16> @reassociate_smax_v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2, <16 x i16> %x3) {
	; SSE-LABEL: reassociate_smax_v16i16:			; SSE-LABEL: reassociate_smax_v16i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddw %xmm2, %xmm0			; SSE-NEXT: paddw %xmm2, %xmm0
	; SSE-NEXT: paddw %xmm3, %xmm1			; SSE-NEXT: paddw %xmm3, %xmm1
	; SSE-NEXT: pmaxsw %xmm5, %xmm1			; SSE-NEXT: pmaxsw %xmm6, %xmm4
	; SSE-NEXT: pmaxsw %xmm4, %xmm0			; SSE-NEXT: pmaxsw %xmm4, %xmm0
	; SSE-NEXT: pmaxsw %xmm6, %xmm0			; SSE-NEXT: pmaxsw %xmm7, %xmm5
	; SSE-NEXT: pmaxsw %xmm7, %xmm1			; SSE-NEXT: pmaxsw %xmm5, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v16i16:			; AVX-LABEL: reassociate_smax_v16i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxsw %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxsw %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxsw %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxsw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i16> %x0, %x1			%t0 = add <16 x i16> %x0, %x1
	%t1 = icmp sgt <16 x i16> %x2, %t0			%t1 = icmp sgt <16 x i16> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0			%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0
	%t3 = icmp sgt <16 x i16> %x3, %t2			%t3 = icmp sgt <16 x i16> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2			%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2
	ret <16 x i16> %t4			ret <16 x i16> %t4
	Show All 24 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smax_v8i32:			; AVX-LABEL: reassociate_smax_v8i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpmaxsd %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpmaxsd %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpmaxsd %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpmaxsd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = icmp sgt <8 x i32> %x2, %t0			%t1 = icmp sgt <8 x i32> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0			%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0
	%t3 = icmp sgt <8 x i32> %x3, %t2			%t3 = icmp sgt <8 x i32> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2			%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2
	ret <8 x i32> %t4			ret <8 x i32> %t4
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpcmpgtq %ymm0, %ymm3, %ymm1			; AVX2-NEXT: vpcmpgtq %ymm0, %ymm3, %ymm1
	; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v4i64:			; AVX512-LABEL: reassociate_smax_v4i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpmaxsq %ymm0, %ymm2, %ymm0			; AVX512-NEXT: vpmaxsq %ymm3, %ymm2, %ymm1
	; AVX512-NEXT: vpmaxsq %ymm0, %ymm3, %ymm0			; AVX512-NEXT: vpmaxsq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i64> %x0, %x1			%t0 = add <4 x i64> %x0, %x1
	%t1 = icmp sgt <4 x i64> %x2, %t0			%t1 = icmp sgt <4 x i64> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0			%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0
	%t3 = icmp sgt <4 x i64> %x3, %t2			%t3 = icmp sgt <4 x i64> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2			%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2
	ret <4 x i64> %t4			ret <4 x i64> %t4
	}			}

	define <32 x i8> @reassociate_umin_v32i8(<32 x i8> %x0, <32 x i8> %x1, <32 x i8> %x2, <32 x i8> %x3) {			define <32 x i8> @reassociate_umin_v32i8(<32 x i8> %x0, <32 x i8> %x1, <32 x i8> %x2, <32 x i8> %x3) {
	; SSE-LABEL: reassociate_umin_v32i8:			; SSE-LABEL: reassociate_umin_v32i8:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddb %xmm2, %xmm0			; SSE-NEXT: paddb %xmm2, %xmm0
	; SSE-NEXT: paddb %xmm3, %xmm1			; SSE-NEXT: paddb %xmm3, %xmm1
	; SSE-NEXT: pminub %xmm5, %xmm1			; SSE-NEXT: pminub %xmm6, %xmm4
	; SSE-NEXT: pminub %xmm4, %xmm0			; SSE-NEXT: pminub %xmm4, %xmm0
	; SSE-NEXT: pminub %xmm6, %xmm0			; SSE-NEXT: pminub %xmm7, %xmm5
	; SSE-NEXT: pminub %xmm7, %xmm1			; SSE-NEXT: pminub %xmm5, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v32i8:			; AVX-LABEL: reassociate_umin_v32i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminub %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminub %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminub %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminub %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <32 x i8> %x0, %x1			%t0 = add <32 x i8> %x0, %x1
	%t1 = icmp ult <32 x i8> %x2, %t0			%t1 = icmp ult <32 x i8> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0			%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0
	%t3 = icmp ult <32 x i8> %x3, %t2			%t3 = icmp ult <32 x i8> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2			%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2
	ret <32 x i8> %t4			ret <32 x i8> %t4
	Show All 18 Lines
	; SSE-NEXT: psubw %xmm0, %xmm7			; SSE-NEXT: psubw %xmm0, %xmm7
	; SSE-NEXT: movdqa %xmm6, %xmm0			; SSE-NEXT: movdqa %xmm6, %xmm0
	; SSE-NEXT: movdqa %xmm7, %xmm1			; SSE-NEXT: movdqa %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v16i16:			; AVX-LABEL: reassociate_umin_v16i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminuw %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminuw %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminuw %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminuw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i16> %x0, %x1			%t0 = add <16 x i16> %x0, %x1
	%t1 = icmp ult <16 x i16> %x2, %t0			%t1 = icmp ult <16 x i16> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0			%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0
	%t3 = icmp ult <16 x i16> %x3, %t2			%t3 = icmp ult <16 x i16> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2			%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2
	ret <16 x i16> %t4			ret <16 x i16> %t4
	Show All 36 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_umin_v8i32:			; AVX-LABEL: reassociate_umin_v8i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminud %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminud %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminud %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminud %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = icmp ult <8 x i32> %x2, %t0			%t1 = icmp ult <8 x i32> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0			%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0
	%t3 = icmp ult <8 x i32> %x3, %t2			%t3 = icmp ult <8 x i32> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2			%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2
	ret <8 x i32> %t4			ret <8 x i32> %t4
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %ymm4, %ymm3, %ymm2			; AVX2-NEXT: vpxor %ymm4, %ymm3, %ymm2
	; AVX2-NEXT: vpcmpgtq %ymm2, %ymm1, %ymm1			; AVX2-NEXT: vpcmpgtq %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v4i64:			; AVX512-LABEL: reassociate_umin_v4i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpminuq %ymm0, %ymm2, %ymm0			; AVX512-NEXT: vpminuq %ymm3, %ymm2, %ymm1
	; AVX512-NEXT: vpminuq %ymm0, %ymm3, %ymm0			; AVX512-NEXT: vpminuq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i64> %x0, %x1			%t0 = add <4 x i64> %x0, %x1
	%t1 = icmp ult <4 x i64> %x2, %t0			%t1 = icmp ult <4 x i64> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0			%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0
	%t3 = icmp ult <4 x i64> %x3, %t2			%t3 = icmp ult <4 x i64> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2			%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2
	ret <4 x i64> %t4			ret <4 x i64> %t4
	Show All 24 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v32i8:			; AVX-LABEL: reassociate_smin_v32i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminsb %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminsb %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminsb %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminsb %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <32 x i8> %x0, %x1			%t0 = add <32 x i8> %x0, %x1
	%t1 = icmp slt <32 x i8> %x2, %t0			%t1 = icmp slt <32 x i8> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0			%t2 = select <32 x i1> %t1, <32 x i8> %x2, <32 x i8> %t0
	%t3 = icmp slt <32 x i8> %x3, %t2			%t3 = icmp slt <32 x i8> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2			%t4 = select <32 x i1> %t3, <32 x i8> %x3, <32 x i8> %t2
	ret <32 x i8> %t4			ret <32 x i8> %t4
	}			}

	define <16 x i16> @reassociate_smin_v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2, <16 x i16> %x3) {			define <16 x i16> @reassociate_smin_v16i16(<16 x i16> %x0, <16 x i16> %x1, <16 x i16> %x2, <16 x i16> %x3) {
	; SSE-LABEL: reassociate_smin_v16i16:			; SSE-LABEL: reassociate_smin_v16i16:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: paddw %xmm2, %xmm0			; SSE-NEXT: paddw %xmm2, %xmm0
	; SSE-NEXT: paddw %xmm3, %xmm1			; SSE-NEXT: paddw %xmm3, %xmm1
	; SSE-NEXT: pminsw %xmm5, %xmm1			; SSE-NEXT: pminsw %xmm6, %xmm4
	; SSE-NEXT: pminsw %xmm4, %xmm0			; SSE-NEXT: pminsw %xmm4, %xmm0
	; SSE-NEXT: pminsw %xmm6, %xmm0			; SSE-NEXT: pminsw %xmm7, %xmm5
	; SSE-NEXT: pminsw %xmm7, %xmm1			; SSE-NEXT: pminsw %xmm5, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v16i16:			; AVX-LABEL: reassociate_smin_v16i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminsw %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminsw %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminsw %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminsw %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <16 x i16> %x0, %x1			%t0 = add <16 x i16> %x0, %x1
	%t1 = icmp slt <16 x i16> %x2, %t0			%t1 = icmp slt <16 x i16> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0			%t2 = select <16 x i1> %t1, <16 x i16> %x2, <16 x i16> %t0
	%t3 = icmp slt <16 x i16> %x3, %t2			%t3 = icmp slt <16 x i16> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2			%t4 = select <16 x i1> %t3, <16 x i16> %x3, <16 x i16> %t2
	ret <16 x i16> %t4			ret <16 x i16> %t4
	Show All 24 Lines
	; SSE-NEXT: pand %xmm1, %xmm7			; SSE-NEXT: pand %xmm1, %xmm7
	; SSE-NEXT: pandn %xmm2, %xmm1			; SSE-NEXT: pandn %xmm2, %xmm1
	; SSE-NEXT: por %xmm7, %xmm1			; SSE-NEXT: por %xmm7, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: reassociate_smin_v8i32:			; AVX-LABEL: reassociate_smin_v8i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vpaddd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: vpminsd %ymm0, %ymm2, %ymm0			; AVX-NEXT: vpminsd %ymm3, %ymm2, %ymm1
	; AVX-NEXT: vpminsd %ymm0, %ymm3, %ymm0			; AVX-NEXT: vpminsd %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq

	%t0 = add <8 x i32> %x0, %x1			%t0 = add <8 x i32> %x0, %x1
	%t1 = icmp slt <8 x i32> %x2, %t0			%t1 = icmp slt <8 x i32> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0			%t2 = select <8 x i1> %t1, <8 x i32> %x2, <8 x i32> %t0
	%t3 = icmp slt <8 x i32> %x3, %t2			%t3 = icmp slt <8 x i32> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2			%t4 = select <8 x i1> %t3, <8 x i32> %x3, <8 x i32> %t2
	ret <8 x i32> %t4			ret <8 x i32> %t4
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpcmpgtq %ymm3, %ymm0, %ymm1			; AVX2-NEXT: vpcmpgtq %ymm3, %ymm0, %ymm1
	; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm1, %ymm3, %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v4i64:			; AVX512-LABEL: reassociate_smin_v4i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; AVX512-NEXT: vpaddq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: vpminsq %ymm0, %ymm2, %ymm0			; AVX512-NEXT: vpminsq %ymm3, %ymm2, %ymm1
	; AVX512-NEXT: vpminsq %ymm0, %ymm3, %ymm0			; AVX512-NEXT: vpminsq %ymm1, %ymm0, %ymm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <4 x i64> %x0, %x1			%t0 = add <4 x i64> %x0, %x1
	%t1 = icmp slt <4 x i64> %x2, %t0			%t1 = icmp slt <4 x i64> %x2, %t0
	%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0			%t2 = select <4 x i1> %t1, <4 x i64> %x2, <4 x i64> %t0
	%t3 = icmp slt <4 x i64> %x3, %t2			%t3 = icmp slt <4 x i64> %x3, %t2
	%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2			%t4 = select <4 x i1> %t3, <4 x i64> %x3, <4 x i64> %t2
	ret <4 x i64> %t4			ret <4 x i64> %t4
	Show All 17 Lines
	; SSE-NEXT: pmaxub {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pmaxub {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pmaxub {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pmaxub {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umax_v64i8:			; AVX2-LABEL: reassociate_umax_v64i8:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxub %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxub %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxub %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxub %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxub %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxub %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxub %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxub %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v64i8:			; AVX512-LABEL: reassociate_umax_v64i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxub %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxub %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxub %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxub %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <64 x i8> %x0, %x1			%t0 = add <64 x i8> %x0, %x1
	%t1 = icmp ugt <64 x i8> %x2, %t0			%t1 = icmp ugt <64 x i8> %x2, %t0
	%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0			%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0
	%t3 = icmp ugt <64 x i8> %x3, %t2			%t3 = icmp ugt <64 x i8> %x3, %t2
	%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2			%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2
	ret <64 x i8> %t4			ret <64 x i8> %t4
	Show All 31 Lines
	; SSE-NEXT: psubusw %xmm8, %xmm3			; SSE-NEXT: psubusw %xmm8, %xmm3
	; SSE-NEXT: paddw %xmm8, %xmm3			; SSE-NEXT: paddw %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umax_v32i16:			; AVX2-LABEL: reassociate_umax_v32i16:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxuw %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxuw %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxuw %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxuw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxuw %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxuw %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxuw %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxuw %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v32i16:			; AVX512-LABEL: reassociate_umax_v32i16:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxuw %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxuw %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxuw %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxuw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <32 x i16> %x0, %x1			%t0 = add <32 x i16> %x0, %x1
	%t1 = icmp ugt <32 x i16> %x2, %t0			%t1 = icmp ugt <32 x i16> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0			%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0
	%t3 = icmp ugt <32 x i16> %x3, %t2			%t3 = icmp ugt <32 x i16> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2			%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2
	ret <32 x i16> %t4			ret <32 x i16> %t4
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm6, %xmm3			; SSE-NEXT: pandn %xmm6, %xmm3
	; SSE-NEXT: por %xmm5, %xmm3			; SSE-NEXT: por %xmm5, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umax_v16i32:			; AVX2-LABEL: reassociate_umax_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxud %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxud %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxud %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxud %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxud %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxud %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxud %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxud %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v16i32:			; AVX512-LABEL: reassociate_umax_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxud %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxud %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxud %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxud %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <16 x i32> %x0, %x1			%t0 = add <16 x i32> %x0, %x1
	%t1 = icmp ugt <16 x i32> %x2, %t0			%t1 = icmp ugt <16 x i32> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0			%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0
	%t3 = icmp ugt <16 x i32> %x3, %t2			%t3 = icmp ugt <16 x i32> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2			%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2
	ret <16 x i32> %t4			ret <16 x i32> %t4
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %ymm2, %ymm7, %ymm2			; AVX2-NEXT: vpxor %ymm2, %ymm7, %ymm2
	; AVX2-NEXT: vpcmpgtq %ymm3, %ymm2, %ymm2			; AVX2-NEXT: vpcmpgtq %ymm3, %ymm2, %ymm2
	; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1			; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umax_v8i64:			; AVX512-LABEL: reassociate_umax_v8i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxuq %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxuq %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxuq %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxuq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i64> %x0, %x1			%t0 = add <8 x i64> %x0, %x1
	%t1 = icmp ugt <8 x i64> %x2, %t0			%t1 = icmp ugt <8 x i64> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0			%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0
	%t3 = icmp ugt <8 x i64> %x3, %t2			%t3 = icmp ugt <8 x i64> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2			%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2
	ret <8 x i64> %t4			ret <8 x i64> %t4
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm3			; SSE-NEXT: pandn %xmm4, %xmm3
	; SSE-NEXT: por %xmm8, %xmm3			; SSE-NEXT: por %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smax_v64i8:			; AVX2-LABEL: reassociate_smax_v64i8:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxsb %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxsb %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxsb %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxsb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxsb %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxsb %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxsb %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxsb %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v64i8:			; AVX512-LABEL: reassociate_smax_v64i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxsb %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxsb %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxsb %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxsb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <64 x i8> %x0, %x1			%t0 = add <64 x i8> %x0, %x1
	%t1 = icmp sgt <64 x i8> %x2, %t0			%t1 = icmp sgt <64 x i8> %x2, %t0
	%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0			%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0
	%t3 = icmp sgt <64 x i8> %x3, %t2			%t3 = icmp sgt <64 x i8> %x3, %t2
	%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2			%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2
	ret <64 x i8> %t4			ret <64 x i8> %t4
	Show All 15 Lines
	; SSE-NEXT: pmaxsw {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pmaxsw {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pmaxsw {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pmaxsw {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smax_v32i16:			; AVX2-LABEL: reassociate_smax_v32i16:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxsw %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxsw %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxsw %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxsw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxsw %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxsw %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxsw %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxsw %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v32i16:			; AVX512-LABEL: reassociate_smax_v32i16:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxsw %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxsw %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxsw %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxsw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <32 x i16> %x0, %x1			%t0 = add <32 x i16> %x0, %x1
	%t1 = icmp sgt <32 x i16> %x2, %t0			%t1 = icmp sgt <32 x i16> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0			%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0
	%t3 = icmp sgt <32 x i16> %x3, %t2			%t3 = icmp sgt <32 x i16> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2			%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2
	ret <32 x i16> %t4			ret <32 x i16> %t4
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm3			; SSE-NEXT: pandn %xmm4, %xmm3
	; SSE-NEXT: por %xmm8, %xmm3			; SSE-NEXT: por %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smax_v16i32:			; AVX2-LABEL: reassociate_smax_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpmaxsd %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpmaxsd %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpmaxsd %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpmaxsd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpmaxsd %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpmaxsd %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpmaxsd %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpmaxsd %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v16i32:			; AVX512-LABEL: reassociate_smax_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxsd %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxsd %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxsd %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxsd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <16 x i32> %x0, %x1			%t0 = add <16 x i32> %x0, %x1
	%t1 = icmp sgt <16 x i32> %x2, %t0			%t1 = icmp sgt <16 x i32> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0			%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0
	%t3 = icmp sgt <16 x i32> %x3, %t2			%t3 = icmp sgt <16 x i32> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2			%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2
	ret <16 x i32> %t4			ret <16 x i32> %t4
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm2, %ymm6, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm2, %ymm6, %ymm0, %ymm0
	; AVX2-NEXT: vpcmpgtq %ymm1, %ymm7, %ymm2			; AVX2-NEXT: vpcmpgtq %ymm1, %ymm7, %ymm2
	; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1			; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smax_v8i64:			; AVX512-LABEL: reassociate_smax_v8i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpmaxsq %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpmaxsq %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpmaxsq %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpmaxsq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i64> %x0, %x1			%t0 = add <8 x i64> %x0, %x1
	%t1 = icmp sgt <8 x i64> %x2, %t0			%t1 = icmp sgt <8 x i64> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0			%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0
	%t3 = icmp sgt <8 x i64> %x3, %t2			%t3 = icmp sgt <8 x i64> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2			%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2
	ret <8 x i64> %t4			ret <8 x i64> %t4
	Show All 15 Lines
	; SSE-NEXT: pminub {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pminub {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pminub {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pminub {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umin_v64i8:			; AVX2-LABEL: reassociate_umin_v64i8:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminub %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminub %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminub %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminub %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminub %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminub %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminub %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminub %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v64i8:			; AVX512-LABEL: reassociate_umin_v64i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminub %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminub %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminub %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminub %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <64 x i8> %x0, %x1			%t0 = add <64 x i8> %x0, %x1
	%t1 = icmp ult <64 x i8> %x2, %t0			%t1 = icmp ult <64 x i8> %x2, %t0
	%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0			%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0
	%t3 = icmp ult <64 x i8> %x3, %t2			%t3 = icmp ult <64 x i8> %x3, %t2
	%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2			%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2
	ret <64 x i8> %t4			ret <64 x i8> %t4
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	; SSE-NEXT: psubusw %xmm12, %xmm4			; SSE-NEXT: psubusw %xmm12, %xmm4
	; SSE-NEXT: psubw %xmm4, %xmm3			; SSE-NEXT: psubw %xmm4, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umin_v32i16:			; AVX2-LABEL: reassociate_umin_v32i16:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminuw %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminuw %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminuw %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminuw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminuw %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminuw %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminuw %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminuw %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v32i16:			; AVX512-LABEL: reassociate_umin_v32i16:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminuw %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminuw %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminuw %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminuw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <32 x i16> %x0, %x1			%t0 = add <32 x i16> %x0, %x1
	%t1 = icmp ult <32 x i16> %x2, %t0			%t1 = icmp ult <32 x i16> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0			%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0
	%t3 = icmp ult <32 x i16> %x3, %t2			%t3 = icmp ult <32 x i16> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2			%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2
	ret <32 x i16> %t4			ret <32 x i16> %t4
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm3			; SSE-NEXT: pandn %xmm4, %xmm3
	; SSE-NEXT: por %xmm8, %xmm3			; SSE-NEXT: por %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_umin_v16i32:			; AVX2-LABEL: reassociate_umin_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminud %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminud %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminud %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminud %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminud %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminud %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminud %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminud %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v16i32:			; AVX512-LABEL: reassociate_umin_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminud %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminud %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminud %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminud %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <16 x i32> %x0, %x1			%t0 = add <16 x i32> %x0, %x1
	%t1 = icmp ult <16 x i32> %x2, %t0			%t1 = icmp ult <16 x i32> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0			%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0
	%t3 = icmp ult <16 x i32> %x3, %t2			%t3 = icmp ult <16 x i32> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2			%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2
	ret <16 x i32> %t4			ret <16 x i32> %t4
	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vpxor %ymm2, %ymm7, %ymm2			; AVX2-NEXT: vpxor %ymm2, %ymm7, %ymm2
	; AVX2-NEXT: vpcmpgtq %ymm2, %ymm3, %ymm2			; AVX2-NEXT: vpcmpgtq %ymm2, %ymm3, %ymm2
	; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1			; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_umin_v8i64:			; AVX512-LABEL: reassociate_umin_v8i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminuq %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminuq %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminuq %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminuq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i64> %x0, %x1			%t0 = add <8 x i64> %x0, %x1
	%t1 = icmp ult <8 x i64> %x2, %t0			%t1 = icmp ult <8 x i64> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0			%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0
	%t3 = icmp ult <8 x i64> %x3, %t2			%t3 = icmp ult <8 x i64> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2			%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2
	ret <8 x i64> %t4			ret <8 x i64> %t4
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm3			; SSE-NEXT: pandn %xmm4, %xmm3
	; SSE-NEXT: por %xmm8, %xmm3			; SSE-NEXT: por %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smin_v64i8:			; AVX2-LABEL: reassociate_smin_v64i8:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddb %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminsb %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminsb %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminsb %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminsb %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminsb %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminsb %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminsb %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminsb %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v64i8:			; AVX512-LABEL: reassociate_smin_v64i8:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminsb %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminsb %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminsb %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminsb %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <64 x i8> %x0, %x1			%t0 = add <64 x i8> %x0, %x1
	%t1 = icmp slt <64 x i8> %x2, %t0			%t1 = icmp slt <64 x i8> %x2, %t0
	%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0			%t2 = select <64 x i1> %t1, <64 x i8> %x2, <64 x i8> %t0
	%t3 = icmp slt <64 x i8> %x3, %t2			%t3 = icmp slt <64 x i8> %x3, %t2
	%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2			%t4 = select <64 x i1> %t3, <64 x i8> %x3, <64 x i8> %t2
	ret <64 x i8> %t4			ret <64 x i8> %t4
	Show All 15 Lines
	; SSE-NEXT: pminsw {{[0-9]+}}(%rsp), %xmm2			; SSE-NEXT: pminsw {{[0-9]+}}(%rsp), %xmm2
	; SSE-NEXT: pminsw {{[0-9]+}}(%rsp), %xmm3			; SSE-NEXT: pminsw {{[0-9]+}}(%rsp), %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smin_v32i16:			; AVX2-LABEL: reassociate_smin_v32i16:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddw %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminsw %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminsw %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminsw %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminsw %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminsw %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminsw %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminsw %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminsw %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v32i16:			; AVX512-LABEL: reassociate_smin_v32i16:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminsw %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminsw %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminsw %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminsw %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <32 x i16> %x0, %x1			%t0 = add <32 x i16> %x0, %x1
	%t1 = icmp slt <32 x i16> %x2, %t0			%t1 = icmp slt <32 x i16> %x2, %t0
	%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0			%t2 = select <32 x i1> %t1, <32 x i16> %x2, <32 x i16> %t0
	%t3 = icmp slt <32 x i16> %x3, %t2			%t3 = icmp slt <32 x i16> %x3, %t2
	%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2			%t4 = select <32 x i1> %t3, <32 x i16> %x3, <32 x i16> %t2
	ret <32 x i16> %t4			ret <32 x i16> %t4
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE-NEXT: pandn %xmm4, %xmm3			; SSE-NEXT: pandn %xmm4, %xmm3
	; SSE-NEXT: por %xmm8, %xmm3			; SSE-NEXT: por %xmm8, %xmm3
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX2-LABEL: reassociate_smin_v16i32:			; AVX2-LABEL: reassociate_smin_v16i32:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1			; AVX2-NEXT: vpaddd %ymm3, %ymm1, %ymm1
	; AVX2-NEXT: vpminsd %ymm1, %ymm5, %ymm1			; AVX2-NEXT: vpminsd %ymm6, %ymm4, %ymm2
	; AVX2-NEXT: vpminsd %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpminsd %ymm2, %ymm0, %ymm0
	; AVX2-NEXT: vpminsd %ymm0, %ymm6, %ymm0			; AVX2-NEXT: vpminsd %ymm7, %ymm5, %ymm2
	; AVX2-NEXT: vpminsd %ymm1, %ymm7, %ymm1			; AVX2-NEXT: vpminsd %ymm2, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v16i32:			; AVX512-LABEL: reassociate_smin_v16i32:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminsd %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminsd %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminsd %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminsd %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <16 x i32> %x0, %x1			%t0 = add <16 x i32> %x0, %x1
	%t1 = icmp slt <16 x i32> %x2, %t0			%t1 = icmp slt <16 x i32> %x2, %t0
	%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0			%t2 = select <16 x i1> %t1, <16 x i32> %x2, <16 x i32> %t0
	%t3 = icmp slt <16 x i32> %x3, %t2			%t3 = icmp slt <16 x i32> %x3, %t2
	%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2			%t4 = select <16 x i1> %t3, <16 x i32> %x3, <16 x i32> %t2
	ret <16 x i32> %t4			ret <16 x i32> %t4
	▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: vblendvpd %ymm2, %ymm6, %ymm0, %ymm0			; AVX2-NEXT: vblendvpd %ymm2, %ymm6, %ymm0, %ymm0
	; AVX2-NEXT: vpcmpgtq %ymm7, %ymm1, %ymm2			; AVX2-NEXT: vpcmpgtq %ymm7, %ymm1, %ymm2
	; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1			; AVX2-NEXT: vblendvpd %ymm2, %ymm7, %ymm1, %ymm1
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: reassociate_smin_v8i64:			; AVX512-LABEL: reassociate_smin_v8i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0			; AVX512-NEXT: vpaddq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: vpminsq %zmm0, %zmm2, %zmm0			; AVX512-NEXT: vpminsq %zmm3, %zmm2, %zmm1
	; AVX512-NEXT: vpminsq %zmm0, %zmm3, %zmm0			; AVX512-NEXT: vpminsq %zmm1, %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq

	%t0 = add <8 x i64> %x0, %x1			%t0 = add <8 x i64> %x0, %x1
	%t1 = icmp slt <8 x i64> %x2, %t0			%t1 = icmp slt <8 x i64> %x2, %t0
	%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0			%t2 = select <8 x i1> %t1, <8 x i64> %x2, <8 x i64> %t0
	%t3 = icmp slt <8 x i64> %x3, %t2			%t3 = icmp slt <8 x i64> %x3, %t2
	%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2			%t4 = select <8 x i1> %t3, <8 x i64> %x3, <8 x i64> %t2
	ret <8 x i64> %t4			ret <8 x i64> %t4
	}			}

llvm/test/CodeGen/X86/machine-combiner-int.ll

	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -machine-combiner-verify-pattern-order=true \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -machine-combiner-verify-pattern-order=true \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -stop-after machine-combiner -machine-combiner-verify-pattern-order=true -o - \| FileCheck %s --check-prefix=DEAD			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -stop-after machine-combiner -machine-combiner-verify-pattern-order=true -o - \| FileCheck %s --check-prefix=DEAD

	; Verify that integer multiplies are reassociated. The first multiply in			; Verify that integer multiplies are reassociated. The first multiply in
	; each test should be independent of the result of the preceding add (lea).			; each test should be independent of the result of the preceding add (lea).

	; TODO: This test does not actually test i16 machine instruction reassociation			; TODO: This test does not actually test i16 machine instruction reassociation
	; because the operands are being promoted to i32 types.			; because the operands are being promoted to i32 types.

	define i16 @reassociate_muls_i16(i16 %x0, i16 %x1, i16 %x2, i16 %x3) {			define i16 @reassociate_muls_i16(i16 %x0, i16 %x1, i16 %x2, i16 %x3) {
	; CHECK-LABEL: reassociate_muls_i16:			; CHECK-LABEL: reassociate_muls_i16:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: leal (%rdi,%rsi), %eax			; CHECK-NEXT: leal (%rdi,%rsi), %eax
				; CHECK-NEXT: imull %ecx, %edx
	; CHECK-NEXT: imull %edx, %eax			; CHECK-NEXT: imull %edx, %eax
	; CHECK-NEXT: imull %ecx, %eax
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = add i16 %x0, %x1			%t0 = add i16 %x0, %x1
	%t1 = mul i16 %x2, %t0			%t1 = mul i16 %x2, %t0
	%t2 = mul i16 %x3, %t1			%t2 = mul i16 %x3, %t1
	ret i16 %t2			ret i16 %t2
	}			}

	define i32 @reassociate_muls_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {			define i32 @reassociate_muls_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {
	; CHECK-LABEL: reassociate_muls_i32:			; CHECK-LABEL: reassociate_muls_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: leal (%rdi,%rsi), %eax			; CHECK-NEXT: leal (%rdi,%rsi), %eax
				; CHECK-NEXT: imull %ecx, %edx
	; CHECK-NEXT: imull %edx, %eax			; CHECK-NEXT: imull %edx, %eax
	; CHECK-NEXT: imull %ecx, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq

	; DEAD: ADD32rr			; DEAD: ADD32rr
	; DEAD-NEXT: IMUL32rr{{.*}}implicit-def dead $eflags			; DEAD-NEXT: IMUL32rr{{.*}}implicit-def dead $eflags
	; DEAD-NEXT: IMUL32rr{{.*}}implicit-def dead $eflags			; DEAD-NEXT: IMUL32rr{{.*}}implicit-def dead $eflags

	%t0 = add i32 %x0, %x1			%t0 = add i32 %x0, %x1
	%t1 = mul i32 %x2, %t0			%t1 = mul i32 %x2, %t0
	%t2 = mul i32 %x3, %t1			%t2 = mul i32 %x3, %t1
	ret i32 %t2			ret i32 %t2
	}			}

	define i64 @reassociate_muls_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {			define i64 @reassociate_muls_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {
	; CHECK-LABEL: reassociate_muls_i64:			; CHECK-LABEL: reassociate_muls_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: leaq (%rdi,%rsi), %rax			; CHECK-NEXT: leaq (%rdi,%rsi), %rax
				; CHECK-NEXT: imulq %rcx, %rdx
	; CHECK-NEXT: imulq %rdx, %rax			; CHECK-NEXT: imulq %rdx, %rax
	; CHECK-NEXT: imulq %rcx, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = add i64 %x0, %x1			%t0 = add i64 %x0, %x1
	%t1 = mul i64 %x2, %t0			%t1 = mul i64 %x2, %t0
	%t2 = mul i64 %x3, %t1			%t2 = mul i64 %x3, %t1
	ret i64 %t2			ret i64 %t2
	}			}

	; Verify that integer 'ands' are reassociated. The first 'and' in			; Verify that integer 'ands' are reassociated. The first 'and' in
	; each test should be independent of the result of the preceding sub.			; each test should be independent of the result of the preceding sub.

	define i8 @reassociate_ands_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {			define i8 @reassociate_ands_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {
	; CHECK-LABEL: reassociate_ands_i8:			; CHECK-LABEL: reassociate_ands_i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subb %sil, %al			; CHECK-NEXT: subb %sil, %dil
	; CHECK-NEXT: andb %dl, %al
	; CHECK-NEXT: andb %cl, %al			; CHECK-NEXT: andb %cl, %al
				; CHECK-NEXT: andb %dil, %al
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i8 %x0, %x1			%t0 = sub i8 %x0, %x1
	%t1 = and i8 %x2, %t0			%t1 = and i8 %x2, %t0
	%t2 = and i8 %x3, %t1			%t2 = and i8 %x3, %t1
	ret i8 %t2			ret i8 %t2
	}			}

	; TODO: No way to test i16? These appear to always get promoted to i32.			; TODO: No way to test i16? These appear to always get promoted to i32.

	define i32 @reassociate_ands_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {			define i32 @reassociate_ands_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {
	; CHECK-LABEL: reassociate_ands_i32:			; CHECK-LABEL: reassociate_ands_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subl %esi, %eax			; CHECK-NEXT: subl %esi, %edi
	; CHECK-NEXT: andl %edx, %eax
	; CHECK-NEXT: andl %ecx, %eax			; CHECK-NEXT: andl %ecx, %eax
				; CHECK-NEXT: andl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i32 %x0, %x1			%t0 = sub i32 %x0, %x1
	%t1 = and i32 %x2, %t0			%t1 = and i32 %x2, %t0
	%t2 = and i32 %x3, %t1			%t2 = and i32 %x3, %t1
	ret i32 %t2			ret i32 %t2
	}			}

	define i64 @reassociate_ands_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {			define i64 @reassociate_ands_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {
	; CHECK-LABEL: reassociate_ands_i64:			; CHECK-LABEL: reassociate_ands_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdx, %rax
	; CHECK-NEXT: subq %rsi, %rax			; CHECK-NEXT: subq %rsi, %rdi
	; CHECK-NEXT: andq %rdx, %rax
	; CHECK-NEXT: andq %rcx, %rax			; CHECK-NEXT: andq %rcx, %rax
				; CHECK-NEXT: andq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i64 %x0, %x1			%t0 = sub i64 %x0, %x1
	%t1 = and i64 %x2, %t0			%t1 = and i64 %x2, %t0
	%t2 = and i64 %x3, %t1			%t2 = and i64 %x3, %t1
	ret i64 %t2			ret i64 %t2
	}			}

	; Verify that integer 'ors' are reassociated. The first 'or' in			; Verify that integer 'ors' are reassociated. The first 'or' in
	; each test should be independent of the result of the preceding sub.			; each test should be independent of the result of the preceding sub.

	define i8 @reassociate_ors_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {			define i8 @reassociate_ors_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {
	; CHECK-LABEL: reassociate_ors_i8:			; CHECK-LABEL: reassociate_ors_i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subb %sil, %al			; CHECK-NEXT: subb %sil, %dil
	; CHECK-NEXT: orb %dl, %al
	; CHECK-NEXT: orb %cl, %al			; CHECK-NEXT: orb %cl, %al
				; CHECK-NEXT: orb %dil, %al
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i8 %x0, %x1			%t0 = sub i8 %x0, %x1
	%t1 = or i8 %x2, %t0			%t1 = or i8 %x2, %t0
	%t2 = or i8 %x3, %t1			%t2 = or i8 %x3, %t1
	ret i8 %t2			ret i8 %t2
	}			}

	; TODO: No way to test i16? These appear to always get promoted to i32.			; TODO: No way to test i16? These appear to always get promoted to i32.

	define i32 @reassociate_ors_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {			define i32 @reassociate_ors_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {
	; CHECK-LABEL: reassociate_ors_i32:			; CHECK-LABEL: reassociate_ors_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subl %esi, %eax			; CHECK-NEXT: subl %esi, %edi
	; CHECK-NEXT: orl %edx, %eax
	; CHECK-NEXT: orl %ecx, %eax			; CHECK-NEXT: orl %ecx, %eax
				; CHECK-NEXT: orl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i32 %x0, %x1			%t0 = sub i32 %x0, %x1
	%t1 = or i32 %x2, %t0			%t1 = or i32 %x2, %t0
	%t2 = or i32 %x3, %t1			%t2 = or i32 %x3, %t1
	ret i32 %t2			ret i32 %t2
	}			}

	define i64 @reassociate_ors_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {			define i64 @reassociate_ors_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {
	; CHECK-LABEL: reassociate_ors_i64:			; CHECK-LABEL: reassociate_ors_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdx, %rax
	; CHECK-NEXT: subq %rsi, %rax			; CHECK-NEXT: subq %rsi, %rdi
	; CHECK-NEXT: orq %rdx, %rax
	; CHECK-NEXT: orq %rcx, %rax			; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: orq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i64 %x0, %x1			%t0 = sub i64 %x0, %x1
	%t1 = or i64 %x2, %t0			%t1 = or i64 %x2, %t0
	%t2 = or i64 %x3, %t1			%t2 = or i64 %x3, %t1
	ret i64 %t2			ret i64 %t2
	}			}

	; Verify that integer 'xors' are reassociated. The first 'xor' in			; Verify that integer 'xors' are reassociated. The first 'xor' in
	; each test should be independent of the result of the preceding sub.			; each test should be independent of the result of the preceding sub.

	define i8 @reassociate_xors_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {			define i8 @reassociate_xors_i8(i8 %x0, i8 %x1, i8 %x2, i8 %x3) {
	; CHECK-LABEL: reassociate_xors_i8:			; CHECK-LABEL: reassociate_xors_i8:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subb %sil, %al			; CHECK-NEXT: subb %sil, %dil
	; CHECK-NEXT: xorb %dl, %al
	; CHECK-NEXT: xorb %cl, %al			; CHECK-NEXT: xorb %cl, %al
				; CHECK-NEXT: xorb %dil, %al
	; CHECK-NEXT: # kill			; CHECK-NEXT: # kill
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i8 %x0, %x1			%t0 = sub i8 %x0, %x1
	%t1 = xor i8 %x2, %t0			%t1 = xor i8 %x2, %t0
	%t2 = xor i8 %x3, %t1			%t2 = xor i8 %x3, %t1
	ret i8 %t2			ret i8 %t2
	}			}

	; TODO: No way to test i16? These appear to always get promoted to i32.			; TODO: No way to test i16? These appear to always get promoted to i32.

	define i32 @reassociate_xors_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {			define i32 @reassociate_xors_i32(i32 %x0, i32 %x1, i32 %x2, i32 %x3) {
	; CHECK-LABEL: reassociate_xors_i32:			; CHECK-LABEL: reassociate_xors_i32:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edx, %eax
	; CHECK-NEXT: subl %esi, %eax			; CHECK-NEXT: subl %esi, %edi
	; CHECK-NEXT: xorl %edx, %eax
	; CHECK-NEXT: xorl %ecx, %eax			; CHECK-NEXT: xorl %ecx, %eax
				; CHECK-NEXT: xorl %edi, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i32 %x0, %x1			%t0 = sub i32 %x0, %x1
	%t1 = xor i32 %x2, %t0			%t1 = xor i32 %x2, %t0
	%t2 = xor i32 %x3, %t1			%t2 = xor i32 %x3, %t1
	ret i32 %t2			ret i32 %t2
	}			}

	define i64 @reassociate_xors_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {			define i64 @reassociate_xors_i64(i64 %x0, i64 %x1, i64 %x2, i64 %x3) {
	; CHECK-LABEL: reassociate_xors_i64:			; CHECK-LABEL: reassociate_xors_i64:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movq %rdi, %rax			; CHECK-NEXT: movq %rdx, %rax
	; CHECK-NEXT: subq %rsi, %rax			; CHECK-NEXT: subq %rsi, %rdi
	; CHECK-NEXT: xorq %rdx, %rax
	; CHECK-NEXT: xorq %rcx, %rax			; CHECK-NEXT: xorq %rcx, %rax
				; CHECK-NEXT: xorq %rdi, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%t0 = sub i64 %x0, %x1			%t0 = sub i64 %x0, %x1
	%t1 = xor i64 %x2, %t0			%t1 = xor i64 %x2, %t0
	%t2 = xor i64 %x3, %t1			%t2 = xor i64 %x3, %t1
	ret i64 %t2			ret i64 %t2
	}			}