This is an archive of the discontinued LLVM Phabricator instance.

[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573)
AbandonedPublic

Authored by avt77 on May 11 2017, 6:08 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
dtemirbulatov
andreadb

Summary

This patch closes AVX part of Bug https://bugs.llvm.org/show_bug.cgi?id=28573. It seems we have some problem here: the throughput of some instructions could have non integer (but float) value.

Diff Detail

Event Timeline

avt77 created this revision.May 11 2017, 6:08 AM

avt77 added reviewers: RKSimon, spatel, dtemirbulatov.May 11 2017, 6:29 AM

avt77 added a subscriber: llvm-commits.

RKSimon added inline comments.May 11 2017, 6:41 AM

lib/Target/X86/X86Schedule.td
48 ↗	(On Diff #98624)	The LEA3 changes should be in their own patch.
lib/Target/X86/X86ScheduleBtVer2.td
137	Isn't this handled by the use of JALU01 grouping JALU0 + JALU1 together? So it has a choice of 2 pipes and it will have a tp of 1cy whichever it goes down.
test/CodeGen/X86/sse2-schedule.ll
6022	Jaguar has a max of 1 load/cycle - so the tp should still be 1.00

RKSimon added a reviewer: andreadb.May 11 2017, 6:51 AM

It seems I fixed all known issues except proper support of vzeroupper and vzeroall: will try to do it in the next patch.

I slightly changed the algorithm of throughput calculation: if the instr sched model does not have cycles for the given instruction but it's valid then throughput is equal to lattency.

RKSimon added inline comments.May 16 2017, 2:27 PM

lib/Target/X86/X86ScheduleBtVer2.td
71	I don't think adding these Cluster groups is necessary. TBH most of the ProcResource defs appear to be superfluous - most aren't used at all - we're just using the JFPU0/JFPU0/JFPU01 defs, with a few others for the longer op chain instructions.
349	Better off using JFPU0 as that's what is actually bound to the buffer. Same for the others below.
357	Shouldn't this def be something like the below, to show it will consume the AGU for a cycle? Same for the other loads. def WriteFAddYMLd: SchedWriteRes<[JLAGU,JFPU0]> { let Latency = 8; let ResourceCycles = [1,2]; }
test/CodeGen/X86/slow-unaligned-mem.ll
89	????

avt77 added inline comments.May 17 2017, 8:47 AM

lib/Target/X86/X86ScheduleBtVer2.td
71	Several nstructions below could be executed on FPA or on FPM that's why we need a possibility to say it and that's why I created JFPFltCluster. Is it OK? And I created JFPIntCluster cluster just in case: should I remove it?
349	Are you sure it's a good idea? FP0 includes VALU0, VIMUL and FPA. I'm using FPA because this instruction uses exactly FPA. At the same time if we use FPU0 then througput will be 2/3 = 0.666 and that's wrong. Or you mean that our instruction is FP and it should not deal with VALU0, VIMUL? In this case we should change the algorithm again.
357	I thought about but Software Optimization Guide does not show it (I mean it says about AGU but it does not include the additional cycle in its tables). Should I update the model?
test/CodeGen/X86/slow-unaligned-mem.ll
89	This test was written by hand that's why it's difficult to compare the results but the new version generates: BB#0: vxorps %ymm0, %ymm0, %ymm0 movl 4(%esp), %eax vmovups %ymm0, 32(%eax) vmovups %ymm0, (%eax) retl As you see we have vxorps between # BB#0: and 'movl'. I decided it's acceptable. Am I wrong?

I've fixed all issues raised by Simon. In addition I re-checked all numbers: it seems they are correct now.

I really don't understand why you are having to change the throughput calculation as part of this - split this as another patch?

lib/Target/X86/X86ScheduleBtVer2.td
353	Why WriteFAddYY not WriteFAddY ?
359	WriteFAddYLd ?
365	WriteFDivY?

In D33099#769806, @RKSimon wrote:

I really don't understand why you are having to change the throughput calculation as part of this - split this as another patch?

In this case I should move the test changes for zerroall and zeroupper in a separate patch as well, right?

RKSimon mentioned this in D33203: Add scheduler classes to integer/float horizontal operations.Jun 1 2017, 9:45 AM

I removed all changes related to throughput calculations. And I made all updates suggested by Simon.

RKSimon added inline comments.Jun 7 2017, 7:04 AM

lib/Target/X86/X86ScheduleBtVer2.td
22	It is still the Retire Control Unit, its just that the FPU can only touch 44 of the entries. let MicroOpBufferSize = 64; // Retire Control Unit
25	Don't remove whitespace.
97	Undo this whitespace
176	Don't remove whitespace.
373	WriteVMULYPD For all these defs, please can you include the 'Y' to make it clear that its just the 256-bit case
445	What is AVX11? Spelling: deafault -> default
453	"VMOVAP(D\|S)rm" etc. are memory loads - they should be in the Ld version
test/CodeGen/X86/avx-vzeroupper.ll
163 ↗	(On Diff #101712)	What is causing this?
test/CodeGen/X86/recip-fastmath.ll
345	Latency should be 5cy

avt77 added inline comments.Jun 8 2017, 4:41 AM

lib/Target/X86/X86ScheduleBtVer2.td
453	From my point of view rm-version store some register value into memory while mr-version loads the value from memory into the register. Am I right?
test/CodeGen/X86/recip-fastmath.ll
345	Why? In fact we should have tp 0.5 for XMM (see below). I'll fix it. VMOVAPD xmm1 xmm2 AVX 1 FPA\|FPM 1 0,5 VMOVAPD ymm1 ymm2 AVX 2 FPA\|FPM 1 1 VMOVAPS xmm1 xmm2 AVX 1 FPA\|FPM 1 0,5 VMOVAPS ymm1 ymm2 AVX 2 FPA\|FPM 1 1

All notes from Simon were resolved. In addition I fixed numbers for some XMM versions of VMOVxxxx instructions.

We have now only 256-bit ops: it makes the patch smaller.

RKSimon added inline comments.Jul 6 2017, 6:01 AM

lib/Target/X86/X86ScheduleBtVer2.td
389	VRVPPSYr -> VRCPPSYr ?
395	VRVPPSYm -> VRCPPSYm ?
397	WriteVDPPSY
403	WriteVDPPSYLd
419	VROUNDYP(S\|D)rm ?

avt77 retitled this revision from AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions to [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).Jul 6 2017, 7:24 AM

Simon, thank you for all these catches: I fixed them.

avt77 mentioned this in D35198: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 .Jul 10 2017, 6:07 AM

I merged this patch with trunk. Now it's a part 2 othe initial patch.

RKSimon added inline comments.Aug 4 2017, 8:27 AM

lib/Target/X86/X86ScheduleBtVer2.td
374	JLAGU, JSTC, JFPU01 ?

Simon, finaly I'm able to create ClothAvx test executable with clang. And I created it with this patch and without it. And I got the following results on AMD laptop (CPU AMD A10-8700P Radeon R6, 10 Compute Cores 4C+6G 1.80 GHz):

C:\Users\andre\Downloads\working\ClothExe>type avxcloth-patch.log
0.00 57.67 60.91 60.28 27.26 62.62 62.56 67.97
SIMD Width = 1
28.43 23.37 22.71 22.93 22.65 23.00 23.07 23.14 22.71 22.89 22.94 22.81 22.79 23.32 23.14
SIMD Width = 4
36.39 57.45 57.61 56.97 57.05 57.82 57.12 57.38 57.08 57.57 57.28 57.88 57.43 56.96 57.07 57.33
SIMD Width = 8
68.71 71.78 71.31 71.78 71.79 71.67 71.97 71.79 71.25 72.55 71.96 71.52 72.04 70.67 71.78 70.39 70.94
C:\Users\andre\Downloads\working\ClothExe>type avxcloth-trunk.log
0.00 55.19 59.88 58.43 19.96 60.22 58.58 57.37 59.34 60.11
SIMD Width = 1
24.51 21.64 21.29 21.42 21.37 21.38 21.43 21.19 22.37 23.09 24.14 23.77 23.23 23.12 22.43 22.30
SIMD Width = 4
35.74 58.77 56.04 55.87 56.56 55.44 55.24 55.26 55.44 54.91 56.47 57.75 56.36 56.72 55.56 56.11 56.59
SIMD Width = 8
65.76 70.74 70.04 70.26 70.95 72.26 73.50 70.77 69.86 69.93 70.76 70.25 70.41 71.99

As you see the patched version is slightly faster than the trunk one. Are you sure you saw any degradation with this patch?
BTW, the number are Flps per second (they are being calculated every one second).

I fixed an issue raised by Simon.

I re-based avx-schedule.ll test.

Thanks, please can you add the f16c-schedule.ll costs as well?

Also, please add DPPS/DPPD (xmm) costs as well.

All updates required by Simon were done.

I made changes related to SSE4.1 and F16C instructions in Jaguar.

RKSimon added inline comments.Oct 14 2017, 7:07 AM

lib/Target/X86/X86ScheduleBtVer2.td
346	let NumMicroOps = 10;
352	let NumMicroOps = 11;
357	This is a load so the AGU should be the first pipe def WriteDPPSLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
360	Give the MOVNT and ROUND instructions their own entries
368	def WriteDPPDLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
374	Missing VTEST instructions
383	Latency is 3 according to AMD64_16h_InstrLatency_1.1.xlsx
386	You should probably just use a latency 3 here as its a convert+store.
389	There's no such instruction as VCVTPH2PSmr
398	WriteCVTPS2PHYSt
486	VPTESTD?

This patch was splitted on 4 related patches which will be committed instead of this one.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

TargetSchedule.h

2 lines

lib/

CodeGen/

TargetSchedule.cpp

36 lines

Target/

X86/

X86ScheduleBtVer2.td

292 lines

test/

CodeGen/

X86/

avx-schedule.ll

624 lines

recip-fastmath.ll

32 lines

recip-fastmath2.ll

48 lines

slow-unaligned-mem.ll

2 lines

sse-schedule.ll

12 lines

sse2-schedule.ll

10 lines

Diff 99105

include/llvm/CodeGen/TargetSchedule.h

Show All 34 Lines	class TargetSchedModel {
const TargetSubtargetInfo *STI = nullptr;		const TargetSubtargetInfo *STI = nullptr;
const TargetInstrInfo *TII = nullptr;		const TargetInstrInfo *TII = nullptr;

SmallVector<unsigned, 16> ResourceFactors;		SmallVector<unsigned, 16> ResourceFactors;
unsigned MicroOpFactor; // Multiply to normalize microops to resource units.		unsigned MicroOpFactor; // Multiply to normalize microops to resource units.
unsigned ResourceLCM; // Resource units per cycle. Latency normalization factor.		unsigned ResourceLCM; // Resource units per cycle. Latency normalization factor.

unsigned computeInstrLatency(const MCSchedClassDesc &SCDesc) const;		unsigned computeInstrLatency(const MCSchedClassDesc &SCDesc) const;
		Optional<double>
		getRThroughputFromInstrSchedModel(const MCSchedClassDesc *SCDesc) const;

public:		public:
TargetSchedModel() : SchedModel(MCSchedModel::GetDefaultSchedModel()) {}		TargetSchedModel() : SchedModel(MCSchedModel::GetDefaultSchedModel()) {}

/// \brief Initialize the machine model for instruction scheduling.		/// \brief Initialize the machine model for instruction scheduling.
///		///
/// The machine model API keeps a copy of the top-level MCSchedModel table		/// The machine model API keeps a copy of the top-level MCSchedModel table
/// indices and may query TargetSubtargetInfo and TargetInstrInfo to resolve		/// indices and may query TargetSubtargetInfo and TargetInstrInfo to resolve
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

lib/CodeGen/TargetSchedule.cpp

Show First 20 Lines • Show All 331 Lines • ▼ Show 20 Lines	if (SCDesc->isValid()) {
return 1;		return 1;
}		}
}		}
}		}
return 0;		return 0;
}		}

static Optional<double>		static Optional<double>
getRTroughputFromItineraries(unsigned schedClass,		getRThroughputFromItineraries(unsigned schedClass,
const InstrItineraryData *IID){		const InstrItineraryData *IID) {
double Unknown = std::numeric_limits<double>::infinity();		double Unknown = std::numeric_limits<double>::infinity();
double Throughput = Unknown;		double Throughput = Unknown;

for (const InstrStage *IS = IID->beginStage(schedClass),		for (const InstrStage *IS = IID->beginStage(schedClass),
*E = IID->endStage(schedClass);		*E = IID->endStage(schedClass);
IS != E; ++IS) {		IS != E; ++IS) {
unsigned Cycles = IS->getCycles();		unsigned Cycles = IS->getCycles();
if (!Cycles)		if (!Cycles)
continue;		continue;
Throughput =		Throughput =
std::min(Throughput, countPopulation(IS->getUnits()) * 1.0 / Cycles);		std::min(Throughput, countPopulation(IS->getUnits()) * 1.0 / Cycles);
}		}
		if (Throughput == Unknown)
		return Optional<double>();
// We need reciprocal throughput that's why we return such value.		// We need reciprocal throughput that's why we return such value.
return 1 / Throughput;		return 1 / Throughput;
}		}

static Optional<double>		Optional<double> TargetSchedModel::getRThroughputFromInstrSchedModel(
getRTroughputFromInstrSchedModel(const MCSchedClassDesc *SCDesc,		const MCSchedClassDesc *SCDesc) const {
const TargetSubtargetInfo *STI,
const MCSchedModel &SchedModel) {
double Unknown = std::numeric_limits<double>::infinity();		double Unknown = std::numeric_limits<double>::infinity();
double Throughput = Unknown;		double Throughput = Unknown;
		const MCWriteProcResEntry *WPR = STI->getWriteProcResBegin(SCDesc),
for (const MCWriteProcResEntry *WPR = STI->getWriteProcResBegin(SCDesc),
*WEnd = STI->getWriteProcResEnd(SCDesc);		*WEnd = STI->getWriteProcResEnd(SCDesc);
WPR != WEnd; ++WPR) {		if ((WPR == WEnd) && SCDesc->isValid())
		return computeInstrLatency(*SCDesc);
		for (; WPR != WEnd; ++WPR) {
unsigned Cycles = WPR->Cycles;		unsigned Cycles = WPR->Cycles;
if (!Cycles)		if (!Cycles)
return Optional<double>();		continue;

unsigned NumUnits =		unsigned NumUnits =
SchedModel.getProcResource(WPR->ProcResourceIdx)->NumUnits;		SchedModel.getProcResource(WPR->ProcResourceIdx)->NumUnits;
Throughput = std::min(Throughput, NumUnits * 1.0 / Cycles);		Throughput = std::min(Throughput, NumUnits * 1.0 / Cycles);
}		}
		if (Throughput == Unknown)
		return Optional<double>();
// We need reciprocal throughput that's why we return such value.		// We need reciprocal throughput that's why we return such value.
return 1 / Throughput;		return 1 / Throughput;
}		}

Optional<double>		Optional<double>
TargetSchedModel::computeInstrRThroughput(const MachineInstr *MI) const {		TargetSchedModel::computeInstrRThroughput(const MachineInstr *MI) const {
if (hasInstrItineraries())		if (hasInstrItineraries())
return getRTroughputFromItineraries(MI->getDesc().getSchedClass(),		return getRThroughputFromItineraries(MI->getDesc().getSchedClass(),
getInstrItineraries());		getInstrItineraries());
if (hasInstrSchedModel())		if (hasInstrSchedModel())
return getRTroughputFromInstrSchedModel(resolveSchedClass(MI), STI,		return getRThroughputFromInstrSchedModel(resolveSchedClass(MI));
SchedModel);
return Optional<double>();		return Optional<double>();
}		}

Optional<double>		Optional<double>
TargetSchedModel::computeInstrRThroughput(unsigned Opcode) const {		TargetSchedModel::computeInstrRThroughput(unsigned Opcode) const {
unsigned SchedClass = TII->get(Opcode).getSchedClass();		unsigned SchedClass = TII->get(Opcode).getSchedClass();
if (hasInstrItineraries())		if (hasInstrItineraries())
return getRTroughputFromItineraries(SchedClass, getInstrItineraries());		return getRThroughputFromItineraries(SchedClass, getInstrItineraries());
if (hasInstrSchedModel()) {		if (hasInstrSchedModel()) {
const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SchedClass);		const MCSchedClassDesc *SCDesc = SchedModel.getSchedClassDesc(SchedClass);
if (SCDesc->isValid() && !SCDesc->isVariant())		if (SCDesc->isValid() && !SCDesc->isVariant())
return getRTroughputFromInstrSchedModel(SCDesc, STI, SchedModel);		return getRThroughputFromInstrSchedModel(SCDesc);
}		}
return Optional<double>();		return Optional<double>();
}		}

lib/Target/X86/X86ScheduleBtVer2.td

Show All 11 Lines
// Optimization Guide for AMD Family 16h Processors & Instruction Latency appendix.		// Optimization Guide for AMD Family 16h Processors & Instruction Latency appendix.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def BtVer2Model : SchedMachineModel {		def BtVer2Model : SchedMachineModel {
// All x86 instructions are modeled as a single micro-op, and btver2 can		// All x86 instructions are modeled as a single micro-op, and btver2 can
// decode 2 instructions per cycle.		// decode 2 instructions per cycle.
let IssueWidth = 2;		let IssueWidth = 2;
let MicroOpBufferSize = 64; // Retire Control Unit		// FIXME: maximum of 44 macro-ops which have floating-point micro-op components can be
		// in-flight in the 64-macro-op in-flight window that the integer retire control unit provides.
		let MicroOpBufferSize = 64; // Integer Retire Control Unit
		RKSimonUnsubmitted Not Done Reply Inline Actions It is still the Retire Control Unit, its just that the FPU can only touch 44 of the entries. let MicroOpBufferSize = 64; // Retire Control Unit RKSimon: It is still the Retire Control Unit, its just that the FPU can only touch 44 of the entries.
let LoadLatency = 5; // FPU latency (worse case cf Integer 3 cycle latency)		let LoadLatency = 5; // FPU latency (worse case cf Integer 3 cycle latency)
let HighLatency = 25;		let HighLatency = 25;
let MispredictPenalty = 14; // Minimum branch misdirection penalty		let MispredictPenalty = 14; // Minimum branch misdirection penalty
let PostRAScheduler = 1;		let PostRAScheduler = 1;

RKSimonUnsubmitted Not Done Reply Inline Actions Don't remove whitespace. RKSimon: Don't remove whitespace.
// FIXME: SSE4/AVX is unimplemented. This flag is set to allow		// FIXME: SSE4/AVX is unimplemented. This flag is set to allow
// the scheduler to assign a default model to unrecognized opcodes.		// the scheduler to assign a default model to unrecognized opcodes.
let CompleteModel = 0;		let CompleteModel = 0;
}		}

let SchedModel = BtVer2Model in {		let SchedModel = BtVer2Model in {

// Jaguar can issue up to 6 micro-ops in one cycle		// Jaguar can issue up to 6 micro-ops in one cycle
Show All 26 Lines
def JMul : ProcResource<1>; // integer multiplication		def JMul : ProcResource<1>; // integer multiplication
def JVALU0 : ProcResource<1>; // vector integer		def JVALU0 : ProcResource<1>; // vector integer
def JVALU1 : ProcResource<1>; // vector integer		def JVALU1 : ProcResource<1>; // vector integer
def JVIMUL : ProcResource<1>; // vector integer multiplication		def JVIMUL : ProcResource<1>; // vector integer multiplication
def JSTC : ProcResource<1>; // vector store/convert		def JSTC : ProcResource<1>; // vector store/convert
def JFPM : ProcResource<1>; // FP multiplication		def JFPM : ProcResource<1>; // FP multiplication
def JFPA : ProcResource<1>; // FP addition		def JFPA : ProcResource<1>; // FP addition

		def JFPFltCluster : ProcResGroup<[JFPA, JFPM]>;
		def JFPIntCluster : ProcResGroup<[JVALU0, JVALU1, JSTC]>;

		RKSimonUnsubmitted Not Done Reply Inline Actions I don't think adding these Cluster groups is necessary. TBH most of the ProcResource defs appear to be superfluous - most aren't used at all - we're just using the JFPU0/JFPU0/JFPU01 defs, with a few others for the longer op chain instructions. RKSimon: I don't think adding these Cluster groups is necessary. TBH most of the ProcResource defs…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions Several nstructions below could be executed on FPA or on FPM that's why we need a possibility to say it and that's why I created JFPFltCluster. Is it OK? And I created JFPIntCluster cluster just in case: should I remove it? avt77: Several nstructions below could be executed on FPA or on FPM that's why we need a possibility…
// Integer loads are 3 cycles, so ReadAfterLd registers needn't be available until 3		// Integer loads are 3 cycles, so ReadAfterLd registers needn't be available until 3
// cycles after the memory operand.		// cycles after the memory operand.
def : ReadAdvance<ReadAfterLd, 3>;		def : ReadAdvance<ReadAfterLd, 3>;

// Many SchedWrites are defined in pairs with and without a folded load.		// Many SchedWrites are defined in pairs with and without a folded load.
// Instructions with folded loads are usually micro-fused, so they only appear		// Instructions with folded loads are usually micro-fused, so they only appear
// as two micro-ops when dispatched by the schedulers.		// as two micro-ops when dispatched by the schedulers.
// This multiclass defines the resource usage for variants with and without		// This multiclass defines the resource usage for variants with and without
Show All 9 Lines	multiclass JWriteResIntPair<X86FoldableSchedWrite SchedRW,
def : WriteRes<SchedRW.Folded, [JLAGU, ExePort]> {		def : WriteRes<SchedRW.Folded, [JLAGU, ExePort]> {
let Latency = !add(Lat, 3);		let Latency = !add(Lat, 3);
}		}
}		}

multiclass JWriteResFpuPair<X86FoldableSchedWrite SchedRW,		multiclass JWriteResFpuPair<X86FoldableSchedWrite SchedRW,
ProcResourceKind ExePort,		ProcResourceKind ExePort,
int Lat> {		int Lat> {

		RKSimonUnsubmitted Not Done Reply Inline Actions Undo this whitespace RKSimon: Undo this whitespace
// Register variant is using a single cycle on ExePort.		// Register variant is using a single cycle on ExePort.
def : WriteRes<SchedRW, [ExePort]> { let Latency = Lat; }		def : WriteRes<SchedRW, [ExePort]> { let Latency = Lat; }

// Memory variant also uses a cycle on JLAGU and adds 5 cycles to the		// Memory variant also uses a cycle on JLAGU and adds 5 cycles to the
// latency.		// latency.
def : WriteRes<SchedRW.Folded, [JLAGU, ExePort]> {		def : WriteRes<SchedRW.Folded, [JLAGU, ExePort]> {
let Latency = !add(Lat, 5);		let Latency = !add(Lat, 5);
}		}
Show All 23 Lines	def : WriteRes<WriteIDivLd, [JALU1, JLAGU, JDiv]> {
let Latency = 41;		let Latency = 41;
let ResourceCycles = [1, 1, 25];		let ResourceCycles = [1, 1, 25];
}		}

// This is for simple LEAs with one or two input operands.		// This is for simple LEAs with one or two input operands.
// FIXME: SAGU 3-operand LEA		// FIXME: SAGU 3-operand LEA
def : WriteRes<WriteLEA, [JALU01]>;		def : WriteRes<WriteLEA, [JALU01]>;

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
		RKSimonUnsubmitted Not Done Reply Inline Actions Isn't this handled by the use of JALU01 grouping JALU0 + JALU1 together? So it has a choice of 2 pipes and it will have a tp of 1cy whichever it goes down. RKSimon: Isn't this handled by the use of JALU01 grouping JALU0 + JALU1 together? So it has a choice of…
// Integer shifts and rotates.		// Integer shifts and rotates.
////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////

defm : JWriteResIntPair<WriteShift, JALU01, 1>;		defm : JWriteResIntPair<WriteShift, JALU01, 1>;

////////////////////////////////////////////////////////////////////////////////		////////////////////////////////////////////////////////////////////////////////
// Loads, stores, and moves, not folded with other operations.		// Loads, stores, and moves, not folded with other operations.
// FIXME: Split x86 and SSE load/store/moves		// FIXME: Split x86 and SSE load/store/moves
Show All 27 Lines

defm : JWriteResFpuPair<WriteFAdd, JFPU0, 3>;		defm : JWriteResFpuPair<WriteFAdd, JFPU0, 3>;
defm : JWriteResFpuPair<WriteFMul, JFPU1, 2>;		defm : JWriteResFpuPair<WriteFMul, JFPU1, 2>;
defm : JWriteResFpuPair<WriteFRcp, JFPU1, 2>;		defm : JWriteResFpuPair<WriteFRcp, JFPU1, 2>;
defm : JWriteResFpuPair<WriteFRsqrt, JFPU1, 2>;		defm : JWriteResFpuPair<WriteFRsqrt, JFPU1, 2>;
defm : JWriteResFpuPair<WriteFShuffle, JFPU01, 1>;		defm : JWriteResFpuPair<WriteFShuffle, JFPU01, 1>;
defm : JWriteResFpuPair<WriteFBlend, JFPU01, 1>;		defm : JWriteResFpuPair<WriteFBlend, JFPU01, 1>;
defm : JWriteResFpuPair<WriteFShuffle256, JFPU01, 1>;		defm : JWriteResFpuPair<WriteFShuffle256, JFPU01, 1>;

RKSimonUnsubmitted Not Done Reply Inline Actions Don't remove whitespace. RKSimon: Don't remove whitespace.
def : WriteRes<WriteFSqrt, [JFPU1, JLAGU, JFPM]> {		def : WriteRes<WriteFSqrt, [JFPU1, JLAGU, JFPM]> {
let Latency = 21;		let Latency = 21;
let ResourceCycles = [1, 1, 21];		let ResourceCycles = [1, 1, 21];
}		}
def : WriteRes<WriteFSqrtLd, [JFPU1, JLAGU, JFPM]> {		def : WriteRes<WriteFSqrtLd, [JFPU1, JLAGU, JFPM]> {
let Latency = 26;		let Latency = 26;
let ResourceCycles = [1, 1, 21];		let ResourceCycles = [1, 1, 21];
}		}
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	def : WriteRes<WriteCLMulLd, [JLAGU, JVIMUL]> {
let ResourceCycles = [1, 1];		let ResourceCycles = [1, 1];
}		}

// FIXME: pipe for system/microcode?		// FIXME: pipe for system/microcode?
def : WriteRes<WriteSystem, [JAny]> { let Latency = 100; }		def : WriteRes<WriteSystem, [JAny]> { let Latency = 100; }
def : WriteRes<WriteMicrocoded, [JAny]> { let Latency = 100; }		def : WriteRes<WriteMicrocoded, [JAny]> { let Latency = 100; }
def : WriteRes<WriteFence, [JSAGU]>;		def : WriteRes<WriteFence, [JSAGU]>;
def : WriteRes<WriteNop, []>;		def : WriteRes<WriteNop, []>;

		////////////////////////////////////////////////////////////////////////////////
		// AVX instructions.
		RKSimonUnsubmitted Not Done Reply Inline Actions let NumMicroOps = 10; RKSimon: let NumMicroOps = 10;
		////////////////////////////////////////////////////////////////////////////////

		def WriteFAddYY: SchedWriteRes<[JFPA]> {
		RKSimonUnsubmitted Not Done Reply Inline Actions Better off using JFPU0 as that's what is actually bound to the buffer. Same for the others below. RKSimon: Better off using JFPU0 as that's what is actually bound to the buffer. Same for the others…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions Are you sure it's a good idea? FP0 includes VALU0, VIMUL and FPA. I'm using FPA because this instruction uses exactly FPA. At the same time if we use FPU0 then througput will be 2/3 = 0.666 and that's wrong. Or you mean that our instruction is FP and it should not deal with VALU0, VIMUL? In this case we should change the algorithm again. avt77: Are you sure it's a good idea? FP0 includes VALU0, VIMUL and FPA. I'm using FPA because this…
		let Latency = 3;
		let ResourceCycles = [2];
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions let NumMicroOps = 11; RKSimon: let NumMicroOps = 11;
		def : InstRW<[WriteFAddYY], (instregex "VADD(SUB)?P(S\|D)Yrr", "VSUBP(S\|D)Yrr")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions Why WriteFAddYY not WriteFAddY ? RKSimon: Why WriteFAddYY not WriteFAddY ?

		def WriteFAddYMLd: SchedWriteRes<[JFPA]> {
		let Latency = 8;
		let ResourceCycles = [2];
		RKSimonUnsubmitted Not Done Reply Inline Actions Shouldn't this def be something like the below, to show it will consume the AGU for a cycle? Same for the other loads. def WriteFAddYMLd: SchedWriteRes<[JLAGU,JFPU0]> { let Latency = 8; let ResourceCycles = [1,2]; } RKSimon: Shouldn't this def be something like the below, to show it will consume the AGU for a cycle?
		avt77AuthorUnsubmitted Not Done Reply Inline Actions I thought about but Software Optimization Guide does not show it (I mean it says about AGU but it does not include the additional cycle in its tables). Should I update the model? avt77: I thought about but Software Optimization Guide does not show it (I mean it says about AGU but…
		RKSimonUnsubmitted Not Done Reply Inline Actions This is a load so the AGU should be the first pipe def WriteDPPSLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> { RKSimon: This is a load so the AGU should be the first pipe ``` def WriteDPPSLd: SchedWriteRes<[JLAGU…
		}
		def : InstRW<[WriteFAddYMLd, ReadAfterLd], (instregex "VADD(SUB)?P(S\|D)Yrm", "VSUBP(S\|D)Yrm")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteFAddYLd ? RKSimon: WriteFAddYLd ?

		RKSimonUnsubmitted Not Done Reply Inline Actions Give the MOVNT and ROUND instructions their own entries RKSimon: Give the MOVNT and ROUND instructions their own entries
		def WriteVDIV: SchedWriteRes<[JFPM]> {
		let Latency = 38;
		let ResourceCycles = [38];
		}
		def : InstRW<[WriteVDIV], (instregex "VDIVP(D\|S)Yrr")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteFDivY? RKSimon: WriteFDivY?

		def WriteVDIVLd: SchedWriteRes<[JFPM]> {
		let Latency = 43;
		RKSimonUnsubmitted Not Done Reply Inline Actions def WriteDPPDLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> { RKSimon: def WriteDPPDLd: SchedWriteRes<[JLAGU, JFPU0, JFPU1]> {
		let ResourceCycles = [38];
		}
		def : InstRW<[WriteVDIVLd, ReadAfterLd], (instregex "VDIVP(S\|D)Yrm")>;

		def WriteVMULPD: SchedWriteRes<[JFPM]> {
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteVMULYPD For all these defs, please can you include the 'Y' to make it clear that its just the 256-bit case RKSimon: WriteVMULYPD For all these defs, please can you include the 'Y' to make it clear that its just…
		let Latency = 4;
		RKSimonUnsubmitted Not Done Reply Inline Actions JLAGU, JSTC, JFPU01 ? RKSimon: JLAGU, JSTC, JFPU01 ?
		RKSimonUnsubmitted Not Done Reply Inline Actions Missing VTEST instructions RKSimon: Missing VTEST instructions
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVMULPD], (instregex "VMULPDYrr")>;

		def WriteVMULPDLd: SchedWriteRes<[JFPM]> {
		let Latency = 9;
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVMULPDLd, ReadAfterLd], (instregex "VMULPDYrm")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions Latency is 3 according to AMD64_16h_InstrLatency_1.1.xlsx RKSimon: Latency is 3 according to AMD64_16h_InstrLatency_1.1.xlsx

		def WriteVMULPS: SchedWriteRes<[JFPM]> {
		let Latency = 2;
		RKSimonUnsubmitted Not Done Reply Inline Actions You should probably just use a latency 3 here as its a convert+store. RKSimon: You should probably just use a latency 3 here as its a convert+store.
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVMULPS], (instregex "VMULPSYrr", "VRVPPSYr", "VRSQRTPSYr")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions VRVPPSYr -> VRCPPSYr ? RKSimon: VRVPPSYr -> VRCPPSYr ?
		RKSimonUnsubmitted Not Done Reply Inline Actions There's no such instruction as VCVTPH2PSmr RKSimon: There's no such instruction as VCVTPH2PSmr

		def WriteVMULPSLd: SchedWriteRes<[JFPM]> {
		let Latency = 7;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVMULPSLd, ReadAfterLd], (instregex "VMULPSYrm", "VRVPPSYm", "VRSQRTPSYm")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions VRVPPSYm -> VRCPPSYm ? RKSimon: VRVPPSYm -> VRCPPSYm ?

		def WriteVDPPS: SchedWriteRes<[JFPM, JFPA]> {
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteVDPPSY RKSimon: WriteVDPPSY
		let Latency = 12;
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteCVTPS2PHYSt RKSimon: WriteCVTPS2PHYSt
		let ResourceCycles = [6, 6];
		}
		def : InstRW<[WriteVDPPS], (instregex "VDPPSYrr")>;

		def WriteVDPPSLd: SchedWriteRes<[JFPM, JFPA]> {
		RKSimonUnsubmitted Not Done Reply Inline Actions WriteVDPPSYLd RKSimon: WriteVDPPSYLd
		let Latency = 17;
		let ResourceCycles = [6, 6];
		}
		def : InstRW<[WriteVDPPSLd, ReadAfterLd], (instregex "VDPPSYrm")>;

		def WriteVCVT: SchedWriteRes<[JSTC]> {
		let Latency = 3;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVCVT], (instregex "VCVTDQ2P(S\|D)Yrr", "VMOVNTP(S\|D)Ymr", "VROUNDYP(S\|D)r")>;

		def WriteVCVTLd: SchedWriteRes<[JSTC]> {
		let Latency = 8;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVCVTLd, ReadAfterLd], (instregex "VCVTDQ2P(S\|D)Yrm", "VROUNDYP(S\|D)r")>;
		RKSimonUnsubmitted Not Done Reply Inline Actions VROUNDYP(S\|D)rm ? RKSimon: VROUNDYP(S\|D)rm ?

		def WriteVCVTPD: SchedWriteRes<[JSTC, JFPFltCluster]> {
		let Latency = 6;
		let ResourceCycles = [2, 4];
		}
		def : InstRW<[WriteVCVTPD], (instregex "VCVTPD2(DQ\|PS)Yrr")>;

		def WriteVCVTPDLd: SchedWriteRes<[JSTC]> {
		let Latency = 11;
		let ResourceCycles = [2, 2];
		}
		def : InstRW<[WriteVCVTPDLd, ReadAfterLd], (instregex "VCVTPD2(DQ\|PS)Yrm")>;

		def WriteVCVTPS: SchedWriteRes<[JSTC]> {
		let Latency = 3;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVCVTPS], (instregex "VCVTPS2DQYrr")>;

		def WriteVCVTPSLd: SchedWriteRes<[JSTC]> {
		let Latency = 11;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVCVTPSLd, ReadAfterLd], (instregex "VCVTPS2DQYrm")>;

		// FIXME: We don't need 'Ld' version for AVX11 because deafult ResourceCycles == 1
		RKSimonUnsubmitted Not Done Reply Inline Actions What is AVX11? Spelling: deafault -> default RKSimon: What is AVX11? Spelling: deafault -> default
		// TODO: How to use ResourceCycles from non-folding version like we do it for Latency?
		def WriteAVX11: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 6;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteAVX11], (instregex "VAND(N)?P(S\|D)Yrr", "VBLENDP(S\|D)Yrri",
		"VBROADCASTF128", "VBROADCASTSSrr", "VINSERTF128rr",
		"VMOVAP(D\|S)rm", "VMOVDDUPYrr", "VMOVS(H\|L)DUPYrr", "VMOVUP(D\|S)Yrm",
		RKSimonUnsubmitted Not Done Reply Inline Actions "VMOVAP(D\|S)rm" etc. are memory loads - they should be in the Ld version RKSimon: "VMOVAP(D\|S)rm" etc. are memory loads - they should be in the Ld version
		avt77AuthorUnsubmitted Not Done Reply Inline Actions From my point of view rm-version store some register value into memory while mr-version loads the value from memory into the register. Am I right? avt77: From my point of view rm-version store some register value into memory while mr-version loads…
		"VORP(S\|D)Yrr", "VPERMILP(D\|S)Yri", "VSHUFP(D\|S)Yrri", "VUNPCK(H\|L)P(D\|S)rr",
		"VXORP(S\|D)Yrr")>;

		def WriteAVX11Ld: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 6;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteAVX11Ld, ReadAfterLd], (instregex "VAND(N)?P(S\|D)Yrm",
		"VBLENDP(S\|D)Yrmi", "VBROADCASTF128", "VBROADCASTSSrm",
		"VINSERTF128rm",
		"VMOVAP(D\|S)rm", "VMOVDDUPYrm", "VMOVS(H\|L)DUPYrr", "VMOVUP(D\|S)Ymr",
		"VORP(S\|D)Yrm", "VPERMILP(D\|S)Yrm", "VSHUFP(D\|S)Yrmi", "VUNPCK(H\|L)P(D\|S)rm",
		"VXORP(S\|D)Yrm")>;

		def WriteBlendVP: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 3;
		let ResourceCycles = [6];
		}
		def : InstRW<[WriteBlendVP], (instregex "VBLENDVP(S\|D)Yrr", "VPERMILP(D\|S)Yrr")>;

		def WriteBlendVPLd: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 8;
		let ResourceCycles = [6];
		}
		def : InstRW<[WriteBlendVPLd, ReadAfterLd], (instregex "VBLENDVP(S\|D)Yrm")>;

		def WriteVBROADCAST: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 1;
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVBROADCAST], (instregex "VBROADCASTS(S\|D)Yrr")>;

		def WriteVBROADCASTLd: SchedWriteRes<[JFPFltCluster]> {
		RKSimonUnsubmitted Not Done Reply Inline Actions VPTESTD? RKSimon: VPTESTD?
		let Latency = 6;
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVBROADCASTLd, ReadAfterLd], (instregex "VBROADCASTS(S\|D)Yrm")>;

		def WriteFPA22: SchedWriteRes<[JFPA]> {
		let Latency = 2;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteFPA22], (instregex "VCMPP(S\|D)Yrri", "VM(AX\|IN)P(D\|S)Yrr")>;

		def WriteFPA22Ld: SchedWriteRes<[JFPA]> {
		let Latency = 7;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteFPA22Ld, ReadAfterLd], (instregex "VCMPP(S\|D)Yrmi", "VM(AX\|IN)P(D\|S)Yrm")>;

		def WriteExtr128: SchedWriteRes<[JALU01]> {
		let Latency = 1;
		let ResourceCycles = [1];
		}
		def : InstRW<[WriteExtr128], (instregex "VEXTRACTF128rr")>;

		def WriteExtr128Ld: SchedWriteRes<[JALU01]> {
		let Latency = 6;
		let ResourceCycles = [1];
		}
		def : InstRW<[WriteExtr128Ld], (instregex "VEXTRACTF128mr")>;

		def WriteVHAddSub: SchedWriteRes<[JFPA]> {
		let Latency = 3;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVHAddSub], (instregex "VH(ADD\|SUB)P(D\|S)Yrr")>;

		def WriteVHAddSubLd: SchedWriteRes<[JFPA]> {
		let Latency = 8;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVHAddSubLd], (instregex "VH(ADD\|SUB)P(D\|S)Yrm")>;

		def WriteVMaskMovY: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 6;
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVMaskMovY], (instregex "VMASKMOVP(D\|S)Yrm")>;

		def WriteVMaskMovYLd: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 11;
		let ResourceCycles = [4];
		}
		def : InstRW<[WriteVMaskMovYLd], (instregex "VMASKMOVP(D\|S)Ymr")>;

		def WriteVMaskMov: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 6;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVMaskMov], (instregex "VMASKMOVP(D\|S)rm")>;

		def WriteVMaskMovLd: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 11;
		let ResourceCycles = [2];
		}
		def : InstRW<[WriteVMaskMovLd], (instregex "VMASKMOVP(D\|S)mr")>;

		// TODO: In fact we have latency '2+i'. The +i represents an additional 1 cycle transfer
		// operation which moves the floating point result to the integer unit. During this
		// additional cycle the floating point unit execution resources are not occupied
		// and ALU0 in the integer unit is occupied instead.
		def WriteVMOVMSK: SchedWriteRes<[JFPA]> {
		let Latency = 3;
		let ResourceCycles = [1];
		}
		def : InstRW<[WriteVMOVMSK], (instregex "VMOVMSKP(D\|S)Yrr", "VTESTP(S\|D)rr")>;

		def WriteVTESTLd: SchedWriteRes<[JFPA]> {
		let Latency = 8;
		let ResourceCycles = [1];
		}
		def : InstRW<[WriteVTESTLd], (instregex "VTESTP(S\|D)rm")>;

		// TODO: In fact we have latency '3+i'. The +i represents an additional 1 cycle transfer
		// operation which moves the floating point result to the integer unit. During this
		// additional cycle the floating point unit execution resources are not occupied
		// and ALU0 in the integer unit is occupied instead.
		def WriteVTESTY: SchedWriteRes<[JFPFltCluster, JFPA]> {
		let Latency = 4;
		let ResourceCycles = [2, 2];
		}
		def : InstRW<[WriteVMOVMSK], (instregex "VTESTP(S\|D)Yrr")>;

		def WriteVTESTYLd: SchedWriteRes<[JFPFltCluster, JFPA]> {
		let Latency = 9;
		let ResourceCycles = [4, 2];
		}
		def : InstRW<[WriteVTESTYLd], (instregex "VTESTP(S\|D)Yrm")>;

		def WriteVPermilP: SchedWriteRes<[JFPFltCluster]> {
		let Latency = 1;
		let ResourceCycles = [1];
		}
		def : InstRW<[WriteVMaskMov], (instregex "VPERMILP(D\|S)ri")>;

		def WriteVSQRTPD: SchedWriteRes<[JFPM]> {
		let Latency = 54;
		let ResourceCycles = [54];
		}
		def : InstRW<[WriteVSQRTPD], (instregex "VSQRTPDYr")>;

		def WriteVSQRTPDLd: SchedWriteRes<[JFPM]> {
		let Latency = 59;
		let ResourceCycles = [54];
		}
		def : InstRW<[WriteVSQRTPDLd], (instregex "VSQRTPDYm")>;

		def WriteVSQRTPS: SchedWriteRes<[JFPM]> {
		let Latency = 42;
		let ResourceCycles = [42];
		}
		def : InstRW<[WriteVSQRTPD], (instregex "VSQRTPSYr")>;

		def WriteVSQRTPSLd: SchedWriteRes<[JFPM]> {
		let Latency = 47;
		let ResourceCycles = [42];
		}
		def : InstRW<[WriteVSQRTPSLd], (instregex "VSQRTPSYm")>;

		def WriteJVZEROALL: SchedWriteRes<[]> {
		let Latency = 90;
		let NumMicroOps = 73;
		}
		def : InstRW<[WriteJVZEROALL], (instregex "VZEROALL")>;

		def WriteJVZEROUPPER: SchedWriteRes<[]> {
		let Latency = 46;
		let NumMicroOps = 37;
		}
		def : InstRW<[WriteJVZEROUPPER], (instregex "VZEROUPPER")>;

} // SchedModel		} // SchedModel

test/CodeGen/X86/avx-schedule.ll

	Show All 15 Lines
	; HASWELL-LABEL: test_addpd:			; HASWELL-LABEL: test_addpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_addpd:			; BTVER2-LABEL: test_addpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_addpd:			; ZNVER1-LABEL: test_addpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vaddpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fadd <4 x double> %a0, %a1			%1 = fadd <4 x double> %a0, %a1
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fadd <4 x double> %1, %2			%3 = fadd <4 x double> %1, %2
	ret <4 x double> %3			ret <4 x double> %3
	}			}

	define <8 x float> @test_addps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_addps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_addps:			; SANDY-LABEL: test_addps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_addps:			; HASWELL-LABEL: test_addps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_addps:			; BTVER2-LABEL: test_addps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_addps:			; ZNVER1-LABEL: test_addps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vaddps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fadd <8 x float> %a0, %a1			%1 = fadd <8 x float> %a0, %a1
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = fadd <8 x float> %1, %2			%3 = fadd <8 x float> %1, %2
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define <4 x double> @test_addsubpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_addsubpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_addsubpd:			; SANDY-LABEL: test_addsubpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_addsubpd:			; HASWELL-LABEL: test_addsubpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_addsubpd:			; BTVER2-LABEL: test_addsubpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_addsubpd:			; ZNVER1-LABEL: test_addsubpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vaddsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double> %1, <4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double> %1, <4 x double> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.addsub.pd.256(<4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_addsubps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_addsubps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_addsubps:			; SANDY-LABEL: test_addsubps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_addsubps:			; HASWELL-LABEL: test_addsubps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_addsubps:			; BTVER2-LABEL: test_addsubps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_addsubps:			; ZNVER1-LABEL: test_addsubps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vaddsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float> %1, <8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float> %1, <8 x float> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.addsub.ps.256(<8 x float>, <8 x float>) nounwind readnone

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_andnotpd:			; BTVER2-LABEL: test_andnotpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_andnotpd:			; ZNVER1-LABEL: test_andnotpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vandnpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vandnpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <4 x double> %a0 to <4 x i64>			%1 = bitcast <4 x double> %a0 to <4 x i64>
	%2 = bitcast <4 x double> %a1 to <4 x i64>			%2 = bitcast <4 x double> %a1 to <4 x i64>
	%3 = xor <4 x i64> %1, <i64 -1, i64 -1, i64 -1, i64 -1>			%3 = xor <4 x i64> %1, <i64 -1, i64 -1, i64 -1, i64 -1>
	%4 = and <4 x i64> %3, %2			%4 = and <4 x i64> %3, %2
	%5 = load <4 x double>, <4 x double> *%a2, align 32			%5 = load <4 x double>, <4 x double> *%a2, align 32
	%6 = bitcast <4 x double> %5 to <4 x i64>			%6 = bitcast <4 x double> %5 to <4 x i64>
	%7 = xor <4 x i64> %4, <i64 -1, i64 -1, i64 -1, i64 -1>			%7 = xor <4 x i64> %4, <i64 -1, i64 -1, i64 -1, i64 -1>
	Show All 15 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_andnotps:			; BTVER2-LABEL: test_andnotps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_andnotps:			; ZNVER1-LABEL: test_andnotps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vandnps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vandnps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <8 x float> %a0 to <4 x i64>			%1 = bitcast <8 x float> %a0 to <4 x i64>
	%2 = bitcast <8 x float> %a1 to <4 x i64>			%2 = bitcast <8 x float> %a1 to <4 x i64>
	%3 = xor <4 x i64> %1, <i64 -1, i64 -1, i64 -1, i64 -1>			%3 = xor <4 x i64> %1, <i64 -1, i64 -1, i64 -1, i64 -1>
	%4 = and <4 x i64> %3, %2			%4 = and <4 x i64> %3, %2
	%5 = load <8 x float>, <8 x float> *%a2, align 32			%5 = load <8 x float>, <8 x float> *%a2, align 32
	%6 = bitcast <8 x float> %5 to <4 x i64>			%6 = bitcast <8 x float> %5 to <4 x i64>
	%7 = xor <4 x i64> %4, <i64 -1, i64 -1, i64 -1, i64 -1>			%7 = xor <4 x i64> %4, <i64 -1, i64 -1, i64 -1, i64 -1>
	Show All 15 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_andpd:			; BTVER2-LABEL: test_andpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_andpd:			; ZNVER1-LABEL: test_andpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vandpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vandpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <4 x double> %a0 to <4 x i64>			%1 = bitcast <4 x double> %a0 to <4 x i64>
	%2 = bitcast <4 x double> %a1 to <4 x i64>			%2 = bitcast <4 x double> %a1 to <4 x i64>
	%3 = and <4 x i64> %1, %2			%3 = and <4 x i64> %1, %2
	%4 = load <4 x double>, <4 x double> *%a2, align 32			%4 = load <4 x double>, <4 x double> *%a2, align 32
	%5 = bitcast <4 x double> %4 to <4 x i64>			%5 = bitcast <4 x double> %4 to <4 x i64>
	%6 = and <4 x i64> %3, %5			%6 = and <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <4 x double>			%7 = bitcast <4 x i64> %6 to <4 x double>
	Show All 13 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_andps:			; BTVER2-LABEL: test_andps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_andps:			; ZNVER1-LABEL: test_andps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vandps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vandps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <8 x float> %a0 to <4 x i64>			%1 = bitcast <8 x float> %a0 to <4 x i64>
	%2 = bitcast <8 x float> %a1 to <4 x i64>			%2 = bitcast <8 x float> %a1 to <4 x i64>
	%3 = and <4 x i64> %1, %2			%3 = and <4 x i64> %1, %2
	%4 = load <8 x float>, <8 x float> *%a2, align 32			%4 = load <8 x float>, <8 x float> *%a2, align 32
	%5 = bitcast <8 x float> %4 to <4 x i64>			%5 = bitcast <8 x float> %4 to <4 x i64>
	%6 = and <4 x i64> %3, %5			%6 = and <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <8 x float>			%7 = bitcast <4 x i64> %6 to <8 x float>
	Show All 13 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [1:0.33]			; HASWELL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [1:0.33]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [5:0.50]			; HASWELL-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [5:0.50]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_blendpd:			; BTVER2-LABEL: test_blendpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [1:0.50]			; BTVER2-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [6:1.00]			; BTVER2-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendpd:			; ZNVER1-LABEL: test_blendpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [1:0.50]			; ZNVER1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3] sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [6:1.00]			; ZNVER1-NEXT: vblendpd {{.*#+}} ymm0 = ymm0[0],mem[1,2],ymm0[3] sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 0, i32 5, i32 6, i32 3>			%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fadd <4 x double> %a1, %1			%3 = fadd <4 x double> %a1, %1
	%4 = shufflevector <4 x double> %3, <4 x double> %2, <4 x i32> <i32 0, i32 5, i32 6, i32 3>			%4 = shufflevector <4 x double> %3, <4 x double> %2, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	define <8 x float> @test_blendps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_blendps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_blendps:			; SANDY-LABEL: test_blendps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.50]			; SANDY-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.50]
	; SANDY-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [5:0.50]			; SANDY-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [5:0.50]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_blendps:			; HASWELL-LABEL: test_blendps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.33]			; HASWELL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.33]
	; HASWELL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [5:0.50]			; HASWELL-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [5:0.50]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_blendps:			; BTVER2-LABEL: test_blendps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.50]			; BTVER2-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [6:1.00]
	; BTVER2-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [6:1.00]			; BTVER2-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendps:			; ZNVER1-LABEL: test_blendps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [1:0.50]			; ZNVER1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3,4,5,6,7] sched: [6:1.00]
	; ZNVER1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [6:1.00]			; ZNVER1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],mem[2],ymm0[3],mem[4,5,6],ymm0[7] sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 5, i32 6, i32 7>			%1 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> <i32 0, i32 9, i32 10, i32 3, i32 4, i32 5, i32 6, i32 7>
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = shufflevector <8 x float> %1, <8 x float> %2, <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 12, i32 13, i32 14, i32 7>			%3 = shufflevector <8 x float> %1, <8 x float> %2, <8 x i32> <i32 0, i32 1, i32 10, i32 3, i32 12, i32 13, i32 14, i32 7>
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define <4 x double> @test_blendvpd(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2, <4 x double> *%a3) {			define <4 x double> @test_blendvpd(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2, <4 x double> *%a3) {
	; SANDY-LABEL: test_blendvpd:			; SANDY-LABEL: test_blendvpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; SANDY-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; SANDY-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; SANDY-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_blendvpd:			; HASWELL-LABEL: test_blendvpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:2.00]			; HASWELL-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; HASWELL-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:2.00]			; HASWELL-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_blendvpd:			; BTVER2-LABEL: test_blendvpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; BTVER2-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [8:3.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendvpd:			; ZNVER1-LABEL: test_blendvpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vblendvpd %ymm2, %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; ZNVER1-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; ZNVER1-NEXT: vblendvpd %ymm2, (%rdi), %ymm0, %ymm0 # sched: [8:3.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2)			%1 = call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double> %a0, <4 x double> %a1, <4 x double> %a2)
	%2 = load <4 x double>, <4 x double> *%a3, align 32			%2 = load <4 x double>, <4 x double> *%a3, align 32
	%3 = call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double> %1, <4 x double> %2, <4 x double> %a2)			%3 = call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double> %1, <4 x double> %2, <4 x double> %a2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_blendvps(<8 x float> %a0, <8 x float> %a1, <8 x float> %a2, <8 x float> *%a3) {			define <8 x float> @test_blendvps(<8 x float> %a0, <8 x float> %a1, <8 x float> %a2, <8 x float> *%a3) {
	; SANDY-LABEL: test_blendvps:			; SANDY-LABEL: test_blendvps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; SANDY-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]
	; SANDY-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; SANDY-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_blendvps:			; HASWELL-LABEL: test_blendvps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:2.00]			; HASWELL-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; HASWELL-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:2.00]			; HASWELL-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [6:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_blendvps:			; BTVER2-LABEL: test_blendvps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; BTVER2-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [8:3.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_blendvps:			; ZNVER1-LABEL: test_blendvps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vblendvps %ymm2, %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; ZNVER1-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; ZNVER1-NEXT: vblendvps %ymm2, (%rdi), %ymm0, %ymm0 # sched: [8:3.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a0, <8 x float> %a1, <8 x float> %a2)			%1 = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a0, <8 x float> %a1, <8 x float> %a2)
	%2 = load <8 x float>, <8 x float> *%a3, align 32			%2 = load <8 x float>, <8 x float> *%a3, align 32
	%3 = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %1, <8 x float> %2, <8 x float> %a2)			%3 = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %1, <8 x float> %2, <8 x float> %a2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x float>, <8 x float>) nounwind readnone

	Show All 30 Lines
	;			;
	; HASWELL-LABEL: test_broadcastsd_ymm:			; HASWELL-LABEL: test_broadcastsd_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_broadcastsd_ymm:			; BTVER2-LABEL: test_broadcastsd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [6:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_broadcastsd_ymm:			; ZNVER1-LABEL: test_broadcastsd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vbroadcastsd (%rdi), %ymm0 # sched: [6:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load double, double *%a0, align 8			%1 = load double, double *%a0, align 8
	%2 = insertelement <4 x double> undef, double %1, i32 0			%2 = insertelement <4 x double> undef, double %1, i32 0
	%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> zeroinitializer			%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> zeroinitializer
	ret <4 x double> %3			ret <4 x double> %3
	}			}

	define <4 x float> @test_broadcastss(float *%a0) {			define <4 x float> @test_broadcastss(float *%a0) {
	; SANDY-LABEL: test_broadcastss:			; SANDY-LABEL: test_broadcastss:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [4:0.50]			; SANDY-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [4:0.50]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_broadcastss:			; HASWELL-LABEL: test_broadcastss:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [4:0.50]			; HASWELL-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [4:0.50]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_broadcastss:			; BTVER2-LABEL: test_broadcastss:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [5:1.00]			; BTVER2-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_broadcastss:			; ZNVER1-LABEL: test_broadcastss:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [5:1.00]			; ZNVER1-NEXT: vbroadcastss (%rdi), %xmm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load float, float *%a0, align 4			%1 = load float, float *%a0, align 4
	%2 = insertelement <4 x float> undef, float %1, i32 0			%2 = insertelement <4 x float> undef, float %1, i32 0
	%3 = shufflevector <4 x float> %2, <4 x float> undef, <4 x i32> zeroinitializer			%3 = shufflevector <4 x float> %2, <4 x float> undef, <4 x i32> zeroinitializer
	ret <4 x float> %3			ret <4 x float> %3
	}			}

	define <8 x float> @test_broadcastss_ymm(float *%a0) {			define <8 x float> @test_broadcastss_ymm(float *%a0) {
	; SANDY-LABEL: test_broadcastss_ymm:			; SANDY-LABEL: test_broadcastss_ymm:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_broadcastss_ymm:			; HASWELL-LABEL: test_broadcastss_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_broadcastss_ymm:			; BTVER2-LABEL: test_broadcastss_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [6:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_broadcastss_ymm:			; ZNVER1-LABEL: test_broadcastss_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vbroadcastss (%rdi), %ymm0 # sched: [6:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load float, float *%a0, align 4			%1 = load float, float *%a0, align 4
	%2 = insertelement <8 x float> undef, float %1, i32 0			%2 = insertelement <8 x float> undef, float %1, i32 0
	%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> zeroinitializer			%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> zeroinitializer
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define <4 x double> @test_cmppd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_cmppd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_cmppd:			; SANDY-LABEL: test_cmppd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; SANDY-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]
	; SANDY-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:0.33]			; SANDY-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:0.33]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_cmppd:			; HASWELL-LABEL: test_cmppd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; HASWELL-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]
	; HASWELL-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cmppd:			; BTVER2-LABEL: test_cmppd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; BTVER2-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [2:2.00]
	; BTVER2-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cmppd:			; ZNVER1-LABEL: test_cmppd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; ZNVER1-NEXT: vcmpeqpd %ymm1, %ymm0, %ymm1 # sched: [2:2.00]
	; ZNVER1-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vcmpeqpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vorpd %ymm0, %ymm1, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fcmp oeq <4 x double> %a0, %a1			%1 = fcmp oeq <4 x double> %a0, %a1
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fcmp oeq <4 x double> %a0, %2			%3 = fcmp oeq <4 x double> %a0, %2
	%4 = sext <4 x i1> %1 to <4 x i64>			%4 = sext <4 x i1> %1 to <4 x i64>
	%5 = sext <4 x i1> %3 to <4 x i64>			%5 = sext <4 x i1> %3 to <4 x i64>
	%6 = or <4 x i64> %4, %5			%6 = or <4 x i64> %4, %5
	%7 = bitcast <4 x i64> %6 to <4 x double>			%7 = bitcast <4 x i64> %6 to <4 x double>
	Show All 12 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; HASWELL-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [3:1.00]
	; HASWELL-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cmpps:			; BTVER2-LABEL: test_cmpps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; BTVER2-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [2:2.00]
	; BTVER2-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cmpps:			; ZNVER1-LABEL: test_cmpps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [3:1.00]			; ZNVER1-NEXT: vcmpeqps %ymm1, %ymm0, %ymm1 # sched: [2:2.00]
	; ZNVER1-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vcmpeqps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vorps %ymm0, %ymm1, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fcmp oeq <8 x float> %a0, %a1			%1 = fcmp oeq <8 x float> %a0, %a1
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = fcmp oeq <8 x float> %a0, %2			%3 = fcmp oeq <8 x float> %a0, %2
	%4 = sext <8 x i1> %1 to <8 x i32>			%4 = sext <8 x i1> %1 to <8 x i32>
	%5 = sext <8 x i1> %3 to <8 x i32>			%5 = sext <8 x i1> %3 to <8 x i32>
	%6 = or <8 x i32> %4, %5			%6 = or <8 x i32> %4, %5
	%7 = bitcast <8 x i32> %6 to <8 x float>			%7 = bitcast <8 x i32> %6 to <8 x float>
	Show All 12 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [6:1.00]			; HASWELL-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [6:1.00]
	; HASWELL-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:1.00]			; HASWELL-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:1.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cvtdq2pd:			; BTVER2-LABEL: test_cvtdq2pd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:1.00]			; BTVER2-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:2.00]
	; BTVER2-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cvtdq2pd:			; ZNVER1-LABEL: test_cvtdq2pd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:1.00]			; ZNVER1-NEXT: vcvtdq2pd (%rdi), %ymm1 # sched: [8:2.00]
	; ZNVER1-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vcvtdq2pd %xmm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = sitofp <4 x i32> %a0 to <4 x double>			%1 = sitofp <4 x i32> %a0 to <4 x double>
	%2 = load <4 x i32>, <4 x i32> *%a1, align 16			%2 = load <4 x i32>, <4 x i32> *%a1, align 16
	%3 = sitofp <4 x i32> %2 to <4 x double>			%3 = sitofp <4 x i32> %2 to <4 x double>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	Show All 11 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [4:1.00]			; HASWELL-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [4:1.00]
	; HASWELL-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:1.00]			; HASWELL-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:1.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cvtdq2ps:			; BTVER2-LABEL: test_cvtdq2ps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:1.00]			; BTVER2-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:2.00]
	; BTVER2-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cvtdq2ps:			; ZNVER1-LABEL: test_cvtdq2ps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:1.00]			; ZNVER1-NEXT: vcvtdq2ps (%rdi), %ymm1 # sched: [8:2.00]
	; ZNVER1-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vcvtdq2ps %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = sitofp <8 x i32> %a0 to <8 x float>			%1 = sitofp <8 x i32> %a0 to <8 x float>
	%2 = load <8 x i32>, <8 x i32> *%a1, align 16			%2 = load <8 x i32>, <8 x i32> *%a1, align 16
	%3 = sitofp <8 x i32> %2 to <8 x float>			%3 = sitofp <8 x i32> %2 to <8 x float>
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}

	Show All 11 Lines
	; HASWELL-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [10:1.00]			; HASWELL-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [10:1.00]
	; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cvtpd2dq:			; BTVER2-LABEL: test_cvtpd2dq:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [8:1.00]			; BTVER2-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [8:1.00]
	; BTVER2-NEXT: vcvttpd2dq %ymm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vcvttpd2dq %ymm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cvtpd2dq:			; ZNVER1-LABEL: test_cvtpd2dq:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [8:1.00]			; ZNVER1-NEXT: vcvttpd2dqy (%rdi), %xmm1 # sched: [8:1.00]
	; ZNVER1-NEXT: vcvttpd2dq %ymm0, %xmm0 # sched: [3:1.00]			; ZNVER1-NEXT: vcvttpd2dq %ymm0, %xmm0 # sched: [3:1.00]
	; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fptosi <4 x double> %a0 to <4 x i32>			%1 = fptosi <4 x double> %a0 to <4 x i32>
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = fptosi <4 x double> %2 to <4 x i32>			%3 = fptosi <4 x double> %2 to <4 x i32>
	%4 = shufflevector <4 x i32> %1, <4 x i32> %3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%4 = shufflevector <4 x i32> %1, <4 x i32> %3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i32> %4			ret <8 x i32> %4
	}			}

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [5:1.00]			; HASWELL-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [5:1.00]
	; HASWELL-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [9:1.00]			; HASWELL-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [9:1.00]
	; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cvtpd2ps:			; BTVER2-LABEL: test_cvtpd2ps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [8:1.00]			; BTVER2-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [11:2.00]
	; BTVER2-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [6:2.00]
	; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cvtpd2ps:			; ZNVER1-LABEL: test_cvtpd2ps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [8:1.00]			; ZNVER1-NEXT: vcvtpd2psy (%rdi), %xmm1 # sched: [11:2.00]
	; ZNVER1-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [3:1.00]			; ZNVER1-NEXT: vcvtpd2ps %ymm0, %xmm0 # sched: [6:2.00]
	; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fptrunc <4 x double> %a0 to <4 x float>			%1 = fptrunc <4 x double> %a0 to <4 x float>
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = fptrunc <4 x double> %2 to <4 x float>			%3 = fptrunc <4 x double> %2 to <4 x float>
	%4 = shufflevector <4 x float> %1, <4 x float> %3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%4 = shufflevector <4 x float> %1, <4 x float> %3, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x float> %4			ret <8 x float> %4
	}			}

	Show All 11 Lines
	; HASWELL-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [7:1.00]			; HASWELL-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [7:1.00]
	; HASWELL-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_cvtps2dq:			; BTVER2-LABEL: test_cvtps2dq:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [8:1.00]			; BTVER2-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [8:1.00]
	; BTVER2-NEXT: vcvttps2dq %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vcvttps2dq %ymm0, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_cvtps2dq:			; ZNVER1-LABEL: test_cvtps2dq:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [8:1.00]			; ZNVER1-NEXT: vcvttps2dq (%rdi), %ymm1 # sched: [8:1.00]
	; ZNVER1-NEXT: vcvttps2dq %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vcvttps2dq %ymm0, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fptosi <8 x float> %a0 to <8 x i32>			%1 = fptosi <8 x float> %a0 to <8 x i32>
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = fptosi <8 x float> %2 to <8 x i32>			%3 = fptosi <8 x float> %2 to <8 x i32>
	%4 = or <8 x i32> %1, %3			%4 = or <8 x i32> %1, %3
	ret <8 x i32> %4			ret <8 x i32> %4
	}			}

	define <4 x double> @test_divpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_divpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_divpd:			; SANDY-LABEL: test_divpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [12:1.00]			; SANDY-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [12:1.00]
	; SANDY-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [16:1.00]			; SANDY-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [16:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_divpd:			; HASWELL-LABEL: test_divpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [27:2.00]			; HASWELL-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [27:2.00]
	; HASWELL-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [31:2.00]			; HASWELL-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [31:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_divpd:			; BTVER2-LABEL: test_divpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [19:19.00]			; BTVER2-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [38:38.00]
	; BTVER2-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [24:19.00]			; BTVER2-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [43:38.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_divpd:			; ZNVER1-LABEL: test_divpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [19:19.00]			; ZNVER1-NEXT: vdivpd %ymm1, %ymm0, %ymm0 # sched: [38:38.00]
	; ZNVER1-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [24:19.00]			; ZNVER1-NEXT: vdivpd (%rdi), %ymm0, %ymm0 # sched: [43:38.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fdiv <4 x double> %a0, %a1			%1 = fdiv <4 x double> %a0, %a1
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fdiv <4 x double> %1, %2			%3 = fdiv <4 x double> %1, %2
	ret <4 x double> %3			ret <4 x double> %3
	}			}

	define <8 x float> @test_divps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_divps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_divps:			; SANDY-LABEL: test_divps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [12:1.00]			; SANDY-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [12:1.00]
	; SANDY-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [16:1.00]			; SANDY-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [16:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_divps:			; HASWELL-LABEL: test_divps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [19:2.00]			; HASWELL-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [19:2.00]
	; HASWELL-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [23:2.00]			; HASWELL-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [23:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_divps:			; BTVER2-LABEL: test_divps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [19:19.00]			; BTVER2-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [38:38.00]
	; BTVER2-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [24:19.00]			; BTVER2-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [43:38.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_divps:			; ZNVER1-LABEL: test_divps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [19:19.00]			; ZNVER1-NEXT: vdivps %ymm1, %ymm0, %ymm0 # sched: [38:38.00]
	; ZNVER1-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [24:19.00]			; ZNVER1-NEXT: vdivps (%rdi), %ymm0, %ymm0 # sched: [43:38.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fdiv <8 x float> %a0, %a1			%1 = fdiv <8 x float> %a0, %a1
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = fdiv <8 x float> %1, %2			%3 = fdiv <8 x float> %1, %2
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define <8 x float> @test_dpps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_dpps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_dpps:			; SANDY-LABEL: test_dpps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_dpps:			; HASWELL-LABEL: test_dpps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [14:2.00]			; HASWELL-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [14:2.00]
	; HASWELL-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [18:2.00]			; HASWELL-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [18:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_dpps:			; BTVER2-LABEL: test_dpps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [12:6.00]
	; BTVER2-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [17:6.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_dpps:			; ZNVER1-LABEL: test_dpps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vdpps $7, %ymm1, %ymm0, %ymm0 # sched: [12:6.00]
	; ZNVER1-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vdpps $7, (%rdi), %ymm0, %ymm0 # sched: [17:6.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 7)			%1 = call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 7)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> %1, <8 x float> %2, i8 7)			%3 = call <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float> %1, <8 x float> %2, i8 7)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float>, <8 x float>, i8) nounwind readnone			declare <8 x float> @llvm.x86.avx.dp.ps.256(<8 x float>, <8 x float>, i8) nounwind readnone

	define <4 x float> @test_extractf128(<8 x float> %a0, <8 x float> %a1, <4 x float> *%a2) {			define <4 x float> @test_extractf128(<8 x float> %a0, <8 x float> %a1, <4 x float> *%a2) {
	; SANDY-LABEL: test_extractf128:			; SANDY-LABEL: test_extractf128:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:1.00]			; SANDY-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:1.00]
	; SANDY-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [1:1.00]			; SANDY-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [1:1.00]
	; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]			; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_extractf128:			; HASWELL-LABEL: test_extractf128:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [4:1.00]			; HASWELL-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [4:1.00]
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_extractf128:			; BTVER2-LABEL: test_extractf128:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:0.50]
	; BTVER2-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [1:1.00]			; BTVER2-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [6:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_extractf128:			; ZNVER1-LABEL: test_extractf128:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vextractf128 $1, %ymm0, %xmm0 # sched: [1:0.50]
	; ZNVER1-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [1:1.00]			; ZNVER1-NEXT: vextractf128 $1, %ymm1, (%rdi) # sched: [6:0.50]
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>			%1 = shufflevector <8 x float> %a0, <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
	%2 = shufflevector <8 x float> %a1, <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>			%2 = shufflevector <8 x float> %a1, <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
	store <4 x float> %2, <4 x float> *%a2			store <4 x float> %2, <4 x float> *%a2
	ret <4 x float> %1			ret <4 x float> %1
	}			}

	define <4 x double> @test_haddpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_haddpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_haddpd:			; SANDY-LABEL: test_haddpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_haddpd:			; HASWELL-LABEL: test_haddpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [5:2.00]			; HASWELL-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [5:2.00]
	; HASWELL-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [9:2.00]			; HASWELL-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [9:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_haddpd:			; BTVER2-LABEL: test_haddpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_haddpd:			; ZNVER1-LABEL: test_haddpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vhaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vhaddpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %1, <4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %1, <4 x double> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_haddps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_haddps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_haddps:			; SANDY-LABEL: test_haddps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_haddps:			; HASWELL-LABEL: test_haddps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [5:2.00]			; HASWELL-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [5:2.00]
	; HASWELL-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [9:2.00]			; HASWELL-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [9:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_haddps:			; BTVER2-LABEL: test_haddps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_haddps:			; ZNVER1-LABEL: test_haddps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vhaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vhaddps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %1, <8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float> %1, <8 x float> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.hadd.ps.256(<8 x float>, <8 x float>) nounwind readnone

	define <4 x double> @test_hsubpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_hsubpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_hsubpd:			; SANDY-LABEL: test_hsubpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_hsubpd:			; HASWELL-LABEL: test_hsubpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [5:2.00]			; HASWELL-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [5:2.00]
	; HASWELL-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [9:2.00]			; HASWELL-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [9:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_hsubpd:			; BTVER2-LABEL: test_hsubpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_hsubpd:			; ZNVER1-LABEL: test_hsubpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vhsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vhsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double> %1, <4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double> %1, <4 x double> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.hsub.pd.256(<4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_hsubps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_hsubps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_hsubps:			; SANDY-LABEL: test_hsubps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_hsubps:			; HASWELL-LABEL: test_hsubps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [5:2.00]			; HASWELL-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [5:2.00]
	; HASWELL-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [9:2.00]			; HASWELL-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [9:2.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_hsubps:			; BTVER2-LABEL: test_hsubps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_hsubps:			; ZNVER1-LABEL: test_hsubps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vhsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vhsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float> %1, <8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float> %1, <8 x float> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.hsub.ps.256(<8 x float>, <8 x float>) nounwind readnone

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [3:1.00]			; HASWELL-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [3:1.00]
	; HASWELL-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_insertf128:			; BTVER2-LABEL: test_insertf128:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [1:0.50]			; BTVER2-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [6:1.00]
	; BTVER2-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_insertf128:			; ZNVER1-LABEL: test_insertf128:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [1:0.50]			; ZNVER1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1 # sched: [6:1.00]
	; ZNVER1-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vinsertf128 $1, (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a1, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%1 = shufflevector <4 x float> %a1, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	%2 = shufflevector <8 x float> %a0, <8 x float> %1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>			%2 = shufflevector <8 x float> %a0, <8 x float> %1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	%3 = load <4 x float>, <4 x float> *%a2, align 16			%3 = load <4 x float>, <4 x float> *%a2, align 16
	%4 = shufflevector <4 x float> %3, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			%4 = shufflevector <4 x float> %3, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	%5 = shufflevector <8 x float> %a0, <8 x float> %4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>			%5 = shufflevector <8 x float> %a0, <8 x float> %4, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	%6 = fadd <8 x float> %2, %5			%6 = fadd <8 x float> %2, %5
	ret <8 x float> %6			ret <8 x float> %6
	Show All 36 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [4:2.00]			; HASWELL-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [4:2.00]
	; HASWELL-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [13:1.00]			; HASWELL-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [13:1.00]
	; HASWELL-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:1.00]			; HASWELL-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maskmovpd:			; BTVER2-LABEL: test_maskmovpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [6:1.00]
	; BTVER2-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [11:1.00]
	; BTVER2-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maskmovpd:			; ZNVER1-LABEL: test_maskmovpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovpd (%rdi), %xmm0, %xmm2 # sched: [6:1.00]
	; ZNVER1-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovpd %xmm1, %xmm0, (%rdi) # sched: [11:1.00]
	; ZNVER1-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vmovapd %xmm2, %xmm0 # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <2 x double> @llvm.x86.avx.maskload.pd(i8* %a0, <2 x i64> %a1)			%1 = call <2 x double> @llvm.x86.avx.maskload.pd(i8* %a0, <2 x i64> %a1)
	call void @llvm.x86.avx.maskstore.pd(i8* %a0, <2 x i64> %a1, <2 x double> %a2)			call void @llvm.x86.avx.maskstore.pd(i8* %a0, <2 x i64> %a1, <2 x double> %a2)
	ret <2 x double> %1			ret <2 x double> %1
	}			}
	declare <2 x double> @llvm.x86.avx.maskload.pd(i8*, <2 x i64>) nounwind readonly			declare <2 x double> @llvm.x86.avx.maskload.pd(i8*, <2 x i64>) nounwind readonly
	declare void @llvm.x86.avx.maskstore.pd(i8*, <2 x i64>, <2 x double>) nounwind			declare void @llvm.x86.avx.maskstore.pd(i8*, <2 x i64>, <2 x double>) nounwind
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [4:2.00]			; HASWELL-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [4:2.00]
	; HASWELL-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [14:1.00]			; HASWELL-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [14:1.00]
	; HASWELL-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maskmovpd_ymm:			; BTVER2-LABEL: test_maskmovpd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [6:2.00]
	; BTVER2-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [11:2.00]
	; BTVER2-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maskmovpd_ymm:			; ZNVER1-LABEL: test_maskmovpd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovpd (%rdi), %ymm0, %ymm2 # sched: [6:2.00]
	; ZNVER1-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovpd %ymm1, %ymm0, (%rdi) # sched: [11:2.00]
	; ZNVER1-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vmovapd %ymm2, %ymm0 # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8* %a0, <4 x i64> %a1)			%1 = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8* %a0, <4 x i64> %a1)
	call void @llvm.x86.avx.maskstore.pd.256(i8* %a0, <4 x i64> %a1, <4 x double> %a2)			call void @llvm.x86.avx.maskstore.pd.256(i8* %a0, <4 x i64> %a1, <4 x double> %a2)
	ret <4 x double> %1			ret <4 x double> %1
	}			}
	declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8*, <4 x i64>) nounwind readonly			declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8*, <4 x i64>) nounwind readonly
	declare void @llvm.x86.avx.maskstore.pd.256(i8*, <4 x i64>, <4 x double>) nounwind			declare void @llvm.x86.avx.maskstore.pd.256(i8*, <4 x i64>, <4 x double>) nounwind
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [4:2.00]			; HASWELL-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [4:2.00]
	; HASWELL-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [13:1.00]			; HASWELL-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [13:1.00]
	; HASWELL-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:1.00]			; HASWELL-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maskmovps:			; BTVER2-LABEL: test_maskmovps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [6:1.00]
	; BTVER2-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [11:1.00]
	; BTVER2-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maskmovps:			; ZNVER1-LABEL: test_maskmovps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovps (%rdi), %xmm0, %xmm2 # sched: [6:1.00]
	; ZNVER1-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovps %xmm1, %xmm0, (%rdi) # sched: [11:1.00]
	; ZNVER1-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vmovaps %xmm2, %xmm0 # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x float> @llvm.x86.avx.maskload.ps(i8* %a0, <4 x i32> %a1)			%1 = call <4 x float> @llvm.x86.avx.maskload.ps(i8* %a0, <4 x i32> %a1)
	call void @llvm.x86.avx.maskstore.ps(i8* %a0, <4 x i32> %a1, <4 x float> %a2)			call void @llvm.x86.avx.maskstore.ps(i8* %a0, <4 x i32> %a1, <4 x float> %a2)
	ret <4 x float> %1			ret <4 x float> %1
	}			}
	declare <4 x float> @llvm.x86.avx.maskload.ps(i8*, <4 x i32>) nounwind readonly			declare <4 x float> @llvm.x86.avx.maskload.ps(i8*, <4 x i32>) nounwind readonly
	declare void @llvm.x86.avx.maskstore.ps(i8*, <4 x i32>, <4 x float>) nounwind			declare void @llvm.x86.avx.maskstore.ps(i8*, <4 x i32>, <4 x float>) nounwind
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [4:2.00]			; HASWELL-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [4:2.00]
	; HASWELL-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [14:1.00]			; HASWELL-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [14:1.00]
	; HASWELL-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maskmovps_ymm:			; BTVER2-LABEL: test_maskmovps_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [6:2.00]
	; BTVER2-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [?:0.000000e+00]			; BTVER2-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [11:2.00]
	; BTVER2-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maskmovps_ymm:			; ZNVER1-LABEL: test_maskmovps_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovps (%rdi), %ymm0, %ymm2 # sched: [6:2.00]
	; ZNVER1-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vmaskmovps %ymm1, %ymm0, (%rdi) # sched: [11:2.00]
	; ZNVER1-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vmovaps %ymm2, %ymm0 # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8* %a0, <8 x i32> %a1)			%1 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8* %a0, <8 x i32> %a1)
	call void @llvm.x86.avx.maskstore.ps.256(i8* %a0, <8 x i32> %a1, <8 x float> %a2)			call void @llvm.x86.avx.maskstore.ps.256(i8* %a0, <8 x i32> %a1, <8 x float> %a2)
	ret <8 x float> %1			ret <8 x float> %1
	}			}
	declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8*, <8 x i32>) nounwind readonly			declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8*, <8 x i32>) nounwind readonly
	declare void @llvm.x86.avx.maskstore.ps.256(i8*, <8 x i32>, <8 x float>) nounwind			declare void @llvm.x86.avx.maskstore.ps.256(i8*, <8 x i32>, <8 x float>) nounwind

	define <4 x double> @test_maxpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_maxpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_maxpd:			; SANDY-LABEL: test_maxpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_maxpd:			; HASWELL-LABEL: test_maxpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maxpd:			; BTVER2-LABEL: test_maxpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maxpd:			; ZNVER1-LABEL: test_maxpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vmaxpd %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vmaxpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.max.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call <4 x double> @llvm.x86.avx.max.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.max.pd.256(<4 x double> %1, <4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.max.pd.256(<4 x double> %1, <4 x double> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.max.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.max.pd.256(<4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_maxps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_maxps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_maxps:			; SANDY-LABEL: test_maxps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_maxps:			; HASWELL-LABEL: test_maxps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_maxps:			; BTVER2-LABEL: test_maxps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_maxps:			; ZNVER1-LABEL: test_maxps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vmaxps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vmaxps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %1, <8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %1, <8 x float> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.max.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.max.ps.256(<8 x float>, <8 x float>) nounwind readnone

	define <4 x double> @test_minpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_minpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_minpd:			; SANDY-LABEL: test_minpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_minpd:			; HASWELL-LABEL: test_minpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_minpd:			; BTVER2-LABEL: test_minpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_minpd:			; ZNVER1-LABEL: test_minpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vminpd %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vminpd (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.min.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call <4 x double> @llvm.x86.avx.min.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.min.pd.256(<4 x double> %1, <4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.min.pd.256(<4 x double> %1, <4 x double> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.min.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.min.pd.256(<4 x double>, <4 x double>) nounwind readnone

	define <8 x float> @test_minps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_minps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_minps:			; SANDY-LABEL: test_minps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_minps:			; HASWELL-LABEL: test_minps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_minps:			; BTVER2-LABEL: test_minps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_minps:			; ZNVER1-LABEL: test_minps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vminps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vminps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.min.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call <8 x float> @llvm.x86.avx.min.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.min.ps.256(<8 x float> %1, <8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.min.ps.256(<8 x float> %1, <8 x float> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.min.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.min.ps.256(<8 x float>, <8 x float>) nounwind readnone

	Show All 10 Lines
	; HASWELL-NEXT: vmovapd (%rdi), %ymm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovapd (%rdi), %ymm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movapd:			; BTVER2-LABEL: test_movapd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovapd (%rdi), %ymm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovapd (%rdi), %ymm0 # sched: [5:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movapd:			; ZNVER1-LABEL: test_movapd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovapd (%rdi), %ymm0 # sched: [5:1.00]			; ZNVER1-NEXT: vmovapd (%rdi), %ymm0 # sched: [5:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovapd %ymm0, (%rsi) # sched: [1:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load <4 x double>, <4 x double> *%a0, align 32			%1 = load <4 x double>, <4 x double> *%a0, align 32
	%2 = fadd <4 x double> %1, %1			%2 = fadd <4 x double> %1, %1
	store <4 x double> %2, <4 x double> *%a1, align 32			store <4 x double> %2, <4 x double> *%a1, align 32
	ret <4 x double> %2			ret <4 x double> %2
	}			}

	Show All 10 Lines
	; HASWELL-NEXT: vmovaps (%rdi), %ymm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovaps (%rdi), %ymm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movaps:			; BTVER2-LABEL: test_movaps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps (%rdi), %ymm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovaps (%rdi), %ymm0 # sched: [5:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movaps:			; ZNVER1-LABEL: test_movaps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovaps (%rdi), %ymm0 # sched: [5:1.00]			; ZNVER1-NEXT: vmovaps (%rdi), %ymm0 # sched: [5:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovaps %ymm0, (%rsi) # sched: [1:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load <8 x float>, <8 x float> *%a0, align 32			%1 = load <8 x float>, <8 x float> *%a0, align 32
	%2 = fadd <8 x float> %1, %1			%2 = fadd <8 x float> %1, %1
	store <8 x float> %2, <8 x float> *%a1, align 32			store <8 x float> %2, <8 x float> *%a1, align 32
	ret <8 x float> %2			ret <8 x float> %2
	}			}

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [1:1.00]			; HASWELL-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [1:1.00]
	; HASWELL-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [4:0.50]			; HASWELL-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [4:0.50]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movddup:			; BTVER2-LABEL: test_movddup:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [5:1.00]			; BTVER2-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [6:1.00]
	; BTVER2-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [1:0.50]			; BTVER2-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movddup:			; ZNVER1-LABEL: test_movddup:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [5:1.00]			; ZNVER1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [6:1.00]
	; ZNVER1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2] sched: [1:0.50]			; ZNVER1-NEXT: vmovddup {{.*#+}} ymm1 = mem[0,0,2,2] sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>			%1 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>			%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	define i32 @test_movmskpd(<4 x double> %a0) {			define i32 @test_movmskpd(<4 x double> %a0) {
	; SANDY-LABEL: test_movmskpd:			; SANDY-LABEL: test_movmskpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovmskpd %ymm0, %eax # sched: [1:0.33]			; SANDY-NEXT: vmovmskpd %ymm0, %eax # sched: [1:0.33]
	; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]			; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_movmskpd:			; HASWELL-LABEL: test_movmskpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovmskpd %ymm0, %eax # sched: [2:1.00]			; HASWELL-NEXT: vmovmskpd %ymm0, %eax # sched: [2:1.00]
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movmskpd:			; BTVER2-LABEL: test_movmskpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovmskpd %ymm0, %eax # sched: [1:0.50]			; BTVER2-NEXT: vmovmskpd %ymm0, %eax # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movmskpd:			; ZNVER1-LABEL: test_movmskpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovmskpd %ymm0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: vmovmskpd %ymm0, %eax # sched: [3:1.00]
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %a0)			%1 = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %a0)
	ret i32 %1			ret i32 %1
	}			}
	declare i32 @llvm.x86.avx.movmsk.pd.256(<4 x double>) nounwind readnone			declare i32 @llvm.x86.avx.movmsk.pd.256(<4 x double>) nounwind readnone

	define i32 @test_movmskps(<8 x float> %a0) {			define i32 @test_movmskps(<8 x float> %a0) {
	; SANDY-LABEL: test_movmskps:			; SANDY-LABEL: test_movmskps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovmskps %ymm0, %eax # sched: [1:0.33]			; SANDY-NEXT: vmovmskps %ymm0, %eax # sched: [1:0.33]
	; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]			; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_movmskps:			; HASWELL-LABEL: test_movmskps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovmskps %ymm0, %eax # sched: [2:1.00]			; HASWELL-NEXT: vmovmskps %ymm0, %eax # sched: [2:1.00]
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movmskps:			; BTVER2-LABEL: test_movmskps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovmskps %ymm0, %eax # sched: [1:0.50]			; BTVER2-NEXT: vmovmskps %ymm0, %eax # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movmskps:			; ZNVER1-LABEL: test_movmskps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovmskps %ymm0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: vmovmskps %ymm0, %eax # sched: [3:1.00]
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %a0)			%1 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %a0)
	ret i32 %1			ret i32 %1
	}			}
	declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>) nounwind readnone			declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>) nounwind readnone

	define <4 x double> @test_movntpd(<4 x double> %a0, <4 x double> *%a1) {			define <4 x double> @test_movntpd(<4 x double> %a0, <4 x double> *%a1) {
	; SANDY-LABEL: test_movntpd:			; SANDY-LABEL: test_movntpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]			; SANDY-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_movntpd:			; HASWELL-LABEL: test_movntpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]			; HASWELL-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movntpd:			; BTVER2-LABEL: test_movntpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]			; BTVER2-NEXT: vmovntpd %ymm0, (%rdi) # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movntpd:			; ZNVER1-LABEL: test_movntpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovntpd %ymm0, (%rdi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovntpd %ymm0, (%rdi) # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fadd <4 x double> %a0, %a0			%1 = fadd <4 x double> %a0, %a0
	store <4 x double> %1, <4 x double> *%a1, align 32, !nontemporal !0			store <4 x double> %1, <4 x double> *%a1, align 32, !nontemporal !0
	ret <4 x double> %1			ret <4 x double> %1
	}			}

	define <8 x float> @test_movntps(<8 x float> %a0, <8 x float> *%a1) {			define <8 x float> @test_movntps(<8 x float> %a0, <8 x float> *%a1) {
	; SANDY-LABEL: test_movntps:			; SANDY-LABEL: test_movntps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]			; SANDY-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_movntps:			; HASWELL-LABEL: test_movntps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]			; HASWELL-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movntps:			; BTVER2-LABEL: test_movntps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]			; BTVER2-NEXT: vmovntps %ymm0, (%rdi) # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movntps:			; ZNVER1-LABEL: test_movntps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovntps %ymm0, (%rdi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovntps %ymm0, (%rdi) # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fadd <8 x float> %a0, %a0			%1 = fadd <8 x float> %a0, %a0
	store <8 x float> %1, <8 x float> *%a1, align 32, !nontemporal !0			store <8 x float> %1, <8 x float> *%a1, align 32, !nontemporal !0
	ret <8 x float> %1			ret <8 x float> %1
	}			}

	define <8 x float> @test_movshdup(<8 x float> %a0, <8 x float> *%a1) {			define <8 x float> @test_movshdup(<8 x float> %a0, <8 x float> *%a1) {
	; SANDY-LABEL: test_movshdup:			; SANDY-LABEL: test_movshdup:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:1.00]			; SANDY-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:1.00]
	; SANDY-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [4:0.50]			; SANDY-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [4:0.50]
	; SANDY-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_movshdup:			; HASWELL-LABEL: test_movshdup:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:1.00]			; HASWELL-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:1.00]
	; HASWELL-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [4:0.50]			; HASWELL-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [4:0.50]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movshdup:			; BTVER2-LABEL: test_movshdup:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [6:1.00]
	; BTVER2-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [5:1.00]			; BTVER2-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [5:1.00]
	; BTVER2-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:0.50]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movshdup:			; ZNVER1-LABEL: test_movshdup:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [6:1.00]
	; ZNVER1-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [5:1.00]			; ZNVER1-NEXT: vmovshdup {{.*#+}} ymm1 = mem[1,1,3,3,5,5,7,7] sched: [5:1.00]
	; ZNVER1-NEXT: vmovshdup {{.*#+}} ymm0 = ymm0[1,1,3,3,5,5,7,7] sched: [1:0.50]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 1, i32 1, i32 3, i32 3, i32 5, i32 5, i32 7, i32 7>			%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 1, i32 1, i32 3, i32 3, i32 5, i32 5, i32 7, i32 7>
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 1, i32 1, i32 3, i32 3, i32 5, i32 5, i32 7, i32 7>			%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 1, i32 1, i32 3, i32 3, i32 5, i32 5, i32 7, i32 7>
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [1:1.00]			; HASWELL-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [1:1.00]
	; HASWELL-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [4:0.50]			; HASWELL-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [4:0.50]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movsldup:			; BTVER2-LABEL: test_movsldup:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [6:1.00]
	; BTVER2-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [5:1.00]			; BTVER2-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [5:1.00]
	; BTVER2-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [1:0.50]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movsldup:			; ZNVER1-LABEL: test_movsldup:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [6:1.00]
	; ZNVER1-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [5:1.00]			; ZNVER1-NEXT: vmovsldup {{.*#+}} ymm1 = mem[0,0,2,2,4,4,6,6] sched: [5:1.00]
	; ZNVER1-NEXT: vmovsldup {{.*#+}} ymm0 = ymm0[0,0,2,2,4,4,6,6] sched: [1:0.50]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 0, i32 0, i32 2, i32 2, i32 4, i32 4, i32 6, i32 6>			%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 0, i32 0, i32 2, i32 2, i32 4, i32 4, i32 6, i32 6>
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 0, i32 0, i32 2, i32 2, i32 4, i32 4, i32 6, i32 6>			%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 0, i32 0, i32 2, i32 2, i32 4, i32 4, i32 6, i32 6>
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}

	Show All 11 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovupd (%rdi), %ymm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovupd (%rdi), %ymm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovupd %ymm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovupd %ymm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movupd:			; BTVER2-LABEL: test_movupd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovupd (%rdi), %ymm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovupd (%rdi), %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovupd %ymm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovupd %ymm0, (%rsi) # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movupd:			; ZNVER1-LABEL: test_movupd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovupd (%rdi), %ymm0 # sched: [5:1.00]			; ZNVER1-NEXT: vmovupd (%rdi), %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovupd %ymm0, (%rsi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovupd %ymm0, (%rsi) # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load <4 x double>, <4 x double> *%a0, align 1			%1 = load <4 x double>, <4 x double> *%a0, align 1
	%2 = fadd <4 x double> %1, %1			%2 = fadd <4 x double> %1, %1
	store <4 x double> %2, <4 x double> *%a1, align 1			store <4 x double> %2, <4 x double> *%a1, align 1
	ret <4 x double> %2			ret <4 x double> %2
	}			}

	define <8 x float> @test_movups(<8 x float> %a0, <8 x float> %a1) {			define <8 x float> @test_movups(<8 x float> %a0, <8 x float> %a1) {
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovups (%rdi), %ymm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovups (%rdi), %ymm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovups %ymm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovups %ymm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movups:			; BTVER2-LABEL: test_movups:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovups (%rdi), %ymm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovups (%rdi), %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmovups %ymm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovups %ymm0, (%rsi) # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_movups:			; ZNVER1-LABEL: test_movups:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmovups (%rdi), %ymm0 # sched: [5:1.00]			; ZNVER1-NEXT: vmovups (%rdi), %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vmovups %ymm0, (%rsi) # sched: [1:1.00]			; ZNVER1-NEXT: vmovups %ymm0, (%rsi) # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = load <8 x float>, <8 x float> *%a0, align 1			%1 = load <8 x float>, <8 x float> *%a0, align 1
	%2 = fadd <8 x float> %1, %1			%2 = fadd <8 x float> %1, %1
	store <8 x float> %2, <8 x float> *%a1, align 1			store <8 x float> %2, <8 x float> *%a1, align 1
	ret <8 x float> %2			ret <8 x float> %2
	}			}

	define <4 x double> @test_mulpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_mulpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_mulpd:			; SANDY-LABEL: test_mulpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:1.00]			; SANDY-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_mulpd:			; HASWELL-LABEL: test_mulpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:1.00]			; HASWELL-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_mulpd:			; BTVER2-LABEL: test_mulpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [4:4.00]
	; BTVER2-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:4.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_mulpd:			; ZNVER1-LABEL: test_mulpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vmulpd %ymm1, %ymm0, %ymm0 # sched: [4:4.00]
	; ZNVER1-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; ZNVER1-NEXT: vmulpd (%rdi), %ymm0, %ymm0 # sched: [9:4.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fmul <4 x double> %a0, %a1			%1 = fmul <4 x double> %a0, %a1
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fmul <4 x double> %1, %2			%3 = fmul <4 x double> %1, %2
	ret <4 x double> %3			ret <4 x double> %3
	}			}

	define <8 x float> @test_mulps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_mulps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_mulps:			; SANDY-LABEL: test_mulps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [9:1.00]			; SANDY-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_mulps:			; HASWELL-LABEL: test_mulps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [9:1.00]			; HASWELL-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [9:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_mulps:			; BTVER2-LABEL: test_mulps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_mulps:			; ZNVER1-LABEL: test_mulps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; ZNVER1-NEXT: vmulps (%rdi), %ymm0, %ymm0 # sched: [7:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fmul <8 x float> %a0, %a1			%1 = fmul <8 x float> %a0, %a1
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = fmul <8 x float> %1, %2			%3 = fmul <8 x float> %1, %2
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define <4 x double> @orpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @orpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: orpd:			; SANDY-LABEL: orpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.33]			; SANDY-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.33]
	; SANDY-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [5:0.50]			; SANDY-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [5:0.50]
	; SANDY-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: orpd:			; HASWELL-LABEL: orpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: orpd:			; BTVER2-LABEL: orpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: orpd:			; ZNVER1-LABEL: orpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vorpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <4 x double> %a0 to <4 x i64>			%1 = bitcast <4 x double> %a0 to <4 x i64>
	%2 = bitcast <4 x double> %a1 to <4 x i64>			%2 = bitcast <4 x double> %a1 to <4 x i64>
	%3 = or <4 x i64> %1, %2			%3 = or <4 x i64> %1, %2
	%4 = load <4 x double>, <4 x double> *%a2, align 32			%4 = load <4 x double>, <4 x double> *%a2, align 32
	%5 = bitcast <4 x double> %4 to <4 x i64>			%5 = bitcast <4 x double> %4 to <4 x i64>
	%6 = or <4 x i64> %3, %5			%6 = or <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <4 x double>			%7 = bitcast <4 x i64> %6 to <4 x double>
	Show All 13 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_orps:			; BTVER2-LABEL: test_orps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_orps:			; ZNVER1-LABEL: test_orps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <8 x float> %a0 to <4 x i64>			%1 = bitcast <8 x float> %a0 to <4 x i64>
	%2 = bitcast <8 x float> %a1 to <4 x i64>			%2 = bitcast <8 x float> %a1 to <4 x i64>
	%3 = or <4 x i64> %1, %2			%3 = or <4 x i64> %1, %2
	%4 = load <8 x float>, <8 x float> *%a2, align 32			%4 = load <8 x float>, <8 x float> *%a2, align 32
	%5 = bitcast <8 x float> %4 to <4 x i64>			%5 = bitcast <8 x float> %4 to <4 x i64>
	%6 = or <4 x i64> %3, %5			%6 = or <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <8 x float>			%7 = bitcast <4 x i64> %6 to <8 x float>
	Show All 13 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [1:1.00]			; HASWELL-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [1:1.00]
	; HASWELL-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [5:1.00]			; HASWELL-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilpd:			; BTVER2-LABEL: test_permilpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [6:1.00]
	; BTVER2-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [6:1.00]			; BTVER2-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [6:1.00]
	; BTVER2-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [1:0.50]
	; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilpd:			; ZNVER1-LABEL: test_permilpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [6:1.00]			; ZNVER1-NEXT: vpermilpd {{.*#+}} xmm1 = mem[1,0] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0] sched: [1:0.50]
	; ZNVER1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> <i32 1, i32 0>			%1 = shufflevector <2 x double> %a0, <2 x double> undef, <2 x i32> <i32 1, i32 0>
	%2 = load <2 x double>, <2 x double> *%a1, align 16			%2 = load <2 x double>, <2 x double> *%a1, align 16
	%3 = shufflevector <2 x double> %2, <2 x double> undef, <2 x i32> <i32 1, i32 0>			%3 = shufflevector <2 x double> %2, <2 x double> undef, <2 x i32> <i32 1, i32 0>
	%4 = fadd <2 x double> %1, %3			%4 = fadd <2 x double> %1, %3
	ret <2 x double> %4			ret <2 x double> %4
	}			}
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [1:1.00]			; HASWELL-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [1:1.00]
	; HASWELL-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [5:1.00]			; HASWELL-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilpd_ymm:			; BTVER2-LABEL: test_permilpd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [6:1.00]
	; BTVER2-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [6:1.00]			; BTVER2-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [6:1.00]
	; BTVER2-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [1:0.50]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilpd_ymm:			; ZNVER1-LABEL: test_permilpd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [6:1.00]			; ZNVER1-NEXT: vpermilpd {{.*#+}} ymm1 = mem[1,0,2,3] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,3] sched: [1:0.50]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>			%1 = shufflevector <4 x double> %a0, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>			%3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [1:1.00]			; HASWELL-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [1:1.00]
	; HASWELL-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [5:1.00]			; HASWELL-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [5:1.00]
	; HASWELL-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilps:			; BTVER2-LABEL: test_permilps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [6:1.00]
	; BTVER2-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [6:1.00]			; BTVER2-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [6:1.00]
	; BTVER2-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [1:0.50]
	; BTVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilps:			; ZNVER1-LABEL: test_permilps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [6:1.00]			; ZNVER1-NEXT: vpermilps {{.*#+}} xmm1 = mem[3,2,1,0] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,1,0] sched: [1:0.50]
	; ZNVER1-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			%1 = shufflevector <4 x float> %a0, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	%2 = load <4 x float>, <4 x float> *%a1, align 16			%2 = load <4 x float>, <4 x float> *%a1, align 16
	%3 = shufflevector <4 x float> %2, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			%3 = shufflevector <4 x float> %2, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	%4 = fadd <4 x float> %1, %3			%4 = fadd <4 x float> %1, %3
	ret <4 x float> %4			ret <4 x float> %4
	}			}
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [1:1.00]			; HASWELL-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [1:1.00]
	; HASWELL-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [5:1.00]			; HASWELL-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [5:1.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilps_ymm:			; BTVER2-LABEL: test_permilps_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
				; BTVER2-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [6:1.00]
	; BTVER2-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [6:1.00]			; BTVER2-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [6:1.00]
	; BTVER2-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [1:0.50]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilps_ymm:			; ZNVER1-LABEL: test_permilps_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
				; ZNVER1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [6:1.00]			; ZNVER1-NEXT: vpermilps {{.*#+}} ymm1 = mem[3,2,1,0,7,6,5,4] sched: [6:1.00]
	; ZNVER1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,2,1,0,7,6,5,4] sched: [1:0.50]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>			%1 = shufflevector <8 x float> %a0, <8 x float> undef, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>			%3 = shufflevector <8 x float> %2, <8 x float> undef, <8 x i32> <i32 3, i32 2, i32 1, i32 0, i32 7, i32 6, i32 5, i32 4>
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}

	Show All 38 Lines
	; HASWELL-LABEL: test_permilvarpd_ymm:			; HASWELL-LABEL: test_permilvarpd_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilvarpd_ymm:			; BTVER2-LABEL: test_permilvarpd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; BTVER2-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilvarpd_ymm:			; ZNVER1-LABEL: test_permilvarpd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vpermilpd %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; ZNVER1-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vpermilpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double> %a0, <4 x i64> %a1)			%1 = call <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double> %a0, <4 x i64> %a1)
	%2 = load <4 x i64>, <4 x i64> *%a2, align 32			%2 = load <4 x i64>, <4 x i64> *%a2, align 32
	%3 = call <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double> %1, <4 x i64> %2)			%3 = call <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double> %1, <4 x i64> %2)
	ret <4 x double> %3			ret <4 x double> %3
	}			}
	declare <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double>, <4 x i64>) nounwind readnone			declare <4 x double> @llvm.x86.avx.vpermilvar.pd.256(<4 x double>, <4 x i64>) nounwind readnone
	Show All 39 Lines
	; HASWELL-LABEL: test_permilvarps_ymm:			; HASWELL-LABEL: test_permilvarps_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_permilvarps_ymm:			; BTVER2-LABEL: test_permilvarps_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; BTVER2-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_permilvarps_ymm:			; ZNVER1-LABEL: test_permilvarps_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vpermilps %ymm1, %ymm0, %ymm0 # sched: [3:3.00]
	; ZNVER1-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vpermilps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float> %a0, <8 x i32> %a1)			%1 = call <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float> %a0, <8 x i32> %a1)
	%2 = load <8 x i32>, <8 x i32> *%a2, align 32			%2 = load <8 x i32>, <8 x i32> *%a2, align 32
	%3 = call <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float> %1, <8 x i32> %2)			%3 = call <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float> %1, <8 x i32> %2)
	ret <8 x float> %3			ret <8 x float> %3
	}			}
	declare <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float>, <8 x i32>) nounwind readnone			declare <8 x float> @llvm.x86.avx.vpermilvar.ps.256(<8 x float>, <8 x i32>) nounwind readnone
	Show All 12 Lines
	; HASWELL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]			; HASWELL-NEXT: vrcpps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_rcpps:			; BTVER2-LABEL: test_rcpps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrcpps (%rdi), %ymm1 # sched: [7:1.00]			; BTVER2-NEXT: vrcpps (%rdi), %ymm1 # sched: [7:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_rcpps:			; ZNVER1-LABEL: test_rcpps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vrcpps (%rdi), %ymm1 # sched: [7:1.00]			; ZNVER1-NEXT: vrcpps (%rdi), %ymm1 # sched: [7:1.00]
	; ZNVER1-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float> %a0)			%1 = call <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float> %a0)
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = call <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float> %2)
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}
	declare <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float>) nounwind readnone
	Show All 11 Lines
	; HASWELL-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [6:2.00]			; HASWELL-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [6:2.00]
	; HASWELL-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [10:2.00]			; HASWELL-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [10:2.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_roundpd:			; BTVER2-LABEL: test_roundpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [8:1.00]			; BTVER2-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [8:1.00]
	; BTVER2-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_roundpd:			; ZNVER1-LABEL: test_roundpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [8:1.00]			; ZNVER1-NEXT: vroundpd $7, (%rdi), %ymm1 # sched: [8:1.00]
	; ZNVER1-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vroundpd $7, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %a0, i32 7)			%1 = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %a0, i32 7)
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %2, i32 7)			%3 = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %2, i32 7)
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}
	declare <4 x double> @llvm.x86.avx.round.pd.256(<4 x double>, i32) nounwind readnone			declare <4 x double> @llvm.x86.avx.round.pd.256(<4 x double>, i32) nounwind readnone
	Show All 11 Lines
	; HASWELL-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [6:2.00]			; HASWELL-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [6:2.00]
	; HASWELL-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [10:2.00]			; HASWELL-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [10:2.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_roundps:			; BTVER2-LABEL: test_roundps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [8:1.00]			; BTVER2-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [8:1.00]
	; BTVER2-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_roundps:			; ZNVER1-LABEL: test_roundps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [8:1.00]			; ZNVER1-NEXT: vroundps $7, (%rdi), %ymm1 # sched: [8:1.00]
	; ZNVER1-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vroundps $7, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %a0, i32 7)			%1 = call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %a0, i32 7)
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %2, i32 7)			%3 = call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %2, i32 7)
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}
	declare <8 x float> @llvm.x86.avx.round.ps.256(<8 x float>, i32) nounwind readnone			declare <8 x float> @llvm.x86.avx.round.ps.256(<8 x float>, i32) nounwind readnone
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [11:2.00]			; HASWELL-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [11:2.00]
	; HASWELL-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [7:2.00]			; HASWELL-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [7:2.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_rsqrtps:			; BTVER2-LABEL: test_rsqrtps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [7:1.00]			; BTVER2-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [7:2.00]
	; BTVER2-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_rsqrtps:			; ZNVER1-LABEL: test_rsqrtps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [7:1.00]			; ZNVER1-NEXT: vrsqrtps (%rdi), %ymm1 # sched: [7:2.00]
	; ZNVER1-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [2:1.00]			; ZNVER1-NEXT: vrsqrtps %ymm0, %ymm0 # sched: [2:2.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %a0)			%1 = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %a0)
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %2)
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}
	declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [1:1.00]			; HASWELL-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [1:1.00]
	; HASWELL-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [5:1.00]			; HASWELL-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_shufpd:			; BTVER2-LABEL: test_shufpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [1:0.50]			; BTVER2-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [6:1.00]
	; BTVER2-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [6:1.00]			; BTVER2-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_shufpd:			; ZNVER1-LABEL: test_shufpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [1:0.50]			; ZNVER1-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[1],ymm1[0],ymm0[2],ymm1[3] sched: [6:1.00]
	; ZNVER1-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [6:1.00]			; ZNVER1-NEXT: vshufpd {{.*#+}} ymm1 = ymm1[1],mem[0],ymm1[2],mem[3] sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 4, i32 2, i32 7>			%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 4, i32 2, i32 7>
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 1, i32 4, i32 2, i32 7>			%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 1, i32 4, i32 2, i32 7>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	define <8 x float> @test_shufps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) nounwind {			define <8 x float> @test_shufps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) nounwind {
	; SANDY-LABEL: test_shufps:			; SANDY-LABEL: test_shufps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:1.00]			; SANDY-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:1.00]
	; SANDY-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [5:1.00]			; SANDY-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [5:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_shufps:			; HASWELL-LABEL: test_shufps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:1.00]			; HASWELL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:1.00]
	; HASWELL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [5:1.00]			; HASWELL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_shufps:			; BTVER2-LABEL: test_shufps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:0.50]			; BTVER2-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [6:1.00]
	; BTVER2-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [6:1.00]			; BTVER2-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_shufps:			; ZNVER1-LABEL: test_shufps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [1:0.50]			; ZNVER1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,0],ymm1[0,0],ymm0[4,4],ymm1[4,4] sched: [6:1.00]
	; ZNVER1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [6:1.00]			; ZNVER1-NEXT: vshufps {{.*#+}} ymm0 = ymm0[0,3],mem[0,0],ymm0[4,7],mem[4,4] sched: [6:1.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> <i32 0, i32 0, i32 8, i32 8, i32 4, i32 4, i32 12, i32 12>			%1 = shufflevector <8 x float> %a0, <8 x float> %a1, <8 x i32> <i32 0, i32 0, i32 8, i32 8, i32 4, i32 4, i32 12, i32 12>
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = shufflevector <8 x float> %1, <8 x float> %2, <8 x i32> <i32 0, i32 3, i32 8, i32 8, i32 4, i32 7, i32 12, i32 12>			%3 = shufflevector <8 x float> %1, <8 x float> %2, <8 x i32> <i32 0, i32 3, i32 8, i32 8, i32 4, i32 7, i32 12, i32 12>
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	Show All 9 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [32:2.00]			; HASWELL-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [32:2.00]
	; HASWELL-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [28:2.00]			; HASWELL-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [28:2.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_sqrtpd:			; BTVER2-LABEL: test_sqrtpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [59:54.00]
	; BTVER2-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [21:21.00]			; BTVER2-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [54:54.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_sqrtpd:			; ZNVER1-LABEL: test_sqrtpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [26:21.00]			; ZNVER1-NEXT: vsqrtpd (%rdi), %ymm1 # sched: [59:54.00]
	; ZNVER1-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [21:21.00]			; ZNVER1-NEXT: vsqrtpd %ymm0, %ymm0 # sched: [54:54.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %a0)			%1 = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %a0)
	%2 = load <4 x double>, <4 x double> *%a1, align 32			%2 = load <4 x double>, <4 x double> *%a1, align 32
	%3 = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %2)			%3 = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %2)
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}
	declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone			declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone
	Show All 10 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vsqrtps (%rdi), %ymm1 # sched: [23:2.00]			; HASWELL-NEXT: vsqrtps (%rdi), %ymm1 # sched: [23:2.00]
	; HASWELL-NEXT: vsqrtps %ymm0, %ymm0 # sched: [19:2.00]			; HASWELL-NEXT: vsqrtps %ymm0, %ymm0 # sched: [19:2.00]
	; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_sqrtps:			; BTVER2-LABEL: test_sqrtps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vsqrtps (%rdi), %ymm1 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtps %ymm0, %ymm0 # sched: [54:54.00]
	; BTVER2-NEXT: vsqrtps %ymm0, %ymm0 # sched: [21:21.00]			; BTVER2-NEXT: vsqrtps (%rdi), %ymm1 # sched: [47:42.00]
	; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_sqrtps:			; ZNVER1-LABEL: test_sqrtps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vsqrtps (%rdi), %ymm1 # sched: [26:21.00]			; ZNVER1-NEXT: vsqrtps %ymm0, %ymm0 # sched: [54:54.00]
	; ZNVER1-NEXT: vsqrtps %ymm0, %ymm0 # sched: [21:21.00]			; ZNVER1-NEXT: vsqrtps (%rdi), %ymm1 # sched: [47:42.00]
	; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %a0)			%1 = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %a0)
	%2 = load <8 x float>, <8 x float> *%a1, align 32			%2 = load <8 x float>, <8 x float> *%a1, align 32
	%3 = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %2)			%3 = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %2)
	%4 = fadd <8 x float> %1, %3			%4 = fadd <8 x float> %1, %3
	ret <8 x float> %4			ret <8 x float> %4
	}			}
	declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone			declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone

	define <4 x double> @test_subpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {			define <4 x double> @test_subpd(<4 x double> %a0, <4 x double> %a1, <4 x double> *%a2) {
	; SANDY-LABEL: test_subpd:			; SANDY-LABEL: test_subpd:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_subpd:			; HASWELL-LABEL: test_subpd:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_subpd:			; BTVER2-LABEL: test_subpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_subpd:			; ZNVER1-LABEL: test_subpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vsubpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vsubpd (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fsub <4 x double> %a0, %a1			%1 = fsub <4 x double> %a0, %a1
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = fsub <4 x double> %1, %2			%3 = fsub <4 x double> %1, %2
	ret <4 x double> %3			ret <4 x double> %3
	}			}

	define <8 x float> @test_subps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {			define <8 x float> @test_subps(<8 x float> %a0, <8 x float> %a1, <8 x float> *%a2) {
	; SANDY-LABEL: test_subps:			; SANDY-LABEL: test_subps:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; SANDY-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; SANDY-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_subps:			; HASWELL-LABEL: test_subps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]			; HASWELL-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [7:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_subps:			; BTVER2-LABEL: test_subps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; BTVER2-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_subps:			; ZNVER1-LABEL: test_subps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vsubps %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [8:1.00]			; ZNVER1-NEXT: vsubps (%rdi), %ymm0, %ymm0 # sched: [8:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = fsub <8 x float> %a0, %a1			%1 = fsub <8 x float> %a0, %a1
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = fsub <8 x float> %1, %2			%3 = fsub <8 x float> %1, %2
	ret <8 x float> %3			ret <8 x float> %3
	}			}

	define i32 @test_testpd(<2 x double> %a0, <2 x double> %a1, <2 x double> *%a2) {			define i32 @test_testpd(<2 x double> %a0, <2 x double> %a1, <2 x double> *%a2) {
	Show All 13 Lines
	; HASWELL-NEXT: setb %al # sched: [1:0.50]			; HASWELL-NEXT: setb %al # sched: [1:0.50]
	; HASWELL-NEXT: vtestpd (%rdi), %xmm0 # sched: [5:0.50]			; HASWELL-NEXT: vtestpd (%rdi), %xmm0 # sched: [5:0.50]
	; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]			; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_testpd:			; BTVER2-LABEL: test_testpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd %xmm1, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestpd %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd (%rdi), %xmm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestpd (%rdi), %xmm0 # sched: [8:1.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testpd:			; ZNVER1-LABEL: test_testpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vtestpd %xmm1, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vtestpd %xmm1, %xmm0 # sched: [3:1.00]
	; ZNVER1-NEXT: setb %al # sched: [1:0.50]			; ZNVER1-NEXT: setb %al # sched: [1:0.50]
	; ZNVER1-NEXT: vtestpd (%rdi), %xmm0 # sched: [6:1.00]			; ZNVER1-NEXT: vtestpd (%rdi), %xmm0 # sched: [8:1.00]
	; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.vtestc.pd(<2 x double> %a0, <2 x double> %a1)			%1 = call i32 @llvm.x86.avx.vtestc.pd(<2 x double> %a0, <2 x double> %a1)
	%2 = load <2 x double>, <2 x double> *%a2, align 16			%2 = load <2 x double>, <2 x double> *%a2, align 16
	%3 = call i32 @llvm.x86.avx.vtestc.pd(<2 x double> %a0, <2 x double> %2)			%3 = call i32 @llvm.x86.avx.vtestc.pd(<2 x double> %a0, <2 x double> %2)
	%4 = add i32 %1, %3			%4 = add i32 %1, %3
	ret i32 %4			ret i32 %4
	}			}
	Show All 12 Lines
	;			;
	; HASWELL-LABEL: test_testpd_ymm:			; HASWELL-LABEL: test_testpd_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: xorl %eax, %eax # sched: [1:0.25]			; HASWELL-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; HASWELL-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.33]			; HASWELL-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.33]
	; HASWELL-NEXT: setb %al # sched: [1:0.50]			; HASWELL-NEXT: setb %al # sched: [1:0.50]
	; HASWELL-NEXT: vtestpd (%rdi), %ymm0 # sched: [5:0.50]			; HASWELL-NEXT: vtestpd (%rdi), %ymm0 # sched: [5:0.50]
	; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]			; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_testpd_ymm:			; BTVER2-LABEL: test_testpd_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestpd %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestpd (%rdi), %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:3.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testpd_ymm:			; ZNVER1-LABEL: test_testpd_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vtestpd %ymm1, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vtestpd %ymm1, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: setb %al # sched: [1:0.50]			; ZNVER1-NEXT: setb %al # sched: [1:0.50]
	; ZNVER1-NEXT: vtestpd (%rdi), %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vtestpd (%rdi), %ymm0 # sched: [9:3.00]
	; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.vtestc.pd.256(<4 x double> %a0, <4 x double> %a1)			%1 = call i32 @llvm.x86.avx.vtestc.pd.256(<4 x double> %a0, <4 x double> %a1)
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = call i32 @llvm.x86.avx.vtestc.pd.256(<4 x double> %a0, <4 x double> %2)			%3 = call i32 @llvm.x86.avx.vtestc.pd.256(<4 x double> %a0, <4 x double> %2)
	%4 = add i32 %1, %3			%4 = add i32 %1, %3
	ret i32 %4			ret i32 %4
	}			}
	declare i32 @llvm.x86.avx.vtestc.pd.256(<4 x double>, <4 x double>) nounwind readnone			declare i32 @llvm.x86.avx.vtestc.pd.256(<4 x double>, <4 x double>) nounwind readnone
	Show All 15 Lines
	; HASWELL-NEXT: setb %al # sched: [1:0.50]			; HASWELL-NEXT: setb %al # sched: [1:0.50]
	; HASWELL-NEXT: vtestps (%rdi), %xmm0 # sched: [5:0.50]			; HASWELL-NEXT: vtestps (%rdi), %xmm0 # sched: [5:0.50]
	; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]			; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_testps:			; BTVER2-LABEL: test_testps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestps %xmm1, %xmm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestps %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestps (%rdi), %xmm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestps (%rdi), %xmm0 # sched: [8:1.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testps:			; ZNVER1-LABEL: test_testps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vtestps %xmm1, %xmm0 # sched: [1:0.50]			; ZNVER1-NEXT: vtestps %xmm1, %xmm0 # sched: [3:1.00]
	; ZNVER1-NEXT: setb %al # sched: [1:0.50]			; ZNVER1-NEXT: setb %al # sched: [1:0.50]
	; ZNVER1-NEXT: vtestps (%rdi), %xmm0 # sched: [6:1.00]			; ZNVER1-NEXT: vtestps (%rdi), %xmm0 # sched: [8:1.00]
	; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.vtestc.ps(<4 x float> %a0, <4 x float> %a1)			%1 = call i32 @llvm.x86.avx.vtestc.ps(<4 x float> %a0, <4 x float> %a1)
	%2 = load <4 x float>, <4 x float> *%a2, align 16			%2 = load <4 x float>, <4 x float> *%a2, align 16
	%3 = call i32 @llvm.x86.avx.vtestc.ps(<4 x float> %a0, <4 x float> %2)			%3 = call i32 @llvm.x86.avx.vtestc.ps(<4 x float> %a0, <4 x float> %2)
	%4 = add i32 %1, %3			%4 = add i32 %1, %3
	ret i32 %4			ret i32 %4
	}			}
	Show All 12 Lines
	;			;
	; HASWELL-LABEL: test_testps_ymm:			; HASWELL-LABEL: test_testps_ymm:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: xorl %eax, %eax # sched: [1:0.25]			; HASWELL-NEXT: xorl %eax, %eax # sched: [1:0.25]
	; HASWELL-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.33]			; HASWELL-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.33]
	; HASWELL-NEXT: setb %al # sched: [1:0.50]			; HASWELL-NEXT: setb %al # sched: [1:0.50]
	; HASWELL-NEXT: vtestps (%rdi), %ymm0 # sched: [5:0.50]			; HASWELL-NEXT: vtestps (%rdi), %ymm0 # sched: [5:0.50]
	; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]			; HASWELL-NEXT: adcl $0, %eax # sched: [2:0.50]
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_testps_ymm:			; BTVER2-LABEL: test_testps_ymm:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]			; BTVER2-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; BTVER2-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vtestps %ymm1, %ymm0 # sched: [3:1.00]
	; BTVER2-NEXT: setb %al # sched: [1:0.50]			; BTVER2-NEXT: setb %al # sched: [1:0.50]
	; BTVER2-NEXT: vtestps (%rdi), %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vtestps (%rdi), %ymm0 # sched: [9:3.00]
	; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]			; BTVER2-NEXT: adcl $0, %eax # sched: [1:0.50]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_testps_ymm:			; ZNVER1-LABEL: test_testps_ymm:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]			; ZNVER1-NEXT: xorl %eax, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vtestps %ymm1, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vtestps %ymm1, %ymm0 # sched: [3:1.00]
	; ZNVER1-NEXT: setb %al # sched: [1:0.50]			; ZNVER1-NEXT: setb %al # sched: [1:0.50]
	; ZNVER1-NEXT: vtestps (%rdi), %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vtestps (%rdi), %ymm0 # sched: [9:3.00]
	; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]			; ZNVER1-NEXT: adcl $0, %eax # sched: [1:0.50]
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = call i32 @llvm.x86.avx.vtestc.ps.256(<8 x float> %a0, <8 x float> %a1)			%1 = call i32 @llvm.x86.avx.vtestc.ps.256(<8 x float> %a0, <8 x float> %a1)
	%2 = load <8 x float>, <8 x float> *%a2, align 32			%2 = load <8 x float>, <8 x float> *%a2, align 32
	%3 = call i32 @llvm.x86.avx.vtestc.ps.256(<8 x float> %a0, <8 x float> %2)			%3 = call i32 @llvm.x86.avx.vtestc.ps.256(<8 x float> %a0, <8 x float> %2)
	%4 = add i32 %1, %3			%4 = add i32 %1, %3
	ret i32 %4			ret i32 %4
	}			}
	declare i32 @llvm.x86.avx.vtestc.ps.256(<8 x float>, <8 x float>) nounwind readnone			declare i32 @llvm.x86.avx.vtestc.ps.256(<8 x float>, <8 x float>) nounwind readnone
	Show All 12 Lines
	; HASWELL-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [5:1.00]			; HASWELL-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpckhpd:			; BTVER2-LABEL: test_unpckhpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3] sched: [1:0.50]			; BTVER2-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3] sched: [1:0.50]
	; BTVER2-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [6:1.00]			; BTVER2-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_unpckhpd:			; ZNVER1-LABEL: test_unpckhpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3] sched: [1:0.50]			; ZNVER1-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3] sched: [1:0.50]
	; ZNVER1-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [6:1.00]			; ZNVER1-NEXT: vunpckhpd {{.*#+}} ymm1 = ymm1[1],mem[1],ymm1[3],mem[3] sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; HASWELL-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [5:1.00]			; HASWELL-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpcklpd:			; BTVER2-LABEL: test_unpcklpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2] sched: [1:0.50]			; BTVER2-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2] sched: [1:0.50]
	; BTVER2-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [6:1.00]			; BTVER2-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_unpcklpd:			; ZNVER1-LABEL: test_unpcklpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2] sched: [1:0.50]			; ZNVER1-NEXT: vunpcklpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2] sched: [1:0.50]
	; ZNVER1-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [6:1.00]			; ZNVER1-NEXT: vunpcklpd {{.*#+}} ymm1 = ymm1[0],mem[0],ymm1[2],mem[2] sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm1, %ymm0, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%1 = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%2 = load <4 x double>, <4 x double> *%a2, align 32			%2 = load <4 x double>, <4 x double> *%a2, align 32
	%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%3 = shufflevector <4 x double> %a1, <4 x double> %2, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%4 = fadd <4 x double> %1, %3			%4 = fadd <4 x double> %1, %3
	ret <4 x double> %4			ret <4 x double> %4
	}			}

	Show All 39 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_xorpd:			; BTVER2-LABEL: test_xorpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_xorpd:			; ZNVER1-LABEL: test_xorpd:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vxorpd %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vxorpd (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddpd %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <4 x double> %a0 to <4 x i64>			%1 = bitcast <4 x double> %a0 to <4 x i64>
	%2 = bitcast <4 x double> %a1 to <4 x i64>			%2 = bitcast <4 x double> %a1 to <4 x i64>
	%3 = xor <4 x i64> %1, %2			%3 = xor <4 x i64> %1, %2
	%4 = load <4 x double>, <4 x double> *%a2, align 32			%4 = load <4 x double>, <4 x double> *%a2, align 32
	%5 = bitcast <4 x double> %4 to <4 x i64>			%5 = bitcast <4 x double> %4 to <4 x i64>
	%6 = xor <4 x i64> %3, %5			%6 = xor <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <4 x double>			%7 = bitcast <4 x i64> %6 to <4 x double>
	Show All 13 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]			; HASWELL-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [1:1.00]
	; HASWELL-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]			; HASWELL-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [5:1.00]
	; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_xorps:			; BTVER2-LABEL: test_xorps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; BTVER2-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; BTVER2-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_xorps:			; ZNVER1-LABEL: test_xorps:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [1:0.50]			; ZNVER1-NEXT: vxorps %ymm1, %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]			; ZNVER1-NEXT: vxorps (%rdi), %ymm0, %ymm0 # sched: [6:1.00]
	; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; ZNVER1-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	%1 = bitcast <8 x float> %a0 to <4 x i64>			%1 = bitcast <8 x float> %a0 to <4 x i64>
	%2 = bitcast <8 x float> %a1 to <4 x i64>			%2 = bitcast <8 x float> %a1 to <4 x i64>
	%3 = xor <4 x i64> %1, %2			%3 = xor <4 x i64> %1, %2
	%4 = load <8 x float>, <8 x float> *%a2, align 32			%4 = load <8 x float>, <8 x float> *%a2, align 32
	%5 = bitcast <8 x float> %4 to <4 x i64>			%5 = bitcast <8 x float> %4 to <4 x i64>
	%6 = xor <4 x i64> %3, %5			%6 = xor <4 x i64> %3, %5
	%7 = bitcast <4 x i64> %6 to <8 x float>			%7 = bitcast <4 x i64> %6 to <8 x float>
	%8 = fadd <8 x float> %a1, %7			%8 = fadd <8 x float> %a1, %7
	ret <8 x float> %8			ret <8 x float> %8
	}			}

	define void @test_zeroall() {			define void @test_zeroall() {
	; SANDY-LABEL: test_zeroall:			; SANDY-LABEL: test_zeroall:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vzeroall # sched: [?:0.000000e+00]			; SANDY-NEXT: vzeroall # sched: [?:0.000000e+00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_zeroall:			; HASWELL-LABEL: test_zeroall:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vzeroall # sched: [1:0.00]			; HASWELL-NEXT: vzeroall # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_zeroall:			; BTVER2-LABEL: test_zeroall:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vzeroall # sched: [?:0.000000e+00]			; BTVER2-NEXT: vzeroall # sched: [90:90.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_zeroall:			; ZNVER1-LABEL: test_zeroall:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vzeroall # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroall # sched: [90:90.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	call void @llvm.x86.avx.vzeroall()			call void @llvm.x86.avx.vzeroall()
	ret void			ret void
	}			}
	declare void @llvm.x86.avx.vzeroall() nounwind			declare void @llvm.x86.avx.vzeroall() nounwind

	define void @test_zeroupper() {			define void @test_zeroupper() {
	; SANDY-LABEL: test_zeroupper:			; SANDY-LABEL: test_zeroupper:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]			; SANDY-NEXT: vzeroupper # sched: [?:0.000000e+00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	; HASWELL-LABEL: test_zeroupper:			; HASWELL-LABEL: test_zeroupper:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vzeroupper # sched: [1:0.00]			; HASWELL-NEXT: vzeroupper # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_zeroupper:			; BTVER2-LABEL: test_zeroupper:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vzeroupper # sched: [?:0.000000e+00]			; BTVER2-NEXT: vzeroupper # sched: [46:46.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; ZNVER1-LABEL: test_zeroupper:			; ZNVER1-LABEL: test_zeroupper:
	; ZNVER1: # BB#0:			; ZNVER1: # BB#0:
	; ZNVER1-NEXT: vzeroupper # sched: [?:0.000000e+00]			; ZNVER1-NEXT: vzeroupper # sched: [46:46.00]
	; ZNVER1-NEXT: retq # sched: [4:1.00]			; ZNVER1-NEXT: retq # sched: [4:1.00]
	call void @llvm.x86.avx.vzeroupper()			call void @llvm.x86.avx.vzeroupper()
	ret void			ret void
	}			}
	declare void @llvm.x86.avx.vzeroupper() nounwind			declare void @llvm.x86.avx.vzeroupper() nounwind

	!0 = !{i32 1}			!0 = !{i32 1}

test/CodeGen/X86/recip-fastmath.ll

	Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines
	; FMA-RECIP-LABEL: v4f32_no_estimate:			; FMA-RECIP-LABEL: v4f32_no_estimate:
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; FMA-RECIP-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; FMA-RECIP-NEXT: vdivps %xmm0, %xmm1, %xmm0			; FMA-RECIP-NEXT: vdivps %xmm0, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_no_estimate:			; BTVER2-LABEL: v4f32_no_estimate:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [19:19.00]			; BTVER2-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [19:19.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_no_estimate:			; SANDY-LABEL: v4f32_no_estimate:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} xmm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [12:1.00]			; SANDY-NEXT: vdivps %xmm0, %xmm1, %xmm0 # sched: [12:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step:			; BTVER2-LABEL: v4f32_one_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
				RKSimonUnsubmitted Not Done Reply Inline Actions Latency should be 5cy RKSimon: Latency should be 5cy
				avt77AuthorUnsubmitted Not Done Reply Inline Actions Why? In fact we should have tp 0.5 for XMM (see below). I'll fix it. VMOVAPD xmm1 xmm2 AVX 1 FPA\|FPM 1 0,5 VMOVAPD ymm1 ymm2 AVX 2 FPA\|FPM 1 1 VMOVAPS xmm1 xmm2 AVX 1 FPA\|FPM 1 0,5 VMOVAPS ymm1 ymm2 AVX 2 FPA\|FPM 1 1 avt77: Why? In fact we should have tp 0.5 for XMM (see below). I'll fix it.
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v4f32_one_step:			; SANDY-LABEL: v4f32_one_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %xmm0, %xmm1 # sched: [5:1.00]
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm0, %xmm3
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_two_step:			; BTVER2-LABEL: v4f32_two_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]			; FMA-RECIP-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00]
	; FMA-RECIP-NEXT: vdivps %ymm0, %ymm1, %ymm0			; FMA-RECIP-NEXT: vdivps %ymm0, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_no_estimate:			; BTVER2-LABEL: v8f32_no_estimate:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [19:19.00]			; BTVER2-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [38:38.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_no_estimate:			; SANDY-LABEL: v8f32_no_estimate:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm1 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [12:1.00]			; SANDY-NEXT: vdivps %ymm0, %ymm1, %ymm0 # sched: [12:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %ymm1, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step:			; BTVER2-LABEL: v8f32_one_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step:			; SANDY-LABEL: v8f32_one_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfnmadd213ps %ymm2, %ymm3, %ymm0
	; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_two_step:			; BTVER2-LABEL: v8f32_two_step:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_two_step:			; SANDY-LABEL: v8f32_two_step:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/recip-fastmath2.ll

	Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1			; FMA-RECIP-NEXT: vrcpps %xmm0, %xmm1
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step2:			; BTVER2-LABEL: v4f32_one_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0 # sched: [7:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps {{.*}}(%rip), %xmm1, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1
	; FMA-RECIP-NEXT: vmulps %xmm0, %xmm1, %xmm0			; FMA-RECIP-NEXT: vmulps %xmm0, %xmm1, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_one_step_2_divs:			; BTVER2-LABEL: v4f32_one_step_2_divs:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm0, %xmm2, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm1 # sched: [7:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3			; FMA-RECIP-NEXT: vfmadd132ps %xmm1, %xmm1, %xmm3
	; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfnmadd213ps %xmm2, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0			; FMA-RECIP-NEXT: vfmadd132ps %xmm3, %xmm3, %xmm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v4f32_two_step2:			; BTVER2-LABEL: v4f32_two_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} xmm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [6:1.00]
	; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %xmm0, %xmm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm2, %xmm3, %xmm2 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm2, %xmm1, %xmm2 # sched: [2:1.00]
	; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm2, %xmm1, %xmm1 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm1, %xmm0, %xmm0 # sched: [2:1.00]
	; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %xmm0, %xmm3, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %xmm0, %xmm1, %xmm0 # sched: [2:1.00]
	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm1, %ymm1, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step2:			; BTVER2-LABEL: v8f32_one_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step2:			; SANDY-LABEL: v8f32_one_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1
	; FMA-RECIP-NEXT: vmulps %ymm0, %ymm1, %ymm0			; FMA-RECIP-NEXT: vmulps %ymm0, %ymm1, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_one_step_2_divs:			; BTVER2-LABEL: v8f32_one_step_2_divs:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm1 # sched: [7:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_one_step_2_divs:			; SANDY-LABEL: v8f32_one_step_2_divs:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm2 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm0, %ymm2, %ymm0 # sched: [3:1.00]
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0			; FMA-RECIP-NEXT: vfmadd132ps %ymm3, %ymm3, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_two_step2:			; BTVER2-LABEL: v8f32_two_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]			; BTVER2-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [5:1.00]
	; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm1 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm2, %ymm1, %ymm2 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm2, %ymm1, %ymm1 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm1, %ymm0, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vsubps %ymm0, %ymm3, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vmulps %ymm0, %ymm1, %ymm0 # sched: [2:2.00]
	; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %ymm0, %ymm1, %ymm0 # sched: [3:2.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_two_step2:			; SANDY-LABEL: v8f32_two_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm1 # sched: [5:1.00]
	; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]			; SANDY-NEXT: vmulps %ymm1, %ymm0, %ymm2 # sched: [5:1.00]
	; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]			; SANDY-NEXT: vmovaps {{.*#+}} ymm3 = [1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00,1.000000e+00] sched: [4:0.50]
	; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]			; SANDY-NEXT: vsubps %ymm2, %ymm3, %ymm2 # sched: [3:1.00]
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; FMA-RECIP: # BB#0:			; FMA-RECIP: # BB#0:
	; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0			; FMA-RECIP-NEXT: vrcpps %ymm0, %ymm0
	; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0			; FMA-RECIP-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0
	; FMA-RECIP-NEXT: retq			; FMA-RECIP-NEXT: retq
	;			;
	; BTVER2-LABEL: v8f32_no_step2:			; BTVER2-LABEL: v8f32_no_step2:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]			; BTVER2-NEXT: vrcpps %ymm0, %ymm0 # sched: [2:1.00]
	; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:1.00]			; BTVER2-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [7:2.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	;			;
	; SANDY-LABEL: v8f32_no_step2:			; SANDY-LABEL: v8f32_no_step2:
	; SANDY: # BB#0:			; SANDY: # BB#0:
	; SANDY-NEXT: vrcpps %ymm0, %ymm0 # sched: [5:1.00]			; SANDY-NEXT: vrcpps %ymm0, %ymm0 # sched: [5:1.00]
	; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]			; SANDY-NEXT: vmulps {{.*}}(%rip), %ymm0, %ymm0 # sched: [9:1.00]
	; SANDY-NEXT: retq # sched: [5:1.00]			; SANDY-NEXT: retq # sched: [5:1.00]
	;			;
	Show All 32 Lines

test/CodeGen/X86/slow-unaligned-mem.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; SLOW-NEXT: movl			; SLOW-NEXT: movl
	; SLOW-NEXT: movl			; SLOW-NEXT: movl
	; SLOW-NEXT: movl			; SLOW-NEXT: movl
	; SLOW-NEXT: movl			; SLOW-NEXT: movl
	;			;
	; FAST-NOT: not a recognized processor			; FAST-NOT: not a recognized processor
	; FAST-LABEL: store_zeros:			; FAST-LABEL: store_zeros:
	; FAST: # BB#0:			; FAST: # BB#0:
	; FAST-NEXT: movl {{[0-9]+}}(%esp), %eax			; FAST: movl {{[0-9]+}}(%esp), %eax
				RKSimonUnsubmitted Not Done Reply Inline Actions ???? RKSimon: ????
				avt77AuthorUnsubmitted Not Done Reply Inline Actions This test was written by hand that's why it's difficult to compare the results but the new version generates: BB#0: vxorps %ymm0, %ymm0, %ymm0 movl 4(%esp), %eax vmovups %ymm0, 32(%eax) vmovups %ymm0, (%eax) retl As you see we have vxorps between # BB#0: and 'movl'. I decided it's acceptable. Am I wrong? avt77: This test was written by hand that's why it's difficult to compare the results but the new…
	; FAST-NOT: movl			; FAST-NOT: movl
	call void @llvm.memset.p0i8.i64(i8* %a, i8 0, i64 64, i32 1, i1 false)			call void @llvm.memset.p0i8.i64(i8* %a, i8 0, i64 64, i32 1, i1 false)
	ret void			ret void
	}			}

	declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1)			declare void @llvm.memset.p0i8.i64(i8* nocapture, i8, i64, i32, i1)

test/CodeGen/X86/sse-schedule.ll

	Show First 20 Lines • Show All 1,025 Lines • ▼ Show 20 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovaps (%rdi), %xmm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovaps (%rdi), %xmm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddps %xmm0, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %xmm0, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovaps %xmm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovaps %xmm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movaps:			; BTVER2-LABEL: test_movaps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps (%rdi), %xmm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovaps (%rdi), %xmm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm0, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmovaps %xmm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovaps %xmm0, (%rsi) # sched: [1:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = load <4 x float>, <4 x float> *%a0, align 16			%1 = load <4 x float>, <4 x float> *%a0, align 16
	%2 = fadd <4 x float> %1, %1			%2 = fadd <4 x float> %1, %1
	store <4 x float> %2, <4 x float> *%a1, align 16			store <4 x float> %2, <4 x float> *%a1, align 16
	ret void			ret void
	}			}
	Show All 29 Lines
	;			;
	; HASWELL-LABEL: test_movhlps:			; HASWELL-LABEL: test_movhlps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm1[1],xmm0[1] sched: [1:1.00]			; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm1[1],xmm0[1] sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movhlps:			; BTVER2-LABEL: test_movhlps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm1[1],xmm0[1] sched: [1:0.50]			; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm1[1],xmm0[1] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 6, i32 7, i32 2, i32 3>			%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 6, i32 7, i32 2, i32 3>
	ret <4 x float> %1			ret <4 x float> %1
	}			}

	; TODO (v)movhps			; TODO (v)movhps

	define void @test_movhps(<4 x float> %a0, <4 x float> %a1, x86_mmx *%a2) {			define void @test_movhps(<4 x float> %a0, <4 x float> %a1, x86_mmx *%a2) {
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; HASWELL-LABEL: test_movlhps:			; HASWELL-LABEL: test_movlhps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:1.00]			; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:1.00]
	; HASWELL-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movlhps:			; BTVER2-LABEL: test_movlhps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:0.50]			; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [6:1.00]
	; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm0, %xmm1, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 1, i32 4, i32 5>			%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	%2 = fadd <4 x float> %a1, %1			%2 = fadd <4 x float> %a1, %1
	ret <4 x float> %2			ret <4 x float> %2
	}			}

	define void @test_movlps(<4 x float> %a0, <4 x float> %a1, x86_mmx *%a2) {			define void @test_movlps(<4 x float> %a0, <4 x float> %a1, x86_mmx *%a2) {
	▲ Show 20 Lines • Show All 842 Lines • ▼ Show 20 Lines
	; HASWELL-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 # sched: [19:1.00]			; HASWELL-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 # sched: [19:1.00]
	; HASWELL-NEXT: vmovaps (%rdi), %xmm1 # sched: [4:0.50]			; HASWELL-NEXT: vmovaps (%rdi), %xmm1 # sched: [4:0.50]
	; HASWELL-NEXT: vsqrtss %xmm1, %xmm1, %xmm1 # sched: [19:1.00]			; HASWELL-NEXT: vsqrtss %xmm1, %xmm1, %xmm1 # sched: [19:1.00]
	; HASWELL-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_sqrtss:			; BTVER2-LABEL: test_sqrtss:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovaps (%rdi), %xmm1 # sched: [5:1.00]			; BTVER2-NEXT: vmovaps (%rdi), %xmm1 # sched: [6:1.00]
	; BTVER2-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtss %xmm0, %xmm0, %xmm0 # sched: [26:21.00]
	; BTVER2-NEXT: vsqrtss %xmm1, %xmm1, %xmm1 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtss %xmm1, %xmm1, %xmm1 # sched: [26:21.00]
	; BTVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddps %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a0)			%1 = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %a0)
	%2 = load <4 x float>, <4 x float> *%a1, align 16			%2 = load <4 x float>, <4 x float> *%a1, align 16
	%3 = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %2)			%3 = call <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float> %2)
	%4 = fadd <4 x float> %1, %3			%4 = fadd <4 x float> %1, %3
	▲ Show 20 Lines • Show All 253 Lines • ▼ Show 20 Lines
	; HASWELL-LABEL: test_unpckhps:			; HASWELL-LABEL: test_unpckhps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] sched: [1:1.00]			; HASWELL-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] sched: [1:1.00]
	; HASWELL-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],mem[2],xmm0[3],mem[3] sched: [5:1.00]			; HASWELL-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],mem[2],xmm0[3],mem[3] sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpckhps:			; BTVER2-LABEL: test_unpckhps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] sched: [1:0.50]			; BTVER2-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3] sched: [6:1.00]
	; BTVER2-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],mem[2],xmm0[3],mem[3] sched: [6:1.00]			; BTVER2-NEXT: vunpckhps {{.*#+}} xmm0 = xmm0[2],mem[2],xmm0[3],mem[3] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	%2 = load <4 x float>, <4 x float> *%a2, align 16			%2 = load <4 x float>, <4 x float> *%a2, align 16
	%3 = shufflevector <4 x float> %1, <4 x float> %2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			%3 = shufflevector <4 x float> %1, <4 x float> %2, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	ret <4 x float> %3			ret <4 x float> %3
	}			}

	Show All 29 Lines
	; HASWELL-LABEL: test_unpcklps:			; HASWELL-LABEL: test_unpcklps:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] sched: [1:1.00]			; HASWELL-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] sched: [1:1.00]
	; HASWELL-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] sched: [5:1.00]			; HASWELL-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] sched: [5:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpcklps:			; BTVER2-LABEL: test_unpcklps:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] sched: [1:0.50]			; BTVER2-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] sched: [6:1.00]
	; BTVER2-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] sched: [6:1.00]			; BTVER2-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%1 = shufflevector <4 x float> %a0, <4 x float> %a1, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	%2 = load <4 x float>, <4 x float> *%a2, align 16			%2 = load <4 x float>, <4 x float> *%a2, align 16
	%3 = shufflevector <4 x float> %1, <4 x float> %2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%3 = shufflevector <4 x float> %1, <4 x float> %2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	ret <4 x float> %3			ret <4 x float> %3
	}			}

	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/CodeGen/X86/sse2-schedule.ll

	Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vmovapd (%rdi), %xmm0 # sched: [4:0.50]			; HASWELL-NEXT: vmovapd (%rdi), %xmm0 # sched: [4:0.50]
	; HASWELL-NEXT: vaddpd %xmm0, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %xmm0, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: vmovapd %xmm0, (%rsi) # sched: [1:1.00]			; HASWELL-NEXT: vmovapd %xmm0, (%rsi) # sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movapd:			; BTVER2-LABEL: test_movapd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovapd (%rdi), %xmm0 # sched: [5:1.00]			; BTVER2-NEXT: vmovapd (%rdi), %xmm0 # sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %xmm0, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %xmm0, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: vmovapd %xmm0, (%rsi) # sched: [1:1.00]			; BTVER2-NEXT: vmovapd %xmm0, (%rsi) # sched: [1:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = load <2 x double>, <2 x double> *%a0, align 16			%1 = load <2 x double>, <2 x double> *%a0, align 16
	%2 = fadd <2 x double> %1, %1			%2 = fadd <2 x double> %1, %1
	store <2 x double> %2, <2 x double> *%a1, align 16			store <2 x double> %2, <2 x double> *%a1, align 16
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 630 Lines • ▼ Show 20 Lines
	;			;
	; HASWELL-LABEL: test_movsd_reg:			; HASWELL-LABEL: test_movsd_reg:
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0] sched: [1:1.00]			; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0] sched: [1:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_movsd_reg:			; BTVER2-LABEL: test_movsd_reg:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0] sched: [1:0.50]			; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm1[0],xmm0[0] sched: [6:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 2, i32 0>			%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 2, i32 0>
	ret <2 x double> %1			ret <2 x double> %1
	}			}

	define void @test_movupd(<2 x double> %a0, <2 x double> %a1) {			define void @test_movupd(<2 x double> %a0, <2 x double> %a1) {
	; GENERIC-LABEL: test_movupd:			; GENERIC-LABEL: test_movupd:
	; GENERIC: # BB#0:			; GENERIC: # BB#0:
	▲ Show 20 Lines • Show All 3,444 Lines • ▼ Show 20 Lines
	; HASWELL-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 # sched: [19:1.00]			; HASWELL-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 # sched: [19:1.00]
	; HASWELL-NEXT: vmovapd (%rdi), %xmm1 # sched: [4:0.50]			; HASWELL-NEXT: vmovapd (%rdi), %xmm1 # sched: [4:0.50]
	; HASWELL-NEXT: vsqrtsd %xmm1, %xmm1, %xmm1 # sched: [19:1.00]			; HASWELL-NEXT: vsqrtsd %xmm1, %xmm1, %xmm1 # sched: [19:1.00]
	; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_sqrtsd:			; BTVER2-LABEL: test_sqrtsd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vmovapd (%rdi), %xmm1 # sched: [5:1.00]			; BTVER2-NEXT: vmovapd (%rdi), %xmm1 # sched: [6:1.00]
	; BTVER2-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtsd %xmm0, %xmm0, %xmm0 # sched: [26:21.00]
	; BTVER2-NEXT: vsqrtsd %xmm1, %xmm1, %xmm1 # sched: [26:21.00]			; BTVER2-NEXT: vsqrtsd %xmm1, %xmm1, %xmm1 # sched: [26:21.00]
	; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a0)			%1 = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a0)
	%2 = load <2 x double>, <2 x double> *%a1, align 16			%2 = load <2 x double>, <2 x double> *%a1, align 16
	%3 = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)			%3 = call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)
	%4 = fadd <2 x double> %1, %3			%4 = fadd <2 x double> %1, %3
	▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1] sched: [1:1.00]			; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1] sched: [1:1.00]
	; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm1 = xmm1[1],mem[1] sched: [5:1.00]			; HASWELL-NEXT: vunpckhpd {{.*#+}} xmm1 = xmm1[1],mem[1] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpckhpd:			; BTVER2-LABEL: test_unpckhpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1] sched: [1:0.50]			; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1] sched: [6:1.00]
	; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm1 = xmm1[1],mem[1] sched: [6:1.00]			; BTVER2-NEXT: vunpckhpd {{.*#+}} xmm1 = xmm1[1],mem[1] sched: [6:1.00]
	; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 1, i32 3>			%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 1, i32 3>
	%2 = load <2 x double>, <2 x double> *%a2, align 16			%2 = load <2 x double>, <2 x double> *%a2, align 16
	%3 = shufflevector <2 x double> %a1, <2 x double> %2, <2 x i32> <i32 1, i32 3>			%3 = shufflevector <2 x double> %a1, <2 x double> %2, <2 x i32> <i32 1, i32 3>
	%4 = fadd <2 x double> %1, %3			%4 = fadd <2 x double> %1, %3
	ret <2 x double> %4			ret <2 x double> %4
	Show All 38 Lines
	; HASWELL: # BB#0:			; HASWELL: # BB#0:
	; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:1.00]			; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:1.00]
	; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [5:1.00]			; HASWELL-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [5:1.00]
	; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; HASWELL-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; HASWELL-NEXT: retq # sched: [1:1.00]			; HASWELL-NEXT: retq # sched: [1:1.00]
	;			;
	; BTVER2-LABEL: test_unpcklpd:			; BTVER2-LABEL: test_unpcklpd:
	; BTVER2: # BB#0:			; BTVER2: # BB#0:
	; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [1:0.50]			; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0] sched: [6:1.00]
	; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:1.00]			; BTVER2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0] sched: [6:1.00]
				RKSimonUnsubmitted Not Done Reply Inline Actions Jaguar has a max of 1 load/cycle - so the tp should still be 1.00 RKSimon: Jaguar has a max of 1 load/cycle - so the tp should still be 1.00
	; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]			; BTVER2-NEXT: vaddpd %xmm1, %xmm0, %xmm0 # sched: [3:1.00]
	; BTVER2-NEXT: retq # sched: [4:1.00]			; BTVER2-NEXT: retq # sched: [4:1.00]
	%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 0, i32 2>			%1 = shufflevector <2 x double> %a0, <2 x double> %a1, <2 x i32> <i32 0, i32 2>
	%2 = load <2 x double>, <2 x double> *%a2, align 16			%2 = load <2 x double>, <2 x double> *%a2, align 16
	%3 = shufflevector <2 x double> %1, <2 x double> %2, <2 x i32> <i32 0, i32 2>			%3 = shufflevector <2 x double> %1, <2 x double> %2, <2 x i32> <i32 0, i32 2>
	%4 = fadd <2 x double> %1, %3			%4 = fadd <2 x double> %1, %3
	ret <2 x double> %4			ret <2 x double> %4
	}			}
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573)AbandonedPublic

Details

Diff Detail

Event Timeline

BB#0:

Revision Contents

Diff 99105

include/llvm/CodeGen/TargetSchedule.h

lib/CodeGen/TargetSchedule.cpp

lib/Target/X86/X86ScheduleBtVer2.td

test/CodeGen/X86/avx-schedule.ll

test/CodeGen/X86/recip-fastmath.ll

test/CodeGen/X86/recip-fastmath2.ll

test/CodeGen/X86/slow-unaligned-mem.ll

BB#0:

test/CodeGen/X86/sse-schedule.ll

test/CodeGen/X86/sse2-schedule.ll

[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573)
AbandonedPublic