Diff 491152

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,874 Lines • ▼ Show 20 Lines	public:
virtual bool hasPairedLoad(EVT /LoadedType/,		virtual bool hasPairedLoad(EVT /LoadedType/,
Align & /RequiredAlignment/) const {		Align & /RequiredAlignment/) const {
return false;		return false;
}		}

/// Return true if the target has a vector blend instruction.		/// Return true if the target has a vector blend instruction.
virtual bool hasVectorBlend() const { return false; }		virtual bool hasVectorBlend() const { return false; }

		/// Return true if the target can efficiently produce a VecVT-typed vector of
		/// Elt elements, which may be either a scalar, or a vector itself,
		/// to replace a chain of scalar stores of the Elt with that vector.
		virtual bool shouldVectorizeScalarElementSplattingStores(SDValue Elt,
		EVT VecVT) const {
		return false;
		}

/// Get the maximum supported factor for interleaved memory accesses.		/// Get the maximum supported factor for interleaved memory accesses.
/// Default to be the minimum interleave factor: 2.		/// Default to be the minimum interleave factor: 2.
virtual unsigned getMaxSupportedInterleaveFactor() const { return 2; }		virtual unsigned getMaxSupportedInterleaveFactor() const { return 2; }

/// Lower an interleaved load to target specific intrinsics. Return		/// Lower an interleaved load to target specific intrinsics. Return
/// true on success.		/// true on success.
///		///
/// \p LI is the vector load instruction.		/// \p LI is the vector load instruction.
▲ Show 20 Lines • Show All 2,308 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	struct MemOpLink {
// Offset from the base ptr.		// Offset from the base ptr.
int64_t OffsetFromBase;		int64_t OffsetFromBase;

MemOpLink(LSBaseSDNode *N, int64_t Offset)		MemOpLink(LSBaseSDNode *N, int64_t Offset)
: MemNode(N), OffsetFromBase(Offset) {}		: MemNode(N), OffsetFromBase(Offset) {}
};		};

// Classify the origin of a stored value.		// Classify the origin of a stored value.
enum class StoreSource { Unknown, Constant, Extract, Load };		enum class StoreSource { Unknown, Splat, Constant, Extract, Load };
StoreSource getStoreSource(SDValue StoreVal) {		StoreSource getStoreSource(SDValue StoreVal) {
switch (StoreVal.getOpcode()) {		switch (StoreVal.getOpcode()) {
case ISD::Constant:		case ISD::Constant:
case ISD::ConstantFP:		case ISD::ConstantFP:
return StoreSource::Constant;		return StoreSource::Constant;
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
case ISD::EXTRACT_SUBVECTOR:		case ISD::EXTRACT_SUBVECTOR:
return StoreSource::Extract;		return StoreSource::Extract;
case ISD::LOAD:		case ISD::LOAD:
return StoreSource::Load;		return StoreSource::Load;
default:		default:
return StoreSource::Unknown;		return StoreSource::Splat;
		pengfeiUnsubmitted Done Reply Inline Actions Is this too strong to assume all the other cases are `Splat`? If this is the fact, why adding it in `StoreSource` rather than replacing `Unknown`? pengfei: Is this too strong to assume all the other cases are `Splat`? If this is the fact, why adding…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions What do you mean by "too strong"? Originally we'd just return `StoreSource::Unknown` and bailout without looking at other stores. If the other stores don't store the same value, then we'll bailout just the same. There is probably some generalization -- just because we've matched a more fine-grained `StoreSource`, doesn't mean that the stores aren't splatting the same value -- but that is not a required part of this enhancement. As for `Unknown`, i don't know, but it seems not useful to just drop it, since it can be used to signal invalid state. lebedev.ri: What do you mean by "too strong"? Originally we'd just return `StoreSource::Unknown` and…
		pengfeiUnsubmitted Done Reply Inline Actions Alright, how about to use `MayBeSplat`? I got the logic now. There is a `getStoreSource` before calling `getStoreMergeCandidates` and there is another `getStoreSource` in `getStoreMergeCandidates`. I think this is just a trick to make it works. It introduces uncertainty into the result. People may get unexpected result if they use this function somewhere else. pengfei: Alright, how about to use `MayBeSplat`? I got the logic now. There is a `getStoreSource`…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Essentially, we want to guess what strategy will succeed to merge the stores, looking at a single store. `StoreSource` don't really describe the source, but names a strategy, it's not an ideal enum name. But likewise, i'm not sure why `MayBeSplat` is a better strategy name? It either is if the strategy matched, or isn't if it didn't. I got the logic now. There is a getStoreSource before calling getStoreMergeCandidates and there is another getStoreSource in getStoreMergeCandidates. Yes. I think this is just a trick to make it works. What is? It introduces uncertainty into the result. People may get unexpected result if they use this function somewhere else. Sorry, i do not follow. lebedev.ri: Essentially, we want to guess what strategy will succeed to merge the stores, looking at a…
		pengfeiUnsubmitted Done Reply Inline Actions `StoreSource` don't really describe the source, but names a strategy, it's not an ideal enum name. I understand it now. Thanks for the explanation! I had wrong assumption about the function. Please ignore the comments. pengfei: > `StoreSource` don't really describe the source, but names a strategy, it's not an ideal enum…
}		}
}		}

/// This is a helper function for visitMUL to check the profitability		/// This is a helper function for visitMUL to check the profitability
/// of folding (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2).		/// of folding (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2).
/// MulNode is the original multiply, AddNode is (add x, c1),		/// MulNode is the original multiply, AddNode is (add x, c1),
/// and ConstNode is c2.		/// and ConstNode is c2.
bool isMulAddWithConstProfitable(SDNode *MulNode, SDValue AddNode,		bool isMulAddWithConstProfitable(SDNode *MulNode, SDValue AddNode,
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	private:
/// This is a helper function for mergeConsecutiveStores. Given a list of		/// This is a helper function for mergeConsecutiveStores. Given a list of
/// store candidates, find the first N that are consecutive in memory.		/// store candidates, find the first N that are consecutive in memory.
/// Returns 0 if there are not at least 2 consecutive stores to try merging.		/// Returns 0 if there are not at least 2 consecutive stores to try merging.
unsigned getConsecutiveStores(SmallVectorImpl<MemOpLink> &StoreNodes,		unsigned getConsecutiveStores(SmallVectorImpl<MemOpLink> &StoreNodes,
int64_t ElementSizeBytes) const;		int64_t ElementSizeBytes) const;

/// This is a helper function for mergeConsecutiveStores. It is used for		/// This is a helper function for mergeConsecutiveStores. It is used for
/// store chains that are composed entirely of constant values.		/// store chains that are composed entirely of constant values.
bool tryStoreMergeOfConstants(SmallVectorImpl<MemOpLink> &StoreNodes,		bool tryStoreMergeOfConstantsOrEltSplat(
unsigned NumConsecutiveStores,		SmallVectorImpl<MemOpLink> &StoreNodes, unsigned NumConsecutiveStores,
EVT MemVT, SDNode *Root, bool AllowVectors);		EVT MemVT, SDNode *Root, bool IsConstantSrc, bool AllowVectors);

/// This is a helper function for mergeConsecutiveStores. It is used for		/// This is a helper function for mergeConsecutiveStores. It is used for
/// store chains that are composed entirely of extracted vector elements.		/// store chains that are composed entirely of extracted vector elements.
/// When extracting multiple vector elements, try to store them in one		/// When extracting multiple vector elements, try to store them in one
/// vector store rather than a sequence of scalar stores.		/// vector store rather than a sequence of scalar stores.
bool tryStoreMergeOfExtracts(SmallVectorImpl<MemOpLink> &StoreNodes,		bool tryStoreMergeOfExtracts(SmallVectorImpl<MemOpLink> &StoreNodes,
unsigned NumConsecutiveStores, EVT MemVT,		unsigned NumConsecutiveStores, EVT MemVT,
SDNode *Root);		SDNode *Root);
▲ Show 20 Lines • Show All 17,974 Lines • ▼ Show 20 Lines	case StoreSource::Extract:
if (Other->isTruncatingStore())		if (Other->isTruncatingStore())
return false;		return false;
if (!MemVT.bitsEq(OtherBC.getValueType()))		if (!MemVT.bitsEq(OtherBC.getValueType()))
return false;		return false;
if (OtherBC.getOpcode() != ISD::EXTRACT_VECTOR_ELT &&		if (OtherBC.getOpcode() != ISD::EXTRACT_VECTOR_ELT &&
OtherBC.getOpcode() != ISD::EXTRACT_SUBVECTOR)		OtherBC.getOpcode() != ISD::EXTRACT_SUBVECTOR)
return false;		return false;
break;		break;
		case StoreSource::Splat:
		if (OtherBC != Val)
		return false;
		break;
default:		default:
llvm_unreachable("Unhandled store source for merging");		llvm_unreachable("Unhandled store source for merging");
}		}
Ptr = BaseIndexOffset::match(Other, DAG);		Ptr = BaseIndexOffset::match(Other, DAG);
return (BasePtr.equalBaseIndex(Ptr, DAG, Offset));		return (BasePtr.equalBaseIndex(Ptr, DAG, Offset));
};		};

// Check if the pair of StoreNode and the RootNode already bail out many		// Check if the pair of StoreNode and the RootNode already bail out many
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	if (NumConsecutiveStores > 1)
return NumConsecutiveStores;		return NumConsecutiveStores;

// There are no consecutive stores at the start of the list.		// There are no consecutive stores at the start of the list.
// Remove the first store and try again.		// Remove the first store and try again.
StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + 1);		StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + 1);
}		}
}		}

bool DAGCombiner::tryStoreMergeOfConstants(		bool DAGCombiner::tryStoreMergeOfConstantsOrEltSplat(
SmallVectorImpl<MemOpLink> &StoreNodes, unsigned NumConsecutiveStores,		SmallVectorImpl<MemOpLink> &StoreNodes, unsigned NumConsecutiveStores,
EVT MemVT, SDNode *RootNode, bool AllowVectors) {		EVT MemVT, SDNode *RootNode, bool IsConstantSrc, bool AllowVectors) {
LLVMContext &Context = *DAG.getContext();		LLVMContext &Context = *DAG.getContext();
const DataLayout &DL = DAG.getDataLayout();		const DataLayout &DL = DAG.getDataLayout();
int64_t ElementSizeBytes = MemVT.getStoreSize();		int64_t ElementSizeBytes = MemVT.getStoreSize();
unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1;		unsigned NumMemElts = MemVT.isVector() ? MemVT.getVectorNumElements() : 1;
bool MadeChange = false;		bool MadeChange = false;

// Store the constants into memory as one consecutive store.		// Store the constants/same element into memory as one consecutive store.
while (NumConsecutiveStores >= 2) {		while (NumConsecutiveStores >= 2) {
LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;		LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;
unsigned FirstStoreAS = FirstInChain->getAddressSpace();		unsigned FirstStoreAS = FirstInChain->getAddressSpace();
Align FirstStoreAlign = FirstInChain->getAlign();		Align FirstStoreAlign = FirstInChain->getAlign();
unsigned LastLegalType = 1;		unsigned LastLegalType = 1;
unsigned LastLegalVectorType = 1;		unsigned LastLegalVectorType = 1;
bool LastIntegerTrunc = false;		bool LastIntegerTrunc = false;
bool NonZero = false;		bool NonZero = false;
unsigned FirstZeroAfterNonZero = NumConsecutiveStores;		unsigned FirstZeroAfterNonZero = NumConsecutiveStores;
for (unsigned i = 0; i < NumConsecutiveStores; ++i) {		for (unsigned i = 0; i < NumConsecutiveStores; ++i) {
StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i].MemNode);		StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i].MemNode);
SDValue StoredVal = ST->getValue();		SDValue StoredVal = ST->getValue();
		if (IsConstantSrc) {
bool IsElementZero = false;		bool IsElementZero = false;
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(StoredVal))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(StoredVal))
IsElementZero = C->isZero();		IsElementZero = C->isZero();
else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(StoredVal))		else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(StoredVal))
IsElementZero = C->getConstantFPValue()->isNullValue();		IsElementZero = C->getConstantFPValue()->isNullValue();
if (IsElementZero) {		if (IsElementZero) {
if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores)		if (NonZero && FirstZeroAfterNonZero == NumConsecutiveStores)
FirstZeroAfterNonZero = i;		FirstZeroAfterNonZero = i;
}		}
NonZero \|= !IsElementZero;		NonZero \|= !IsElementZero;
		}

// Find a legal type for the constant store.		// Find a legal type for the constant store.
unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;		unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;
EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);		EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);
unsigned IsFast = 0;		unsigned IsFast = 0;

// Break early when size is too large to be legal.		// Break early when size is too large to be legal.
if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits)		if (StoreTy.getSizeInBits() > MaximumLegalStoreInBits)
Show All 20 Lines	for (unsigned i = 0; i < NumConsecutiveStores; ++i) {
IsFast) {		IsFast) {
LastIntegerTrunc = true;		LastIntegerTrunc = true;
LastLegalType = i + 1;		LastLegalType = i + 1;
}		}
}		}

// We only use vectors if the constant is known to be zero or the		// We only use vectors if the constant is known to be zero or the
// target allows it and the function is not marked with the		// target allows it and the function is not marked with the
// noimplicitfloat attribute.		// noimplicitfloat attribute, or we are merging a splatting stores.
if ((!NonZero \|\|		if (!IsConstantSrc \|\| ((!NonZero \|\| TLI.storeOfVectorConstantIsCheap(
TLI.storeOfVectorConstantIsCheap(MemVT, i + 1, FirstStoreAS)) &&		MemVT, i + 1, FirstStoreAS)) &&
AllowVectors) {		AllowVectors)) {
// Find a legal type for the vector store.		// Find a legal type for the vector store.
unsigned Elts = (i + 1) * NumMemElts;		unsigned Elts = (i + 1) * NumMemElts;
EVT Ty = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts);		EVT Ty = EVT::getVectorVT(Context, MemVT.getScalarType(), Elts);
if (TLI.isTypeLegal(Ty) && TLI.isTypeLegal(MemVT) &&		if (TLI.isTypeLegal(Ty) && TLI.isTypeLegal(MemVT) &&
TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG.getMachineFunction()) &&		TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG.getMachineFunction()) &&
TLI.allowsMemoryAccess(Context, DL, Ty,		TLI.allowsMemoryAccess(Context, DL, Ty,
*FirstInChain->getMemOperand(), &IsFast) &&		*FirstInChain->getMemOperand(), &IsFast) &&
IsFast)		IsFast)
LastLegalVectorType = i + 1;		LastLegalVectorType = i + 1;
}		}
}		}

bool UseVector = (LastLegalVectorType > LastLegalType) && AllowVectors;		bool UseVector = !IsConstantSrc \|\|
		((LastLegalVectorType > LastLegalType) && AllowVectors);
unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType;		unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType;
bool UseTrunc = LastIntegerTrunc && !UseVector;		bool UseTrunc = LastIntegerTrunc && !UseVector;

// Check if we found a legal integer type that creates a meaningful		// Check if we found a legal integer type that creates a meaningful
// merge.		// merge.
if (NumElem < 2) {		if (NumElem < 2) {
// We know that candidate stores are in order and of correct		// We know that candidate stores are in order and of correct
// shape. While there is no mergeable sequence from the		// shape. While there is no mergeable sequence from the
Show All 16 Lines	while (NumConsecutiveStores >= 2) {
// Check that we can merge these candidates without causing a cycle.		// Check that we can merge these candidates without causing a cycle.
if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem,		if (!checkMergeStoreCandidatesForDependencies(StoreNodes, NumElem,
RootNode)) {		RootNode)) {
StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem);		StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem);
NumConsecutiveStores -= NumElem;		NumConsecutiveStores -= NumElem;
continue;		continue;
}		}

MadeChange \|= mergeStoresOfConstantsOrVecElts(StoreNodes, MemVT, NumElem,		// If we are producing a splat-store, we want a profitability check.
/IsConstantSrc/ true,		if (!IsConstantSrc) {
UseVector, UseTrunc);		// Get the type for the merged vector store.
		unsigned Elts = NumElem * NumMemElts;
		EVT StoreTy =
		EVT::getVectorVT(*DAG.getContext(), MemVT.getScalarType(), Elts);
		// Can we efficiently produce a vector of the value we are splatting?
		if (!TLI.shouldVectorizeScalarElementSplattingStores(
		cast<StoreSDNode>(StoreNodes[0].MemNode)->getValue(), StoreTy)) {
		StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem);
		NumConsecutiveStores -= NumElem;
		continue;
		}
		}

		MadeChange \|= mergeStoresOfConstantsOrVecElts(
		StoreNodes, MemVT, NumElem, IsConstantSrc, UseVector, UseTrunc);

// Remove merged stores for next iteration.		// Remove merged stores for next iteration.
StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem);		StoreNodes.erase(StoreNodes.begin(), StoreNodes.begin() + NumElem);
NumConsecutiveStores -= NumElem;		NumConsecutiveStores -= NumElem;
}		}
return MadeChange;		return MadeChange;
}		}

▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	bool DAGCombiner::mergeConsecutiveStores(StoreSDNode *St) {
// This function cannot currently deal with non-byte-sized memory sizes.		// This function cannot currently deal with non-byte-sized memory sizes.
int64_t ElementSizeBytes = MemVT.getStoreSize();		int64_t ElementSizeBytes = MemVT.getStoreSize();
if (ElementSizeBytes * 8 != (int64_t)MemVT.getSizeInBits())		if (ElementSizeBytes * 8 != (int64_t)MemVT.getSizeInBits())
return false;		return false;

// Do not bother looking at stored values that are not constants, loads, or		// Do not bother looking at stored values that are not constants, loads, or
// extracted vector elements.		// extracted vector elements.
SDValue StoredVal = peekThroughBitcasts(St->getValue());		SDValue StoredVal = peekThroughBitcasts(St->getValue());
const StoreSource StoreSrc = getStoreSource(StoredVal);		const StoreSource StoreSrc = getStoreSource(StoredVal);
if (StoreSrc == StoreSource::Unknown)		assert(StoreSrc != StoreSource::Unknown && "Expected known source for store");
		pengfeiUnsubmitted Done Reply Inline Actions This is dead now. pengfei: This is dead now.
return false;

SmallVector<MemOpLink, 8> StoreNodes;		SmallVector<MemOpLink, 8> StoreNodes;
SDNode *RootNode;		SDNode *RootNode;
// Find potential store merge candidates by searching through chain sub-DAG		// Find potential store merge candidates by searching through chain sub-DAG
getStoreMergeCandidates(St, StoreNodes, RootNode);		getStoreMergeCandidates(St, StoreNodes, RootNode);

// Check if there is anything to merge.		// Check if there is anything to merge.
if (StoreNodes.size() < 2)		if (StoreNodes.size() < 2)
Show All 25 Lines	while (StoreNodes.size() > 1) {
// There are no more stores in the list to examine.		// There are no more stores in the list to examine.
if (NumConsecutiveStores == 0)		if (NumConsecutiveStores == 0)
return MadeChange;		return MadeChange;

// We have at least 2 consecutive stores. Try to merge them.		// We have at least 2 consecutive stores. Try to merge them.
assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores");		assert(NumConsecutiveStores >= 2 && "Expected at least 2 stores");
switch (StoreSrc) {		switch (StoreSrc) {
case StoreSource::Constant:		case StoreSource::Constant:
MadeChange \|= tryStoreMergeOfConstants(StoreNodes, NumConsecutiveStores,		case StoreSource::Splat:
MemVT, RootNode, AllowVectors);		MadeChange \|= tryStoreMergeOfConstantsOrEltSplat(
		StoreNodes, NumConsecutiveStores, MemVT, RootNode,
		/IsConstantSrc=/StoreSrc == StoreSource::Constant, AllowVectors);
break;		break;

case StoreSource::Extract:		case StoreSource::Extract:
MadeChange \|= tryStoreMergeOfExtracts(StoreNodes, NumConsecutiveStores,		MadeChange \|= tryStoreMergeOfExtracts(StoreNodes, NumConsecutiveStores,
MemVT, RootNode);		MemVT, RootNode);
break;		break;

case StoreSource::Load:		case StoreSource::Load:
▲ Show 20 Lines • Show All 6,867 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,510 Lines • ▼ Show 20 Lines	public:
bool hasStackProbeSymbol(const MachineFunction &MF) const override;		bool hasStackProbeSymbol(const MachineFunction &MF) const override;
bool hasInlineStackProbe(const MachineFunction &MF) const override;		bool hasInlineStackProbe(const MachineFunction &MF) const override;
StringRef getStackProbeSymbolName(const MachineFunction &MF) const override;		StringRef getStackProbeSymbolName(const MachineFunction &MF) const override;

unsigned getStackProbeSize(const MachineFunction &MF) const;		unsigned getStackProbeSize(const MachineFunction &MF) const;

bool hasVectorBlend() const override { return true; }		bool hasVectorBlend() const override { return true; }

		bool shouldVectorizeScalarElementSplattingStores(SDValue Elt,
		EVT VecVT) const override;

unsigned getMaxSupportedInterleaveFactor() const override { return 4; }		unsigned getMaxSupportedInterleaveFactor() const override { return 4; }

bool isInlineAsmTargetBranch(const SmallVectorImpl<StringRef> &AsmStrs,		bool isInlineAsmTargetBranch(const SmallVectorImpl<StringRef> &AsmStrs,
unsigned OpNo) const override;		unsigned OpNo) const override;

/// Lower interleaved load(s) into target specific		/// Lower interleaved load(s) into target specific
/// instructions/intrinsics.		/// instructions/intrinsics.
bool lowerInterleavedLoad(LoadInst *LI,		bool lowerInterleavedLoad(LoadInst *LI,
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 57,751 Lines • ▼ Show 20 Lines
	}			}

	Align X86TargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {			Align X86TargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
	if (ML->isInnermost() &&			if (ML->isInnermost() &&
	ExperimentalPrefInnermostLoopAlignment.getNumOccurrences())			ExperimentalPrefInnermostLoopAlignment.getNumOccurrences())
	return Align(1ULL << ExperimentalPrefInnermostLoopAlignment);			return Align(1ULL << ExperimentalPrefInnermostLoopAlignment);
	return TargetLowering::getPrefLoopAlignment();			return TargetLowering::getPrefLoopAlignment();
	}			}

				bool X86TargetLowering::shouldVectorizeScalarElementSplattingStores(
				SDValue Elt, EVT VecVT) const {
				return Subtarget.hasSSE2() && VecVT.getSizeInBits() >= 128;
				}

llvm/test/CodeGen/X86/MergeConsecutiveStores.ll

	Show First 20 Lines • Show All 766 Lines • ▼ Show 20 Lines

	}			}

	; Merging vector stores when sourced from a constant vector is not currently handled.			; Merging vector stores when sourced from a constant vector is not currently handled.
	define void @merge_vec_stores_of_constants(<4 x i32>* %ptr) {			define void @merge_vec_stores_of_constants(<4 x i32>* %ptr) {
	; CHECK-LABEL: merge_vec_stores_of_constants:			; CHECK-LABEL: merge_vec_stores_of_constants:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0			; CHECK-NEXT: vxorps %xmm0, %xmm0, %xmm0
	; CHECK-NEXT: vmovaps %xmm0, 48(%rdi)			; CHECK-NEXT: vmovups %ymm0, 48(%rdi)
	; CHECK-NEXT: vmovaps %xmm0, 64(%rdi)			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%idx0 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 3			%idx0 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 3
	%idx1 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 4			%idx1 = getelementptr inbounds <4 x i32>, <4 x i32>* %ptr, i64 4
	store <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32>* %idx0, align 16			store <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32>* %idx0, align 16
	store <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32>* %idx1, align 16			store <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32>* %idx1, align 16
	ret void			ret void

	}			}
	▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/elementwise-store-of-scalar-splat.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=-sse2 \| FileCheck %s --check-prefixes=ALL,SCALAR		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=-sse2 \| FileCheck %s --check-prefixes=ALL,SCALAR
		RKSimonUnsubmitted Done Reply Inline Actions why is this 'SCALAR' given that the triple implicitly assumes SSE2? RKSimon: why is this 'SCALAR' given that the triple implicitly assumes SSE2?
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=ALL,SSE,SSE2,SSE2-ONLY		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse2 \| FileCheck %s --check-prefixes=ALL,SSE,SSE2,SSE2-ONLY
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse3 \| FileCheck %s --check-prefixes=ALL,SSE,SSE2,SSE3		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse3 \| FileCheck %s --check-prefixes=ALL,SSE,SSE2,SSE3
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+ssse3 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSSE3-ONLY		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+ssse3 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSSE3-ONLY
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.1 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSE41		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.1 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSE41
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.2 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSE42		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+sse4.2 \| FileCheck %s --check-prefixes=ALL,SSE,SSSE3,SSE42
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=ALL,AVX,AVX1		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx \| FileCheck %s --check-prefixes=ALL,AVX,AVX1
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=ALL,AVX,AVX2		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 \| FileCheck %s --check-prefixes=ALL,AVX,AVX2
; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512vl \| FileCheck %s --check-prefixes=ALL,AVX512,AVX512F		; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512vl \| FileCheck %s --check-prefixes=ALL,AVX512,AVX512F
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0
store float %in.elt, ptr %out.elt0.ptr, align 64		store float %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1
store float %in.elt, ptr %out.elt1.ptr, align 4		store float %in.elt, ptr %out.elt1.ptr, align 4
ret void		ret void
}		}

define void @vec128_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_i8:		; SCALAR-LABEL: vec128_i8:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movzbl (%rdi), %eax		; SCALAR-NEXT: movzbl (%rdi), %eax
; ALL-NEXT: notb %al		; SCALAR-NEXT: notb %al
; ALL-NEXT: movb %al, (%rsi)		; SCALAR-NEXT: movb %al, (%rsi)
; ALL-NEXT: movb %al, 1(%rsi)		; SCALAR-NEXT: movb %al, 1(%rsi)
; ALL-NEXT: movb %al, 2(%rsi)		; SCALAR-NEXT: movb %al, 2(%rsi)
; ALL-NEXT: movb %al, 3(%rsi)		; SCALAR-NEXT: movb %al, 3(%rsi)
; ALL-NEXT: movb %al, 4(%rsi)		; SCALAR-NEXT: movb %al, 4(%rsi)
; ALL-NEXT: movb %al, 5(%rsi)		; SCALAR-NEXT: movb %al, 5(%rsi)
; ALL-NEXT: movb %al, 6(%rsi)		; SCALAR-NEXT: movb %al, 6(%rsi)
; ALL-NEXT: movb %al, 7(%rsi)		; SCALAR-NEXT: movb %al, 7(%rsi)
; ALL-NEXT: movb %al, 8(%rsi)		; SCALAR-NEXT: movb %al, 8(%rsi)
; ALL-NEXT: movb %al, 9(%rsi)		; SCALAR-NEXT: movb %al, 9(%rsi)
; ALL-NEXT: movb %al, 10(%rsi)		; SCALAR-NEXT: movb %al, 10(%rsi)
; ALL-NEXT: movb %al, 11(%rsi)		; SCALAR-NEXT: movb %al, 11(%rsi)
; ALL-NEXT: movb %al, 12(%rsi)		; SCALAR-NEXT: movb %al, 12(%rsi)
; ALL-NEXT: movb %al, 13(%rsi)		; SCALAR-NEXT: movb %al, 13(%rsi)
; ALL-NEXT: movb %al, 14(%rsi)		; SCALAR-NEXT: movb %al, 14(%rsi)
; ALL-NEXT: movb %al, 15(%rsi)		; SCALAR-NEXT: movb %al, 15(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE2-LABEL: vec128_i8:
		; SSE2: # %bb.0:
		; SSE2-NEXT: movzbl (%rdi), %eax
		; SSE2-NEXT: notb %al
		; SSE2-NEXT: movzbl %al, %eax
		; SSE2-NEXT: movd %eax, %xmm0
		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE2-NEXT: movdqa %xmm0, (%rsi)
		; SSE2-NEXT: retq
		;
		; SSSE3-LABEL: vec128_i8:
		; SSSE3: # %bb.0:
		; SSSE3-NEXT: movzbl (%rdi), %eax
		; SSSE3-NEXT: notb %al
		; SSSE3-NEXT: movzbl %al, %eax
		; SSSE3-NEXT: movd %eax, %xmm0
		; SSSE3-NEXT: pxor %xmm1, %xmm1
		; SSSE3-NEXT: pshufb %xmm1, %xmm0
		; SSSE3-NEXT: movdqa %xmm0, (%rsi)
		; SSSE3-NEXT: retq
		;
		; AVX1-LABEL: vec128_i8:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movzbl (%rdi), %eax
		; AVX1-NEXT: notb %al
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
		; AVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm0
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_i8:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movzbl (%rdi), %eax
		; AVX2-NEXT: notb %al
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastb %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec128_i8:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movzbl (%rdi), %eax
		; AVX512F-NEXT: notb %al
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastb %xmm0, %xmm0
		; AVX512F-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec128_i8:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movzbl (%rdi), %eax
		; AVX512BW-NEXT: notb %al
		; AVX512BW-NEXT: vpbroadcastb %eax, %xmm0
		; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512BW-NEXT: retq
%in.elt.not = load i8, ptr %in.elt.ptr, align 64		%in.elt.not = load i8, ptr %in.elt.ptr, align 64
%in.elt = xor i8 %in.elt.not, -1		%in.elt = xor i8 %in.elt.not, -1
%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0
store i8 %in.elt, ptr %out.elt0.ptr, align 64		store i8 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1
store i8 %in.elt, ptr %out.elt1.ptr, align 1		store i8 %in.elt, ptr %out.elt1.ptr, align 1
%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2
store i8 %in.elt, ptr %out.elt2.ptr, align 2		store i8 %in.elt, ptr %out.elt2.ptr, align 2
Show All 22 Lines	; AVX512BW-NEXT: retq
%out.elt14.ptr = getelementptr i8, ptr %out.vec.ptr, i64 14		%out.elt14.ptr = getelementptr i8, ptr %out.vec.ptr, i64 14
store i8 %in.elt, ptr %out.elt14.ptr, align 2		store i8 %in.elt, ptr %out.elt14.ptr, align 2
%out.elt15.ptr = getelementptr i8, ptr %out.vec.ptr, i64 15		%out.elt15.ptr = getelementptr i8, ptr %out.vec.ptr, i64 15
store i8 %in.elt, ptr %out.elt15.ptr, align 1		store i8 %in.elt, ptr %out.elt15.ptr, align 1
ret void		ret void
}		}

define void @vec128_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_i16:		; SCALAR-LABEL: vec128_i16:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movw %ax, (%rsi)		; SCALAR-NEXT: movw %ax, (%rsi)
; ALL-NEXT: movw %ax, 2(%rsi)		; SCALAR-NEXT: movw %ax, 2(%rsi)
; ALL-NEXT: movw %ax, 4(%rsi)		; SCALAR-NEXT: movw %ax, 4(%rsi)
; ALL-NEXT: movw %ax, 6(%rsi)		; SCALAR-NEXT: movw %ax, 6(%rsi)
; ALL-NEXT: movw %ax, 8(%rsi)		; SCALAR-NEXT: movw %ax, 8(%rsi)
; ALL-NEXT: movw %ax, 10(%rsi)		; SCALAR-NEXT: movw %ax, 10(%rsi)
; ALL-NEXT: movw %ax, 12(%rsi)		; SCALAR-NEXT: movw %ax, 12(%rsi)
; ALL-NEXT: movw %ax, 14(%rsi)		; SCALAR-NEXT: movw %ax, 14(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec128_i16:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec128_i16:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_i16:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastw %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec128_i16:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movl (%rdi), %eax
		; AVX512F-NEXT: notl %eax
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastw %xmm0, %xmm0
		; AVX512F-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec128_i16:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movl (%rdi), %eax
		; AVX512BW-NEXT: notl %eax
		; AVX512BW-NEXT: vpbroadcastw %eax, %xmm0
		; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512BW-NEXT: retq
%in.elt.not = load i16, ptr %in.elt.ptr, align 64		%in.elt.not = load i16, ptr %in.elt.ptr, align 64
%in.elt = xor i16 %in.elt.not, -1		%in.elt = xor i16 %in.elt.not, -1
%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0
store i16 %in.elt, ptr %out.elt0.ptr, align 64		store i16 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1
store i16 %in.elt, ptr %out.elt1.ptr, align 2		store i16 %in.elt, ptr %out.elt1.ptr, align 2
%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2
store i16 %in.elt, ptr %out.elt2.ptr, align 4		store i16 %in.elt, ptr %out.elt2.ptr, align 4
%out.elt3.ptr = getelementptr i16, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i16, ptr %out.vec.ptr, i64 3
store i16 %in.elt, ptr %out.elt3.ptr, align 2		store i16 %in.elt, ptr %out.elt3.ptr, align 2
%out.elt4.ptr = getelementptr i16, ptr %out.vec.ptr, i64 4		%out.elt4.ptr = getelementptr i16, ptr %out.vec.ptr, i64 4
store i16 %in.elt, ptr %out.elt4.ptr, align 8		store i16 %in.elt, ptr %out.elt4.ptr, align 8
%out.elt5.ptr = getelementptr i16, ptr %out.vec.ptr, i64 5		%out.elt5.ptr = getelementptr i16, ptr %out.vec.ptr, i64 5
store i16 %in.elt, ptr %out.elt5.ptr, align 2		store i16 %in.elt, ptr %out.elt5.ptr, align 2
%out.elt6.ptr = getelementptr i16, ptr %out.vec.ptr, i64 6		%out.elt6.ptr = getelementptr i16, ptr %out.vec.ptr, i64 6
store i16 %in.elt, ptr %out.elt6.ptr, align 4		store i16 %in.elt, ptr %out.elt6.ptr, align 4
%out.elt7.ptr = getelementptr i16, ptr %out.vec.ptr, i64 7		%out.elt7.ptr = getelementptr i16, ptr %out.vec.ptr, i64 7
store i16 %in.elt, ptr %out.elt7.ptr, align 2		store i16 %in.elt, ptr %out.elt7.ptr, align 2
ret void		ret void
}		}

define void @vec128_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_i32:		; SCALAR-LABEL: vec128_i32:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec128_i32:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec128_i32:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_i32:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec128_i32:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %xmm0
		; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt = xor i32 %in.elt.not, -1		%in.elt = xor i32 %in.elt.not, -1
%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0
store i32 %in.elt, ptr %out.elt0.ptr, align 64		store i32 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1
store i32 %in.elt, ptr %out.elt1.ptr, align 4		store i32 %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2
store i32 %in.elt, ptr %out.elt2.ptr, align 8		store i32 %in.elt, ptr %out.elt2.ptr, align 8
%out.elt3.ptr = getelementptr i32, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i32, ptr %out.vec.ptr, i64 3
store i32 %in.elt, ptr %out.elt3.ptr, align 4		store i32 %in.elt, ptr %out.elt3.ptr, align 4
ret void		ret void
}		}

define void @vec128_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_float:		; SCALAR-LABEL: vec128_float:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec128_float:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec128_float:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_float:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec128_float:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %xmm0
		; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt.int = xor i32 %in.elt.not, -1		%in.elt.int = xor i32 %in.elt.not, -1
%in.elt = bitcast i32 %in.elt.int to float		%in.elt = bitcast i32 %in.elt.int to float
%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0
store float %in.elt, ptr %out.elt0.ptr, align 64		store float %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1
store float %in.elt, ptr %out.elt1.ptr, align 4		store float %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2
store float %in.elt, ptr %out.elt2.ptr, align 8		store float %in.elt, ptr %out.elt2.ptr, align 8
%out.elt3.ptr = getelementptr float, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr float, ptr %out.vec.ptr, i64 3
store float %in.elt, ptr %out.elt3.ptr, align 4		store float %in.elt, ptr %out.elt3.ptr, align 4
ret void		ret void
}		}

define void @vec128_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_i64:		; SCALAR-LABEL: vec128_i64:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec128_i64:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec128_i64:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_i64:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec128_i64:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %xmm0
		; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt = xor i64 %in.elt.not, -1		%in.elt = xor i64 %in.elt.not, -1
%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0
store i64 %in.elt, ptr %out.elt0.ptr, align 64		store i64 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1
store i64 %in.elt, ptr %out.elt1.ptr, align 8		store i64 %in.elt, ptr %out.elt1.ptr, align 8
ret void		ret void
}		}

define void @vec128_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec128_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec128_double:		; SCALAR-LABEL: vec128_double:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec128_double:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec128_double:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec128_double:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %xmm0
		; AVX2-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec128_double:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %xmm0
		; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt.int = xor i64 %in.elt.not, -1		%in.elt.int = xor i64 %in.elt.not, -1
%in.elt = bitcast i64 %in.elt.int to double		%in.elt = bitcast i64 %in.elt.int to double
%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0
store double %in.elt, ptr %out.elt0.ptr, align 64		store double %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1
store double %in.elt, ptr %out.elt1.ptr, align 8		store double %in.elt, ptr %out.elt1.ptr, align 8
ret void		ret void
}		}

define void @vec256_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_i8:		; SCALAR-LABEL: vec256_i8:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movzbl (%rdi), %eax		; SCALAR-NEXT: movzbl (%rdi), %eax
; ALL-NEXT: notb %al		; SCALAR-NEXT: notb %al
; ALL-NEXT: movb %al, (%rsi)		; SCALAR-NEXT: movb %al, (%rsi)
; ALL-NEXT: movb %al, 1(%rsi)		; SCALAR-NEXT: movb %al, 1(%rsi)
; ALL-NEXT: movb %al, 2(%rsi)		; SCALAR-NEXT: movb %al, 2(%rsi)
; ALL-NEXT: movb %al, 3(%rsi)		; SCALAR-NEXT: movb %al, 3(%rsi)
; ALL-NEXT: movb %al, 4(%rsi)		; SCALAR-NEXT: movb %al, 4(%rsi)
; ALL-NEXT: movb %al, 5(%rsi)		; SCALAR-NEXT: movb %al, 5(%rsi)
; ALL-NEXT: movb %al, 6(%rsi)		; SCALAR-NEXT: movb %al, 6(%rsi)
; ALL-NEXT: movb %al, 7(%rsi)		; SCALAR-NEXT: movb %al, 7(%rsi)
; ALL-NEXT: movb %al, 8(%rsi)		; SCALAR-NEXT: movb %al, 8(%rsi)
; ALL-NEXT: movb %al, 9(%rsi)		; SCALAR-NEXT: movb %al, 9(%rsi)
; ALL-NEXT: movb %al, 10(%rsi)		; SCALAR-NEXT: movb %al, 10(%rsi)
; ALL-NEXT: movb %al, 11(%rsi)		; SCALAR-NEXT: movb %al, 11(%rsi)
; ALL-NEXT: movb %al, 12(%rsi)		; SCALAR-NEXT: movb %al, 12(%rsi)
; ALL-NEXT: movb %al, 13(%rsi)		; SCALAR-NEXT: movb %al, 13(%rsi)
; ALL-NEXT: movb %al, 14(%rsi)		; SCALAR-NEXT: movb %al, 14(%rsi)
; ALL-NEXT: movb %al, 15(%rsi)		; SCALAR-NEXT: movb %al, 15(%rsi)
; ALL-NEXT: movb %al, 16(%rsi)		; SCALAR-NEXT: movb %al, 16(%rsi)
; ALL-NEXT: movb %al, 17(%rsi)		; SCALAR-NEXT: movb %al, 17(%rsi)
; ALL-NEXT: movb %al, 18(%rsi)		; SCALAR-NEXT: movb %al, 18(%rsi)
; ALL-NEXT: movb %al, 19(%rsi)		; SCALAR-NEXT: movb %al, 19(%rsi)
; ALL-NEXT: movb %al, 20(%rsi)		; SCALAR-NEXT: movb %al, 20(%rsi)
; ALL-NEXT: movb %al, 21(%rsi)		; SCALAR-NEXT: movb %al, 21(%rsi)
; ALL-NEXT: movb %al, 22(%rsi)		; SCALAR-NEXT: movb %al, 22(%rsi)
; ALL-NEXT: movb %al, 23(%rsi)		; SCALAR-NEXT: movb %al, 23(%rsi)
; ALL-NEXT: movb %al, 24(%rsi)		; SCALAR-NEXT: movb %al, 24(%rsi)
; ALL-NEXT: movb %al, 25(%rsi)		; SCALAR-NEXT: movb %al, 25(%rsi)
; ALL-NEXT: movb %al, 26(%rsi)		; SCALAR-NEXT: movb %al, 26(%rsi)
; ALL-NEXT: movb %al, 27(%rsi)		; SCALAR-NEXT: movb %al, 27(%rsi)
; ALL-NEXT: movb %al, 28(%rsi)		; SCALAR-NEXT: movb %al, 28(%rsi)
; ALL-NEXT: movb %al, 29(%rsi)		; SCALAR-NEXT: movb %al, 29(%rsi)
; ALL-NEXT: movb %al, 30(%rsi)		; SCALAR-NEXT: movb %al, 30(%rsi)
; ALL-NEXT: movb %al, 31(%rsi)		; SCALAR-NEXT: movb %al, 31(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE2-LABEL: vec256_i8:
		; SSE2: # %bb.0:
		; SSE2-NEXT: movzbl (%rdi), %eax
		; SSE2-NEXT: notb %al
		; SSE2-NEXT: movzbl %al, %eax
		; SSE2-NEXT: movd %eax, %xmm0
		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE2-NEXT: movdqa %xmm0, (%rsi)
		; SSE2-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE2-NEXT: retq
		;
		; SSSE3-LABEL: vec256_i8:
		; SSSE3: # %bb.0:
		; SSSE3-NEXT: movzbl (%rdi), %eax
		; SSSE3-NEXT: notb %al
		; SSSE3-NEXT: movzbl %al, %eax
		; SSSE3-NEXT: movd %eax, %xmm0
		; SSSE3-NEXT: pxor %xmm1, %xmm1
		; SSSE3-NEXT: pshufb %xmm1, %xmm0
		; SSSE3-NEXT: movdqa %xmm0, (%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 16(%rsi)
		; SSSE3-NEXT: retq
		;
		; AVX1-LABEL: vec256_i8:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movzbl (%rdi), %eax
		; AVX1-NEXT: notb %al
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
		; AVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm0
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_i8:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movzbl (%rdi), %eax
		; AVX2-NEXT: notb %al
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec256_i8:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movzbl (%rdi), %eax
		; AVX512F-NEXT: notb %al
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec256_i8:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movzbl (%rdi), %eax
		; AVX512BW-NEXT: notb %al
		; AVX512BW-NEXT: vpbroadcastb %eax, %ymm0
		; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i8, ptr %in.elt.ptr, align 64		%in.elt.not = load i8, ptr %in.elt.ptr, align 64
%in.elt = xor i8 %in.elt.not, -1		%in.elt = xor i8 %in.elt.not, -1
%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0
store i8 %in.elt, ptr %out.elt0.ptr, align 64		store i8 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1
store i8 %in.elt, ptr %out.elt1.ptr, align 1		store i8 %in.elt, ptr %out.elt1.ptr, align 1
%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2
store i8 %in.elt, ptr %out.elt2.ptr, align 2		store i8 %in.elt, ptr %out.elt2.ptr, align 2
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	; AVX512BW-NEXT: retq
%out.elt30.ptr = getelementptr i8, ptr %out.vec.ptr, i64 30		%out.elt30.ptr = getelementptr i8, ptr %out.vec.ptr, i64 30
store i8 %in.elt, ptr %out.elt30.ptr, align 2		store i8 %in.elt, ptr %out.elt30.ptr, align 2
%out.elt31.ptr = getelementptr i8, ptr %out.vec.ptr, i64 31		%out.elt31.ptr = getelementptr i8, ptr %out.vec.ptr, i64 31
store i8 %in.elt, ptr %out.elt31.ptr, align 1		store i8 %in.elt, ptr %out.elt31.ptr, align 1
ret void		ret void
}		}

define void @vec256_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_i16:		; SCALAR-LABEL: vec256_i16:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movw %ax, (%rsi)		; SCALAR-NEXT: movw %ax, (%rsi)
; ALL-NEXT: movw %ax, 2(%rsi)		; SCALAR-NEXT: movw %ax, 2(%rsi)
; ALL-NEXT: movw %ax, 4(%rsi)		; SCALAR-NEXT: movw %ax, 4(%rsi)
; ALL-NEXT: movw %ax, 6(%rsi)		; SCALAR-NEXT: movw %ax, 6(%rsi)
; ALL-NEXT: movw %ax, 8(%rsi)		; SCALAR-NEXT: movw %ax, 8(%rsi)
; ALL-NEXT: movw %ax, 10(%rsi)		; SCALAR-NEXT: movw %ax, 10(%rsi)
; ALL-NEXT: movw %ax, 12(%rsi)		; SCALAR-NEXT: movw %ax, 12(%rsi)
; ALL-NEXT: movw %ax, 14(%rsi)		; SCALAR-NEXT: movw %ax, 14(%rsi)
; ALL-NEXT: movw %ax, 16(%rsi)		; SCALAR-NEXT: movw %ax, 16(%rsi)
; ALL-NEXT: movw %ax, 18(%rsi)		; SCALAR-NEXT: movw %ax, 18(%rsi)
; ALL-NEXT: movw %ax, 20(%rsi)		; SCALAR-NEXT: movw %ax, 20(%rsi)
; ALL-NEXT: movw %ax, 22(%rsi)		; SCALAR-NEXT: movw %ax, 22(%rsi)
; ALL-NEXT: movw %ax, 24(%rsi)		; SCALAR-NEXT: movw %ax, 24(%rsi)
; ALL-NEXT: movw %ax, 26(%rsi)		; SCALAR-NEXT: movw %ax, 26(%rsi)
; ALL-NEXT: movw %ax, 28(%rsi)		; SCALAR-NEXT: movw %ax, 28(%rsi)
; ALL-NEXT: movw %ax, 30(%rsi)		; SCALAR-NEXT: movw %ax, 30(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec256_i16:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec256_i16:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_i16:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec256_i16:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movl (%rdi), %eax
		; AVX512F-NEXT: notl %eax
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec256_i16:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movl (%rdi), %eax
		; AVX512BW-NEXT: notl %eax
		; AVX512BW-NEXT: vpbroadcastw %eax, %ymm0
		; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i16, ptr %in.elt.ptr, align 64		%in.elt.not = load i16, ptr %in.elt.ptr, align 64
%in.elt = xor i16 %in.elt.not, -1		%in.elt = xor i16 %in.elt.not, -1
%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0
store i16 %in.elt, ptr %out.elt0.ptr, align 64		store i16 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1
store i16 %in.elt, ptr %out.elt1.ptr, align 2		store i16 %in.elt, ptr %out.elt1.ptr, align 2
%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2
store i16 %in.elt, ptr %out.elt2.ptr, align 4		store i16 %in.elt, ptr %out.elt2.ptr, align 4
Show All 22 Lines	; AVX512BW-NEXT: retq
%out.elt14.ptr = getelementptr i16, ptr %out.vec.ptr, i64 14		%out.elt14.ptr = getelementptr i16, ptr %out.vec.ptr, i64 14
store i16 %in.elt, ptr %out.elt14.ptr, align 4		store i16 %in.elt, ptr %out.elt14.ptr, align 4
%out.elt15.ptr = getelementptr i16, ptr %out.vec.ptr, i64 15		%out.elt15.ptr = getelementptr i16, ptr %out.vec.ptr, i64 15
store i16 %in.elt, ptr %out.elt15.ptr, align 2		store i16 %in.elt, ptr %out.elt15.ptr, align 2
ret void		ret void
}		}

define void @vec256_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_i32:		; SCALAR-LABEL: vec256_i32:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec256_i32:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec256_i32:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_i32:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec256_i32:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt = xor i32 %in.elt.not, -1		%in.elt = xor i32 %in.elt.not, -1
%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0
store i32 %in.elt, ptr %out.elt0.ptr, align 64		store i32 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1
store i32 %in.elt, ptr %out.elt1.ptr, align 4		store i32 %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2
store i32 %in.elt, ptr %out.elt2.ptr, align 8		store i32 %in.elt, ptr %out.elt2.ptr, align 8
%out.elt3.ptr = getelementptr i32, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i32, ptr %out.vec.ptr, i64 3
store i32 %in.elt, ptr %out.elt3.ptr, align 4		store i32 %in.elt, ptr %out.elt3.ptr, align 4
%out.elt4.ptr = getelementptr i32, ptr %out.vec.ptr, i64 4		%out.elt4.ptr = getelementptr i32, ptr %out.vec.ptr, i64 4
store i32 %in.elt, ptr %out.elt4.ptr, align 16		store i32 %in.elt, ptr %out.elt4.ptr, align 16
%out.elt5.ptr = getelementptr i32, ptr %out.vec.ptr, i64 5		%out.elt5.ptr = getelementptr i32, ptr %out.vec.ptr, i64 5
store i32 %in.elt, ptr %out.elt5.ptr, align 4		store i32 %in.elt, ptr %out.elt5.ptr, align 4
%out.elt6.ptr = getelementptr i32, ptr %out.vec.ptr, i64 6		%out.elt6.ptr = getelementptr i32, ptr %out.vec.ptr, i64 6
store i32 %in.elt, ptr %out.elt6.ptr, align 8		store i32 %in.elt, ptr %out.elt6.ptr, align 8
%out.elt7.ptr = getelementptr i32, ptr %out.vec.ptr, i64 7		%out.elt7.ptr = getelementptr i32, ptr %out.vec.ptr, i64 7
store i32 %in.elt, ptr %out.elt7.ptr, align 4		store i32 %in.elt, ptr %out.elt7.ptr, align 4
ret void		ret void
}		}

define void @vec256_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_float:		; SCALAR-LABEL: vec256_float:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec256_float:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec256_float:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_float:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec256_float:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt.int = xor i32 %in.elt.not, -1		%in.elt.int = xor i32 %in.elt.not, -1
%in.elt = bitcast i32 %in.elt.int to float		%in.elt = bitcast i32 %in.elt.int to float
%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0
store float %in.elt, ptr %out.elt0.ptr, align 64		store float %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1
store float %in.elt, ptr %out.elt1.ptr, align 4		store float %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2
store float %in.elt, ptr %out.elt2.ptr, align 8		store float %in.elt, ptr %out.elt2.ptr, align 8
%out.elt3.ptr = getelementptr float, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr float, ptr %out.vec.ptr, i64 3
store float %in.elt, ptr %out.elt3.ptr, align 4		store float %in.elt, ptr %out.elt3.ptr, align 4
%out.elt4.ptr = getelementptr float, ptr %out.vec.ptr, i64 4		%out.elt4.ptr = getelementptr float, ptr %out.vec.ptr, i64 4
store float %in.elt, ptr %out.elt4.ptr, align 16		store float %in.elt, ptr %out.elt4.ptr, align 16
%out.elt5.ptr = getelementptr float, ptr %out.vec.ptr, i64 5		%out.elt5.ptr = getelementptr float, ptr %out.vec.ptr, i64 5
store float %in.elt, ptr %out.elt5.ptr, align 4		store float %in.elt, ptr %out.elt5.ptr, align 4
%out.elt6.ptr = getelementptr float, ptr %out.vec.ptr, i64 6		%out.elt6.ptr = getelementptr float, ptr %out.vec.ptr, i64 6
store float %in.elt, ptr %out.elt6.ptr, align 8		store float %in.elt, ptr %out.elt6.ptr, align 8
%out.elt7.ptr = getelementptr float, ptr %out.vec.ptr, i64 7		%out.elt7.ptr = getelementptr float, ptr %out.vec.ptr, i64 7
store float %in.elt, ptr %out.elt7.ptr, align 4		store float %in.elt, ptr %out.elt7.ptr, align 4
ret void		ret void
}		}

define void @vec256_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_i64:		; SCALAR-LABEL: vec256_i64:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec256_i64:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec256_i64:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_i64:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec256_i64:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt = xor i64 %in.elt.not, -1		%in.elt = xor i64 %in.elt.not, -1
%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0
store i64 %in.elt, ptr %out.elt0.ptr, align 64		store i64 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1
store i64 %in.elt, ptr %out.elt1.ptr, align 8		store i64 %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2
store i64 %in.elt, ptr %out.elt2.ptr, align 16		store i64 %in.elt, ptr %out.elt2.ptr, align 16
%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3
store i64 %in.elt, ptr %out.elt3.ptr, align 8		store i64 %in.elt, ptr %out.elt3.ptr, align 8
ret void		ret void
}		}

define void @vec256_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec256_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec256_double:		; SCALAR-LABEL: vec256_double:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec256_double:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec256_double:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec256_double:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec256_double:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt.int = xor i64 %in.elt.not, -1		%in.elt.int = xor i64 %in.elt.not, -1
%in.elt = bitcast i64 %in.elt.int to double		%in.elt = bitcast i64 %in.elt.int to double
%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0
store double %in.elt, ptr %out.elt0.ptr, align 64		store double %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1
store double %in.elt, ptr %out.elt1.ptr, align 8		store double %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2
Show All 20 Lines	; ALL-NEXT: retq
%out.elt0.ptr = getelementptr i128, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i128, ptr %out.vec.ptr, i64 0
store i128 %in.elt, ptr %out.elt0.ptr, align 64		store i128 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i128, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i128, ptr %out.vec.ptr, i64 1
store i128 %in.elt, ptr %out.elt1.ptr, align 16		store i128 %in.elt, ptr %out.elt1.ptr, align 16
ret void		ret void
}		}

define void @vec384_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_i8:		; SCALAR-LABEL: vec384_i8:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movzbl (%rdi), %eax		; SCALAR-NEXT: movzbl (%rdi), %eax
; ALL-NEXT: notb %al		; SCALAR-NEXT: notb %al
; ALL-NEXT: movb %al, (%rsi)		; SCALAR-NEXT: movb %al, (%rsi)
; ALL-NEXT: movb %al, 1(%rsi)		; SCALAR-NEXT: movb %al, 1(%rsi)
; ALL-NEXT: movb %al, 2(%rsi)		; SCALAR-NEXT: movb %al, 2(%rsi)
; ALL-NEXT: movb %al, 3(%rsi)		; SCALAR-NEXT: movb %al, 3(%rsi)
; ALL-NEXT: movb %al, 4(%rsi)		; SCALAR-NEXT: movb %al, 4(%rsi)
; ALL-NEXT: movb %al, 5(%rsi)		; SCALAR-NEXT: movb %al, 5(%rsi)
; ALL-NEXT: movb %al, 6(%rsi)		; SCALAR-NEXT: movb %al, 6(%rsi)
; ALL-NEXT: movb %al, 7(%rsi)		; SCALAR-NEXT: movb %al, 7(%rsi)
; ALL-NEXT: movb %al, 8(%rsi)		; SCALAR-NEXT: movb %al, 8(%rsi)
; ALL-NEXT: movb %al, 9(%rsi)		; SCALAR-NEXT: movb %al, 9(%rsi)
; ALL-NEXT: movb %al, 10(%rsi)		; SCALAR-NEXT: movb %al, 10(%rsi)
; ALL-NEXT: movb %al, 11(%rsi)		; SCALAR-NEXT: movb %al, 11(%rsi)
; ALL-NEXT: movb %al, 12(%rsi)		; SCALAR-NEXT: movb %al, 12(%rsi)
; ALL-NEXT: movb %al, 13(%rsi)		; SCALAR-NEXT: movb %al, 13(%rsi)
; ALL-NEXT: movb %al, 14(%rsi)		; SCALAR-NEXT: movb %al, 14(%rsi)
; ALL-NEXT: movb %al, 15(%rsi)		; SCALAR-NEXT: movb %al, 15(%rsi)
; ALL-NEXT: movb %al, 16(%rsi)		; SCALAR-NEXT: movb %al, 16(%rsi)
; ALL-NEXT: movb %al, 17(%rsi)		; SCALAR-NEXT: movb %al, 17(%rsi)
; ALL-NEXT: movb %al, 18(%rsi)		; SCALAR-NEXT: movb %al, 18(%rsi)
; ALL-NEXT: movb %al, 19(%rsi)		; SCALAR-NEXT: movb %al, 19(%rsi)
; ALL-NEXT: movb %al, 20(%rsi)		; SCALAR-NEXT: movb %al, 20(%rsi)
; ALL-NEXT: movb %al, 21(%rsi)		; SCALAR-NEXT: movb %al, 21(%rsi)
; ALL-NEXT: movb %al, 22(%rsi)		; SCALAR-NEXT: movb %al, 22(%rsi)
; ALL-NEXT: movb %al, 23(%rsi)		; SCALAR-NEXT: movb %al, 23(%rsi)
; ALL-NEXT: movb %al, 24(%rsi)		; SCALAR-NEXT: movb %al, 24(%rsi)
; ALL-NEXT: movb %al, 25(%rsi)		; SCALAR-NEXT: movb %al, 25(%rsi)
; ALL-NEXT: movb %al, 26(%rsi)		; SCALAR-NEXT: movb %al, 26(%rsi)
; ALL-NEXT: movb %al, 27(%rsi)		; SCALAR-NEXT: movb %al, 27(%rsi)
; ALL-NEXT: movb %al, 28(%rsi)		; SCALAR-NEXT: movb %al, 28(%rsi)
; ALL-NEXT: movb %al, 29(%rsi)		; SCALAR-NEXT: movb %al, 29(%rsi)
; ALL-NEXT: movb %al, 30(%rsi)		; SCALAR-NEXT: movb %al, 30(%rsi)
; ALL-NEXT: movb %al, 31(%rsi)		; SCALAR-NEXT: movb %al, 31(%rsi)
; ALL-NEXT: movb %al, 32(%rsi)		; SCALAR-NEXT: movb %al, 32(%rsi)
; ALL-NEXT: movb %al, 33(%rsi)		; SCALAR-NEXT: movb %al, 33(%rsi)
; ALL-NEXT: movb %al, 34(%rsi)		; SCALAR-NEXT: movb %al, 34(%rsi)
; ALL-NEXT: movb %al, 35(%rsi)		; SCALAR-NEXT: movb %al, 35(%rsi)
; ALL-NEXT: movb %al, 36(%rsi)		; SCALAR-NEXT: movb %al, 36(%rsi)
; ALL-NEXT: movb %al, 37(%rsi)		; SCALAR-NEXT: movb %al, 37(%rsi)
; ALL-NEXT: movb %al, 38(%rsi)		; SCALAR-NEXT: movb %al, 38(%rsi)
; ALL-NEXT: movb %al, 39(%rsi)		; SCALAR-NEXT: movb %al, 39(%rsi)
; ALL-NEXT: movb %al, 40(%rsi)		; SCALAR-NEXT: movb %al, 40(%rsi)
; ALL-NEXT: movb %al, 41(%rsi)		; SCALAR-NEXT: movb %al, 41(%rsi)
; ALL-NEXT: movb %al, 42(%rsi)		; SCALAR-NEXT: movb %al, 42(%rsi)
; ALL-NEXT: movb %al, 43(%rsi)		; SCALAR-NEXT: movb %al, 43(%rsi)
; ALL-NEXT: movb %al, 44(%rsi)		; SCALAR-NEXT: movb %al, 44(%rsi)
; ALL-NEXT: movb %al, 45(%rsi)		; SCALAR-NEXT: movb %al, 45(%rsi)
; ALL-NEXT: movb %al, 46(%rsi)		; SCALAR-NEXT: movb %al, 46(%rsi)
; ALL-NEXT: movb %al, 47(%rsi)		; SCALAR-NEXT: movb %al, 47(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE2-LABEL: vec384_i8:
		; SSE2: # %bb.0:
		; SSE2-NEXT: movzbl (%rdi), %eax
		; SSE2-NEXT: notb %al
		; SSE2-NEXT: movzbl %al, %eax
		; SSE2-NEXT: movd %eax, %xmm0
		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE2-NEXT: movdqa %xmm0, (%rsi)
		; SSE2-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE2-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE2-NEXT: retq
		;
		; SSSE3-LABEL: vec384_i8:
		; SSSE3: # %bb.0:
		; SSSE3-NEXT: movzbl (%rdi), %eax
		; SSSE3-NEXT: notb %al
		; SSSE3-NEXT: movzbl %al, %eax
		; SSSE3-NEXT: movd %eax, %xmm0
		; SSSE3-NEXT: pxor %xmm1, %xmm1
		; SSSE3-NEXT: pshufb %xmm1, %xmm0
		; SSSE3-NEXT: movdqa %xmm0, (%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 16(%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 32(%rsi)
		; SSSE3-NEXT: retq
		;
		; AVX1-LABEL: vec384_i8:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movzbl (%rdi), %eax
		; AVX1-NEXT: notb %al
		; AVX1-NEXT: vpxor %xmm0, %xmm0, %xmm0
		; AVX1-NEXT: vmovd %eax, %xmm1
		; AVX1-NEXT: vpshufb %xmm0, %xmm1, %xmm0
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_i8:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movzbl (%rdi), %eax
		; AVX2-NEXT: notb %al
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec384_i8:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movzbl (%rdi), %eax
		; AVX512F-NEXT: notb %al
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512F-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec384_i8:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movzbl (%rdi), %eax
		; AVX512BW-NEXT: notb %al
		; AVX512BW-NEXT: vpbroadcastb %eax, %ymm0
		; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512BW-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i8, ptr %in.elt.ptr, align 64		%in.elt.not = load i8, ptr %in.elt.ptr, align 64
%in.elt = xor i8 %in.elt.not, -1		%in.elt = xor i8 %in.elt.not, -1
%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0
store i8 %in.elt, ptr %out.elt0.ptr, align 64		store i8 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1
store i8 %in.elt, ptr %out.elt1.ptr, align 1		store i8 %in.elt, ptr %out.elt1.ptr, align 1
%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2
store i8 %in.elt, ptr %out.elt2.ptr, align 2		store i8 %in.elt, ptr %out.elt2.ptr, align 2
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	; AVX512BW-NEXT: retq
%out.elt46.ptr = getelementptr i8, ptr %out.vec.ptr, i64 46		%out.elt46.ptr = getelementptr i8, ptr %out.vec.ptr, i64 46
store i8 %in.elt, ptr %out.elt46.ptr, align 2		store i8 %in.elt, ptr %out.elt46.ptr, align 2
%out.elt47.ptr = getelementptr i8, ptr %out.vec.ptr, i64 47		%out.elt47.ptr = getelementptr i8, ptr %out.vec.ptr, i64 47
store i8 %in.elt, ptr %out.elt47.ptr, align 1		store i8 %in.elt, ptr %out.elt47.ptr, align 1
ret void		ret void
}		}

define void @vec384_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_i16:		; SCALAR-LABEL: vec384_i16:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movw %ax, (%rsi)		; SCALAR-NEXT: movw %ax, (%rsi)
; ALL-NEXT: movw %ax, 2(%rsi)		; SCALAR-NEXT: movw %ax, 2(%rsi)
; ALL-NEXT: movw %ax, 4(%rsi)		; SCALAR-NEXT: movw %ax, 4(%rsi)
; ALL-NEXT: movw %ax, 6(%rsi)		; SCALAR-NEXT: movw %ax, 6(%rsi)
; ALL-NEXT: movw %ax, 8(%rsi)		; SCALAR-NEXT: movw %ax, 8(%rsi)
; ALL-NEXT: movw %ax, 10(%rsi)		; SCALAR-NEXT: movw %ax, 10(%rsi)
; ALL-NEXT: movw %ax, 12(%rsi)		; SCALAR-NEXT: movw %ax, 12(%rsi)
; ALL-NEXT: movw %ax, 14(%rsi)		; SCALAR-NEXT: movw %ax, 14(%rsi)
; ALL-NEXT: movw %ax, 16(%rsi)		; SCALAR-NEXT: movw %ax, 16(%rsi)
; ALL-NEXT: movw %ax, 18(%rsi)		; SCALAR-NEXT: movw %ax, 18(%rsi)
; ALL-NEXT: movw %ax, 20(%rsi)		; SCALAR-NEXT: movw %ax, 20(%rsi)
; ALL-NEXT: movw %ax, 22(%rsi)		; SCALAR-NEXT: movw %ax, 22(%rsi)
; ALL-NEXT: movw %ax, 24(%rsi)		; SCALAR-NEXT: movw %ax, 24(%rsi)
; ALL-NEXT: movw %ax, 26(%rsi)		; SCALAR-NEXT: movw %ax, 26(%rsi)
; ALL-NEXT: movw %ax, 28(%rsi)		; SCALAR-NEXT: movw %ax, 28(%rsi)
; ALL-NEXT: movw %ax, 30(%rsi)		; SCALAR-NEXT: movw %ax, 30(%rsi)
; ALL-NEXT: movw %ax, 32(%rsi)		; SCALAR-NEXT: movw %ax, 32(%rsi)
; ALL-NEXT: movw %ax, 34(%rsi)		; SCALAR-NEXT: movw %ax, 34(%rsi)
; ALL-NEXT: movw %ax, 36(%rsi)		; SCALAR-NEXT: movw %ax, 36(%rsi)
; ALL-NEXT: movw %ax, 38(%rsi)		; SCALAR-NEXT: movw %ax, 38(%rsi)
; ALL-NEXT: movw %ax, 40(%rsi)		; SCALAR-NEXT: movw %ax, 40(%rsi)
; ALL-NEXT: movw %ax, 42(%rsi)		; SCALAR-NEXT: movw %ax, 42(%rsi)
; ALL-NEXT: movw %ax, 44(%rsi)		; SCALAR-NEXT: movw %ax, 44(%rsi)
; ALL-NEXT: movw %ax, 46(%rsi)		; SCALAR-NEXT: movw %ax, 46(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec384_i16:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec384_i16:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm1
		; AVX1-NEXT: vmovaps %ymm1, (%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_i16:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec384_i16:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movl (%rdi), %eax
		; AVX512F-NEXT: notl %eax
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512F-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec384_i16:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movl (%rdi), %eax
		; AVX512BW-NEXT: notl %eax
		; AVX512BW-NEXT: vpbroadcastw %eax, %ymm0
		; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512BW-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i16, ptr %in.elt.ptr, align 64		%in.elt.not = load i16, ptr %in.elt.ptr, align 64
%in.elt = xor i16 %in.elt.not, -1		%in.elt = xor i16 %in.elt.not, -1
%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0
store i16 %in.elt, ptr %out.elt0.ptr, align 64		store i16 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1
store i16 %in.elt, ptr %out.elt1.ptr, align 2		store i16 %in.elt, ptr %out.elt1.ptr, align 2
%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2
store i16 %in.elt, ptr %out.elt2.ptr, align 4		store i16 %in.elt, ptr %out.elt2.ptr, align 4
Show All 38 Lines	; AVX512BW-NEXT: retq
%out.elt22.ptr = getelementptr i16, ptr %out.vec.ptr, i64 22		%out.elt22.ptr = getelementptr i16, ptr %out.vec.ptr, i64 22
store i16 %in.elt, ptr %out.elt22.ptr, align 4		store i16 %in.elt, ptr %out.elt22.ptr, align 4
%out.elt23.ptr = getelementptr i16, ptr %out.vec.ptr, i64 23		%out.elt23.ptr = getelementptr i16, ptr %out.vec.ptr, i64 23
store i16 %in.elt, ptr %out.elt23.ptr, align 2		store i16 %in.elt, ptr %out.elt23.ptr, align 2
ret void		ret void
}		}

define void @vec384_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_i32:		; SCALAR-LABEL: vec384_i32:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: movl %eax, 32(%rsi)		; SCALAR-NEXT: movl %eax, 32(%rsi)
; ALL-NEXT: movl %eax, 36(%rsi)		; SCALAR-NEXT: movl %eax, 36(%rsi)
; ALL-NEXT: movl %eax, 40(%rsi)		; SCALAR-NEXT: movl %eax, 40(%rsi)
; ALL-NEXT: movl %eax, 44(%rsi)		; SCALAR-NEXT: movl %eax, 44(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec384_i32:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec384_i32:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_i32:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec384_i32:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt = xor i32 %in.elt.not, -1		%in.elt = xor i32 %in.elt.not, -1
%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0
store i32 %in.elt, ptr %out.elt0.ptr, align 64		store i32 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1
store i32 %in.elt, ptr %out.elt1.ptr, align 4		store i32 %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2
store i32 %in.elt, ptr %out.elt2.ptr, align 8		store i32 %in.elt, ptr %out.elt2.ptr, align 8
Show All 14 Lines	; AVX512-NEXT: retq
%out.elt10.ptr = getelementptr i32, ptr %out.vec.ptr, i64 10		%out.elt10.ptr = getelementptr i32, ptr %out.vec.ptr, i64 10
store i32 %in.elt, ptr %out.elt10.ptr, align 8		store i32 %in.elt, ptr %out.elt10.ptr, align 8
%out.elt11.ptr = getelementptr i32, ptr %out.vec.ptr, i64 11		%out.elt11.ptr = getelementptr i32, ptr %out.vec.ptr, i64 11
store i32 %in.elt, ptr %out.elt11.ptr, align 4		store i32 %in.elt, ptr %out.elt11.ptr, align 4
ret void		ret void
}		}

define void @vec384_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_float:		; SCALAR-LABEL: vec384_float:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: movl %eax, 32(%rsi)		; SCALAR-NEXT: movl %eax, 32(%rsi)
; ALL-NEXT: movl %eax, 36(%rsi)		; SCALAR-NEXT: movl %eax, 36(%rsi)
; ALL-NEXT: movl %eax, 40(%rsi)		; SCALAR-NEXT: movl %eax, 40(%rsi)
; ALL-NEXT: movl %eax, 44(%rsi)		; SCALAR-NEXT: movl %eax, 44(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec384_float:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec384_float:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vmovdqa %xmm0, 16(%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
		; AVX1-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_float:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec384_float:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt.int = xor i32 %in.elt.not, -1		%in.elt.int = xor i32 %in.elt.not, -1
%in.elt = bitcast i32 %in.elt.int to float		%in.elt = bitcast i32 %in.elt.int to float
%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0
store float %in.elt, ptr %out.elt0.ptr, align 64		store float %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1
store float %in.elt, ptr %out.elt1.ptr, align 4		store float %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2
Show All 15 Lines	; AVX512-NEXT: retq
%out.elt10.ptr = getelementptr float, ptr %out.vec.ptr, i64 10		%out.elt10.ptr = getelementptr float, ptr %out.vec.ptr, i64 10
store float %in.elt, ptr %out.elt10.ptr, align 8		store float %in.elt, ptr %out.elt10.ptr, align 8
%out.elt11.ptr = getelementptr float, ptr %out.vec.ptr, i64 11		%out.elt11.ptr = getelementptr float, ptr %out.vec.ptr, i64 11
store float %in.elt, ptr %out.elt11.ptr, align 4		store float %in.elt, ptr %out.elt11.ptr, align 4
ret void		ret void
}		}

define void @vec384_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_i64:		; SCALAR-LABEL: vec384_i64:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: movq %rax, 32(%rsi)		; SCALAR-NEXT: movq %rax, 32(%rsi)
; ALL-NEXT: movq %rax, 40(%rsi)		; SCALAR-NEXT: movq %rax, 40(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec384_i64:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec384_i64:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %xmm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_i64:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec384_i64:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt = xor i64 %in.elt.not, -1		%in.elt = xor i64 %in.elt.not, -1
%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0
store i64 %in.elt, ptr %out.elt0.ptr, align 64		store i64 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1
store i64 %in.elt, ptr %out.elt1.ptr, align 8		store i64 %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2
store i64 %in.elt, ptr %out.elt2.ptr, align 16		store i64 %in.elt, ptr %out.elt2.ptr, align 16
%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3
store i64 %in.elt, ptr %out.elt3.ptr, align 8		store i64 %in.elt, ptr %out.elt3.ptr, align 8
%out.elt4.ptr = getelementptr i64, ptr %out.vec.ptr, i64 4		%out.elt4.ptr = getelementptr i64, ptr %out.vec.ptr, i64 4
store i64 %in.elt, ptr %out.elt4.ptr, align 32		store i64 %in.elt, ptr %out.elt4.ptr, align 32
%out.elt5.ptr = getelementptr i64, ptr %out.vec.ptr, i64 5		%out.elt5.ptr = getelementptr i64, ptr %out.vec.ptr, i64 5
store i64 %in.elt, ptr %out.elt5.ptr, align 8		store i64 %in.elt, ptr %out.elt5.ptr, align 8
ret void		ret void
}		}

define void @vec384_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec384_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec384_double:		; SCALAR-LABEL: vec384_double:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: movq %rax, 32(%rsi)		; SCALAR-NEXT: movq %rax, 32(%rsi)
; ALL-NEXT: movq %rax, 40(%rsi)		; SCALAR-NEXT: movq %rax, 40(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec384_double:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec384_double:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %xmm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec384_double:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec384_double:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %ymm0
		; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX512-NEXT: vmovdqa %xmm0, 32(%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt.int = xor i64 %in.elt.not, -1		%in.elt.int = xor i64 %in.elt.not, -1
%in.elt = bitcast i64 %in.elt.int to double		%in.elt = bitcast i64 %in.elt.int to double
%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0
store double %in.elt, ptr %out.elt0.ptr, align 64		store double %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1
store double %in.elt, ptr %out.elt1.ptr, align 8		store double %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2
Show All 28 Lines	; ALL-NEXT: retq
%out.elt1.ptr = getelementptr i128, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i128, ptr %out.vec.ptr, i64 1
store i128 %in.elt, ptr %out.elt1.ptr, align 16		store i128 %in.elt, ptr %out.elt1.ptr, align 16
%out.elt2.ptr = getelementptr i128, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i128, ptr %out.vec.ptr, i64 2
store i128 %in.elt, ptr %out.elt2.ptr, align 32		store i128 %in.elt, ptr %out.elt2.ptr, align 32
ret void		ret void
}		}

define void @vec512_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_i8(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_i8:		; SCALAR-LABEL: vec512_i8:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movzbl (%rdi), %eax		; SCALAR-NEXT: movzbl (%rdi), %eax
; ALL-NEXT: notb %al		; SCALAR-NEXT: notb %al
; ALL-NEXT: movb %al, (%rsi)		; SCALAR-NEXT: movb %al, (%rsi)
; ALL-NEXT: movb %al, 1(%rsi)		; SCALAR-NEXT: movb %al, 1(%rsi)
; ALL-NEXT: movb %al, 2(%rsi)		; SCALAR-NEXT: movb %al, 2(%rsi)
; ALL-NEXT: movb %al, 3(%rsi)		; SCALAR-NEXT: movb %al, 3(%rsi)
; ALL-NEXT: movb %al, 4(%rsi)		; SCALAR-NEXT: movb %al, 4(%rsi)
; ALL-NEXT: movb %al, 5(%rsi)		; SCALAR-NEXT: movb %al, 5(%rsi)
; ALL-NEXT: movb %al, 6(%rsi)		; SCALAR-NEXT: movb %al, 6(%rsi)
; ALL-NEXT: movb %al, 7(%rsi)		; SCALAR-NEXT: movb %al, 7(%rsi)
; ALL-NEXT: movb %al, 8(%rsi)		; SCALAR-NEXT: movb %al, 8(%rsi)
; ALL-NEXT: movb %al, 9(%rsi)		; SCALAR-NEXT: movb %al, 9(%rsi)
; ALL-NEXT: movb %al, 10(%rsi)		; SCALAR-NEXT: movb %al, 10(%rsi)
; ALL-NEXT: movb %al, 11(%rsi)		; SCALAR-NEXT: movb %al, 11(%rsi)
; ALL-NEXT: movb %al, 12(%rsi)		; SCALAR-NEXT: movb %al, 12(%rsi)
; ALL-NEXT: movb %al, 13(%rsi)		; SCALAR-NEXT: movb %al, 13(%rsi)
; ALL-NEXT: movb %al, 14(%rsi)		; SCALAR-NEXT: movb %al, 14(%rsi)
; ALL-NEXT: movb %al, 15(%rsi)		; SCALAR-NEXT: movb %al, 15(%rsi)
; ALL-NEXT: movb %al, 16(%rsi)		; SCALAR-NEXT: movb %al, 16(%rsi)
; ALL-NEXT: movb %al, 17(%rsi)		; SCALAR-NEXT: movb %al, 17(%rsi)
; ALL-NEXT: movb %al, 18(%rsi)		; SCALAR-NEXT: movb %al, 18(%rsi)
; ALL-NEXT: movb %al, 19(%rsi)		; SCALAR-NEXT: movb %al, 19(%rsi)
; ALL-NEXT: movb %al, 20(%rsi)		; SCALAR-NEXT: movb %al, 20(%rsi)
; ALL-NEXT: movb %al, 21(%rsi)		; SCALAR-NEXT: movb %al, 21(%rsi)
; ALL-NEXT: movb %al, 22(%rsi)		; SCALAR-NEXT: movb %al, 22(%rsi)
; ALL-NEXT: movb %al, 23(%rsi)		; SCALAR-NEXT: movb %al, 23(%rsi)
; ALL-NEXT: movb %al, 24(%rsi)		; SCALAR-NEXT: movb %al, 24(%rsi)
; ALL-NEXT: movb %al, 25(%rsi)		; SCALAR-NEXT: movb %al, 25(%rsi)
; ALL-NEXT: movb %al, 26(%rsi)		; SCALAR-NEXT: movb %al, 26(%rsi)
; ALL-NEXT: movb %al, 27(%rsi)		; SCALAR-NEXT: movb %al, 27(%rsi)
; ALL-NEXT: movb %al, 28(%rsi)		; SCALAR-NEXT: movb %al, 28(%rsi)
; ALL-NEXT: movb %al, 29(%rsi)		; SCALAR-NEXT: movb %al, 29(%rsi)
; ALL-NEXT: movb %al, 30(%rsi)		; SCALAR-NEXT: movb %al, 30(%rsi)
; ALL-NEXT: movb %al, 31(%rsi)		; SCALAR-NEXT: movb %al, 31(%rsi)
; ALL-NEXT: movb %al, 32(%rsi)		; SCALAR-NEXT: movb %al, 32(%rsi)
; ALL-NEXT: movb %al, 33(%rsi)		; SCALAR-NEXT: movb %al, 33(%rsi)
; ALL-NEXT: movb %al, 34(%rsi)		; SCALAR-NEXT: movb %al, 34(%rsi)
; ALL-NEXT: movb %al, 35(%rsi)		; SCALAR-NEXT: movb %al, 35(%rsi)
; ALL-NEXT: movb %al, 36(%rsi)		; SCALAR-NEXT: movb %al, 36(%rsi)
; ALL-NEXT: movb %al, 37(%rsi)		; SCALAR-NEXT: movb %al, 37(%rsi)
; ALL-NEXT: movb %al, 38(%rsi)		; SCALAR-NEXT: movb %al, 38(%rsi)
; ALL-NEXT: movb %al, 39(%rsi)		; SCALAR-NEXT: movb %al, 39(%rsi)
; ALL-NEXT: movb %al, 40(%rsi)		; SCALAR-NEXT: movb %al, 40(%rsi)
; ALL-NEXT: movb %al, 41(%rsi)		; SCALAR-NEXT: movb %al, 41(%rsi)
; ALL-NEXT: movb %al, 42(%rsi)		; SCALAR-NEXT: movb %al, 42(%rsi)
; ALL-NEXT: movb %al, 43(%rsi)		; SCALAR-NEXT: movb %al, 43(%rsi)
; ALL-NEXT: movb %al, 44(%rsi)		; SCALAR-NEXT: movb %al, 44(%rsi)
; ALL-NEXT: movb %al, 45(%rsi)		; SCALAR-NEXT: movb %al, 45(%rsi)
; ALL-NEXT: movb %al, 46(%rsi)		; SCALAR-NEXT: movb %al, 46(%rsi)
; ALL-NEXT: movb %al, 47(%rsi)		; SCALAR-NEXT: movb %al, 47(%rsi)
; ALL-NEXT: movb %al, 48(%rsi)		; SCALAR-NEXT: movb %al, 48(%rsi)
; ALL-NEXT: movb %al, 49(%rsi)		; SCALAR-NEXT: movb %al, 49(%rsi)
; ALL-NEXT: movb %al, 50(%rsi)		; SCALAR-NEXT: movb %al, 50(%rsi)
; ALL-NEXT: movb %al, 51(%rsi)		; SCALAR-NEXT: movb %al, 51(%rsi)
; ALL-NEXT: movb %al, 52(%rsi)		; SCALAR-NEXT: movb %al, 52(%rsi)
; ALL-NEXT: movb %al, 53(%rsi)		; SCALAR-NEXT: movb %al, 53(%rsi)
; ALL-NEXT: movb %al, 54(%rsi)		; SCALAR-NEXT: movb %al, 54(%rsi)
; ALL-NEXT: movb %al, 55(%rsi)		; SCALAR-NEXT: movb %al, 55(%rsi)
; ALL-NEXT: movb %al, 56(%rsi)		; SCALAR-NEXT: movb %al, 56(%rsi)
; ALL-NEXT: movb %al, 57(%rsi)		; SCALAR-NEXT: movb %al, 57(%rsi)
; ALL-NEXT: movb %al, 58(%rsi)		; SCALAR-NEXT: movb %al, 58(%rsi)
; ALL-NEXT: movb %al, 59(%rsi)		; SCALAR-NEXT: movb %al, 59(%rsi)
; ALL-NEXT: movb %al, 60(%rsi)		; SCALAR-NEXT: movb %al, 60(%rsi)
; ALL-NEXT: movb %al, 61(%rsi)		; SCALAR-NEXT: movb %al, 61(%rsi)
; ALL-NEXT: movb %al, 62(%rsi)		; SCALAR-NEXT: movb %al, 62(%rsi)
; ALL-NEXT: movb %al, 63(%rsi)		; SCALAR-NEXT: movb %al, 63(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE2-LABEL: vec512_i8:
		; SSE2: # %bb.0:
		; SSE2-NEXT: movzbl (%rdi), %eax
		; SSE2-NEXT: notb %al
		; SSE2-NEXT: movzbl %al, %eax
		; SSE2-NEXT: movd %eax, %xmm0
		; SSE2-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE2-NEXT: movdqa %xmm0, (%rsi)
		; SSE2-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE2-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE2-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE2-NEXT: retq
		;
		; SSSE3-LABEL: vec512_i8:
		; SSSE3: # %bb.0:
		; SSSE3-NEXT: movzbl (%rdi), %eax
		; SSSE3-NEXT: notb %al
		; SSSE3-NEXT: movzbl %al, %eax
		; SSSE3-NEXT: movd %eax, %xmm0
		; SSSE3-NEXT: pxor %xmm1, %xmm1
		; SSSE3-NEXT: pshufb %xmm1, %xmm0
		; SSSE3-NEXT: movdqa %xmm0, (%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 16(%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 32(%rsi)
		; SSSE3-NEXT: movdqa %xmm0, 48(%rsi)
		; SSSE3-NEXT: retq
		;
		; AVX1-LABEL: vec512_i8:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movzbl (%rdi), %eax
		; AVX1-NEXT: notb %al
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
		; AVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm0
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_i8:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movzbl (%rdi), %eax
		; AVX2-NEXT: notb %al
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec512_i8:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movzbl (%rdi), %eax
		; AVX512F-NEXT: notb %al
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastb %xmm0, %ymm0
		; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
		; AVX512F-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec512_i8:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movzbl (%rdi), %eax
		; AVX512BW-NEXT: notb %al
		; AVX512BW-NEXT: vpbroadcastb %eax, %zmm0
		; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i8, ptr %in.elt.ptr, align 64		%in.elt.not = load i8, ptr %in.elt.ptr, align 64
%in.elt = xor i8 %in.elt.not, -1		%in.elt = xor i8 %in.elt.not, -1
%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i8, ptr %out.vec.ptr, i64 0
store i8 %in.elt, ptr %out.elt0.ptr, align 64		store i8 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i8, ptr %out.vec.ptr, i64 1
store i8 %in.elt, ptr %out.elt1.ptr, align 1		store i8 %in.elt, ptr %out.elt1.ptr, align 1
%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i8, ptr %out.vec.ptr, i64 2
store i8 %in.elt, ptr %out.elt2.ptr, align 2		store i8 %in.elt, ptr %out.elt2.ptr, align 2
▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	; AVX512BW-NEXT: retq
%out.elt62.ptr = getelementptr i8, ptr %out.vec.ptr, i64 62		%out.elt62.ptr = getelementptr i8, ptr %out.vec.ptr, i64 62
store i8 %in.elt, ptr %out.elt62.ptr, align 2		store i8 %in.elt, ptr %out.elt62.ptr, align 2
%out.elt63.ptr = getelementptr i8, ptr %out.vec.ptr, i64 63		%out.elt63.ptr = getelementptr i8, ptr %out.vec.ptr, i64 63
store i8 %in.elt, ptr %out.elt63.ptr, align 1		store i8 %in.elt, ptr %out.elt63.ptr, align 1
ret void		ret void
}		}

define void @vec512_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_i16(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_i16:		; SCALAR-LABEL: vec512_i16:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movw %ax, (%rsi)		; SCALAR-NEXT: movw %ax, (%rsi)
; ALL-NEXT: movw %ax, 2(%rsi)		; SCALAR-NEXT: movw %ax, 2(%rsi)
; ALL-NEXT: movw %ax, 4(%rsi)		; SCALAR-NEXT: movw %ax, 4(%rsi)
; ALL-NEXT: movw %ax, 6(%rsi)		; SCALAR-NEXT: movw %ax, 6(%rsi)
; ALL-NEXT: movw %ax, 8(%rsi)		; SCALAR-NEXT: movw %ax, 8(%rsi)
; ALL-NEXT: movw %ax, 10(%rsi)		; SCALAR-NEXT: movw %ax, 10(%rsi)
; ALL-NEXT: movw %ax, 12(%rsi)		; SCALAR-NEXT: movw %ax, 12(%rsi)
; ALL-NEXT: movw %ax, 14(%rsi)		; SCALAR-NEXT: movw %ax, 14(%rsi)
; ALL-NEXT: movw %ax, 16(%rsi)		; SCALAR-NEXT: movw %ax, 16(%rsi)
; ALL-NEXT: movw %ax, 18(%rsi)		; SCALAR-NEXT: movw %ax, 18(%rsi)
; ALL-NEXT: movw %ax, 20(%rsi)		; SCALAR-NEXT: movw %ax, 20(%rsi)
; ALL-NEXT: movw %ax, 22(%rsi)		; SCALAR-NEXT: movw %ax, 22(%rsi)
; ALL-NEXT: movw %ax, 24(%rsi)		; SCALAR-NEXT: movw %ax, 24(%rsi)
; ALL-NEXT: movw %ax, 26(%rsi)		; SCALAR-NEXT: movw %ax, 26(%rsi)
; ALL-NEXT: movw %ax, 28(%rsi)		; SCALAR-NEXT: movw %ax, 28(%rsi)
; ALL-NEXT: movw %ax, 30(%rsi)		; SCALAR-NEXT: movw %ax, 30(%rsi)
; ALL-NEXT: movw %ax, 32(%rsi)		; SCALAR-NEXT: movw %ax, 32(%rsi)
; ALL-NEXT: movw %ax, 34(%rsi)		; SCALAR-NEXT: movw %ax, 34(%rsi)
; ALL-NEXT: movw %ax, 36(%rsi)		; SCALAR-NEXT: movw %ax, 36(%rsi)
; ALL-NEXT: movw %ax, 38(%rsi)		; SCALAR-NEXT: movw %ax, 38(%rsi)
; ALL-NEXT: movw %ax, 40(%rsi)		; SCALAR-NEXT: movw %ax, 40(%rsi)
; ALL-NEXT: movw %ax, 42(%rsi)		; SCALAR-NEXT: movw %ax, 42(%rsi)
; ALL-NEXT: movw %ax, 44(%rsi)		; SCALAR-NEXT: movw %ax, 44(%rsi)
; ALL-NEXT: movw %ax, 46(%rsi)		; SCALAR-NEXT: movw %ax, 46(%rsi)
; ALL-NEXT: movw %ax, 48(%rsi)		; SCALAR-NEXT: movw %ax, 48(%rsi)
; ALL-NEXT: movw %ax, 50(%rsi)		; SCALAR-NEXT: movw %ax, 50(%rsi)
; ALL-NEXT: movw %ax, 52(%rsi)		; SCALAR-NEXT: movw %ax, 52(%rsi)
; ALL-NEXT: movw %ax, 54(%rsi)		; SCALAR-NEXT: movw %ax, 54(%rsi)
; ALL-NEXT: movw %ax, 56(%rsi)		; SCALAR-NEXT: movw %ax, 56(%rsi)
; ALL-NEXT: movw %ax, 58(%rsi)		; SCALAR-NEXT: movw %ax, 58(%rsi)
; ALL-NEXT: movw %ax, 60(%rsi)		; SCALAR-NEXT: movw %ax, 60(%rsi)
; ALL-NEXT: movw %ax, 62(%rsi)		; SCALAR-NEXT: movw %ax, 62(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec512_i16:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec512_i16:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,0,0,4,5,6,7]
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_i16:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512F-LABEL: vec512_i16:
		; AVX512F: # %bb.0:
		; AVX512F-NEXT: movl (%rdi), %eax
		; AVX512F-NEXT: notl %eax
		; AVX512F-NEXT: vmovd %eax, %xmm0
		; AVX512F-NEXT: vpbroadcastw %xmm0, %ymm0
		; AVX512F-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
		; AVX512F-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512F-NEXT: vzeroupper
		; AVX512F-NEXT: retq
		;
		; AVX512BW-LABEL: vec512_i16:
		; AVX512BW: # %bb.0:
		; AVX512BW-NEXT: movl (%rdi), %eax
		; AVX512BW-NEXT: notl %eax
		; AVX512BW-NEXT: vpbroadcastw %eax, %zmm0
		; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512BW-NEXT: vzeroupper
		; AVX512BW-NEXT: retq
%in.elt.not = load i16, ptr %in.elt.ptr, align 64		%in.elt.not = load i16, ptr %in.elt.ptr, align 64
%in.elt = xor i16 %in.elt.not, -1		%in.elt = xor i16 %in.elt.not, -1
%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i16, ptr %out.vec.ptr, i64 0
store i16 %in.elt, ptr %out.elt0.ptr, align 64		store i16 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i16, ptr %out.vec.ptr, i64 1
store i16 %in.elt, ptr %out.elt1.ptr, align 2		store i16 %in.elt, ptr %out.elt1.ptr, align 2
%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i16, ptr %out.vec.ptr, i64 2
store i16 %in.elt, ptr %out.elt2.ptr, align 4		store i16 %in.elt, ptr %out.elt2.ptr, align 4
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	; AVX512BW-NEXT: retq
%out.elt30.ptr = getelementptr i16, ptr %out.vec.ptr, i64 30		%out.elt30.ptr = getelementptr i16, ptr %out.vec.ptr, i64 30
store i16 %in.elt, ptr %out.elt30.ptr, align 4		store i16 %in.elt, ptr %out.elt30.ptr, align 4
%out.elt31.ptr = getelementptr i16, ptr %out.vec.ptr, i64 31		%out.elt31.ptr = getelementptr i16, ptr %out.vec.ptr, i64 31
store i16 %in.elt, ptr %out.elt31.ptr, align 2		store i16 %in.elt, ptr %out.elt31.ptr, align 2
ret void		ret void
}		}

define void @vec512_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_i32(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_i32:		; SCALAR-LABEL: vec512_i32:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: movl %eax, 32(%rsi)		; SCALAR-NEXT: movl %eax, 32(%rsi)
; ALL-NEXT: movl %eax, 36(%rsi)		; SCALAR-NEXT: movl %eax, 36(%rsi)
; ALL-NEXT: movl %eax, 40(%rsi)		; SCALAR-NEXT: movl %eax, 40(%rsi)
; ALL-NEXT: movl %eax, 44(%rsi)		; SCALAR-NEXT: movl %eax, 44(%rsi)
; ALL-NEXT: movl %eax, 48(%rsi)		; SCALAR-NEXT: movl %eax, 48(%rsi)
; ALL-NEXT: movl %eax, 52(%rsi)		; SCALAR-NEXT: movl %eax, 52(%rsi)
; ALL-NEXT: movl %eax, 56(%rsi)		; SCALAR-NEXT: movl %eax, 56(%rsi)
; ALL-NEXT: movl %eax, 60(%rsi)		; SCALAR-NEXT: movl %eax, 60(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec512_i32:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec512_i32:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_i32:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec512_i32:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %zmm0
		; AVX512-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt = xor i32 %in.elt.not, -1		%in.elt = xor i32 %in.elt.not, -1
%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i32, ptr %out.vec.ptr, i64 0
store i32 %in.elt, ptr %out.elt0.ptr, align 64		store i32 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i32, ptr %out.vec.ptr, i64 1
store i32 %in.elt, ptr %out.elt1.ptr, align 4		store i32 %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i32, ptr %out.vec.ptr, i64 2
store i32 %in.elt, ptr %out.elt2.ptr, align 8		store i32 %in.elt, ptr %out.elt2.ptr, align 8
Show All 22 Lines	; AVX512-NEXT: retq
%out.elt14.ptr = getelementptr i32, ptr %out.vec.ptr, i64 14		%out.elt14.ptr = getelementptr i32, ptr %out.vec.ptr, i64 14
store i32 %in.elt, ptr %out.elt14.ptr, align 8		store i32 %in.elt, ptr %out.elt14.ptr, align 8
%out.elt15.ptr = getelementptr i32, ptr %out.vec.ptr, i64 15		%out.elt15.ptr = getelementptr i32, ptr %out.vec.ptr, i64 15
store i32 %in.elt, ptr %out.elt15.ptr, align 4		store i32 %in.elt, ptr %out.elt15.ptr, align 4
ret void		ret void
}		}

define void @vec512_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_float(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_float:		; SCALAR-LABEL: vec512_float:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movl (%rdi), %eax		; SCALAR-NEXT: movl (%rdi), %eax
; ALL-NEXT: notl %eax		; SCALAR-NEXT: notl %eax
; ALL-NEXT: movl %eax, (%rsi)		; SCALAR-NEXT: movl %eax, (%rsi)
; ALL-NEXT: movl %eax, 4(%rsi)		; SCALAR-NEXT: movl %eax, 4(%rsi)
; ALL-NEXT: movl %eax, 8(%rsi)		; SCALAR-NEXT: movl %eax, 8(%rsi)
; ALL-NEXT: movl %eax, 12(%rsi)		; SCALAR-NEXT: movl %eax, 12(%rsi)
; ALL-NEXT: movl %eax, 16(%rsi)		; SCALAR-NEXT: movl %eax, 16(%rsi)
; ALL-NEXT: movl %eax, 20(%rsi)		; SCALAR-NEXT: movl %eax, 20(%rsi)
; ALL-NEXT: movl %eax, 24(%rsi)		; SCALAR-NEXT: movl %eax, 24(%rsi)
; ALL-NEXT: movl %eax, 28(%rsi)		; SCALAR-NEXT: movl %eax, 28(%rsi)
; ALL-NEXT: movl %eax, 32(%rsi)		; SCALAR-NEXT: movl %eax, 32(%rsi)
; ALL-NEXT: movl %eax, 36(%rsi)		; SCALAR-NEXT: movl %eax, 36(%rsi)
; ALL-NEXT: movl %eax, 40(%rsi)		; SCALAR-NEXT: movl %eax, 40(%rsi)
; ALL-NEXT: movl %eax, 44(%rsi)		; SCALAR-NEXT: movl %eax, 44(%rsi)
; ALL-NEXT: movl %eax, 48(%rsi)		; SCALAR-NEXT: movl %eax, 48(%rsi)
; ALL-NEXT: movl %eax, 52(%rsi)		; SCALAR-NEXT: movl %eax, 52(%rsi)
; ALL-NEXT: movl %eax, 56(%rsi)		; SCALAR-NEXT: movl %eax, 56(%rsi)
; ALL-NEXT: movl %eax, 60(%rsi)		; SCALAR-NEXT: movl %eax, 60(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec512_float:
		; SSE: # %bb.0:
		; SSE-NEXT: movl (%rdi), %eax
		; SSE-NEXT: notl %eax
		; SSE-NEXT: movd %eax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec512_float:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movl (%rdi), %eax
		; AVX1-NEXT: notl %eax
		; AVX1-NEXT: vmovd %eax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_float:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movl (%rdi), %eax
		; AVX2-NEXT: notl %eax
		; AVX2-NEXT: vmovd %eax, %xmm0
		; AVX2-NEXT: vpbroadcastd %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec512_float:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movl (%rdi), %eax
		; AVX512-NEXT: notl %eax
		; AVX512-NEXT: vpbroadcastd %eax, %zmm0
		; AVX512-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i32, ptr %in.elt.ptr, align 64		%in.elt.not = load i32, ptr %in.elt.ptr, align 64
%in.elt.int = xor i32 %in.elt.not, -1		%in.elt.int = xor i32 %in.elt.not, -1
%in.elt = bitcast i32 %in.elt.int to float		%in.elt = bitcast i32 %in.elt.int to float
%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr float, ptr %out.vec.ptr, i64 0
store float %in.elt, ptr %out.elt0.ptr, align 64		store float %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr float, ptr %out.vec.ptr, i64 1
store float %in.elt, ptr %out.elt1.ptr, align 4		store float %in.elt, ptr %out.elt1.ptr, align 4
%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr float, ptr %out.vec.ptr, i64 2
Show All 23 Lines	; AVX512-NEXT: retq
%out.elt14.ptr = getelementptr float, ptr %out.vec.ptr, i64 14		%out.elt14.ptr = getelementptr float, ptr %out.vec.ptr, i64 14
store float %in.elt, ptr %out.elt14.ptr, align 8		store float %in.elt, ptr %out.elt14.ptr, align 8
%out.elt15.ptr = getelementptr float, ptr %out.vec.ptr, i64 15		%out.elt15.ptr = getelementptr float, ptr %out.vec.ptr, i64 15
store float %in.elt, ptr %out.elt15.ptr, align 4		store float %in.elt, ptr %out.elt15.ptr, align 4
ret void		ret void
}		}

define void @vec512_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_i64(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_i64:		; SCALAR-LABEL: vec512_i64:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: movq %rax, 32(%rsi)		; SCALAR-NEXT: movq %rax, 32(%rsi)
; ALL-NEXT: movq %rax, 40(%rsi)		; SCALAR-NEXT: movq %rax, 40(%rsi)
; ALL-NEXT: movq %rax, 48(%rsi)		; SCALAR-NEXT: movq %rax, 48(%rsi)
; ALL-NEXT: movq %rax, 56(%rsi)		; SCALAR-NEXT: movq %rax, 56(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec512_i64:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec512_i64:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_i64:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec512_i64:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %zmm0
		; AVX512-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt = xor i64 %in.elt.not, -1		%in.elt = xor i64 %in.elt.not, -1
%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i64, ptr %out.vec.ptr, i64 0
store i64 %in.elt, ptr %out.elt0.ptr, align 64		store i64 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i64, ptr %out.vec.ptr, i64 1
store i64 %in.elt, ptr %out.elt1.ptr, align 8		store i64 %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr i64, ptr %out.vec.ptr, i64 2
store i64 %in.elt, ptr %out.elt2.ptr, align 16		store i64 %in.elt, ptr %out.elt2.ptr, align 16
%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3		%out.elt3.ptr = getelementptr i64, ptr %out.vec.ptr, i64 3
store i64 %in.elt, ptr %out.elt3.ptr, align 8		store i64 %in.elt, ptr %out.elt3.ptr, align 8
%out.elt4.ptr = getelementptr i64, ptr %out.vec.ptr, i64 4		%out.elt4.ptr = getelementptr i64, ptr %out.vec.ptr, i64 4
store i64 %in.elt, ptr %out.elt4.ptr, align 32		store i64 %in.elt, ptr %out.elt4.ptr, align 32
%out.elt5.ptr = getelementptr i64, ptr %out.vec.ptr, i64 5		%out.elt5.ptr = getelementptr i64, ptr %out.vec.ptr, i64 5
store i64 %in.elt, ptr %out.elt5.ptr, align 8		store i64 %in.elt, ptr %out.elt5.ptr, align 8
%out.elt6.ptr = getelementptr i64, ptr %out.vec.ptr, i64 6		%out.elt6.ptr = getelementptr i64, ptr %out.vec.ptr, i64 6
store i64 %in.elt, ptr %out.elt6.ptr, align 16		store i64 %in.elt, ptr %out.elt6.ptr, align 16
%out.elt7.ptr = getelementptr i64, ptr %out.vec.ptr, i64 7		%out.elt7.ptr = getelementptr i64, ptr %out.vec.ptr, i64 7
store i64 %in.elt, ptr %out.elt7.ptr, align 8		store i64 %in.elt, ptr %out.elt7.ptr, align 8
ret void		ret void
}		}

define void @vec512_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {		define void @vec512_double(ptr %in.elt.ptr, ptr %out.vec.ptr) nounwind {
; ALL-LABEL: vec512_double:		; SCALAR-LABEL: vec512_double:
; ALL: # %bb.0:		; SCALAR: # %bb.0:
; ALL-NEXT: movq (%rdi), %rax		; SCALAR-NEXT: movq (%rdi), %rax
; ALL-NEXT: notq %rax		; SCALAR-NEXT: notq %rax
; ALL-NEXT: movq %rax, (%rsi)		; SCALAR-NEXT: movq %rax, (%rsi)
; ALL-NEXT: movq %rax, 8(%rsi)		; SCALAR-NEXT: movq %rax, 8(%rsi)
; ALL-NEXT: movq %rax, 16(%rsi)		; SCALAR-NEXT: movq %rax, 16(%rsi)
; ALL-NEXT: movq %rax, 24(%rsi)		; SCALAR-NEXT: movq %rax, 24(%rsi)
; ALL-NEXT: movq %rax, 32(%rsi)		; SCALAR-NEXT: movq %rax, 32(%rsi)
; ALL-NEXT: movq %rax, 40(%rsi)		; SCALAR-NEXT: movq %rax, 40(%rsi)
; ALL-NEXT: movq %rax, 48(%rsi)		; SCALAR-NEXT: movq %rax, 48(%rsi)
; ALL-NEXT: movq %rax, 56(%rsi)		; SCALAR-NEXT: movq %rax, 56(%rsi)
; ALL-NEXT: retq		; SCALAR-NEXT: retq
		;
		; SSE-LABEL: vec512_double:
		; SSE: # %bb.0:
		; SSE-NEXT: movq (%rdi), %rax
		; SSE-NEXT: notq %rax
		; SSE-NEXT: movq %rax, %xmm0
		; SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; SSE-NEXT: movdqa %xmm0, (%rsi)
		; SSE-NEXT: movdqa %xmm0, 16(%rsi)
		; SSE-NEXT: movdqa %xmm0, 32(%rsi)
		; SSE-NEXT: movdqa %xmm0, 48(%rsi)
		; SSE-NEXT: retq
		;
		; AVX1-LABEL: vec512_double:
		; AVX1: # %bb.0:
		; AVX1-NEXT: movq (%rdi), %rax
		; AVX1-NEXT: notq %rax
		; AVX1-NEXT: vmovq %rax, %xmm0
		; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
		; AVX1-NEXT: vmovaps %ymm0, (%rsi)
		; AVX1-NEXT: vmovaps %ymm0, 32(%rsi)
		; AVX1-NEXT: vzeroupper
		; AVX1-NEXT: retq
		;
		; AVX2-LABEL: vec512_double:
		; AVX2: # %bb.0:
		; AVX2-NEXT: movq (%rdi), %rax
		; AVX2-NEXT: notq %rax
		; AVX2-NEXT: vmovq %rax, %xmm0
		; AVX2-NEXT: vpbroadcastq %xmm0, %ymm0
		; AVX2-NEXT: vmovdqa %ymm0, (%rsi)
		; AVX2-NEXT: vmovdqa %ymm0, 32(%rsi)
		; AVX2-NEXT: vzeroupper
		; AVX2-NEXT: retq
		;
		; AVX512-LABEL: vec512_double:
		; AVX512: # %bb.0:
		; AVX512-NEXT: movq (%rdi), %rax
		; AVX512-NEXT: notq %rax
		; AVX512-NEXT: vpbroadcastq %rax, %zmm0
		; AVX512-NEXT: vmovdqa64 %zmm0, (%rsi)
		; AVX512-NEXT: vzeroupper
		; AVX512-NEXT: retq
%in.elt.not = load i64, ptr %in.elt.ptr, align 64		%in.elt.not = load i64, ptr %in.elt.ptr, align 64
%in.elt.int = xor i64 %in.elt.not, -1		%in.elt.int = xor i64 %in.elt.not, -1
%in.elt = bitcast i64 %in.elt.int to double		%in.elt = bitcast i64 %in.elt.int to double
%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr double, ptr %out.vec.ptr, i64 0
store double %in.elt, ptr %out.elt0.ptr, align 64		store double %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr double, ptr %out.vec.ptr, i64 1
store double %in.elt, ptr %out.elt1.ptr, align 8		store double %in.elt, ptr %out.elt1.ptr, align 8
%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2		%out.elt2.ptr = getelementptr double, ptr %out.vec.ptr, i64 2
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	; ALL-NEXT: retq
%out.elt0.ptr = getelementptr i256, ptr %out.vec.ptr, i64 0		%out.elt0.ptr = getelementptr i256, ptr %out.vec.ptr, i64 0
store i256 %in.elt, ptr %out.elt0.ptr, align 64		store i256 %in.elt, ptr %out.elt0.ptr, align 64
%out.elt1.ptr = getelementptr i256, ptr %out.vec.ptr, i64 1		%out.elt1.ptr = getelementptr i256, ptr %out.vec.ptr, i64 1
store i256 %in.elt, ptr %out.elt1.ptr, align 32		store i256 %in.elt, ptr %out.elt1.ptr, align 32
ret void		ret void
}		}
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; AVX: {{.*}}		; AVX: {{.*}}
; AVX1: {{.*}}
; AVX2: {{.*}}
; AVX512: {{.*}}
; AVX512BW: {{.*}}
; AVX512F: {{.*}}
; SCALAR: {{.*}}
; SSE: {{.*}}
; SSE2: {{.*}}
; SSE2-ONLY: {{.*}}		; SSE2-ONLY: {{.*}}
; SSE3: {{.*}}		; SSE3: {{.*}}
; SSE41: {{.*}}		; SSE41: {{.*}}
; SSE42: {{.*}}		; SSE42: {{.*}}
; SSSE3: {{.*}}
; SSSE3-ONLY: {{.*}}		; SSSE3-ONLY: {{.*}}

llvm/test/CodeGen/X86/legalize-shl-vec.ll

	Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rcx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi			; X64-NEXT: movq {{[0-9]+}}(%rsp), %rsi
	; X64-NEXT: shrdq $6, %rsi, %r9			; X64-NEXT: shrdq $6, %rsi, %r9
	; X64-NEXT: shrdq $6, %rdx, %rsi			; X64-NEXT: shrdq $6, %rdx, %rsi
	; X64-NEXT: shrdq $6, %rcx, %rdx			; X64-NEXT: shrdq $6, %rcx, %rdx
	; X64-NEXT: sarq $63, %r8
	; X64-NEXT: sarq $6, %rcx			; X64-NEXT: sarq $6, %rcx
	; X64-NEXT: movq %rcx, 56(%rdi)			; X64-NEXT: movq %rcx, 56(%rdi)
	; X64-NEXT: movq %rdx, 48(%rdi)			; X64-NEXT: movq %rdx, 48(%rdi)
	; X64-NEXT: movq %rsi, 40(%rdi)			; X64-NEXT: movq %rsi, 40(%rdi)
	; X64-NEXT: movq %r9, 32(%rdi)			; X64-NEXT: movq %r9, 32(%rdi)
	; X64-NEXT: movq %r8, 24(%rdi)			; X64-NEXT: movq %r8, %xmm0
	; X64-NEXT: movq %r8, 16(%rdi)			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
	; X64-NEXT: movq %r8, 8(%rdi)			; X64-NEXT: psrad $31, %xmm0
	; X64-NEXT: movq %r8, (%rdi)			; X64-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
				; X64-NEXT: movdqa %xmm0, 16(%rdi)
				; X64-NEXT: movdqa %xmm0, (%rdi)
	; X64-NEXT: retq			; X64-NEXT: retq
				pengfeiUnsubmitted Done Reply Inline Actions Looks like regression here. pengfei: Looks like regression here.
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions We've traded 4 scalar stores to two vector stores + GPR->XMM xfer + two shuffles + shift. I wouldn't say it's an obvious regression, since we get less contention in CPU store unit, but it's not really an improvement, yes. Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` / `subvectorwise-store-of-vector-splat.ll`? If those do not have regressions, then we need to restrict some other fold. lebedev.ri: We've traded 4 scalar stores to two vector stores + GPR->XMM xfer + two shuffles + shift. I…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions This is an unrelated shuffle combining issue movq %r8, %xmm0 pshufd $68, %xmm0, %xmm0 # xmm0 = xmm0[0,1,0,1] psrad $31, %xmm0 pshufd $245, %xmm0, %xmm0 # xmm0 = xmm0[1,1,3,3] What we want here, is to splat the sign bit of a 64-bit scalar to the entire XMM. This should just be: (note the decoded shuffle mask) movq %r8, %xmm0 pshufd $<???>, %xmm0, %xmm0 # xmm0 = xmm0[1,1,1,1] psrad $31, %xmm0 lebedev.ri: This is an unrelated shuffle combining issue ``` movq %r8, %xmm0 pshufd $68…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions https://reviews.llvm.org/D141806 will address this. Does anyone see any other regressions? lebedev.ri: https://reviews.llvm.org/D141806 will address this. Does anyone see any other regressions?
				pengfeiUnsubmitted Done Reply Inline Actions Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` / `subvectorwise-store-of-vector-splat.ll`? I like the AVX512 broadcast version, i.e., GPR->XMM directly. AVX2 and some AVX512F is suboptimal. I cannot tell whether it's good or not to replace splat store with shufle on SSE2. Should a `rep stos` be better? pengfei: > Can you help spot issues in the tests in `elementwise-store-of-scalar-splat.ll` /…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Should a rep stos be better? Almost certainly not. lebedev.ri: > Should a rep stos be better? Almost certainly not.
	%Amt = insertelement <2 x i256> <i256 5, i256 6>, i256 255, i32 0			%Amt = insertelement <2 x i256> <i256 5, i256 6>, i256 255, i32 0
	%Out = ashr <2 x i256> %In, %Amt			%Out = ashr <2 x i256> %In, %Amt
	ret <2 x i256> %Out			ret <2 x i256> %Out
	}			}

llvm/test/CodeGen/X86/subvectorwise-store-of-vector-splat.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,123 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v2i64:			; AVX-LABEL: vec256_v2i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0
	store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1
	store <2 x i64> %in.subvec, ptr %out.subvec1.ptr, align 16			store <2 x i64> %in.subvec, ptr %out.subvec1.ptr, align 16
	Show All 24 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v2f64:			; AVX-LABEL: vec256_v2f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>			%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>
	store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0
	store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 1
	▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v4i32:			; AVX-LABEL: vec256_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0
	store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1
	store <4 x i32> %in.subvec, ptr %out.subvec1.ptr, align 16			store <4 x i32> %in.subvec, ptr %out.subvec1.ptr, align 16
	Show All 19 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v4f32:			; AVX-LABEL: vec256_v4f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>			%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>
	store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0
	store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 1
	▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v8i16:			; AVX-LABEL: vec256_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0
	store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1
	store <8 x i16> %in.subvec, ptr %out.subvec1.ptr, align 16			store <8 x i16> %in.subvec, ptr %out.subvec1.ptr, align 16
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec256_v16i8:			; AVX-LABEL: vec256_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0
	store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1
	store <16 x i8> %in.subvec, ptr %out.subvec1.ptr, align 16			store <16 x i8> %in.subvec, ptr %out.subvec1.ptr, align 16
	▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v2i64:			; AVX-LABEL: vec384_v2i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0
	store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1
	Show All 30 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v2f64:			; AVX-LABEL: vec384_v2f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>			%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>
	store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0
	store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64
	▲ Show 20 Lines • Show All 1,356 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v4i32:			; AVX-LABEL: vec384_v4i32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0
	store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1
	Show All 24 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v4f32:			; AVX-LABEL: vec384_v4f32:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>			%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>
	store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0
	store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64
	▲ Show 20 Lines • Show All 808 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v8i16:			; AVX-LABEL: vec384_v8i16:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0
	store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1
	▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec384_v16i8:			; AVX-LABEL: vec384_v16i8:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0
	store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1
	▲ Show 20 Lines • Show All 810 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v2i64:			; AVX1-LABEL: vec512_v2i64:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v2i64:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v2i64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 0
	store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 1
	store <2 x i64> %in.subvec, ptr %out.subvec1.ptr, align 16			store <2 x i64> %in.subvec, ptr %out.subvec1.ptr, align 16
	%out.subvec2.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 2			%out.subvec2.ptr = getelementptr <2 x i64>, ptr %out.vec.ptr, i64 2
	Show All 28 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v2f64:			; AVX1-LABEL: vec512_v2f64:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v2f64:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v2f64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <2 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>			%in.subvec.int = xor <2 x i64> %in.subvec.not, <i64 -1, i64 -1>
	%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>			%in.subvec = bitcast <2 x i64> %in.subvec.int to <2 x double>
	store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 0
	store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64			store <2 x double> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <2 x double>, ptr %out.vec.ptr, i64 1
	store <2 x double> %in.subvec, ptr %out.subvec1.ptr, align 16			store <2 x double> %in.subvec, ptr %out.subvec1.ptr, align 16
	▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v4i32:			; AVX1-LABEL: vec512_v4i32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v4i32:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v4i32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 0
	store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 1
	store <4 x i32> %in.subvec, ptr %out.subvec1.ptr, align 16			store <4 x i32> %in.subvec, ptr %out.subvec1.ptr, align 16
	%out.subvec2.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 2			%out.subvec2.ptr = getelementptr <4 x i32>, ptr %out.vec.ptr, i64 2
	Show All 21 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v4f32:			; AVX1-LABEL: vec512_v4f32:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v4f32:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v4f32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec.int = xor <4 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1>
	%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>			%in.subvec = bitcast <4 x i32> %in.subvec.int to <4 x float>
	store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 0
	store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x float> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x float>, ptr %out.vec.ptr, i64 1
	store <4 x float> %in.subvec, ptr %out.subvec1.ptr, align 16			store <4 x float> %in.subvec, ptr %out.subvec1.ptr, align 16
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v4i64:			; AVX2-ONLY-LABEL: vec512_v4i64:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v4i64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <4 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <4 x i64> %in.subvec.not, <i64 -1, i64 -1, i64 -1, i64 -1>			%in.subvec = xor <4 x i64> %in.subvec.not, <i64 -1, i64 -1, i64 -1, i64 -1>
	store <4 x i64> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x i64> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x i64>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x i64>, ptr %out.vec.ptr, i64 0
	store <4 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x i64> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x i64>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x i64>, ptr %out.vec.ptr, i64 1
	store <4 x i64> %in.subvec, ptr %out.subvec1.ptr, align 32			store <4 x i64> %in.subvec, ptr %out.subvec1.ptr, align 32
	ret void			ret void
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v4f64:			; AVX2-ONLY-LABEL: vec512_v4f64:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v4f64:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <4 x i64>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <4 x i64>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <4 x i64> %in.subvec.not, <i64 -1, i64 -1, i64 -1, i64 -1>			%in.subvec.int = xor <4 x i64> %in.subvec.not, <i64 -1, i64 -1, i64 -1, i64 -1>
	%in.subvec = bitcast <4 x i64> %in.subvec.int to <4 x double>			%in.subvec = bitcast <4 x i64> %in.subvec.int to <4 x double>
	store <4 x double> %in.subvec, ptr %out.subvec.ptr, align 64			store <4 x double> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <4 x double>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <4 x double>, ptr %out.vec.ptr, i64 0
	store <4 x double> %in.subvec, ptr %out.subvec0.ptr, align 64			store <4 x double> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <4 x double>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <4 x double>, ptr %out.vec.ptr, i64 1
	store <4 x double> %in.subvec, ptr %out.subvec1.ptr, align 32			store <4 x double> %in.subvec, ptr %out.subvec1.ptr, align 32
	▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v8i16:			; AVX1-LABEL: vec512_v8i16:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v8i16:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512F-LABEL: vec512_v8i16:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512F-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512F-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512F-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX512F-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX512F-NEXT: vzeroupper
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: vec512_v8i16:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512BW-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512BW-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
	%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <8 x i16>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			%in.subvec = xor <8 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 0
	store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64			store <8 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 1
	store <8 x i16> %in.subvec, ptr %out.subvec1.ptr, align 16			store <8 x i16> %in.subvec, ptr %out.subvec1.ptr, align 16
	%out.subvec2.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 2			%out.subvec2.ptr = getelementptr <8 x i16>, ptr %out.vec.ptr, i64 2
	▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v8i32:			; AVX2-ONLY-LABEL: vec512_v8i32:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v8i32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <8 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <8 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <8 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec = xor <8 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	store <8 x i32> %in.subvec, ptr %out.subvec.ptr, align 64			store <8 x i32> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <8 x i32>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <8 x i32>, ptr %out.vec.ptr, i64 0
	store <8 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64			store <8 x i32> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <8 x i32>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <8 x i32>, ptr %out.vec.ptr, i64 1
	store <8 x i32> %in.subvec, ptr %out.subvec1.ptr, align 32			store <8 x i32> %in.subvec, ptr %out.subvec1.ptr, align 32
	ret void			ret void
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v8f32:			; AVX2-ONLY-LABEL: vec512_v8f32:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512-LABEL: vec512_v8f32:
				; AVX512: # %bb.0:
				; AVX512-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
	%in.subvec.not = load <8 x i32>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <8 x i32>, ptr %in.subvec.ptr, align 64
	%in.subvec.int = xor <8 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>			%in.subvec.int = xor <8 x i32> %in.subvec.not, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
	%in.subvec = bitcast <8 x i32> %in.subvec.int to <8 x float>			%in.subvec = bitcast <8 x i32> %in.subvec.int to <8 x float>
	store <8 x float> %in.subvec, ptr %out.subvec.ptr, align 64			store <8 x float> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <8 x float>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <8 x float>, ptr %out.vec.ptr, i64 0
	store <8 x float> %in.subvec, ptr %out.subvec0.ptr, align 64			store <8 x float> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <8 x float>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <8 x float>, ptr %out.vec.ptr, i64 1
	store <8 x float> %in.subvec, ptr %out.subvec1.ptr, align 32			store <8 x float> %in.subvec, ptr %out.subvec1.ptr, align 32
	▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	; SSE2-NEXT: pxor (%rdi), %xmm0			; SSE2-NEXT: pxor (%rdi), %xmm0
	; SSE2-NEXT: movdqa %xmm0, (%rsi)			; SSE2-NEXT: movdqa %xmm0, (%rsi)
	; SSE2-NEXT: movdqa %xmm0, (%rdx)			; SSE2-NEXT: movdqa %xmm0, (%rdx)
	; SSE2-NEXT: movdqa %xmm0, 16(%rdx)			; SSE2-NEXT: movdqa %xmm0, 16(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 32(%rdx)			; SSE2-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE2-NEXT: movdqa %xmm0, 48(%rdx)			; SSE2-NEXT: movdqa %xmm0, 48(%rdx)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; AVX-LABEL: vec512_v16i8:			; AVX1-LABEL: vec512_v16i8:
	; AVX: # %bb.0:			; AVX1: # %bb.0:
	; AVX-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX1-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX-NEXT: vpxor (%rdi), %xmm0, %xmm0			; AVX1-NEXT: vpxor (%rdi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, (%rsi)			; AVX1-NEXT: vmovdqa %xmm0, (%rsi)
	; AVX-NEXT: vmovdqa %xmm0, (%rdx)			; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 48(%rdx)			; AVX1-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX1-NEXT: retq
				;
				; AVX2-ONLY-LABEL: vec512_v16i8:
				; AVX2-ONLY: # %bb.0:
				; AVX2-ONLY-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX2-ONLY-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX2-ONLY-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX2-ONLY-NEXT: vzeroupper
				; AVX2-ONLY-NEXT: retq
				;
				; AVX512F-LABEL: vec512_v16i8:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512F-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512F-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512F-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512F-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX512F-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX512F-NEXT: vzeroupper
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: vec512_v16i8:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
				; AVX512BW-NEXT: vpxor (%rdi), %xmm0, %xmm0
				; AVX512BW-NEXT: vmovdqa %xmm0, (%rsi)
				; AVX512BW-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
	%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <16 x i8>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			%in.subvec = xor <16 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 0
	store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64			store <16 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 1
	store <16 x i8> %in.subvec, ptr %out.subvec1.ptr, align 16			store <16 x i8> %in.subvec, ptr %out.subvec1.ptr, align 16
	%out.subvec2.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 2			%out.subvec2.ptr = getelementptr <16 x i8>, ptr %out.vec.ptr, i64 2
	▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v16i16:			; AVX2-ONLY-LABEL: vec512_v16i16:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512F-LABEL: vec512_v16i16:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512F-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512F-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX512F-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX512F-NEXT: vzeroupper
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: vec512_v16i16:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512BW-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
	%in.subvec.not = load <16 x i16>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <16 x i16>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <16 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>			%in.subvec = xor <16 x i16> %in.subvec.not, <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>
	store <16 x i16> %in.subvec, ptr %out.subvec.ptr, align 64			store <16 x i16> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <16 x i16>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <16 x i16>, ptr %out.vec.ptr, i64 0
	store <16 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64			store <16 x i16> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <16 x i16>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <16 x i16>, ptr %out.vec.ptr, i64 1
	store <16 x i16> %in.subvec, ptr %out.subvec1.ptr, align 32			store <16 x i16> %in.subvec, ptr %out.subvec1.ptr, align 32
	ret void			ret void
	▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0			; AVX1-NEXT: vcmptrueps %ymm0, %ymm0, %ymm0
	; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0			; AVX1-NEXT: vxorps (%rdi), %ymm0, %ymm0
	; AVX1-NEXT: vmovaps %ymm0, (%rsi)			; AVX1-NEXT: vmovaps %ymm0, (%rsi)
	; AVX1-NEXT: vmovaps %ymm0, (%rdx)			; AVX1-NEXT: vmovaps %ymm0, (%rdx)
	; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdx)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: vec512_v32i8:			; AVX2-ONLY-LABEL: vec512_v32i8:
	; AVX2: # %bb.0:			; AVX2-ONLY: # %bb.0:
	; AVX2-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
	; AVX2-NEXT: vpxor (%rdi), %ymm0, %ymm0			; AVX2-ONLY-NEXT: vpxor (%rdi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, (%rsi)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rsi)
	; AVX2-NEXT: vmovdqa %ymm0, (%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, (%rdx)
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-ONLY-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-ONLY-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-ONLY-NEXT: retq
				;
				; AVX512F-LABEL: vec512_v32i8:
				; AVX512F: # %bb.0:
				; AVX512F-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512F-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512F-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512F-NEXT: vmovdqa %ymm0, 32(%rdx)
				; AVX512F-NEXT: vmovdqa %ymm0, (%rdx)
				; AVX512F-NEXT: vzeroupper
				; AVX512F-NEXT: retq
				;
				; AVX512BW-LABEL: vec512_v32i8:
				; AVX512BW: # %bb.0:
				; AVX512BW-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
				; AVX512BW-NEXT: vpxor (%rdi), %ymm0, %ymm0
				; AVX512BW-NEXT: vmovdqa %ymm0, (%rsi)
				; AVX512BW-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0
				; AVX512BW-NEXT: vmovdqa64 %zmm0, (%rdx)
				; AVX512BW-NEXT: vzeroupper
				; AVX512BW-NEXT: retq
	%in.subvec.not = load <32 x i8>, ptr %in.subvec.ptr, align 64			%in.subvec.not = load <32 x i8>, ptr %in.subvec.ptr, align 64
	%in.subvec = xor <32 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>			%in.subvec = xor <32 x i8> %in.subvec.not, <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>
	store <32 x i8> %in.subvec, ptr %out.subvec.ptr, align 64			store <32 x i8> %in.subvec, ptr %out.subvec.ptr, align 64
	%out.subvec0.ptr = getelementptr <32 x i8>, ptr %out.vec.ptr, i64 0			%out.subvec0.ptr = getelementptr <32 x i8>, ptr %out.vec.ptr, i64 0
	store <32 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64			store <32 x i8> %in.subvec, ptr %out.subvec0.ptr, align 64
	%out.subvec1.ptr = getelementptr <32 x i8>, ptr %out.vec.ptr, i64 1			%out.subvec1.ptr = getelementptr <32 x i8>, ptr %out.vec.ptr, i64 1
	store <32 x i8> %in.subvec, ptr %out.subvec1.ptr, align 32			store <32 x i8> %in.subvec, ptr %out.subvec1.ptr, align 32
	ret void			ret void
	}			}
	;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:			;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
	; SSSE3: {{.*}}			; SSSE3: {{.*}}

llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll

	Show First 20 Lines • Show All 1,010 Lines • ▼ Show 20 Lines
	; X32-SSE42-NEXT: movl 4(%edx), %edi			; X32-SSE42-NEXT: movl 4(%edx), %edi
	; X32-SSE42-NEXT: movl 8(%edx), %ebx			; X32-SSE42-NEXT: movl 8(%edx), %ebx
	; X32-SSE42-NEXT: movl 12(%edx), %edx			; X32-SSE42-NEXT: movl 12(%edx), %edx
	; X32-SSE42-NEXT: movzbl (%ecx), %ecx			; X32-SSE42-NEXT: movzbl (%ecx), %ecx
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %esi, (%esp)			; X32-SSE42-NEXT: movl %esi, (%esp)
	; X32-SSE42-NEXT: sarl $31, %edx			; X32-SSE42-NEXT: movd %edx, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: psrad $31, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: andl $15, %ecx			; X32-SSE42-NEXT: andl $15, %ecx
	; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0			; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0
	; X32-SSE42-NEXT: movups %xmm0, (%eax)			; X32-SSE42-NEXT: movups %xmm0, (%eax)
	; X32-SSE42-NEXT: addl $32, %esp			; X32-SSE42-NEXT: addl $32, %esp
	; X32-SSE42-NEXT: popl %esi			; X32-SSE42-NEXT: popl %esi
	; X32-SSE42-NEXT: popl %edi			; X32-SSE42-NEXT: popl %edi
	; X32-SSE42-NEXT: popl %ebx			; X32-SSE42-NEXT: popl %ebx
	; X32-SSE42-NEXT: retl			; X32-SSE42-NEXT: retl
	;			;
	; X32-AVX-LABEL: ashr_16bytes:			; X32-AVX1-LABEL: ashr_16bytes:
	; X32-AVX: # %bb.0:			; X32-AVX1: # %bb.0:
	; X32-AVX-NEXT: pushl %ebx			; X32-AVX1-NEXT: pushl %ebx
	; X32-AVX-NEXT: pushl %edi			; X32-AVX1-NEXT: pushl %edi
	; X32-AVX-NEXT: pushl %esi			; X32-AVX1-NEXT: pushl %esi
	; X32-AVX-NEXT: subl $32, %esp			; X32-AVX1-NEXT: subl $32, %esp
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %edx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-AVX-NEXT: movl (%edx), %esi			; X32-AVX1-NEXT: movl (%edx), %esi
	; X32-AVX-NEXT: movl 4(%edx), %edi			; X32-AVX1-NEXT: movl 4(%edx), %edi
	; X32-AVX-NEXT: movl 8(%edx), %ebx			; X32-AVX1-NEXT: movl 8(%edx), %ebx
	; X32-AVX-NEXT: movl 12(%edx), %edx			; X32-AVX1-NEXT: movl 12(%edx), %edx
	; X32-AVX-NEXT: movzbl (%ecx), %ecx			; X32-AVX1-NEXT: movzbl (%ecx), %ecx
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %esi, (%esp)			; X32-AVX1-NEXT: movl %esi, (%esp)
	; X32-AVX-NEXT: sarl $31, %edx			; X32-AVX1-NEXT: vmovd %edx, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpsrad $31, %xmm0, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: andl $15, %ecx
	; X32-AVX-NEXT: andl $15, %ecx			; X32-AVX1-NEXT: vmovups (%esp,%ecx), %xmm0
	; X32-AVX-NEXT: vmovups (%esp,%ecx), %xmm0			; X32-AVX1-NEXT: vmovups %xmm0, (%eax)
	; X32-AVX-NEXT: vmovups %xmm0, (%eax)			; X32-AVX1-NEXT: addl $32, %esp
	; X32-AVX-NEXT: addl $32, %esp			; X32-AVX1-NEXT: popl %esi
	; X32-AVX-NEXT: popl %esi			; X32-AVX1-NEXT: popl %edi
	; X32-AVX-NEXT: popl %edi			; X32-AVX1-NEXT: popl %ebx
	; X32-AVX-NEXT: popl %ebx			; X32-AVX1-NEXT: retl
	; X32-AVX-NEXT: retl			;
				; X32-AVX512-LABEL: ashr_16bytes:
				; X32-AVX512: # %bb.0:
				; X32-AVX512-NEXT: pushl %ebx
				; X32-AVX512-NEXT: pushl %edi
				; X32-AVX512-NEXT: pushl %esi
				; X32-AVX512-NEXT: subl $32, %esp
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X32-AVX512-NEXT: movl (%edx), %esi
				; X32-AVX512-NEXT: movl 4(%edx), %edi
				; X32-AVX512-NEXT: movl 8(%edx), %ebx
				; X32-AVX512-NEXT: movl 12(%edx), %edx
				; X32-AVX512-NEXT: movzbl (%ecx), %ecx
				; X32-AVX512-NEXT: movl %edx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %ebx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %edi, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %esi, (%esp)
				; X32-AVX512-NEXT: sarl $31, %edx
				; X32-AVX512-NEXT: vpbroadcastd %edx, %xmm0
				; X32-AVX512-NEXT: vmovdqu %xmm0, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: andl $15, %ecx
				; X32-AVX512-NEXT: vmovups (%esp,%ecx), %xmm0
				; X32-AVX512-NEXT: vmovups %xmm0, (%eax)
				; X32-AVX512-NEXT: addl $32, %esp
				; X32-AVX512-NEXT: popl %esi
				; X32-AVX512-NEXT: popl %edi
				; X32-AVX512-NEXT: popl %ebx
				; X32-AVX512-NEXT: retl
	%src = load i128, ptr %src.ptr, align 1			%src = load i128, ptr %src.ptr, align 1
	%byteOff = load i128, ptr %byteOff.ptr, align 1			%byteOff = load i128, ptr %byteOff.ptr, align 1
	%bitOff = shl i128 %byteOff, 3			%bitOff = shl i128 %byteOff, 3
	%res = ashr i128 %src, %bitOff			%res = ashr i128 %src, %bitOff
	store i128 %res, ptr %dst, align 1			store i128 %res, ptr %dst, align 1
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines
	; X64-SSE42: # %bb.0:			; X64-SSE42: # %bb.0:
	; X64-SSE42-NEXT: movups (%rdi), %xmm0			; X64-SSE42-NEXT: movups (%rdi), %xmm0
	; X64-SSE42-NEXT: movq 16(%rdi), %rax			; X64-SSE42-NEXT: movq 16(%rdi), %rax
	; X64-SSE42-NEXT: movq 24(%rdi), %rcx			; X64-SSE42-NEXT: movq 24(%rdi), %rcx
	; X64-SSE42-NEXT: movzbl (%rsi), %esi			; X64-SSE42-NEXT: movzbl (%rsi), %esi
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movups %xmm0, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movups %xmm0, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: sarq $63, %rcx			; X64-SSE42-NEXT: movq %rcx, %xmm0
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pxor %xmm1, %xmm1
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pcmpgtq %xmm0, %xmm1
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
				; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: andl $31, %esi			; X64-SSE42-NEXT: andl $31, %esi
	; X64-SSE42-NEXT: movups -64(%rsp,%rsi), %xmm0			; X64-SSE42-NEXT: movups -64(%rsp,%rsi), %xmm0
	; X64-SSE42-NEXT: movups -48(%rsp,%rsi), %xmm1			; X64-SSE42-NEXT: movups -48(%rsp,%rsi), %xmm1
	; X64-SSE42-NEXT: movups %xmm1, 16(%rdx)
	; X64-SSE42-NEXT: movups %xmm0, (%rdx)			; X64-SSE42-NEXT: movups %xmm0, (%rdx)
				; X64-SSE42-NEXT: movups %xmm1, 16(%rdx)
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: ashr_32bytes:			; X64-AVX1-LABEL: ashr_32bytes:
	; X64-AVX: # %bb.0:			; X64-AVX1: # %bb.0:
	; X64-AVX-NEXT: vmovups (%rdi), %xmm0			; X64-AVX1-NEXT: vmovups (%rdi), %xmm0
	; X64-AVX-NEXT: movq 16(%rdi), %rax			; X64-AVX1-NEXT: movq 16(%rdi), %rax
	; X64-AVX-NEXT: movq 24(%rdi), %rcx			; X64-AVX1-NEXT: movq 24(%rdi), %rcx
	; X64-AVX-NEXT: movzbl (%rsi), %esi			; X64-AVX1-NEXT: movzbl (%rsi), %esi
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: vmovups %xmm0, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovups %xmm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: sarq $63, %rcx			; X64-AVX1-NEXT: vmovq %rcx, %xmm0
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovdqu %xmm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: andl $31, %esi			; X64-AVX1-NEXT: vmovdqu %xmm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: vmovups -64(%rsp,%rsi), %xmm0			; X64-AVX1-NEXT: andl $31, %esi
	; X64-AVX-NEXT: vmovups -48(%rsp,%rsi), %xmm1			; X64-AVX1-NEXT: vmovups -64(%rsp,%rsi), %xmm0
	; X64-AVX-NEXT: vmovups %xmm1, 16(%rdx)			; X64-AVX1-NEXT: vmovups -48(%rsp,%rsi), %xmm1
	; X64-AVX-NEXT: vmovups %xmm0, (%rdx)			; X64-AVX1-NEXT: vmovups %xmm0, (%rdx)
	; X64-AVX-NEXT: retq			; X64-AVX1-NEXT: vmovups %xmm1, 16(%rdx)
				; X64-AVX1-NEXT: retq
				;
				; X64-AVX512-LABEL: ashr_32bytes:
				; X64-AVX512: # %bb.0:
				; X64-AVX512-NEXT: vmovups (%rdi), %xmm0
				; X64-AVX512-NEXT: movq 16(%rdi), %rax
				; X64-AVX512-NEXT: movq 24(%rdi), %rcx
				; X64-AVX512-NEXT: movzbl (%rsi), %esi
				; X64-AVX512-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: vmovups %xmm0, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: sarq $63, %rcx
				; X64-AVX512-NEXT: vpbroadcastq %rcx, %ymm0
				; X64-AVX512-NEXT: vmovdqu %ymm0, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: andl $31, %esi
				; X64-AVX512-NEXT: vmovups -64(%rsp,%rsi), %xmm0
				; X64-AVX512-NEXT: vmovups -48(%rsp,%rsi), %xmm1
				; X64-AVX512-NEXT: vmovups %xmm0, (%rdx)
				; X64-AVX512-NEXT: vmovups %xmm1, 16(%rdx)
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	;			;
	; X32-SSE2-LABEL: ashr_32bytes:			; X32-SSE2-LABEL: ashr_32bytes:
	; X32-SSE2: # %bb.0:			; X32-SSE2: # %bb.0:
	; X32-SSE2-NEXT: pushl %ebp			; X32-SSE2-NEXT: pushl %ebp
	; X32-SSE2-NEXT: pushl %ebx			; X32-SSE2-NEXT: pushl %ebx
	; X32-SSE2-NEXT: pushl %edi			; X32-SSE2-NEXT: pushl %edi
	; X32-SSE2-NEXT: pushl %esi			; X32-SSE2-NEXT: pushl %esi
	; X32-SSE2-NEXT: subl $72, %esp			; X32-SSE2-NEXT: subl $72, %esp
	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	; X32-SSE42-NEXT: movl 24(%edx), %ebx			; X32-SSE42-NEXT: movl 24(%edx), %ebx
	; X32-SSE42-NEXT: movl 28(%edx), %edx			; X32-SSE42-NEXT: movl 28(%edx), %edx
	; X32-SSE42-NEXT: movzbl (%ecx), %ecx			; X32-SSE42-NEXT: movzbl (%ecx), %ecx
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %esi, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %esi, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movups %xmm0, (%esp)			; X32-SSE42-NEXT: movups %xmm0, (%esp)
	; X32-SSE42-NEXT: sarl $31, %edx			; X32-SSE42-NEXT: movd %edx, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: psrad $31, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: andl $31, %ecx			; X32-SSE42-NEXT: andl $31, %ecx
	; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0			; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0
	; X32-SSE42-NEXT: movups 16(%esp,%ecx), %xmm1			; X32-SSE42-NEXT: movups 16(%esp,%ecx), %xmm1
	; X32-SSE42-NEXT: movups %xmm1, 16(%eax)
	; X32-SSE42-NEXT: movups %xmm0, (%eax)			; X32-SSE42-NEXT: movups %xmm0, (%eax)
				; X32-SSE42-NEXT: movups %xmm1, 16(%eax)
	; X32-SSE42-NEXT: addl $64, %esp			; X32-SSE42-NEXT: addl $64, %esp
	; X32-SSE42-NEXT: popl %esi			; X32-SSE42-NEXT: popl %esi
	; X32-SSE42-NEXT: popl %edi			; X32-SSE42-NEXT: popl %edi
	; X32-SSE42-NEXT: popl %ebx			; X32-SSE42-NEXT: popl %ebx
	; X32-SSE42-NEXT: retl			; X32-SSE42-NEXT: retl
	;			;
	; X32-AVX-LABEL: ashr_32bytes:			; X32-AVX1-LABEL: ashr_32bytes:
	; X32-AVX: # %bb.0:			; X32-AVX1: # %bb.0:
	; X32-AVX-NEXT: pushl %ebx			; X32-AVX1-NEXT: pushl %ebx
	; X32-AVX-NEXT: pushl %edi			; X32-AVX1-NEXT: pushl %edi
	; X32-AVX-NEXT: pushl %esi			; X32-AVX1-NEXT: pushl %esi
	; X32-AVX-NEXT: subl $64, %esp			; X32-AVX1-NEXT: subl $64, %esp
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %edx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-AVX-NEXT: vmovups (%edx), %xmm0			; X32-AVX1-NEXT: vmovups (%edx), %xmm0
	; X32-AVX-NEXT: movl 16(%edx), %esi			; X32-AVX1-NEXT: movl 16(%edx), %esi
	; X32-AVX-NEXT: movl 20(%edx), %edi			; X32-AVX1-NEXT: movl 20(%edx), %edi
	; X32-AVX-NEXT: movl 24(%edx), %ebx			; X32-AVX1-NEXT: movl 24(%edx), %ebx
	; X32-AVX-NEXT: movl 28(%edx), %edx			; X32-AVX1-NEXT: movl 28(%edx), %edx
	; X32-AVX-NEXT: movzbl (%ecx), %ecx			; X32-AVX1-NEXT: movzbl (%ecx), %ecx
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %esi, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %esi, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: vmovups %xmm0, (%esp)			; X32-AVX1-NEXT: vmovups %xmm0, (%esp)
	; X32-AVX-NEXT: sarl $31, %edx			; X32-AVX1-NEXT: vmovd %edx, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpsrad $31, %xmm0, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: andl $31, %ecx
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups (%esp,%ecx), %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups 16(%esp,%ecx), %xmm1
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm0, (%eax)
	; X32-AVX-NEXT: andl $31, %ecx			; X32-AVX1-NEXT: vmovups %xmm1, 16(%eax)
	; X32-AVX-NEXT: vmovups (%esp,%ecx), %xmm0			; X32-AVX1-NEXT: addl $64, %esp
	; X32-AVX-NEXT: vmovups 16(%esp,%ecx), %xmm1			; X32-AVX1-NEXT: popl %esi
	; X32-AVX-NEXT: vmovups %xmm1, 16(%eax)			; X32-AVX1-NEXT: popl %edi
	; X32-AVX-NEXT: vmovups %xmm0, (%eax)			; X32-AVX1-NEXT: popl %ebx
	; X32-AVX-NEXT: addl $64, %esp			; X32-AVX1-NEXT: retl
	; X32-AVX-NEXT: popl %esi			;
	; X32-AVX-NEXT: popl %edi			; X32-AVX512-LABEL: ashr_32bytes:
	; X32-AVX-NEXT: popl %ebx			; X32-AVX512: # %bb.0:
	; X32-AVX-NEXT: retl			; X32-AVX512-NEXT: pushl %ebx
				; X32-AVX512-NEXT: pushl %edi
				; X32-AVX512-NEXT: pushl %esi
				; X32-AVX512-NEXT: subl $64, %esp
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx
				; X32-AVX512-NEXT: vmovups (%edx), %xmm0
				; X32-AVX512-NEXT: movl 16(%edx), %esi
				; X32-AVX512-NEXT: movl 20(%edx), %edi
				; X32-AVX512-NEXT: movl 24(%edx), %ebx
				; X32-AVX512-NEXT: movl 28(%edx), %edx
				; X32-AVX512-NEXT: movzbl (%ecx), %ecx
				; X32-AVX512-NEXT: movl %edx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %ebx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %edi, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %esi, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: vmovups %xmm0, (%esp)
				; X32-AVX512-NEXT: sarl $31, %edx
				; X32-AVX512-NEXT: vpbroadcastd %edx, %ymm0
				; X32-AVX512-NEXT: vmovdqu %ymm0, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: andl $31, %ecx
				; X32-AVX512-NEXT: vmovups (%esp,%ecx), %xmm0
				; X32-AVX512-NEXT: vmovups 16(%esp,%ecx), %xmm1
				; X32-AVX512-NEXT: vmovups %xmm0, (%eax)
				; X32-AVX512-NEXT: vmovups %xmm1, 16(%eax)
				; X32-AVX512-NEXT: addl $64, %esp
				; X32-AVX512-NEXT: popl %esi
				; X32-AVX512-NEXT: popl %edi
				; X32-AVX512-NEXT: popl %ebx
				; X32-AVX512-NEXT: vzeroupper
				; X32-AVX512-NEXT: retl
	%src = load i256, ptr %src.ptr, align 1			%src = load i256, ptr %src.ptr, align 1
	%byteOff = load i256, ptr %byteOff.ptr, align 1			%byteOff = load i256, ptr %byteOff.ptr, align 1
	%bitOff = shl i256 %byteOff, 3			%bitOff = shl i256 %byteOff, 3
	%res = ashr i256 %src, %bitOff			%res = ashr i256 %src, %bitOff
	store i256 %res, ptr %dst, align 1			store i256 %res, ptr %dst, align 1
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 757 Lines • ▼ Show 20 Lines
	; X64-SSE2-NEXT: movq %rsi, 8(%rdx)			; X64-SSE2-NEXT: movq %rsi, 8(%rdx)
	; X64-SSE2-NEXT: popq %rbx			; X64-SSE2-NEXT: popq %rbx
	; X64-SSE2-NEXT: popq %r14			; X64-SSE2-NEXT: popq %r14
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
	;			;
	; X64-SSE42-LABEL: ashr_64bytes:			; X64-SSE42-LABEL: ashr_64bytes:
	; X64-SSE42: # %bb.0:			; X64-SSE42: # %bb.0:
	; X64-SSE42-NEXT: movups (%rdi), %xmm0			; X64-SSE42-NEXT: movups (%rdi), %xmm0
	; X64-SSE42-NEXT: movups 16(%rdi), %xmm1			; X64-SSE42-NEXT: movdqu 16(%rdi), %xmm1
	; X64-SSE42-NEXT: movups 32(%rdi), %xmm2			; X64-SSE42-NEXT: movups 32(%rdi), %xmm2
	; X64-SSE42-NEXT: movq 48(%rdi), %rax			; X64-SSE42-NEXT: movq 48(%rdi), %rax
	; X64-SSE42-NEXT: movq 56(%rdi), %rcx			; X64-SSE42-NEXT: movq 56(%rdi), %rcx
	; X64-SSE42-NEXT: movl (%rsi), %esi			; X64-SSE42-NEXT: movl (%rsi), %esi
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movups %xmm2, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movups %xmm2, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movups %xmm1, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movups %xmm0, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movups %xmm0, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: sarq $63, %rcx			; X64-SSE42-NEXT: movq %rcx, %xmm0
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pxor %xmm1, %xmm1
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: pcmpgtq %xmm0, %xmm1
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-SSE42-NEXT: movdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; X64-SSE42-NEXT: andl $63, %esi			; X64-SSE42-NEXT: andl $63, %esi
	; X64-SSE42-NEXT: movups -128(%rsp,%rsi), %xmm0			; X64-SSE42-NEXT: movups -128(%rsp,%rsi), %xmm0
	; X64-SSE42-NEXT: movups -112(%rsp,%rsi), %xmm1			; X64-SSE42-NEXT: movups -112(%rsp,%rsi), %xmm1
	; X64-SSE42-NEXT: movups -96(%rsp,%rsi), %xmm2			; X64-SSE42-NEXT: movups -96(%rsp,%rsi), %xmm2
	; X64-SSE42-NEXT: movups -80(%rsp,%rsi), %xmm3			; X64-SSE42-NEXT: movups -80(%rsp,%rsi), %xmm3
	; X64-SSE42-NEXT: movups %xmm1, 16(%rdx)
	; X64-SSE42-NEXT: movups %xmm2, 32(%rdx)			; X64-SSE42-NEXT: movups %xmm2, 32(%rdx)
	; X64-SSE42-NEXT: movups %xmm3, 48(%rdx)			; X64-SSE42-NEXT: movups %xmm1, 16(%rdx)
	; X64-SSE42-NEXT: movups %xmm0, (%rdx)			; X64-SSE42-NEXT: movups %xmm0, (%rdx)
				; X64-SSE42-NEXT: movups %xmm3, 48(%rdx)
	; X64-SSE42-NEXT: retq			; X64-SSE42-NEXT: retq
	;			;
	; X64-AVX-LABEL: ashr_64bytes:			; X64-AVX1-LABEL: ashr_64bytes:
	; X64-AVX: # %bb.0:			; X64-AVX1: # %bb.0:
	; X64-AVX-NEXT: vmovups (%rdi), %ymm0			; X64-AVX1-NEXT: vmovups (%rdi), %ymm0
	; X64-AVX-NEXT: vmovups 32(%rdi), %xmm1			; X64-AVX1-NEXT: vmovdqu 32(%rdi), %xmm1
	; X64-AVX-NEXT: movq 48(%rdi), %rax			; X64-AVX1-NEXT: movq 48(%rdi), %rax
	; X64-AVX-NEXT: movq 56(%rdi), %rcx			; X64-AVX1-NEXT: movq 56(%rdi), %rcx
	; X64-AVX-NEXT: movl (%rsi), %esi			; X64-AVX1-NEXT: movl (%rsi), %esi
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: movq %rax, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: vmovups %xmm1, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovdqu %xmm1, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: sarq $63, %rcx			; X64-AVX1-NEXT: vmovq %rcx, %xmm0
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vpcmpgtq %xmm0, %xmm1, %xmm0
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: andl $63, %esi
	; X64-AVX-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)			; X64-AVX1-NEXT: vmovups -128(%rsp,%rsi), %xmm0
	; X64-AVX-NEXT: andl $63, %esi			; X64-AVX1-NEXT: vmovups -112(%rsp,%rsi), %xmm1
	; X64-AVX-NEXT: vmovups -128(%rsp,%rsi), %xmm0			; X64-AVX1-NEXT: vmovups -96(%rsp,%rsi), %xmm2
	; X64-AVX-NEXT: vmovups -112(%rsp,%rsi), %xmm1			; X64-AVX1-NEXT: vmovups -80(%rsp,%rsi), %xmm3
	; X64-AVX-NEXT: vmovups -96(%rsp,%rsi), %xmm2			; X64-AVX1-NEXT: vmovups %xmm2, 32(%rdx)
	; X64-AVX-NEXT: vmovups -80(%rsp,%rsi), %xmm3			; X64-AVX1-NEXT: vmovups %xmm1, 16(%rdx)
	; X64-AVX-NEXT: vmovups %xmm1, 16(%rdx)			; X64-AVX1-NEXT: vmovups %xmm0, (%rdx)
	; X64-AVX-NEXT: vmovups %xmm2, 32(%rdx)			; X64-AVX1-NEXT: vmovups %xmm3, 48(%rdx)
	; X64-AVX-NEXT: vmovups %xmm3, 48(%rdx)			; X64-AVX1-NEXT: vzeroupper
	; X64-AVX-NEXT: vmovups %xmm0, (%rdx)			; X64-AVX1-NEXT: retq
	; X64-AVX-NEXT: vzeroupper			;
	; X64-AVX-NEXT: retq			; X64-AVX512-LABEL: ashr_64bytes:
				; X64-AVX512: # %bb.0:
				; X64-AVX512-NEXT: vmovups (%rdi), %ymm0
				; X64-AVX512-NEXT: vmovups 32(%rdi), %xmm1
				; X64-AVX512-NEXT: movq 48(%rdi), %rax
				; X64-AVX512-NEXT: movq 56(%rdi), %rcx
				; X64-AVX512-NEXT: movl (%rsi), %esi
				; X64-AVX512-NEXT: movq %rcx, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: movq %rax, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: vmovups %xmm1, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: vmovups %ymm0, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: sarq $63, %rcx
				; X64-AVX512-NEXT: vpbroadcastq %rcx, %zmm0
				; X64-AVX512-NEXT: vmovdqu64 %zmm0, -{{[0-9]+}}(%rsp)
				; X64-AVX512-NEXT: andl $63, %esi
				; X64-AVX512-NEXT: vmovups -128(%rsp,%rsi), %xmm0
				; X64-AVX512-NEXT: vmovups -112(%rsp,%rsi), %xmm1
				; X64-AVX512-NEXT: vmovups -96(%rsp,%rsi), %xmm2
				; X64-AVX512-NEXT: vmovups -80(%rsp,%rsi), %xmm3
				; X64-AVX512-NEXT: vmovups %xmm2, 32(%rdx)
				; X64-AVX512-NEXT: vmovups %xmm1, 16(%rdx)
				; X64-AVX512-NEXT: vmovups %xmm0, (%rdx)
				; X64-AVX512-NEXT: vmovups %xmm3, 48(%rdx)
				; X64-AVX512-NEXT: vzeroupper
				; X64-AVX512-NEXT: retq
	;			;
	; X32-SSE2-LABEL: ashr_64bytes:			; X32-SSE2-LABEL: ashr_64bytes:
	; X32-SSE2: # %bb.0:			; X32-SSE2: # %bb.0:
	; X32-SSE2-NEXT: pushl %ebp			; X32-SSE2-NEXT: pushl %ebp
	; X32-SSE2-NEXT: pushl %ebx			; X32-SSE2-NEXT: pushl %ebx
	; X32-SSE2-NEXT: pushl %edi			; X32-SSE2-NEXT: pushl %edi
	; X32-SSE2-NEXT: pushl %esi			; X32-SSE2-NEXT: pushl %esi
	; X32-SSE2-NEXT: subl $168, %esp			; X32-SSE2-NEXT: subl $168, %esp
	▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; X32-SSE42-NEXT: movl (%ecx), %ecx			; X32-SSE42-NEXT: movl (%ecx), %ecx
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %esi, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movl %esi, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movups %xmm2, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movups %xmm2, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movups %xmm1, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movups %xmm1, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movups %xmm0, (%esp)			; X32-SSE42-NEXT: movups %xmm0, (%esp)
	; X32-SSE42-NEXT: sarl $31, %edx			; X32-SSE42-NEXT: movd %edx, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: psrad $31, %xmm0
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-SSE42-NEXT: movdqu %xmm0, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-SSE42-NEXT: andl $63, %ecx			; X32-SSE42-NEXT: andl $63, %ecx
	; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0			; X32-SSE42-NEXT: movups (%esp,%ecx), %xmm0
	; X32-SSE42-NEXT: movups 16(%esp,%ecx), %xmm1			; X32-SSE42-NEXT: movups 16(%esp,%ecx), %xmm1
	; X32-SSE42-NEXT: movups 32(%esp,%ecx), %xmm2			; X32-SSE42-NEXT: movups 32(%esp,%ecx), %xmm2
	; X32-SSE42-NEXT: movups 48(%esp,%ecx), %xmm3			; X32-SSE42-NEXT: movups 48(%esp,%ecx), %xmm3
	; X32-SSE42-NEXT: movups %xmm3, 48(%eax)			; X32-SSE42-NEXT: movups %xmm3, 48(%eax)
	; X32-SSE42-NEXT: movups %xmm2, 32(%eax)			; X32-SSE42-NEXT: movups %xmm2, 32(%eax)
	; X32-SSE42-NEXT: movups %xmm1, 16(%eax)			; X32-SSE42-NEXT: movups %xmm1, 16(%eax)
	; X32-SSE42-NEXT: movups %xmm0, (%eax)			; X32-SSE42-NEXT: movups %xmm0, (%eax)
	; X32-SSE42-NEXT: addl $128, %esp			; X32-SSE42-NEXT: addl $128, %esp
	; X32-SSE42-NEXT: popl %esi			; X32-SSE42-NEXT: popl %esi
	; X32-SSE42-NEXT: popl %edi			; X32-SSE42-NEXT: popl %edi
	; X32-SSE42-NEXT: popl %ebx			; X32-SSE42-NEXT: popl %ebx
	; X32-SSE42-NEXT: retl			; X32-SSE42-NEXT: retl
	;			;
	; X32-AVX-LABEL: ashr_64bytes:			; X32-AVX1-LABEL: ashr_64bytes:
	; X32-AVX: # %bb.0:			; X32-AVX1: # %bb.0:
	; X32-AVX-NEXT: pushl %ebx			; X32-AVX1-NEXT: pushl %ebx
	; X32-AVX-NEXT: pushl %edi			; X32-AVX1-NEXT: pushl %edi
	; X32-AVX-NEXT: pushl %esi			; X32-AVX1-NEXT: pushl %esi
	; X32-AVX-NEXT: subl $128, %esp			; X32-AVX1-NEXT: subl $128, %esp
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-AVX-NEXT: movl {{[0-9]+}}(%esp), %edx			; X32-AVX1-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-AVX-NEXT: vmovups (%edx), %ymm0			; X32-AVX1-NEXT: vmovups (%edx), %ymm0
	; X32-AVX-NEXT: vmovups 32(%edx), %xmm1			; X32-AVX1-NEXT: vmovups 32(%edx), %xmm1
	; X32-AVX-NEXT: movl 48(%edx), %esi			; X32-AVX1-NEXT: movl 48(%edx), %esi
	; X32-AVX-NEXT: movl 52(%edx), %edi			; X32-AVX1-NEXT: movl 52(%edx), %edi
	; X32-AVX-NEXT: movl 56(%edx), %ebx			; X32-AVX1-NEXT: movl 56(%edx), %ebx
	; X32-AVX-NEXT: movl 60(%edx), %edx			; X32-AVX1-NEXT: movl 60(%edx), %edx
	; X32-AVX-NEXT: movl (%ecx), %ecx			; X32-AVX1-NEXT: movl (%ecx), %ecx
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %ebx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %ebx, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edi, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %edi, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %esi, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: movl %esi, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: vmovups %xmm1, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm1, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: vmovups %ymm0, (%esp)			; X32-AVX1-NEXT: vmovups %ymm0, (%esp)
	; X32-AVX-NEXT: sarl $31, %edx			; X32-AVX1-NEXT: vmovd %edx, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vpsrad $31, %xmm0, %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %ymm0, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %ymm0, {{[0-9]+}}(%esp)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: andl $63, %ecx
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups (%esp,%ecx), %xmm0
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups 16(%esp,%ecx), %xmm1
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups 32(%esp,%ecx), %xmm2
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups 48(%esp,%ecx), %xmm3
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm3, 48(%eax)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm2, 32(%eax)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm1, 16(%eax)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: vmovups %xmm0, (%eax)
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: addl $128, %esp
	; X32-AVX-NEXT: movl %edx, {{[0-9]+}}(%esp)			; X32-AVX1-NEXT: popl %esi
	; X32-AVX-NEXT: andl $63, %ecx			; X32-AVX1-NEXT: popl %edi
	; X32-AVX-NEXT: vmovups (%esp,%ecx), %xmm0			; X32-AVX1-NEXT: popl %ebx
	; X32-AVX-NEXT: vmovups 16(%esp,%ecx), %xmm1			; X32-AVX1-NEXT: vzeroupper
	; X32-AVX-NEXT: vmovups 32(%esp,%ecx), %xmm2			; X32-AVX1-NEXT: retl
	; X32-AVX-NEXT: vmovups 48(%esp,%ecx), %xmm3			;
	; X32-AVX-NEXT: vmovups %xmm3, 48(%eax)			; X32-AVX512-LABEL: ashr_64bytes:
	; X32-AVX-NEXT: vmovups %xmm2, 32(%eax)			; X32-AVX512: # %bb.0:
	; X32-AVX-NEXT: vmovups %xmm1, 16(%eax)			; X32-AVX512-NEXT: pushl %ebx
	; X32-AVX-NEXT: vmovups %xmm0, (%eax)			; X32-AVX512-NEXT: pushl %edi
	; X32-AVX-NEXT: addl $128, %esp			; X32-AVX512-NEXT: pushl %esi
	; X32-AVX-NEXT: popl %esi			; X32-AVX512-NEXT: subl $128, %esp
	; X32-AVX-NEXT: popl %edi			; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-AVX-NEXT: popl %ebx			; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-AVX-NEXT: vzeroupper			; X32-AVX512-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X32-AVX-NEXT: retl			; X32-AVX512-NEXT: vmovups (%edx), %ymm0
				; X32-AVX512-NEXT: vmovups 32(%edx), %xmm1
				; X32-AVX512-NEXT: movl 48(%edx), %esi
				; X32-AVX512-NEXT: movl 52(%edx), %edi
				; X32-AVX512-NEXT: movl 56(%edx), %ebx
				; X32-AVX512-NEXT: movl 60(%edx), %edx
				; X32-AVX512-NEXT: movl (%ecx), %ecx
				; X32-AVX512-NEXT: movl %edx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %ebx, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %edi, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: movl %esi, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: vmovups %xmm1, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: vmovups %ymm0, (%esp)
				; X32-AVX512-NEXT: sarl $31, %edx
				; X32-AVX512-NEXT: vpbroadcastd %edx, %zmm0
				; X32-AVX512-NEXT: vmovdqu64 %zmm0, {{[0-9]+}}(%esp)
				; X32-AVX512-NEXT: andl $63, %ecx
				; X32-AVX512-NEXT: vmovups (%esp,%ecx), %xmm0
				; X32-AVX512-NEXT: vmovups 16(%esp,%ecx), %xmm1
				; X32-AVX512-NEXT: vmovups 32(%esp,%ecx), %xmm2
				; X32-AVX512-NEXT: vmovups 48(%esp,%ecx), %xmm3
				; X32-AVX512-NEXT: vmovups %xmm3, 48(%eax)
				; X32-AVX512-NEXT: vmovups %xmm2, 32(%eax)
				; X32-AVX512-NEXT: vmovups %xmm1, 16(%eax)
				; X32-AVX512-NEXT: vmovups %xmm0, (%eax)
				; X32-AVX512-NEXT: addl $128, %esp
				; X32-AVX512-NEXT: popl %esi
				; X32-AVX512-NEXT: popl %edi
				; X32-AVX512-NEXT: popl %ebx
				; X32-AVX512-NEXT: vzeroupper
				; X32-AVX512-NEXT: retl
	%src = load i512, ptr %src.ptr, align 1			%src = load i512, ptr %src.ptr, align 1
	%byteOff = load i512, ptr %byteOff.ptr, align 1			%byteOff = load i512, ptr %byteOff.ptr, align 1
	%bitOff = shl i512 %byteOff, 3			%bitOff = shl i512 %byteOff, 3
	%res = ashr i512 %src, %bitOff			%res = ashr i512 %src, %bitOff
	store i512 %res, ptr %dst, align 1			store i512 %res, ptr %dst, align 1
	ret void			ret void
	}			}
	;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:			;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same value
Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 491152

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/MergeConsecutiveStores.ll

llvm/test/CodeGen/X86/elementwise-store-of-scalar-splat.ll

llvm/test/CodeGen/X86/legalize-shl-vec.ll

llvm/test/CodeGen/X86/subvectorwise-store-of-vector-splat.ll

llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same valueNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 491152

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/MergeConsecutiveStores.ll

llvm/test/CodeGen/X86/elementwise-store-of-scalar-splat.ll

llvm/test/CodeGen/X86/legalize-shl-vec.ll

llvm/test/CodeGen/X86/subvectorwise-store-of-vector-splat.ll

llvm/test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll

[DAGCombiner][X86] `mergeConsecutiveStores()`: support merging splat-stores of the same value
Needs ReviewPublic