This is an archive of the discontinued LLVM Phabricator instance.

Type legalizer for masked gather/scatter intrinsics
ClosedPublic

Authored by delena on Oct 11 2015, 4:08 AM.

Download Raw Diff

Details

Reviewers

qcolombet
mkuper
hfinkel

Commits

rG6015f5c8237c: Type legalizer for masked gather and scatter intrinsics.
rL255629: Type legalizer for masked gather and scatter intrinsics.

Summary

Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).

This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.

Diff Detail

Repository: rL LLVM

Event Timeline

delena updated this revision to Diff 37052.Oct 11 2015, 4:08 AM

delena retitled this revision from to Type legalizer for masked gather/scatter intrinsics.

delena updated this object.

delena added reviewers: hfinkel, spatel, qcolombet.

delena set the repository for this revision to rL LLVM.

delena added a subscriber: llvm-commits.

Ping.

Sorry, but I haven't looked at scatter/gather at all, so I don't think I can provide any help with this patch.

mbodart added a subscriber: mbodart.Oct 29 2015, 8:53 AM

mbodart added inline comments.Oct 29 2015, 1:43 PM

../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
518	minor typos: "ther" => "the" And a little below, and in several other places in this file/change set: "Legalized the chain result ..." => "Legalize the chain result ..."
1239	This is the first time I've looked at vector type legalization, so there's something I don't quite understand. A masked store (PromoteIntOp_MSTORE) is in some sense just a degenerative case of a masked scatter. So intuitively it would seem PromoteIntOp_MSTORE would have a more simple implementation than PromoteIntOp_MSCATTER. But here it seems the reverse is true. Why does PromoteIntOp_MSTORE try to legalize both the mask and data simultaneously, while for scatter we just do the one operand? Could PromoteIntOp_MSTORE be simplified? If not, what is the essential difference?
../lib/CodeGen/SelectionDAG/LegalizeTypes.h
752	Can you please change the parameter name from WidenVT to NVT, so that it matches this comment and the code in LegalizeVectorTypes.cpp? Also a typo: defalut => default
754–758	The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value. Otherwise, only its first element is replicated. Was that the intent? If so, it seems like a difficult interface to use robustly. An enum with possible values of Zext, Splat or Undef would make the widening interface more clear, assuming the Splat ambiguity is resolved to consistently replicate all original values. But as a further question, when would a larger vector length not be a multiple of a smaller one? Aren't they all powers of 2?
../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2703–2706	Now that ModifyToType supports zero fill, why aren't we using it here (and in WidenVecOp_MSTORE)?
3631–3632	I would think we want this moved below the check for "if (InVT == NVT)".
../lib/Target/X86/X86ISelLowering.cpp
1588–1589	Why would 512-bit gathers and scatters have different operation actions here?
19661	It's unfortunate that we have to duplicate much of the functionality of the generic type modifier in CodeGen's ModifyToType. Is there a way to unify them? If not, please add a source comment describing how this implementation differs from the one in CodeGen.
../test/CodeGen/X86/masked_gather_scatter.ll
2	Have you tried your changes with gathers/scatters targetting 32-bit X86? Poking around in the X86 test area, it seems all triples use x86_64 for gather/scatter tests.

delena marked 3 inline comments as done.Nov 2 2015, 5:07 AM

delena added inline comments.

../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1239	I wanted to simplify MLOAD / MSTORE but it requires additional changes in X86ISelLowering.cpp. I can do this in a separate patch.
../lib/CodeGen/SelectionDAG/LegalizeTypes.h
754–758	The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value. Otherwise, only its first element is replicated. Was that the intent? If so, it seems like a difficult interface to use robustly. When I extend vector of indices, I want to add existing values (replicated small vector or replicated first element - does not matter). The gather/scatter will be faster, as far as I know (I'll check it again). But now I think that may be this is too X86 specific? May be I should fill indices with "undef" and then replace these "undefs" in target specific part? An enum with possible values of Zext, Splat or Undef would make the widening interface more clear I thought about this, but I don't see too many enums in LLVM. All enums have global senсe, not per function. But as a further question, when would a larger vector length not be a multiple of a smaller one? The original type may not be power of 2. It is type legalizer, it should be able to deal with any type. < 3 x i64 > -> < 4 x i64 >
../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2703–2706	I'll simplify MSTORE in a separate patch.
../lib/Target/X86/X86ISelLowering.cpp
1588–1589	The SCATTER is more problematic than GATHER. MSCATTER node returns only the Chain. The VPSCATTER instruction in X86 zeroes mask operand and I should specify it as "return value".
19661	I simplified the X86 version. I call it ExtendToType. It works with legal types only and optimized for X86.
../test/CodeGen/X86/masked_gather_scatter.ll
2	I tried before, saw that it works. I added tests for 32-bit.

Updated the patch according to Mitch's comments.

mbodart added inline comments.Nov 3 2015, 9:57 AM

../include/llvm/CodeGen/SelectionDAGNodes.h
2120–2122	This new assertion is fine. But why are we removing the assertion that the mask's element type is i1?
../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1239	OK
../lib/CodeGen/SelectionDAG/LegalizeTypes.h
754–758	I agree that this optimization is too X86 specific, and CodeGen should simply extend with undef values. If you want to proceed with those X86-specific extensions, please do so in a separate change set. That will simplify this change set, and render the other comments here moot for now. If it turns out that replication is needed, we will want to define its behavior more crisply.
../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2703–2706	That's fine. But then can you please add a "FIX ME" comment here indicating such a change is desired?
2758	A simple source comment here to the effect "// Zero extend the mask" would help readers remember the meaning of "true".
2773	Another instance of "Legalized" => "Legalize".
../lib/Target/X86/X86ISelLowering.cpp
11876–11877	Unrelated to your change, I'm finding the optimizations of MVT::i1 here confusing as there are no checks for SubVecVT.getVectorNumElements(). The code appears to be making the assumption that when IdxVal is 0, or half the full vector length, then SubVec is exactly half the size of Vec. Why is that not being checked? It's also unclear to me how VSHLI/VSHRI operate on a vector of MVT::i1 elements. When operating on a vector of i32 or i64 elements, I would have thought these instructions perform a bit shift of each individual element (not cross element). But it seems the code here is trying to shift the whole i1 vector left and right, as if it is one big integer. Maybe that's how VSHLI/VSHRI are defined to behave, I don't know. Can you clarify?
19666	Another case where this new Undef creation should probably be moved below the InVT == NVT early return.
../test/CodeGen/X86/masked_gather_scatter.ll
290	I don't understand how SKX can use 32-bit indices here (i.e., vpgatherdd instead of vpgatherqd), when targetting x86_64. How does lowering of the @llvm.masked.gather.v8i32 calls know that only the low 32-bits of the <8 x i32> pointer values are needed? Is there some analysis of the insertelement;shufflevector/getelementptr instructions which feed to < 8 x i32> pointers?

Extending indices does not require replication. I simplified the functions that modify values to a new type.

delena added inline comments.Nov 3 2015, 11:48 AM

../lib/CodeGen/SelectionDAG/LegalizeTypes.h
754–758	I removed "Slpat". I don't need it. "Zext" extension is not target specific, at least for the mask.
../lib/Target/X86/X86ISelLowering.cpp
11876–11877	There are the special SHIFT instructions for mask vector KSHIFTR, KSHIFTL (in AVX-512) When we insert v8i1 into v16i1, the index should be 8 (I'll add an assertion) or 0. If v16i1 is allzero, it's enough to shift the input vector left-right.
../test/CodeGen/X86/masked_gather_scatter.ll
290	The base address in %rdi. (In %edi for 32-bit). In %ymm0 we have only indices. The real address of each element is "base +indexscale" -> %rdi + ymm[i] 4

delena marked an inline comment as done.Nov 5 2015, 5:28 AM

delena added inline comments.

../include/llvm/CodeGen/SelectionDAGNodes.h
2120–2122	Mask types are not always legal. When <2xi1> is illegal, the type legalizer promotes it to <2 x i64>. I create the node with <2 x i64> mask and then handle it in X86 code.

mbodart added inline comments.Nov 6 2015, 4:04 PM

../include/llvm/CodeGen/SelectionDAGNodes.h
2120–2122	Allowing non-i1 element types, even temporarily, seems like it breaks the definition of MGATHER/MSCATTER. Do we need to update the LLVM documentation to allow simd masks here? Or is it the intent to require i1 masks throughout LLVM IR, and only allow simd masks while we perform type legalization during during DAG selection? If the latter, that still seems a bit risky as type legalization can use utilities in the non-target-specific parts of CodeGen, and they would not like to see a non-i1 mask element. That's just paranoid speculation on my part. I don't know whether DAG selection is in general allowed to take such liberties. A related question: Why isn't <2xi1> legalized to a full mask register size (<64xi1>, <32xi1>, <16xi1> or <8xi1>) instead of <2xi64>? Is this just inherent in type legalization of Nxi1 vectors, or is it needed for an AVX2 simd mask?
../test/CodeGen/X86/masked_gather_scatter.ll
290	OK, thanks.

I added a function WidenTargetBoolean() for mask legalization in all masked operations load/store/gather/scatter.

Hi Elena,

I added just a few more comments in line.

regards,

mitch

../include/llvm/CodeGen/SelectionDAGNodes.h
2116	Not your change, but minor typo: PathThru => PassThru
../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
521	Is there any particular reason that the mask operand legalization is handled in a different place for MLOAD and MGATHER? For MLOAD it is processed during PromoteIntRes_MLOAD, while for MGATHER it is processed when legalizing its operands. The MGATHER method seems cleaner, but I'm curious as to why MLOAD does it differently. Also note that for MLOAD, the call to PromoteTargetBoolean is conditioned on whether the type mismatches, while in PromoteIntOp_MGATHER, the call is unconditional. I don't know if these inconsistencies are innocuous from a functionality standpoint, but they certainly make it difficult to understand the requirements.
1239	There appears to be an extraneous blank line here.
../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
1115	It seems important here that the value returned by GetPromotedInteger matches the promotion operation (e.g.. sign extension) performed by PromoteTargetBoolean. How is that guaranteed?
../lib/CodeGen/SelectionDAG/LegalizeTypes.h
177	Should that be "of ValVT", not "if"?
../lib/Target/X86/X86ISelLowering.cpp
19738	It seems odd to me to allow a masked store/scatter where the value element size differs from the memory element size. Is this an unavoidable result of type legalization? What are the semantics of such an operation? If the value is i64 and the memory is i32, apparently we just store the low 32 bits of each i64 element. Is that the expected behavior? Do we ever have to worry about a size difference in the opposite direction, e.g. storing an i32 to an i64?
../test/CodeGen/X86/masked_gather_scatter.ll
1244	Please keep the last <3 x i32> all on the same line.

Fixed WidenTargetBoolean() and added more tests to check that the mask is sign-extended.
Thanks to Mitch for catching this.

Addressed all other issues.

Sorry for adding a line to X86InstrAVX512.td, just one of the new tests triggered the failure.
I'm trying to reproduce the same failure on a standalone test to make a separate commit.

mbodart added inline comments.Dec 2 2015, 5:06 PM

../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
1115	Using SExt unconditionally seems incorrect. Don't we want to make the choice of ZExt vs SExt dependent on getBooleanContents? Perhaps we should add getExtendForVectorContent, similar to the existing scalar getExtendForContent.

delena added inline comments.Dec 3 2015, 3:55 AM

../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
521	No reason. Just the old code. I removed the mask promotion and all tests still pass.
1239	I did not see any blank line. I'll run dos2unix on all files again.
../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp
1115	Fixed! Thank you.
../lib/Target/X86/X86ISelLowering.cpp
19738	This is the case of promotion of v2i32 to v2i64. Only MemVT keeps the original VT. I'm actually "redo" the TypeLegalizer's work. The type legalizer promoted v2i32 to v2i64 and I retrieve v2i32 with shuffle {0, 2, -1, -1} and then widen the result to v4i32. I'll add comments.

Updated revision according to Mitch's comments.

Hi Elena,

I added few more minor comments and a question, but in general it LGTM.

regards,

mitch

../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3137	Redundant assert, already checked a few lines above.
../lib/Target/X86/X86ISelLowering.cpp
19792	Minor mechanics question: should we first check if Mask's element type is i1, and if so, avoid inserting the TRUNCATE?
19833	For a 32-bit target, we can gather 16 elements with a single gather. Though the comment says 8 is the minimum number of elements in a gather, which is true, the code also seems to be treating this as the maximum number. How are 16-element gathers supported?
26638	It is hard to interpret this comment, and thus the correctness of this routine, without any additional context. Why is a mask always truncated at this point? How do you know that it is truncated down to the element size of the SIGN_EXTEND_INREG's operand? Please add some supporting comments answering these kinds of questions.
../test/CodeGen/X86/masked_gather_scatter.ll
2–4	I would think we also want a test for i386 and avx512vl.

delena marked an inline comment as done.Dec 15 2015, 12:24 AM

delena added inline comments.

../lib/Target/X86/X86ISelLowering.cpp
19792	getNode(ISD::TRUNCATE..) does this optimization.
19833	We always can sign-extend the indices. VGATHERQPS is supported in 32-bit mode.
26638	I added more words here: // Gather and Scatter instructions use k-registers for masks. The type of // the masks is v*i1. So the mask will be truncated anyway. // The SIGN_EXTEND_INREG my be dropped.
../test/CodeGen/X86/masked_gather_scatter.ll
2–4	I added.

Closed by commit rL255629: Type legalizer for masked gather and scatter intrinsics. (authored by delena). · Explain WhyDec 15 2015, 12:43 AM

This revision was automatically updated to reflect the committed changes.

jevinskie added a subscriber: jevinskie.Dec 16 2015, 3:04 PM

Revision Contents

Path

Size

../

include/

llvm/

CodeGen/

SelectionDAGNodes.h

18 lines

lib/

CodeGen/

SelectionDAG/

LegalizeFloatTypes.cpp

6 lines

LegalizeIntegerTypes.cpp

94 lines

LegalizeTypes.h

16 lines

LegalizeTypes.cpp

23 lines

LegalizeVectorTypes.cpp

142 lines

Target/

X86/

X86ISelLowering.cpp

228 lines

test/

CodeGen/

X86/

masked_gather_scatter.ll

1226 lines

Diff 41020

../include/llvm/CodeGen/SelectionDAGNodes.h

	Show First 20 Lines • Show All 2,107 Lines • ▼ Show 20 Lines
	class MaskedGatherSDNode : public MaskedGatherScatterSDNode {			class MaskedGatherSDNode : public MaskedGatherScatterSDNode {
	public:			public:
	friend class SelectionDAG;			friend class SelectionDAG;
	MaskedGatherSDNode(unsigned Order, DebugLoc dl, ArrayRef<SDValue> Operands,			MaskedGatherSDNode(unsigned Order, DebugLoc dl, ArrayRef<SDValue> Operands,
	SDVTList VTs, EVT MemVT, MachineMemOperand *MMO)			SDVTList VTs, EVT MemVT, MachineMemOperand *MMO)
	: MaskedGatherScatterSDNode(ISD::MGATHER, Order, dl, Operands, VTs, MemVT,			: MaskedGatherScatterSDNode(ISD::MGATHER, Order, dl, Operands, VTs, MemVT,
	MMO) {			MMO) {
	assert(getValue().getValueType() == getValueType(0) &&			assert(getValue().getValueType() == getValueType(0) &&
	"Incompatible type of the PathThru value in MaskedGatherSDNode");			"Incompatible type of the PathThru value in MaskedGatherSDNode");
				mbodartUnsubmitted Done Reply Inline Actions Not your change, but minor typo: PathThru => PassThru mbodart: Not your change, but minor typo: PathThru => PassThru
	assert(getMask().getValueType().getVectorNumElements() ==			assert(getMask().getValueType().getVectorNumElements() ==
	getValueType(0).getVectorNumElements() &&			getValueType(0).getVectorNumElements() &&
	"Vector width mismatch between mask and data");			"Vector width mismatch between mask and data");
	assert(getMask().getValueType().getScalarType() == MVT::i1 &&			assert(getIndex().getValueType().getVectorNumElements() ==
	"Vector width mismatch between mask and data");			getValueType(0).getVectorNumElements() &&
				"Vector width mismatch between index and data");
				mbodartUnsubmitted Not Done Reply Inline Actions This new assertion is fine. But why are we removing the assertion that the mask's element type is i1? mbodart: This new assertion is fine. But why are we removing the assertion that the mask's element type…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions Mask types are not always legal. When <2xi1> is illegal, the type legalizer promotes it to <2 x i64>. I create the node with <2 x i64> mask and then handle it in X86 code. delena: Mask types are not always legal. When <2xi1> is illegal, the type legalizer promotes it to <2 x…
				mbodartUnsubmitted Not Done Reply Inline Actions Allowing non-i1 element types, even temporarily, seems like it breaks the definition of MGATHER/MSCATTER. Do we need to update the LLVM documentation to allow simd masks here? Or is it the intent to require i1 masks throughout LLVM IR, and only allow simd masks while we perform type legalization during during DAG selection? If the latter, that still seems a bit risky as type legalization can use utilities in the non-target-specific parts of CodeGen, and they would not like to see a non-i1 mask element. That's just paranoid speculation on my part. I don't know whether DAG selection is in general allowed to take such liberties. A related question: Why isn't <2xi1> legalized to a full mask register size (<64xi1>, <32xi1>, <16xi1> or <8xi1>) instead of <2xi64>? Is this just inherent in type legalization of Nxi1 vectors, or is it needed for an AVX2 simd mask? mbodart: Allowing non-i1 element types, even temporarily, seems like it breaks the definition of…
	}			}

	static bool classof(const SDNode *N) {			static bool classof(const SDNode *N) {
	return N->getOpcode() == ISD::MGATHER;			return N->getOpcode() == ISD::MGATHER;
	}			}
	};			};

	/// This class is used to represent an MSCATTER node			/// This class is used to represent an MSCATTER node
	///			///
	class MaskedScatterSDNode : public MaskedGatherScatterSDNode {			class MaskedScatterSDNode : public MaskedGatherScatterSDNode {

	public:			public:
	friend class SelectionDAG;			friend class SelectionDAG;
	MaskedScatterSDNode(unsigned Order, DebugLoc dl,ArrayRef<SDValue> Operands,			MaskedScatterSDNode(unsigned Order, DebugLoc dl,ArrayRef<SDValue> Operands,
	SDVTList VTs, EVT MemVT, MachineMemOperand *MMO)			SDVTList VTs, EVT MemVT, MachineMemOperand *MMO)
	: MaskedGatherScatterSDNode(ISD::MSCATTER, Order, dl, Operands, VTs,			: MaskedGatherScatterSDNode(ISD::MSCATTER, Order, dl, Operands, VTs, MemVT,
	MemVT, MMO) {			MMO) {
	assert(getMask().getValueType().getVectorNumElements() ==			assert(getMask().getValueType().getVectorNumElements() ==
	getValue().getValueType().getVectorNumElements() &&			getValue().getValueType().getVectorNumElements() &&
	"Vector width mismatch between mask and data");			"Vector width mismatch between mask and data");
	assert(getMask().getValueType().getScalarType() == MVT::i1 &&			assert(getIndex().getValueType().getVectorNumElements() ==
	"Vector width mismatch between mask and data");			getValue().getValueType().getVectorNumElements() &&
				"Vector width mismatch between index and data");
	}			}

	static bool classof(const SDNode *N) {			static bool classof(const SDNode *N) {
	return N->getOpcode() == ISD::MSCATTER;			return N->getOpcode() == ISD::MSCATTER;
	}			}
	};			};

	/// An SDNode that represents everything that will be needed			/// An SDNode that represents everything that will be needed
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

../lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

Show First 20 Lines • Show All 587 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SoftenFloatRes_LOAD(SDNode *N) {

SDValue NewL;		SDValue NewL;
if (L->getExtensionType() == ISD::NON_EXTLOAD) {		if (L->getExtensionType() == ISD::NON_EXTLOAD) {
NewL = DAG.getLoad(L->getAddressingMode(), L->getExtensionType(),		NewL = DAG.getLoad(L->getAddressingMode(), L->getExtensionType(),
NVT, dl, L->getChain(), L->getBasePtr(), L->getOffset(),		NVT, dl, L->getChain(), L->getBasePtr(), L->getOffset(),
L->getPointerInfo(), NVT, L->isVolatile(),		L->getPointerInfo(), NVT, L->isVolatile(),
L->isNonTemporal(), false, L->getAlignment(),		L->isNonTemporal(), false, L->getAlignment(),
L->getAAInfo());		L->getAAInfo());
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), NewL.getValue(1));		ReplaceValueWith(SDValue(N, 1), NewL.getValue(1));
return NewL;		return NewL;
}		}

// Do a non-extending load followed by FP_EXTEND.		// Do a non-extending load followed by FP_EXTEND.
NewL = DAG.getLoad(L->getAddressingMode(), ISD::NON_EXTLOAD,		NewL = DAG.getLoad(L->getAddressingMode(), ISD::NON_EXTLOAD,
L->getMemoryVT(), dl, L->getChain(),		L->getMemoryVT(), dl, L->getChain(),
L->getBasePtr(), L->getOffset(), L->getPointerInfo(),		L->getBasePtr(), L->getOffset(), L->getPointerInfo(),
L->getMemoryVT(), L->isVolatile(),		L->getMemoryVT(), L->isVolatile(),
L->isNonTemporal(), false, L->getAlignment(),		L->isNonTemporal(), false, L->getAlignment(),
L->getAAInfo());		L->getAAInfo());
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), NewL.getValue(1));		ReplaceValueWith(SDValue(N, 1), NewL.getValue(1));
return BitConvertToInteger(DAG.getNode(ISD::FP_EXTEND, dl, VT, NewL));		return BitConvertToInteger(DAG.getNode(ISD::FP_EXTEND, dl, VT, NewL));
}		}

SDValue DAGTypeLegalizer::SoftenFloatRes_SELECT(SDNode *N) {		SDValue DAGTypeLegalizer::SoftenFloatRes_SELECT(SDNode *N) {
SDValue LHS = GetSoftenedFloat(N->getOperand(1));		SDValue LHS = GetSoftenedFloat(N->getOperand(1));
SDValue RHS = GetSoftenedFloat(N->getOperand(2));		SDValue RHS = GetSoftenedFloat(N->getOperand(2));
Show All 20 Lines	SDValue DAGTypeLegalizer::SoftenFloatRes_VAARG(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
SDLoc dl(N);		SDLoc dl(N);

SDValue NewVAARG;		SDValue NewVAARG;
NewVAARG = DAG.getVAArg(NVT, dl, Chain, Ptr, N->getOperand(2),		NewVAARG = DAG.getVAArg(NVT, dl, Chain, Ptr, N->getOperand(2),
N->getConstantOperandVal(3));		N->getConstantOperandVal(3));

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), NewVAARG.getValue(1));		ReplaceValueWith(SDValue(N, 1), NewVAARG.getValue(1));
return NewVAARG;		return NewVAARG;
}		}

SDValue DAGTypeLegalizer::SoftenFloatRes_XINT_TO_FP(SDNode *N) {		SDValue DAGTypeLegalizer::SoftenFloatRes_XINT_TO_FP(SDNode *N) {
bool Signed = N->getOpcode() == ISD::SINT_TO_FP;		bool Signed = N->getOpcode() == ISD::SINT_TO_FP;
EVT SVT = N->getOperand(0).getValueType();		EVT SVT = N->getOperand(0).getValueType();
▲ Show 20 Lines • Show All 1,364 Lines • Show Last 20 Lines

../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	case ISD::CONVERT_RNDSAT:
Res = PromoteIntRes_CONVERT_RNDSAT(N); break;		Res = PromoteIntRes_CONVERT_RNDSAT(N); break;
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
case ISD::CTLZ: Res = PromoteIntRes_CTLZ(N); break;		case ISD::CTLZ: Res = PromoteIntRes_CTLZ(N); break;
case ISD::CTPOP: Res = PromoteIntRes_CTPOP(N); break;		case ISD::CTPOP: Res = PromoteIntRes_CTPOP(N); break;
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
case ISD::CTTZ: Res = PromoteIntRes_CTTZ(N); break;		case ISD::CTTZ: Res = PromoteIntRes_CTTZ(N); break;
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
Res = PromoteIntRes_EXTRACT_VECTOR_ELT(N); break;		Res = PromoteIntRes_EXTRACT_VECTOR_ELT(N); break;
case ISD::LOAD: Res = PromoteIntRes_LOAD(cast<LoadSDNode>(N));break;		case ISD::LOAD: Res = PromoteIntRes_LOAD(cast<LoadSDNode>(N)); break;
case ISD::MLOAD: Res = PromoteIntRes_MLOAD(cast<MaskedLoadSDNode>(N));break;		case ISD::MLOAD: Res = PromoteIntRes_MLOAD(cast<MaskedLoadSDNode>(N));
		break;
		case ISD::MGATHER: Res = PromoteIntRes_MGATHER(cast<MaskedGatherSDNode>(N));
		break;
case ISD::SELECT: Res = PromoteIntRes_SELECT(N); break;		case ISD::SELECT: Res = PromoteIntRes_SELECT(N); break;
case ISD::VSELECT: Res = PromoteIntRes_VSELECT(N); break;		case ISD::VSELECT: Res = PromoteIntRes_VSELECT(N); break;
case ISD::SELECT_CC: Res = PromoteIntRes_SELECT_CC(N); break;		case ISD::SELECT_CC: Res = PromoteIntRes_SELECT_CC(N); break;
case ISD::SETCC: Res = PromoteIntRes_SETCC(N); break;		case ISD::SETCC: Res = PromoteIntRes_SETCC(N); break;
case ISD::SMIN:		case ISD::SMIN:
case ISD::SMAX:		case ISD::SMAX:
case ISD::UMIN:		case ISD::UMIN:
case ISD::UMAX: Res = PromoteIntRes_SimpleIntBinOp(N); break;		case ISD::UMAX: Res = PromoteIntRes_SimpleIntBinOp(N); break;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines

SDValue DAGTypeLegalizer::PromoteIntRes_Atomic0(AtomicSDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_Atomic0(AtomicSDNode *N) {
EVT ResVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT ResVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDValue Res = DAG.getAtomic(N->getOpcode(), SDLoc(N),		SDValue Res = DAG.getAtomic(N->getOpcode(), SDLoc(N),
N->getMemoryVT(), ResVT,		N->getMemoryVT(), ResVT,
N->getChain(), N->getBasePtr(),		N->getChain(), N->getBasePtr(),
N->getMemOperand(), N->getOrdering(),		N->getMemOperand(), N->getOrdering(),
N->getSynchScope());		N->getSynchScope());
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
return Res;		return Res;
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_Atomic1(AtomicSDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_Atomic1(AtomicSDNode *N) {
SDValue Op2 = GetPromotedInteger(N->getOperand(2));		SDValue Op2 = GetPromotedInteger(N->getOperand(2));
SDValue Res = DAG.getAtomic(N->getOpcode(), SDLoc(N),		SDValue Res = DAG.getAtomic(N->getOpcode(), SDLoc(N),
N->getMemoryVT(),		N->getMemoryVT(),
N->getChain(), N->getBasePtr(),		N->getChain(), N->getBasePtr(),
Op2, N->getMemOperand(), N->getOrdering(),		Op2, N->getMemOperand(), N->getOrdering(),
N->getSynchScope());		N->getSynchScope());
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
return Res;		return Res;
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N,		SDValue DAGTypeLegalizer::PromoteIntRes_AtomicCmpSwap(AtomicSDNode *N,
unsigned ResNo) {		unsigned ResNo) {
if (ResNo == 1) {		if (ResNo == 1) {
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_LOAD(LoadSDNode *N) {
assert(ISD::isUNINDEXEDLoad(N) && "Indexed load during type legalization!");		assert(ISD::isUNINDEXEDLoad(N) && "Indexed load during type legalization!");
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
ISD::LoadExtType ExtType =		ISD::LoadExtType ExtType =
ISD::isNON_EXTLoad(N) ? ISD::EXTLOAD : N->getExtensionType();		ISD::isNON_EXTLoad(N) ? ISD::EXTLOAD : N->getExtensionType();
SDLoc dl(N);		SDLoc dl(N);
SDValue Res = DAG.getExtLoad(ExtType, dl, NVT, N->getChain(), N->getBasePtr(),		SDValue Res = DAG.getExtLoad(ExtType, dl, NVT, N->getChain(), N->getBasePtr(),
N->getMemoryVT(), N->getMemOperand());		N->getMemoryVT(), N->getMemOperand());

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
return Res;		return Res;
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_MLOAD(MaskedLoadSDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_MLOAD(MaskedLoadSDNode *N) {
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDValue ExtSrc0 = GetPromotedInteger(N->getSrc0());		SDValue ExtSrc0 = GetPromotedInteger(N->getSrc0());

SDValue Mask = N->getMask();		SDValue Mask = N->getMask();
EVT NewMaskVT = getSetCCResultType(NVT);		EVT NewMaskVT = getSetCCResultType(NVT);
if (NewMaskVT != N->getMask().getValueType())		if (NewMaskVT != N->getMask().getValueType())
Mask = PromoteTargetBoolean(Mask, NewMaskVT);		Mask = PromoteTargetBoolean(Mask, NewMaskVT);
SDLoc dl(N);		SDLoc dl(N);

SDValue Res = DAG.getMaskedLoad(NVT, dl, N->getChain(), N->getBasePtr(),		SDValue Res = DAG.getMaskedLoad(NVT, dl, N->getChain(), N->getBasePtr(),
Mask, ExtSrc0, N->getMemoryVT(),		Mask, ExtSrc0, N->getMemoryVT(),
N->getMemOperand(), ISD::SEXTLOAD);		N->getMemOperand(), ISD::SEXTLOAD);
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
		return Res;
		}

		SDValue DAGTypeLegalizer::PromoteIntRes_MGATHER(MaskedGatherSDNode *N) {
		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
		SDValue ExtSrc0 = GetPromotedInteger(N->getValue());
		assert(NVT == ExtSrc0.getValueType() &&
		"Gather result type and the passThru agrument type should be the same");
		mbodartUnsubmitted Done Reply Inline Actions minor typos: "ther" => "the" And a little below, and in several other places in this file/change set: "Legalized the chain result ..." => "Legalize the chain result ..." mbodart: minor typos: "ther" => "the" And a little below, and in several other places in this…

		SDLoc dl(N);
		SDValue Ops[] = {N->getChain(), ExtSrc0, N->getMask(), N->getBasePtr(),
		mbodartUnsubmitted Done Reply Inline Actions Is there any particular reason that the mask operand legalization is handled in a different place for MLOAD and MGATHER? For MLOAD it is processed during PromoteIntRes_MLOAD, while for MGATHER it is processed when legalizing its operands. The MGATHER method seems cleaner, but I'm curious as to why MLOAD does it differently. Also note that for MLOAD, the call to PromoteTargetBoolean is conditioned on whether the type mismatches, while in PromoteIntOp_MGATHER, the call is unconditional. I don't know if these inconsistencies are innocuous from a functionality standpoint, but they certainly make it difficult to understand the requirements. mbodart: Is there any particular reason that the mask operand legalization is handled in a different…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions No reason. Just the old code. I removed the mask promotion and all tests still pass. delena: No reason. Just the old code. I removed the mask promotion and all tests still pass.
		N->getIndex()};
		SDValue Res = DAG.getMaskedGather(DAG.getVTList(NVT, MVT::Other),
		N->getMemoryVT(), dl, Ops,
		N->getMemOperand());
		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
return Res;		return Res;
}		}

/// Promote the overflow flag of an overflowing arithmetic node.		/// Promote the overflow flag of an overflowing arithmetic node.
SDValue DAGTypeLegalizer::PromoteIntRes_Overflow(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_Overflow(SDNode *N) {
// Simply change the return type of the boolean result.		// Simply change the return type of the boolean result.
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(1));		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(1));
EVT ValueVTs[] = { N->getValueType(0), NVT };		EVT ValueVTs[] = { N->getValueType(0), NVT };
SDValue Ops[] = { N->getOperand(0), N->getOperand(1) };		SDValue Ops[] = { N->getOperand(0), N->getOperand(1) };
SDValue Res = DAG.getNode(N->getOpcode(), SDLoc(N),		SDValue Res = DAG.getNode(N->getOpcode(), SDLoc(N),
DAG.getVTList(ValueVTs), Ops);		DAG.getVTList(ValueVTs), Ops);
▲ Show 20 Lines • Show All 370 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N); break;		case ISD::SIGN_EXTEND: Res = PromoteIntOp_SIGN_EXTEND(N); break;
case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N); break;		case ISD::SINT_TO_FP: Res = PromoteIntOp_SINT_TO_FP(N); break;
case ISD::STORE: Res = PromoteIntOp_STORE(cast<StoreSDNode>(N),		case ISD::STORE: Res = PromoteIntOp_STORE(cast<StoreSDNode>(N),
OpNo); break;		OpNo); break;
case ISD::MSTORE: Res = PromoteIntOp_MSTORE(cast<MaskedStoreSDNode>(N),		case ISD::MSTORE: Res = PromoteIntOp_MSTORE(cast<MaskedStoreSDNode>(N),
OpNo); break;		OpNo); break;
case ISD::MLOAD: Res = PromoteIntOp_MLOAD(cast<MaskedLoadSDNode>(N),		case ISD::MLOAD: Res = PromoteIntOp_MLOAD(cast<MaskedLoadSDNode>(N),
OpNo); break;		OpNo); break;
		case ISD::MGATHER: Res = PromoteIntOp_MGATHER(cast<MaskedGatherSDNode>(N),
		OpNo); break;
		case ISD::MSCATTER: Res = PromoteIntOp_MSCATTER(cast<MaskedScatterSDNode>(N),
		OpNo); break;
case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;		case ISD::TRUNCATE: Res = PromoteIntOp_TRUNCATE(N); break;
case ISD::FP16_TO_FP:		case ISD::FP16_TO_FP:
case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;		case ISD::UINT_TO_FP: Res = PromoteIntOp_UINT_TO_FP(N); break;
case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N); break;		case ISD::ZERO_EXTEND: Res = PromoteIntOp_ZERO_EXTEND(N); break;
case ISD::EXTRACT_SUBVECTOR: Res = PromoteIntOp_EXTRACT_SUBVECTOR(N); break;		case ISD::EXTRACT_SUBVECTOR: Res = PromoteIntOp_EXTRACT_SUBVECTOR(N); break;

case ISD::SHL:		case ISD::SHL:
case ISD::SRA:		case ISD::SRA:
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntOp_STORE(StoreSDNode *N, unsigned OpNo){

SDValue Val = GetPromotedInteger(N->getValue()); // Get promoted value.		SDValue Val = GetPromotedInteger(N->getValue()); // Get promoted value.

// Truncate the value and store the result.		// Truncate the value and store the result.
return DAG.getTruncStore(Ch, dl, Val, Ptr,		return DAG.getTruncStore(Ch, dl, Val, Ptr,
N->getMemoryVT(), N->getMemOperand());		N->getMemoryVT(), N->getMemOperand());
}		}

SDValue DAGTypeLegalizer::PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo){		SDValue DAGTypeLegalizer::PromoteIntOp_MSTORE(MaskedStoreSDNode *N,
		unsigned OpNo) {

SDValue DataOp = N->getValue();		SDValue DataOp = N->getValue();
EVT DataVT = DataOp.getValueType();		EVT DataVT = DataOp.getValueType();
SDValue Mask = N->getMask();		SDValue Mask = N->getMask();
EVT MaskVT = Mask.getValueType();		EVT MaskVT = Mask.getValueType();
SDLoc dl(N);		SDLoc dl(N);

bool TruncateStore = false;		bool TruncateStore = false;
if (!TLI.isTypeLegal(DataVT)) {		if (!TLI.isTypeLegal(DataVT)) {
if (getTypeAction(DataVT) == TargetLowering::TypePromoteInteger) {		if (getTypeAction(DataVT) == TargetLowering::TypePromoteInteger) {
DataOp = GetPromotedInteger(DataOp);		DataOp = GetPromotedInteger(DataOp);
if (!TLI.isTypeLegal(MaskVT))		if (!TLI.isTypeLegal(MaskVT))
Mask = PromoteTargetBoolean(Mask, DataOp.getValueType());		Mask = PromoteTargetBoolean(Mask, DataOp.getValueType());
TruncateStore = true;		TruncateStore = true;
}		}
else {		else {
assert(getTypeAction(DataVT) == TargetLowering::TypeWidenVector &&		assert(getTypeAction(DataVT) == TargetLowering::TypeWidenVector &&
"Unexpected data legalization in MSTORE");		"Unexpected data legalization in MSTORE");
DataOp = GetWidenedVector(DataOp);		DataOp = GetWidenedVector(DataOp);
		Mask = WidenTargetBoolean(Mask, DataOp.getValueType(), true);
if (getTypeAction(MaskVT) == TargetLowering::TypeWidenVector)
Mask = GetWidenedVector(Mask);
else {
EVT BoolVT = getSetCCResultType(DataOp.getValueType());

// We can't use ModifyToType() because we should fill the mask with
// zeroes
unsigned WidenNumElts = BoolVT.getVectorNumElements();
unsigned MaskNumElts = MaskVT.getVectorNumElements();

unsigned NumConcat = WidenNumElts / MaskNumElts;
SmallVector<SDValue, 16> Ops(NumConcat);
SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
Ops[0] = Mask;
for (unsigned i = 1; i != NumConcat; ++i)
Ops[i] = ZeroVal;

Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
}
}		}
}		}
else		else
Mask = PromoteTargetBoolean(N->getMask(), DataOp.getValueType());		Mask = PromoteTargetBoolean(Mask, DataOp.getValueType());
return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,		return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,
N->getMemoryVT(), N->getMemOperand(),		N->getMemoryVT(), N->getMemOperand(),
TruncateStore);		TruncateStore);
}		}

SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N, unsigned OpNo){		SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,
		unsigned OpNo) {
assert(OpNo == 2 && "Only know how to promote the mask!");		assert(OpNo == 2 && "Only know how to promote the mask!");
EVT DataVT = N->getValueType(0);		EVT DataVT = N->getValueType(0);
SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
NewOps[OpNo] = Mask;		NewOps[OpNo] = Mask;
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
}		}

		SDValue DAGTypeLegalizer::PromoteIntOp_MGATHER(MaskedGatherSDNode *N,
		unsigned OpNo) {

		SmallVector<SDValue, 5> NewOps(N->op_begin(), N->op_end());
		if (OpNo == 2) {
		// The Mask
		EVT DataVT = N->getValueType(0);
		NewOps[OpNo] = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
		} else
		mbodartUnsubmitted Not Done Reply Inline Actions This is the first time I've looked at vector type legalization, so there's something I don't quite understand. A masked store (PromoteIntOp_MSTORE) is in some sense just a degenerative case of a masked scatter. So intuitively it would seem PromoteIntOp_MSTORE would have a more simple implementation than PromoteIntOp_MSCATTER. But here it seems the reverse is true. Why does PromoteIntOp_MSTORE try to legalize both the mask and data simultaneously, while for scatter we just do the one operand? Could PromoteIntOp_MSTORE be simplified? If not, what is the essential difference? mbodart: This is the first time I've looked at vector type legalization, so there's something I don't…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I wanted to simplify MLOAD / MSTORE but it requires additional changes in X86ISelLowering.cpp. I can do this in a separate patch. delena: I wanted to simplify MLOAD / MSTORE but it requires additional changes in X86ISelLowering.cpp.
		mbodartUnsubmitted Not Done Reply Inline Actions OK mbodart: OK
		mbodartUnsubmitted Not Done Reply Inline Actions There appears to be an extraneous blank line here. mbodart: There appears to be an extraneous blank line here.
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I did not see any blank line. I'll run dos2unix on all files again. delena: I did not see any blank line. I'll run dos2unix on all files again.
		NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
		}

		SDValue DAGTypeLegalizer::PromoteIntOp_MSCATTER(MaskedScatterSDNode *N,
		unsigned OpNo) {
		SmallVector<SDValue, 5> NewOps(N->op_begin(), N->op_end());
		if (OpNo == 2) {
		// The Mask
		EVT DataVT = N->getValue().getValueType();
		NewOps[OpNo] = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
		} else
		NewOps[OpNo] = GetPromotedInteger(N->getOperand(OpNo));
		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
		}

SDValue DAGTypeLegalizer::PromoteIntOp_TRUNCATE(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntOp_TRUNCATE(SDNode *N) {
SDValue Op = GetPromotedInteger(N->getOperand(0));		SDValue Op = GetPromotedInteger(N->getOperand(0));
return DAG.getNode(ISD::TRUNCATE, SDLoc(N), N->getValueType(0), Op);		return DAG.getNode(ISD::TRUNCATE, SDLoc(N), N->getValueType(0), Op);
}		}

SDValue DAGTypeLegalizer::PromoteIntOp_UINT_TO_FP(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntOp_UINT_TO_FP(SDNode *N) {
return SDValue(DAG.UpdateNodeOperands(N,		return SDValue(DAG.UpdateNodeOperands(N,
ZExtPromotedInteger(N->getOperand(0))), 0);		ZExtPromotedInteger(N->getOperand(0))), 0);
▲ Show 20 Lines • Show All 840 Lines • ▼ Show 20 Lines	if (ExcessBits < NVT.getSizeInBits()) {
// Move high bits to the right position in Hi.		// Move high bits to the right position in Hi.
Hi = DAG.getNode(ExtType == ISD::SEXTLOAD ? ISD::SRA : ISD::SRL, dl, NVT,		Hi = DAG.getNode(ExtType == ISD::SEXTLOAD ? ISD::SRA : ISD::SRL, dl, NVT,
Hi,		Hi,
DAG.getConstant(NVT.getSizeInBits() - ExcessBits, dl,		DAG.getConstant(NVT.getSizeInBits() - ExcessBits, dl,
TLI.getPointerTy(DAG.getDataLayout())));		TLI.getPointerTy(DAG.getDataLayout())));
}		}
}		}

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Ch);		ReplaceValueWith(SDValue(N, 1), Ch);
}		}

void DAGTypeLegalizer::ExpandIntRes_Logical(SDNode *N,		void DAGTypeLegalizer::ExpandIntRes_Logical(SDNode *N,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
SDLoc dl(N);		SDLoc dl(N);
SDValue LL, LH, RL, RH;		SDValue LL, LH, RL, RH;
▲ Show 20 Lines • Show All 1,201 Lines • Show Last 20 Lines

../lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	private:
SDValue JoinIntegers(SDValue Lo, SDValue Hi);		SDValue JoinIntegers(SDValue Lo, SDValue Hi);
SDValue LibCallify(RTLIB::Libcall LC, SDNode *N, bool isSigned);		SDValue LibCallify(RTLIB::Libcall LC, SDNode *N, bool isSigned);

std::pair<SDValue, SDValue> ExpandChainLibCall(RTLIB::Libcall LC,		std::pair<SDValue, SDValue> ExpandChainLibCall(RTLIB::Libcall LC,
SDNode *Node, bool isSigned);		SDNode *Node, bool isSigned);
std::pair<SDValue, SDValue> ExpandAtomic(SDNode *Node);		std::pair<SDValue, SDValue> ExpandAtomic(SDNode *Node);

SDValue PromoteTargetBoolean(SDValue Bool, EVT ValVT);		SDValue PromoteTargetBoolean(SDValue Bool, EVT ValVT);

		/// Modify Bit Vector to match SetCC result type if ValVT.
		mbodartUnsubmitted Done Reply Inline Actions Should that be "of ValVT", not "if"? mbodart: Should that be "of ValVT", not "if"?
		/// The bit vector is widened with zeroes when WithZeroes is true.
		SDValue WidenTargetBoolean(SDValue Bool, EVT ValVT, bool WithZeroes = false);

void ReplaceValueWith(SDValue From, SDValue To);		void ReplaceValueWith(SDValue From, SDValue To);
void SplitInteger(SDValue Op, SDValue &Lo, SDValue &Hi);		void SplitInteger(SDValue Op, SDValue &Lo, SDValue &Hi);
void SplitInteger(SDValue Op, EVT LoVT, EVT HiVT,		void SplitInteger(SDValue Op, EVT LoVT, EVT HiVT,
SDValue &Lo, SDValue &Hi);		SDValue &Lo, SDValue &Hi);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Integer Promotion Support: LegalizeIntegerTypes.cpp		// Integer Promotion Support: LegalizeIntegerTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_CTPOP(SDNode *N);		SDValue PromoteIntRes_CTPOP(SDNode *N);
SDValue PromoteIntRes_CTTZ(SDNode *N);		SDValue PromoteIntRes_CTTZ(SDNode *N);
SDValue PromoteIntRes_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue PromoteIntRes_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue PromoteIntRes_FP_TO_XINT(SDNode *N);		SDValue PromoteIntRes_FP_TO_XINT(SDNode *N);
SDValue PromoteIntRes_FP_TO_FP16(SDNode *N);		SDValue PromoteIntRes_FP_TO_FP16(SDNode *N);
SDValue PromoteIntRes_INT_EXTEND(SDNode *N);		SDValue PromoteIntRes_INT_EXTEND(SDNode *N);
SDValue PromoteIntRes_LOAD(LoadSDNode *N);		SDValue PromoteIntRes_LOAD(LoadSDNode *N);
SDValue PromoteIntRes_MLOAD(MaskedLoadSDNode *N);		SDValue PromoteIntRes_MLOAD(MaskedLoadSDNode *N);
		SDValue PromoteIntRes_MGATHER(MaskedGatherSDNode *N);
SDValue PromoteIntRes_Overflow(SDNode *N);		SDValue PromoteIntRes_Overflow(SDNode *N);
SDValue PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_SADDSUBO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_SDIV(SDNode *N);		SDValue PromoteIntRes_SDIV(SDNode *N);
SDValue PromoteIntRes_SELECT(SDNode *N);		SDValue PromoteIntRes_SELECT(SDNode *N);
SDValue PromoteIntRes_VSELECT(SDNode *N);		SDValue PromoteIntRes_VSELECT(SDNode *N);
SDValue PromoteIntRes_SELECT_CC(SDNode *N);		SDValue PromoteIntRes_SELECT_CC(SDNode *N);
SDValue PromoteIntRes_SETCC(SDNode *N);		SDValue PromoteIntRes_SETCC(SDNode *N);
SDValue PromoteIntRes_SHL(SDNode *N);		SDValue PromoteIntRes_SHL(SDNode *N);
Show All 30 Lines	private:
SDValue PromoteIntOp_SIGN_EXTEND(SDNode *N);		SDValue PromoteIntOp_SIGN_EXTEND(SDNode *N);
SDValue PromoteIntOp_SINT_TO_FP(SDNode *N);		SDValue PromoteIntOp_SINT_TO_FP(SDNode *N);
SDValue PromoteIntOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_STORE(StoreSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_TRUNCATE(SDNode *N);		SDValue PromoteIntOp_TRUNCATE(SDNode *N);
SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);		SDValue PromoteIntOp_UINT_TO_FP(SDNode *N);
SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);		SDValue PromoteIntOp_ZERO_EXTEND(SDNode *N);
SDValue PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MLOAD(MaskedLoadSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MLOAD(MaskedLoadSDNode *N, unsigned OpNo);
		SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);
		SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);

void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);		void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Integer Expansion Support: LegalizeIntegerTypes.cpp		// Integer Expansion Support: LegalizeIntegerTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// GetExpandedInteger - Given a processed operand Op which was expanded into		/// GetExpandedInteger - Given a processed operand Op which was expanded into
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	private:
SDValue WidenVecRes_BITCAST(SDNode* N);		SDValue WidenVecRes_BITCAST(SDNode* N);
SDValue WidenVecRes_BUILD_VECTOR(SDNode* N);		SDValue WidenVecRes_BUILD_VECTOR(SDNode* N);
SDValue WidenVecRes_CONCAT_VECTORS(SDNode* N);		SDValue WidenVecRes_CONCAT_VECTORS(SDNode* N);
SDValue WidenVecRes_CONVERT_RNDSAT(SDNode* N);		SDValue WidenVecRes_CONVERT_RNDSAT(SDNode* N);
SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);		SDValue WidenVecRes_EXTRACT_SUBVECTOR(SDNode* N);
SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);		SDValue WidenVecRes_INSERT_VECTOR_ELT(SDNode* N);
SDValue WidenVecRes_LOAD(SDNode* N);		SDValue WidenVecRes_LOAD(SDNode* N);
SDValue WidenVecRes_MLOAD(MaskedLoadSDNode* N);		SDValue WidenVecRes_MLOAD(MaskedLoadSDNode* N);
		SDValue WidenVecRes_MGATHER(MaskedGatherSDNode* N);
SDValue WidenVecRes_SCALAR_TO_VECTOR(SDNode* N);		SDValue WidenVecRes_SCALAR_TO_VECTOR(SDNode* N);
SDValue WidenVecRes_SELECT(SDNode* N);		SDValue WidenVecRes_SELECT(SDNode* N);
SDValue WidenVecRes_SELECT_CC(SDNode* N);		SDValue WidenVecRes_SELECT_CC(SDNode* N);
SDValue WidenVecRes_SETCC(SDNode* N);		SDValue WidenVecRes_SETCC(SDNode* N);
SDValue WidenVecRes_UNDEF(SDNode *N);		SDValue WidenVecRes_UNDEF(SDNode *N);
SDValue WidenVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N);		SDValue WidenVecRes_VECTOR_SHUFFLE(ShuffleVectorSDNode *N);
SDValue WidenVecRes_VSETCC(SDNode* N);		SDValue WidenVecRes_VSETCC(SDNode* N);

Show All 11 Lines	private:
bool WidenVectorOperand(SDNode *N, unsigned OpNo);		bool WidenVectorOperand(SDNode *N, unsigned OpNo);
SDValue WidenVecOp_BITCAST(SDNode *N);		SDValue WidenVecOp_BITCAST(SDNode *N);
SDValue WidenVecOp_CONCAT_VECTORS(SDNode *N);		SDValue WidenVecOp_CONCAT_VECTORS(SDNode *N);
SDValue WidenVecOp_EXTEND(SDNode *N);		SDValue WidenVecOp_EXTEND(SDNode *N);
SDValue WidenVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue WidenVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue WidenVecOp_EXTRACT_SUBVECTOR(SDNode *N);		SDValue WidenVecOp_EXTRACT_SUBVECTOR(SDNode *N);
SDValue WidenVecOp_STORE(SDNode* N);		SDValue WidenVecOp_STORE(SDNode* N);
SDValue WidenVecOp_MSTORE(SDNode* N, unsigned OpNo);		SDValue WidenVecOp_MSTORE(SDNode* N, unsigned OpNo);
		SDValue WidenVecOp_MSCATTER(SDNode* N, unsigned OpNo);
SDValue WidenVecOp_SETCC(SDNode* N);		SDValue WidenVecOp_SETCC(SDNode* N);

SDValue WidenVecOp_Convert(SDNode *N);		SDValue WidenVecOp_Convert(SDNode *N);
SDValue WidenVecOp_FCOPYSIGN(SDNode *N);		SDValue WidenVecOp_FCOPYSIGN(SDNode *N);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Vector Widening Utilities Support: LegalizeVectorTypes.cpp		// Vector Widening Utilities Support: LegalizeVectorTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
Show All 21 Lines	private:

/// Helper genWidenVectorTruncStores - Helper function to generate a set of		/// Helper genWidenVectorTruncStores - Helper function to generate a set of
/// stores to store a truncate widen vector into non-widen memory		/// stores to store a truncate widen vector into non-widen memory
/// StChain: list of chains for the stores we have generated		/// StChain: list of chains for the stores we have generated
/// ST: store of a widen value		/// ST: store of a widen value
void GenWidenVectorTruncStores(SmallVectorImpl<SDValue> &StChain,		void GenWidenVectorTruncStores(SmallVectorImpl<SDValue> &StChain,
StoreSDNode *ST);		StoreSDNode *ST);

/// Modifies a vector input (widen or narrows) to a vector of NVT. The		/// Modifies a vector input (widen or narrows) to a vector of NVT. The
		mbodartUnsubmitted Done Reply Inline Actions Can you please change the parameter name from WidenVT to NVT, so that it matches this comment and the code in LegalizeVectorTypes.cpp? Also a typo: defalut => default mbodart: Can you please change the parameter name from WidenVT to NVT, so that it matches this comment…
/// input vector must have the same element type as NVT.		/// input vector must have the same element type as NVT.
SDValue ModifyToType(SDValue InOp, EVT WidenVT);		/// When FillWithZeroes is "on" the vector will be widened with
		/// zeroes.
		/// By default, the vector will be widened with undefined values.
		SDValue ModifyToType(SDValue InOp, EVT NVT, bool FillWithZeroes = false);

		mbodartUnsubmitted Not Done Reply Inline Actions The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value. Otherwise, only its first element is replicated. Was that the intent? If so, it seems like a difficult interface to use robustly. An enum with possible values of Zext, Splat or Undef would make the widening interface more clear, assuming the Splat ambiguity is resolved to consistently replicate all original values. But as a further question, when would a larger vector length not be a multiple of a smaller one? Aren't they all powers of 2? mbodart: The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value. Otherwise, only its first element is replicated. Was that the intent? If so, it seems like a difficult interface to use robustly. When I extend vector of indices, I want to add existing values (replicated small vector or replicated first element - does not matter). The gather/scatter will be faster, as far as I know (I'll check it again). But now I think that may be this is too X86 specific? May be I should fill indices with "undef" and then replace these "undefs" in target specific part? An enum with possible values of Zext, Splat or Undef would make the widening interface more clear I thought about this, but I don't see too many enums in LLVM. All enums have global senсe, not per function. But as a further question, when would a larger vector length not be a multiple of a smaller one? The original type may not be power of 2. It is type legalizer, it should be able to deal with any type. < 3 x i64 > -> < 4 x i64 > delena: > The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation…
		mbodartUnsubmitted Not Done Reply Inline Actions I agree that this optimization is too X86 specific, and CodeGen should simply extend with undef values. If you want to proceed with those X86-specific extensions, please do so in a separate change set. That will simplify this change set, and render the other comments here moot for now. If it turns out that replication is needed, we will want to define its behavior more crisply. mbodart: I agree that this optimization is too X86 specific, and CodeGen should simply extend with undef…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I removed "Slpat". I don't need it. "Zext" extension is not target specific, at least for the mask. delena: I removed "Slpat". I don't need it. "Zext" extension is not target specific, at least for the…
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Generic Splitting: LegalizeTypesGeneric.cpp		// Generic Splitting: LegalizeTypesGeneric.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Legalization methods which only use that the illegal type is split into two		// Legalization methods which only use that the illegal type is split into two
// not necessarily identical types. As such they can be used for splitting		// not necessarily identical types. As such they can be used for splitting
// vectors and expanding integers and floats.		// vectors and expanding integers and floats.

▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp

	Show First 20 Lines • Show All 1,098 Lines • ▼ Show 20 Lines
	SDValue DAGTypeLegalizer::PromoteTargetBoolean(SDValue Bool, EVT ValVT) {			SDValue DAGTypeLegalizer::PromoteTargetBoolean(SDValue Bool, EVT ValVT) {
	SDLoc dl(Bool);			SDLoc dl(Bool);
	EVT BoolVT = getSetCCResultType(ValVT);			EVT BoolVT = getSetCCResultType(ValVT);
	ISD::NodeType ExtendCode =			ISD::NodeType ExtendCode =
	TargetLowering::getExtendForContent(TLI.getBooleanContents(ValVT));			TargetLowering::getExtendForContent(TLI.getBooleanContents(ValVT));
	return DAG.getNode(ExtendCode, dl, BoolVT, Bool);			return DAG.getNode(ExtendCode, dl, BoolVT, Bool);
	}			}

				/// WidenTargetBoolean - Widen the given target boolean to a target boolean
				/// of the given type. The boolean vector is promoted (if necessary),
				/// widened and then extended or truncated to match the target boolean
				/// type of the given ValVT.
				SDValue DAGTypeLegalizer::WidenTargetBoolean(SDValue Bool, EVT ValVT,
				bool WithZeroes) {
				SDLoc dl(Bool);
				if (getTypeAction(Bool.getValueType()) == TargetLowering::TypePromoteInteger)
				Bool = GetPromotedInteger(Bool);
				mbodartUnsubmitted Done Reply Inline Actions It seems important here that the value returned by GetPromotedInteger matches the promotion operation (e.g.. sign extension) performed by PromoteTargetBoolean. How is that guaranteed? mbodart: It seems important here that the value returned by GetPromotedInteger matches the promotion…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions Fixed! Thank you. delena: Fixed! Thank you.
				mbodartUnsubmitted Not Done Reply Inline Actions Using SExt unconditionally seems incorrect. Don't we want to make the choice of ZExt vs SExt dependent on getBooleanContents? Perhaps we should add getExtendForVectorContent, similar to the existing scalar getExtendForContent. mbodart: Using SExt unconditionally seems incorrect. Don't we want to make the choice of ZExt vs SExt…

				EVT WideVT = EVT::getVectorVT(*DAG.getContext(),
				Bool.getValueType().getScalarType(),
				ValVT.getVectorNumElements());
				Bool = ModifyToType(Bool, WideVT, WithZeroes);

				EVT TargetBoolVT = getSetCCResultType(ValVT);
				if (WideVT.bitsGT(TargetBoolVT))
				return DAG.getNode(ISD::TRUNCATE, dl, TargetBoolVT, Bool);
				if (WideVT.bitsLT(TargetBoolVT))
				return DAG.getNode(ISD::SIGN_EXTEND, dl, TargetBoolVT, Bool);
				return Bool;
				}

	/// SplitInteger - Return the lower LoVT bits of Op in Lo and the upper HiVT			/// SplitInteger - Return the lower LoVT bits of Op in Lo and the upper HiVT
	/// bits in Hi.			/// bits in Hi.
	void DAGTypeLegalizer::SplitInteger(SDValue Op,			void DAGTypeLegalizer::SplitInteger(SDValue Op,
	EVT LoVT, EVT HiVT,			EVT LoVT, EVT HiVT,
	SDValue &Lo, SDValue &Hi) {			SDValue &Lo, SDValue &Hi) {
	SDLoc dl(Op);			SDLoc dl(Op);
	assert(LoVT.getSizeInBits() + HiVT.getSizeInBits() ==			assert(LoVT.getSizeInBits() + HiVT.getSizeInBits() ==
	Op.getValueType().getSizeInBits() && "Invalid integer splitting!");			Op.getValueType().getSizeInBits() && "Invalid integer splitting!");
	Show All 30 Lines

../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	SDValue Result = DAG.getLoad(ISD::UNINDEXED,
N->getChain(), N->getBasePtr(),		N->getChain(), N->getBasePtr(),
DAG.getUNDEF(N->getBasePtr().getValueType()),		DAG.getUNDEF(N->getBasePtr().getValueType()),
N->getPointerInfo(),		N->getPointerInfo(),
N->getMemoryVT().getVectorElementType(),		N->getMemoryVT().getVectorElementType(),
N->isVolatile(), N->isNonTemporal(),		N->isVolatile(), N->isNonTemporal(),
N->isInvariant(), N->getOriginalAlignment(),		N->isInvariant(), N->getOriginalAlignment(),
N->getAAInfo());		N->getAAInfo());

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Result.getValue(1));		ReplaceValueWith(SDValue(N, 1), Result.getValue(1));
return Result;		return Result;
}		}

SDValue DAGTypeLegalizer::ScalarizeVecRes_UnaryOp(SDNode *N) {		SDValue DAGTypeLegalizer::ScalarizeVecRes_UnaryOp(SDNode *N) {
// Get the dest type - it doesn't always match the input type, e.g. int_to_fp.		// Get the dest type - it doesn't always match the input type, e.g. int_to_fp.
EVT DestVT = N->getValueType(0).getVectorElementType();		EVT DestVT = N->getValueType(0).getVectorElementType();
▲ Show 20 Lines • Show All 770 Lines • ▼ Show 20 Lines	Hi = DAG.getLoad(ISD::UNINDEXED, ExtType, HiVT, dl, Ch, Ptr, Offset,
HiMemVT, isVolatile, isNonTemporal, isInvariant, Alignment,		HiMemVT, isVolatile, isNonTemporal, isInvariant, Alignment,
AAInfo);		AAInfo);

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(LD, 1), Ch);		ReplaceValueWith(SDValue(LD, 1), Ch);
}		}

void DAGTypeLegalizer::SplitVecRes_MLOAD(MaskedLoadSDNode *MLD,		void DAGTypeLegalizer::SplitVecRes_MLOAD(MaskedLoadSDNode *MLD,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
EVT LoVT, HiVT;		EVT LoVT, HiVT;
SDLoc dl(MLD);		SDLoc dl(MLD);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Hi = DAG.getMaskedLoad(HiVT, dl, Ch, Ptr, MaskHi, Src0Hi, HiMemVT, MMO,
ExtType);		ExtType);


// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(MLD, 1), Ch);		ReplaceValueWith(SDValue(MLD, 1), Ch);

}		}

void DAGTypeLegalizer::SplitVecRes_MGATHER(MaskedGatherSDNode *MGT,		void DAGTypeLegalizer::SplitVecRes_MGATHER(MaskedGatherSDNode *MGT,
SDValue &Lo, SDValue &Hi) {		SDValue &Lo, SDValue &Hi) {
EVT LoVT, HiVT;		EVT LoVT, HiVT;
Show All 31 Lines	void DAGTypeLegalizer::SplitVecRes_MGATHER(MaskedGatherSDNode *MGT,
Hi = DAG.getMaskedGather(DAG.getVTList(HiVT, MVT::Other), HiVT, dl, OpsHi,		Hi = DAG.getMaskedGather(DAG.getVTList(HiVT, MVT::Other), HiVT, dl, OpsHi,
MMO);		MMO);

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(MGT, 1), Ch);		ReplaceValueWith(SDValue(MGT, 1), Ch);
}		}


void DAGTypeLegalizer::SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi) {		void DAGTypeLegalizer::SplitVecRes_SETCC(SDNode *N, SDValue &Lo, SDValue &Hi) {
assert(N->getValueType(0).isVector() &&		assert(N->getValueType(0).isVector() &&
N->getOperand(0).getValueType().isVector() &&		N->getOperand(0).getValueType().isVector() &&
▲ Show 20 Lines • Show All 496 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SplitVecOp_MGATHER(MaskedGatherSDNode *MGT,
SDValue Hi = DAG.getMaskedGather(DAG.getVTList(HiVT, MVT::Other), HiVT, dl,		SDValue Hi = DAG.getMaskedGather(DAG.getVTList(HiVT, MVT::Other), HiVT, dl,
OpsHi, MMO);		OpsHi, MMO);

// Build a factor node to remember that this load is independent of the		// Build a factor node to remember that this load is independent of the
// other one.		// other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(MGT, 1), Ch);		ReplaceValueWith(SDValue(MGT, 1), Ch);

SDValue Res = DAG.getNode(ISD::CONCAT_VECTORS, dl, MGT->getValueType(0), Lo,		SDValue Res = DAG.getNode(ISD::CONCAT_VECTORS, dl, MGT->getValueType(0), Lo,
Hi);		Hi);
ReplaceValueWith(SDValue(MGT, 0), Res);		ReplaceValueWith(SDValue(MGT, 0), Res);
return SDValue();		return SDValue();
}		}
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	#endif
case ISD::SETCC: Res = WidenVecRes_SETCC(N); break;		case ISD::SETCC: Res = WidenVecRes_SETCC(N); break;
case ISD::UNDEF: Res = WidenVecRes_UNDEF(N); break;		case ISD::UNDEF: Res = WidenVecRes_UNDEF(N); break;
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
Res = WidenVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N));		Res = WidenVecRes_VECTOR_SHUFFLE(cast<ShuffleVectorSDNode>(N));
break;		break;
case ISD::MLOAD:		case ISD::MLOAD:
Res = WidenVecRes_MLOAD(cast<MaskedLoadSDNode>(N));		Res = WidenVecRes_MLOAD(cast<MaskedLoadSDNode>(N));
break;		break;
		case ISD::MGATHER:
		Res = WidenVecRes_MGATHER(cast<MaskedGatherSDNode>(N));
		break;

case ISD::ADD:		case ISD::ADD:
case ISD::AND:		case ISD::AND:
case ISD::MUL:		case ISD::MUL:
case ISD::MULHS:		case ISD::MULHS:
case ISD::MULHU:		case ISD::MULHU:
case ISD::OR:		case ISD::OR:
case ISD::SUB:		case ISD::SUB:
▲ Show 20 Lines • Show All 702 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::WidenVecRes_LOAD(SDNode *N) {
// Modified the chain - switch anything that used the old chain to use		// Modified the chain - switch anything that used the old chain to use
// the new one.		// the new one.
ReplaceValueWith(SDValue(N, 1), NewChain);		ReplaceValueWith(SDValue(N, 1), NewChain);

return Result;		return Result;
}		}

SDValue DAGTypeLegalizer::WidenVecRes_MLOAD(MaskedLoadSDNode *N) {		SDValue DAGTypeLegalizer::WidenVecRes_MLOAD(MaskedLoadSDNode *N) {

EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(),N->getValueType(0));		EVT WideVT = TLI.getTypeToTransformTo(*DAG.getContext(),N->getValueType(0));
SDValue Mask = N->getMask();		SDValue Mask = N->getMask();
EVT MaskVT = Mask.getValueType();
SDValue Src0 = GetWidenedVector(N->getSrc0());		SDValue Src0 = GetWidenedVector(N->getSrc0());
ISD::LoadExtType ExtType = N->getExtensionType();		ISD::LoadExtType ExtType = N->getExtensionType();
SDLoc dl(N);		SDLoc dl(N);

if (getTypeAction(MaskVT) == TargetLowering::TypeWidenVector)		// The mask should be widened as well
Mask = GetWidenedVector(Mask);		Mask = WidenTargetBoolean(Mask, WideVT, true);
else {
EVT BoolVT = getSetCCResultType(WidenVT);

// We can't use ModifyToType() because we should fill the mask with
// zeroes
unsigned WidenNumElts = BoolVT.getVectorNumElements();
unsigned MaskNumElts = MaskVT.getVectorNumElements();

unsigned NumConcat = WidenNumElts / MaskNumElts;
SmallVector<SDValue, 16> Ops(NumConcat);
SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
Ops[0] = Mask;
for (unsigned i = 1; i != NumConcat; ++i)
Ops[i] = ZeroVal;

Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
}

		mbodartUnsubmitted Not Done Reply Inline Actions Now that ModifyToType supports zero fill, why aren't we using it here (and in WidenVecOp_MSTORE)? mbodart: Now that ModifyToType supports zero fill, why aren't we using it here (and in…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I'll simplify MSTORE in a separate patch. delena: I'll simplify MSTORE in a separate patch.
		mbodartUnsubmitted Not Done Reply Inline Actions That's fine. But then can you please add a "FIX ME" comment here indicating such a change is desired? mbodart: That's fine. But then can you please add a "FIX ME" comment here indicating such a change is…
SDValue Res = DAG.getMaskedLoad(WidenVT, dl, N->getChain(), N->getBasePtr(),		SDValue Res = DAG.getMaskedLoad(WideVT, dl, N->getChain(), N->getBasePtr(),
Mask, Src0, N->getMemoryVT(),		Mask, Src0, N->getMemoryVT(),
N->getMemOperand(), ExtType);		N->getMemOperand(), ExtType);
// Legalized the chain result - switch anything that used the old chain to		// Legalize the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
		return Res;
		}

		SDValue DAGTypeLegalizer::WidenVecRes_MGATHER(MaskedGatherSDNode *N) {

		EVT WideVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
		SDValue Mask = N->getMask();
		SDValue Src0 = GetWidenedVector(N->getValue());
		unsigned NumElts = WideVT.getVectorNumElements();
		SDLoc dl(N);

		// The mask should be widened as well
		Mask = WidenTargetBoolean(Mask, WideVT, true);

		// Widen the Index operand
		SDValue Index = N->getIndex();
		EVT WideIndexVT = EVT::getVectorVT(*DAG.getContext(),
		Index.getValueType().getScalarType(),
		NumElts);
		Index = ModifyToType(Index, WideIndexVT);
		SDValue Ops[] = { N->getChain(), Src0, Mask, N->getBasePtr(), Index };
		SDValue Res = DAG.getMaskedGather(DAG.getVTList(WideVT, MVT::Other),
		N->getMemoryVT(), dl, Ops,
		N->getMemOperand());

		// Legalize the chain result - switch anything that used the old chain to
// use the new one.		// use the new one.
ReplaceValueWith(SDValue(N, 1), Res.getValue(1));		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
return Res;		return Res;
}		}

SDValue DAGTypeLegalizer::WidenVecRes_SCALAR_TO_VECTOR(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecRes_SCALAR_TO_VECTOR(SDNode *N) {
EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N),		return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N),
WidenVT, N->getOperand(0));		WidenVT, N->getOperand(0));
}		}

SDValue DAGTypeLegalizer::WidenVecRes_SELECT(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecRes_SELECT(SDNode *N) {
EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
unsigned WidenNumElts = WidenVT.getVectorNumElements();		unsigned WidenNumElts = WidenVT.getVectorNumElements();

SDValue Cond1 = N->getOperand(0);		SDValue Cond1 = N->getOperand(0);
EVT CondVT = Cond1.getValueType();		EVT CondVT = Cond1.getValueType();
if (CondVT.isVector()) {		if (CondVT.isVector()) {
EVT CondEltVT = CondVT.getVectorElementType();		EVT CondEltVT = CondVT.getVectorElementType();
EVT CondWidenVT = EVT::getVectorVT(*DAG.getContext(),		EVT CondWidenVT = EVT::getVectorVT(*DAG.getContext(),
		mbodartUnsubmitted Not Done Reply Inline Actions A simple source comment here to the effect "// Zero extend the mask" would help readers remember the meaning of "true". mbodart: A simple source comment here to the effect "// Zero extend the mask" would help readers…
CondEltVT, WidenNumElts);		CondEltVT, WidenNumElts);
if (getTypeAction(CondVT) == TargetLowering::TypeWidenVector)		if (getTypeAction(CondVT) == TargetLowering::TypeWidenVector)
Cond1 = GetWidenedVector(Cond1);		Cond1 = GetWidenedVector(Cond1);

// If we have to split the condition there is no point in widening the		// If we have to split the condition there is no point in widening the
// select. This would result in an cycle of widening the select ->		// select. This would result in an cycle of widening the select ->
// widening the condition operand -> splitting the condition operand ->		// widening the condition operand -> splitting the condition operand ->
// splitting the select -> widening the select. Instead split this select		// splitting the select -> widening the select. Instead split this select
// further and widen the resulting type.		// further and widen the resulting type.
if (getTypeAction(CondVT) == TargetLowering::TypeSplitVector) {		if (getTypeAction(CondVT) == TargetLowering::TypeSplitVector) {
SDValue SplitSelect = SplitVecOp_VSELECT(N, 0);		SDValue SplitSelect = SplitVecOp_VSELECT(N, 0);
SDValue Res = ModifyToType(SplitSelect, WidenVT);		SDValue Res = ModifyToType(SplitSelect, WidenVT);
return Res;		return Res;
}		}

		mbodartUnsubmitted Done Reply Inline Actions Another instance of "Legalized" => "Legalize". mbodart: Another instance of "Legalized" => "Legalize".
if (Cond1.getValueType() != CondWidenVT)		if (Cond1.getValueType() != CondWidenVT)
Cond1 = ModifyToType(Cond1, CondWidenVT);		Cond1 = ModifyToType(Cond1, CondWidenVT);
}		}

SDValue InOp1 = GetWidenedVector(N->getOperand(1));		SDValue InOp1 = GetWidenedVector(N->getOperand(1));
SDValue InOp2 = GetWidenedVector(N->getOperand(2));		SDValue InOp2 = GetWidenedVector(N->getOperand(2));
assert(InOp1.getValueType() == WidenVT && InOp2.getValueType() == WidenVT);		assert(InOp1.getValueType() == WidenVT && InOp2.getValueType() == WidenVT);
return DAG.getNode(N->getOpcode(), SDLoc(N),		return DAG.getNode(N->getOpcode(), SDLoc(N),
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	#endif
llvm_unreachable("Do not know how to widen this operator's operand!");		llvm_unreachable("Do not know how to widen this operator's operand!");

case ISD::BITCAST: Res = WidenVecOp_BITCAST(N); break;		case ISD::BITCAST: Res = WidenVecOp_BITCAST(N); break;
case ISD::CONCAT_VECTORS: Res = WidenVecOp_CONCAT_VECTORS(N); break;		case ISD::CONCAT_VECTORS: Res = WidenVecOp_CONCAT_VECTORS(N); break;
case ISD::EXTRACT_SUBVECTOR: Res = WidenVecOp_EXTRACT_SUBVECTOR(N); break;		case ISD::EXTRACT_SUBVECTOR: Res = WidenVecOp_EXTRACT_SUBVECTOR(N); break;
case ISD::EXTRACT_VECTOR_ELT: Res = WidenVecOp_EXTRACT_VECTOR_ELT(N); break;		case ISD::EXTRACT_VECTOR_ELT: Res = WidenVecOp_EXTRACT_VECTOR_ELT(N); break;
case ISD::STORE: Res = WidenVecOp_STORE(N); break;		case ISD::STORE: Res = WidenVecOp_STORE(N); break;
case ISD::MSTORE: Res = WidenVecOp_MSTORE(N, OpNo); break;		case ISD::MSTORE: Res = WidenVecOp_MSTORE(N, OpNo); break;
		case ISD::MSCATTER: Res = WidenVecOp_MSCATTER(N, OpNo); break;
case ISD::SETCC: Res = WidenVecOp_SETCC(N); break;		case ISD::SETCC: Res = WidenVecOp_SETCC(N); break;
case ISD::FCOPYSIGN: Res = WidenVecOp_FCOPYSIGN(N); break;		case ISD::FCOPYSIGN: Res = WidenVecOp_FCOPYSIGN(N); break;

case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
Res = WidenVecOp_EXTEND(N);		Res = WidenVecOp_EXTEND(N);
break;		break;
▲ Show 20 Lines • Show All 198 Lines • ▼ Show 20 Lines	if (StChain.size() == 1)
return StChain[0];		return StChain[0];
else		else
return DAG.getNode(ISD::TokenFactor, SDLoc(ST), MVT::Other, StChain);		return DAG.getNode(ISD::TokenFactor, SDLoc(ST), MVT::Other, StChain);
}		}

SDValue DAGTypeLegalizer::WidenVecOp_MSTORE(SDNode *N, unsigned OpNo) {		SDValue DAGTypeLegalizer::WidenVecOp_MSTORE(SDNode *N, unsigned OpNo) {
MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);		MaskedStoreSDNode *MST = cast<MaskedStoreSDNode>(N);
SDValue Mask = MST->getMask();		SDValue Mask = MST->getMask();
EVT MaskVT = Mask.getValueType();
SDValue StVal = MST->getValue();		SDValue StVal = MST->getValue();

		assert(OpNo == 3 && "Unexpected operand number");
// Widen the value		// Widen the value
SDValue WideVal = GetWidenedVector(StVal);		SDValue WideVal = GetWidenedVector(StVal);
		EVT WideVT = WideVal.getValueType();
SDLoc dl(N);		SDLoc dl(N);

if (OpNo == 2 \|\| getTypeAction(MaskVT) == TargetLowering::TypeWidenVector)
Mask = GetWidenedVector(Mask);
else {
// The mask should be widened as well		// The mask should be widened as well
EVT BoolVT = getSetCCResultType(WideVal.getValueType());		Mask = WidenTargetBoolean(Mask, WideVT, true);
// We can't use ModifyToType() because we should fill the mask with
// zeroes
unsigned WidenNumElts = BoolVT.getVectorNumElements();
unsigned MaskNumElts = MaskVT.getVectorNumElements();

unsigned NumConcat = WidenNumElts / MaskNumElts;
SmallVector<SDValue, 16> Ops(NumConcat);
SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
Ops[0] = Mask;
for (unsigned i = 1; i != NumConcat; ++i)
Ops[i] = ZeroVal;

Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
}
assert(Mask.getValueType().getVectorNumElements() ==
WideVal.getValueType().getVectorNumElements() &&
"Mask and data vectors should have the same number of elements");
return DAG.getMaskedStore(MST->getChain(), dl, WideVal, MST->getBasePtr(),		return DAG.getMaskedStore(MST->getChain(), dl, WideVal, MST->getBasePtr(),
Mask, MST->getMemoryVT(), MST->getMemOperand(),		Mask, MST->getMemoryVT(), MST->getMemOperand(),
false);		false);
}		}

		SDValue DAGTypeLegalizer::WidenVecOp_MSCATTER(SDNode *N, unsigned OpNo) {
		assert(OpNo == 1 && "Can widen only data operand of mscatter");
		MaskedScatterSDNode *MSC = cast<MaskedScatterSDNode>(N);
		SDValue DataOp = MSC->getValue();
		SDValue Mask = MSC->getMask();

		assert(OpNo == 1 && "Unexpected operand number");
		mbodartUnsubmitted Done Reply Inline Actions Redundant assert, already checked a few lines above. mbodart: Redundant assert, already checked a few lines above.
		// Widen the value
		SDValue WideVal = GetWidenedVector(DataOp);
		EVT WideVT = WideVal.getValueType();
		unsigned NumElts = WideVal.getValueType().getVectorNumElements();
		SDLoc dl(N);

		// The mask should be widened as well
		Mask = WidenTargetBoolean(Mask, WideVT, true);

		// Widen index
		SDValue Index = MSC->getIndex();
		EVT WideIndexVT = EVT::getVectorVT(*DAG.getContext(),
		Index.getValueType().getScalarType(),
		NumElts);
		Index = ModifyToType(Index, WideIndexVT);

		SDValue Ops[] = {MSC->getChain(), WideVal, Mask, MSC->getBasePtr(), Index};
		return DAG.getMaskedScatter(DAG.getVTList(MVT::Other),
		MSC->getMemoryVT(), dl, Ops,
		MSC->getMemOperand());
		}

SDValue DAGTypeLegalizer::WidenVecOp_SETCC(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecOp_SETCC(SDNode *N) {
SDValue InOp0 = GetWidenedVector(N->getOperand(0));		SDValue InOp0 = GetWidenedVector(N->getOperand(0));
SDValue InOp1 = GetWidenedVector(N->getOperand(1));		SDValue InOp1 = GetWidenedVector(N->getOperand(1));
SDLoc dl(N);		SDLoc dl(N);

// WARNING: In this code we widen the compare instruction with garbage.		// WARNING: In this code we widen the compare instruction with garbage.
// This garbage may contain denormal floats which may be slow. Is this a real		// This garbage may contain denormal floats which may be slow. Is this a real
// concern ? Should we zero the unused lanes if this is a float compare ?		// concern ? Should we zero the unused lanes if this is a float compare ?
▲ Show 20 Lines • Show All 447 Lines • ▼ Show 20 Lines	StChain.push_back(DAG.getTruncStore(Chain, dl, EOp, NewBasePtr,
ST->getPointerInfo().getWithOffset(Offset),		ST->getPointerInfo().getWithOffset(Offset),
StEltVT, isVolatile, isNonTemporal,		StEltVT, isVolatile, isNonTemporal,
MinAlign(Align, Offset), AAInfo));		MinAlign(Align, Offset), AAInfo));
}		}
}		}

/// Modifies a vector input (widen or narrows) to a vector of NVT. The		/// Modifies a vector input (widen or narrows) to a vector of NVT. The
/// input vector must have the same element type as NVT.		/// input vector must have the same element type as NVT.
SDValue DAGTypeLegalizer::ModifyToType(SDValue InOp, EVT NVT) {		/// FillWithZeroes specifies that the vector should be widened with zeroes.
		SDValue DAGTypeLegalizer::ModifyToType(SDValue InOp, EVT NVT,
		bool FillWithZeroes) {
// Note that InOp might have been widened so it might already have		// Note that InOp might have been widened so it might already have
// the right width or it might need be narrowed.		// the right width or it might need be narrowed.
EVT InVT = InOp.getValueType();		EVT InVT = InOp.getValueType();
assert(InVT.getVectorElementType() == NVT.getVectorElementType() &&		assert(InVT.getVectorElementType() == NVT.getVectorElementType() &&
"input and widen element type must match");		"input and widen element type must match");
SDLoc dl(InOp);		SDLoc dl(InOp);

		mbodartUnsubmitted Done Reply Inline Actions I would think we want this moved below the check for "if (InVT == NVT)". mbodart: I would think we want this moved below the check for "if (InVT == NVT)".
// Check if InOp already has the right width.		// Check if InOp already has the right width.
if (InVT == NVT)		if (InVT == NVT)
return InOp;		return InOp;

unsigned InNumElts = InVT.getVectorNumElements();		unsigned InNumElts = InVT.getVectorNumElements();
unsigned WidenNumElts = NVT.getVectorNumElements();		unsigned WidenNumElts = NVT.getVectorNumElements();
if (WidenNumElts > InNumElts && WidenNumElts % InNumElts == 0) {		if (WidenNumElts > InNumElts && WidenNumElts % InNumElts == 0) {
unsigned NumConcat = WidenNumElts / InNumElts;		unsigned NumConcat = WidenNumElts / InNumElts;
SmallVector<SDValue, 16> Ops(NumConcat);		SmallVector<SDValue, 16> Ops(NumConcat);
SDValue UndefVal = DAG.getUNDEF(InVT);		SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, InVT) :
		DAG.getUNDEF(InVT);
Ops[0] = InOp;		Ops[0] = InOp;
for (unsigned i = 1; i != NumConcat; ++i)		for (unsigned i = 1; i != NumConcat; ++i)
Ops[i] = UndefVal;		Ops[i] = FillVal;

return DAG.getNode(ISD::CONCAT_VECTORS, dl, NVT, Ops);		return DAG.getNode(ISD::CONCAT_VECTORS, dl, NVT, Ops);
}		}

if (WidenNumElts < InNumElts && InNumElts % WidenNumElts)		if (WidenNumElts < InNumElts && InNumElts % WidenNumElts)
return DAG.getNode(		return DAG.getNode(
ISD::EXTRACT_SUBVECTOR, dl, NVT, InOp,		ISD::EXTRACT_SUBVECTOR, dl, NVT, InOp,
DAG.getConstant(0, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));		DAG.getConstant(0, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));

// Fall back to extract and build.		// Fall back to extract and build.
SmallVector<SDValue, 16> Ops(WidenNumElts);		SmallVector<SDValue, 16> Ops(WidenNumElts);
EVT EltVT = NVT.getVectorElementType();		EVT EltVT = NVT.getVectorElementType();
unsigned MinNumElts = std::min(WidenNumElts, InNumElts);		unsigned MinNumElts = std::min(WidenNumElts, InNumElts);
unsigned Idx;		unsigned Idx;
for (Idx = 0; Idx < MinNumElts; ++Idx)		for (Idx = 0; Idx < MinNumElts; ++Idx)
Ops[Idx] = DAG.getNode(		Ops[Idx] = DAG.getNode(
ISD::EXTRACT_VECTOR_ELT, dl, EltVT, InOp,		ISD::EXTRACT_VECTOR_ELT, dl, EltVT, InOp,
DAG.getConstant(Idx, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));		DAG.getConstant(Idx, dl, TLI.getVectorIdxTy(DAG.getDataLayout())));

SDValue UndefVal = DAG.getUNDEF(EltVT);		SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, EltVT) :
		DAG.getUNDEF(EltVT);
for ( ; Idx < WidenNumElts; ++Idx)		for ( ; Idx < WidenNumElts; ++Idx)
Ops[Idx] = UndefVal;		Ops[Idx] = FillVal;
return DAG.getNode(ISD::BUILD_VECTOR, dl, NVT, Ops);		return DAG.getNode(ISD::BUILD_VECTOR, dl, NVT, Ops);
}		}

../lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,553 Lines • ▼ Show 20 Lines	if (!Subtarget->useSoftFloat() && Subtarget->hasAVX512()) {
// Custom lower several nodes.		// Custom lower several nodes.
for (MVT VT : MVT::vector_valuetypes()) {		for (MVT VT : MVT::vector_valuetypes()) {
unsigned EltSize = VT.getVectorElementType().getSizeInBits();		unsigned EltSize = VT.getVectorElementType().getSizeInBits();
if (EltSize == 1) {		if (EltSize == 1) {
setOperationAction(ISD::AND, VT, Legal);		setOperationAction(ISD::AND, VT, Legal);
setOperationAction(ISD::OR, VT, Legal);		setOperationAction(ISD::OR, VT, Legal);
setOperationAction(ISD::XOR, VT, Legal);		setOperationAction(ISD::XOR, VT, Legal);
}		}
if (EltSize >= 32 && VT.getSizeInBits() <= 512) {		if ((VT.is128BitVector() \|\| VT.is256BitVector()) && EltSize >= 32) {
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
}		}
// Extract subvector is special because the value type		// Extract subvector is special because the value type
// (result) is 256/128-bit but the source is 512-bit wide.		// (result) is 256/128-bit but the source is 512-bit wide.
if (VT.is128BitVector() \|\| VT.is256BitVector()) {		if (VT.is128BitVector() \|\| VT.is256BitVector()) {
setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::EXTRACT_SUBVECTOR, VT, Custom);
}		}
Show All 9 Lines	for (MVT VT : MVT::vector_valuetypes()) {
setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::INSERT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::BUILD_VECTOR, VT, Custom);		setOperationAction(ISD::BUILD_VECTOR, VT, Custom);
setOperationAction(ISD::VSELECT, VT, Legal);		setOperationAction(ISD::VSELECT, VT, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, VT, Custom);
setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Custom);		setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Custom);
setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);		setOperationAction(ISD::INSERT_SUBVECTOR, VT, Custom);
setOperationAction(ISD::MLOAD, VT, Legal);		setOperationAction(ISD::MLOAD, VT, Legal);
setOperationAction(ISD::MSTORE, VT, Legal);		setOperationAction(ISD::MSTORE, VT, Legal);
		setOperationAction(ISD::MGATHER, VT, Legal);
		setOperationAction(ISD::MSCATTER, VT, Custom);
		mbodartUnsubmitted Not Done Reply Inline Actions Why would 512-bit gathers and scatters have different operation actions here? mbodart: Why would 512-bit gathers and scatters have different operation actions here?
		delenaAuthorUnsubmitted Not Done Reply Inline Actions The SCATTER is more problematic than GATHER. MSCATTER node returns only the Chain. The VPSCATTER instruction in X86 zeroes mask operand and I should specify it as "return value". delena: The SCATTER is more problematic than GATHER. MSCATTER node returns only the Chain. The…
}		}
}		}
for (auto VT : { MVT::v64i8, MVT::v32i16, MVT::v16i32 }) {		for (auto VT : { MVT::v64i8, MVT::v32i16, MVT::v16i32 }) {
setOperationAction(ISD::SELECT, VT, Promote);		setOperationAction(ISD::SELECT, VT, Promote);
AddPromotedToType (ISD::SELECT, VT, MVT::v8i64);		AddPromotedToType (ISD::SELECT, VT, MVT::v8i64);
}		}
}// has AVX-512		}// has AVX-512

▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
setTargetDAGCombine(ISD::SIGN_EXTEND);		setTargetDAGCombine(ISD::SIGN_EXTEND);
setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);		setTargetDAGCombine(ISD::SIGN_EXTEND_INREG);
setTargetDAGCombine(ISD::SINT_TO_FP);		setTargetDAGCombine(ISD::SINT_TO_FP);
setTargetDAGCombine(ISD::UINT_TO_FP);		setTargetDAGCombine(ISD::UINT_TO_FP);
setTargetDAGCombine(ISD::SETCC);		setTargetDAGCombine(ISD::SETCC);
setTargetDAGCombine(ISD::BUILD_VECTOR);		setTargetDAGCombine(ISD::BUILD_VECTOR);
setTargetDAGCombine(ISD::MUL);		setTargetDAGCombine(ISD::MUL);
setTargetDAGCombine(ISD::XOR);		setTargetDAGCombine(ISD::XOR);
		setTargetDAGCombine(ISD::MSCATTER);
		setTargetDAGCombine(ISD::MGATHER);

computeRegisterProperties(Subtarget->getRegisterInfo());		computeRegisterProperties(Subtarget->getRegisterInfo());

MaxStoresPerMemset = 16; // For @llvm.memset -> sequence of stores		MaxStoresPerMemset = 16; // For @llvm.memset -> sequence of stores
MaxStoresPerMemsetOptSize = 8;		MaxStoresPerMemsetOptSize = 8;
MaxStoresPerMemcpy = 8; // For @llvm.memcpy -> sequence of stores		MaxStoresPerMemcpy = 8; // For @llvm.memcpy -> sequence of stores
MaxStoresPerMemcpyOptSize = 4;		MaxStoresPerMemcpyOptSize = 4;
MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores		MaxStoresPerMemmove = 8; // For @llvm.memmove -> sequence of stores
▲ Show 20 Lines • Show All 10,064 Lines • ▼ Show 20 Lines	static SDValue LowerINSERT_SUBVECTOR(SDValue Op, const X86Subtarget *Subtarget,
}		}

if ((OpVT.is256BitVector() \|\| OpVT.is512BitVector()) &&		if ((OpVT.is256BitVector() \|\| OpVT.is512BitVector()) &&
SubVecVT.is128BitVector())		SubVecVT.is128BitVector())
return Insert128BitVector(Vec, SubVec, IdxVal, DAG, dl);		return Insert128BitVector(Vec, SubVec, IdxVal, DAG, dl);

if (OpVT.is512BitVector() && SubVecVT.is256BitVector())		if (OpVT.is512BitVector() && SubVecVT.is256BitVector())
return Insert256BitVector(Vec, SubVec, IdxVal, DAG, dl);		return Insert256BitVector(Vec, SubVec, IdxVal, DAG, dl);

if (OpVT.getVectorElementType() == MVT::i1)		if (OpVT.getVectorElementType() == MVT::i1)
		mbodartUnsubmitted Not Done Reply Inline Actions Unrelated to your change, I'm finding the optimizations of MVT::i1 here confusing as there are no checks for SubVecVT.getVectorNumElements(). The code appears to be making the assumption that when IdxVal is 0, or half the full vector length, then SubVec is exactly half the size of Vec. Why is that not being checked? It's also unclear to me how VSHLI/VSHRI operate on a vector of MVT::i1 elements. When operating on a vector of i32 or i64 elements, I would have thought these instructions perform a bit shift of each individual element (not cross element). But it seems the code here is trying to shift the whole i1 vector left and right, as if it is one big integer. Maybe that's how VSHLI/VSHRI are defined to behave, I don't know. Can you clarify? mbodart: Unrelated to your change, I'm finding the optimizations of MVT::i1 here confusing as there are…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions There are the special SHIFT instructions for mask vector KSHIFTR, KSHIFTL (in AVX-512) When we insert v8i1 into v16i1, the index should be 8 (I'll add an assertion) or 0. If v16i1 is allzero, it's enough to shift the input vector left-right. delena: There are the special SHIFT instructions for mask vector KSHIFTR, KSHIFTL (in AVX-512) When we…
return Insert1BitVector(Op, DAG);		return Insert1BitVector(Op, DAG);

return SDValue();		return SDValue();
}		}

// ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as		// ConstantPool, JumpTable, GlobalAddress, and ExternalSymbol are lowered as
// their target countpart wrapped in the X86ISD::Wrapper node. Suppose N is		// their target countpart wrapped in the X86ISD::Wrapper node. Suppose N is
// one of the above mentioned nodes. It has to be wrapped because otherwise		// one of the above mentioned nodes. It has to be wrapped because otherwise
▲ Show 20 Lines • Show All 1,344 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerTRUNCATE(SDValue Op, SelectionDAG &DAG) const {
if (Subtarget->hasAVX512()) {		if (Subtarget->hasAVX512()) {
// word to byte only under BWI		// word to byte only under BWI
if (InVT == MVT::v16i16 && !Subtarget->hasBWI()) // v16i16 -> v16i8		if (InVT == MVT::v16i16 && !Subtarget->hasBWI()) // v16i16 -> v16i8
return DAG.getNode(X86ISD::VTRUNC, DL, VT,		return DAG.getNode(X86ISD::VTRUNC, DL, VT,
DAG.getNode(X86ISD::VSEXT, DL, MVT::v16i32, In));		DAG.getNode(X86ISD::VSEXT, DL, MVT::v16i32, In));
return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);		return DAG.getNode(X86ISD::VTRUNC, DL, VT, In);
}		}
if ((VT == MVT::v4i32) && (InVT == MVT::v4i64)) {		if ((VT == MVT::v4i32) && (InVT == MVT::v4i64)) {
		if (In.getOpcode() == ISD::CONCAT_VECTORS && In.getNumOperands() == 2) {
		static const int ShufMask[] = {0, 2, 4, 6};
		return DAG.getVectorShuffle(VT, DL,
		DAG.getBitcast(MVT::v4i32, In.getOperand(0)),
		DAG.getBitcast(MVT::v4i32, In.getOperand(1)),
		ShufMask);
		}
// On AVX2, v4i64 -> v4i32 becomes VPERMD.		// On AVX2, v4i64 -> v4i32 becomes VPERMD.
if (Subtarget->hasInt256()) {		if (Subtarget->hasInt256()) {
static const int ShufMask[] = {0, 2, 4, 6, -1, -1, -1, -1};		static const int ShufMask[] = {0, 2, 4, 6, -1, -1, -1, -1};
In = DAG.getBitcast(MVT::v8i32, In);		In = DAG.getBitcast(MVT::v8i32, In);
In = DAG.getVectorShuffle(MVT::v8i32, DL, In, DAG.getUNDEF(MVT::v8i32),		In = DAG.getVectorShuffle(MVT::v8i32, DL, In, DAG.getUNDEF(MVT::v8i32),
ShufMask);		ShufMask);
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, In,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, VT, In,
DAG.getIntPtrConstant(0, DL));		DAG.getIntPtrConstant(0, DL));
▲ Show 20 Lines • Show All 6,400 Lines • ▼ Show 20 Lines	static SDValue LowerFSINCOS(SDValue Op, const X86Subtarget *Subtarget,
SDValue SinVal = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ArgVT,		SDValue SinVal = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ArgVT,
CallResult.first, DAG.getIntPtrConstant(0, dl));		CallResult.first, DAG.getIntPtrConstant(0, dl));
SDValue CosVal = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ArgVT,		SDValue CosVal = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, ArgVT,
CallResult.first, DAG.getIntPtrConstant(1, dl));		CallResult.first, DAG.getIntPtrConstant(1, dl));
SDVTList Tys = DAG.getVTList(ArgVT, ArgVT);		SDVTList Tys = DAG.getVTList(ArgVT, ArgVT);
return DAG.getNode(ISD::MERGE_VALUES, dl, Tys, SinVal, CosVal);		return DAG.getNode(ISD::MERGE_VALUES, dl, Tys, SinVal, CosVal);
}		}

		/// Widen a vector input to a vector of NVT. The
		mbodartUnsubmitted Not Done Reply Inline Actions It's unfortunate that we have to duplicate much of the functionality of the generic type modifier in CodeGen's ModifyToType. Is there a way to unify them? If not, please add a source comment describing how this implementation differs from the one in CodeGen. mbodart: It's unfortunate that we have to duplicate much of the functionality of the generic type…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I simplified the X86 version. I call it ExtendToType. It works with legal types only and optimized for X86. delena: I simplified the X86 version. I call it ExtendToType. It works with legal types only and…
		/// input vector must have the same element type as NVT.
		static SDValue ExtendToType(SDValue InOp, MVT NVT, SelectionDAG &DAG,
		bool FillWithZeroes = false) {
		// Check if InOp already has the right width.
		MVT InVT = InOp.getSimpleValueType();
		mbodartUnsubmitted Not Done Reply Inline Actions Another case where this new Undef creation should probably be moved below the InVT == NVT early return. mbodart: Another case where this new Undef creation should probably be moved below the InVT == NVT early…
		if (InVT == NVT)
		return InOp;

		if (InOp.isUndef())
		return DAG.getUNDEF(NVT);

		assert(InVT.getVectorElementType() == NVT.getVectorElementType() &&
		"input and widen element type must match");

		unsigned InNumElts = InVT.getVectorNumElements();
		unsigned WidenNumElts = NVT.getVectorNumElements();
		assert(WidenNumElts > InNumElts && WidenNumElts % InNumElts == 0 &&
		"Unexpected request for vector widening");

		EVT EltVT = NVT.getVectorElementType();

		SDLoc dl(InOp);
		if (InOp.getOpcode() == ISD::CONCAT_VECTORS &&
		InOp.getNumOperands() == 2) {
		SDValue N1 = InOp.getOperand(1);
		if ((ISD::isBuildVectorAllZeros(N1.getNode()) && FillWithZeroes) \|\|
		N1.isUndef()) {
		InOp = InOp.getOperand(0);
		InVT = InOp.getSimpleValueType();
		InNumElts = InVT.getVectorNumElements();
		}
		}
		if (ISD::isBuildVectorOfConstantSDNodes(InOp.getNode())) {
		// Special case, because CONCAT_VECTORS with many operands is not
		// converted to the BUILD_VECTOR
		SmallVector<SDValue, 16> Ops;
		for (unsigned i = 0; i < InNumElts; ++i)
		Ops.push_back(InOp.getOperand(i));

		SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, EltVT) :
		DAG.getUNDEF(EltVT);
		for (unsigned i = 0; i < WidenNumElts - InNumElts; ++i)
		Ops.push_back(FillVal);
		return DAG.getNode(ISD::BUILD_VECTOR, dl, NVT, Ops);
		}
		SDValue FillVal = FillWithZeroes ? DAG.getConstant(0, dl, NVT) : DAG.getUNDEF(NVT);
		return DAG.getNode(ISD::INSERT_SUBVECTOR, dl, NVT, FillVal,
		InOp, DAG.getIntPtrConstant(0, dl));
		}

static SDValue LowerMSCATTER(SDValue Op, const X86Subtarget *Subtarget,		static SDValue LowerMSCATTER(SDValue Op, const X86Subtarget *Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
assert(Subtarget->hasAVX512() &&		assert(Subtarget->hasAVX512() &&
"MGATHER/MSCATTER are supported on AVX-512 arch only");		"MGATHER/MSCATTER are supported on AVX-512 arch only");

		// X86 scatter kills mask register, so its type should be added to
		// the list of return values.
		// If the "scatter" has 2 return values, it is already handled.
		if (Op.getNode()->getNumValues() == 2)
		return Op;

MaskedScatterSDNode *N = cast<MaskedScatterSDNode>(Op.getNode());		MaskedScatterSDNode *N = cast<MaskedScatterSDNode>(Op.getNode());
MVT VT = N->getValue().getSimpleValueType();		SDValue Src = N->getValue();
		MVT VT = Src.getSimpleValueType();
assert(VT.getScalarSizeInBits() >= 32 && "Unsupported scatter op");		assert(VT.getScalarSizeInBits() >= 32 && "Unsupported scatter op");
SDLoc dl(Op);		SDLoc dl(Op);

// X86 scatter kills mask register, so its type should be added to		SDValue NewScatter;
// the list of return values
if (N->getNumValues() == 1) {
SDValue Index = N->getIndex();		SDValue Index = N->getIndex();
		SDValue Mask = N->getMask();
		SDValue Chain = N->getChain();
		SDValue BasePtr = N->getBasePtr();
		MVT MemVT = N->getMemoryVT().getSimpleVT();
		MVT IndexVT = Index.getSimpleValueType();
		MVT MaskVT = Mask.getSimpleValueType();

		if (MemVT.getScalarSizeInBits() < VT.getScalarSizeInBits()) {
		mbodartUnsubmitted Not Done Reply Inline Actions It seems odd to me to allow a masked store/scatter where the value element size differs from the memory element size. Is this an unavoidable result of type legalization? What are the semantics of such an operation? If the value is i64 and the memory is i32, apparently we just store the low 32 bits of each i64 element. Is that the expected behavior? Do we ever have to worry about a size difference in the opposite direction, e.g. storing an i32 to an i64? mbodart: It seems odd to me to allow a masked store/scatter where the value element size differs from…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions This is the case of promotion of v2i32 to v2i64. Only MemVT keeps the original VT. I'm actually "redo" the TypeLegalizer's work. The type legalizer promoted v2i32 to v2i64 and I retrieve v2i32 with shuffle {0, 2, -1, -1} and then widen the result to v4i32. I'll add comments. delena: This is the case of promotion of v2i32 to v2i64. Only MemVT keeps the original VT. I'm actually…
		// Promoted data type
		assert((MemVT == MVT::v2i32 && VT == MVT::v2i64) &&
		"Unexpected memory type");
		int ShuffleMask[] = {0, 2, -1, -1};
		Src = DAG.getVectorShuffle(MVT::v4i32, dl, DAG.getBitcast(MVT::v4i32, Src),
		DAG.getUNDEF(MVT::v4i32), ShuffleMask);
		// Now we have 4 elements instead of 2.
		// Expand the index.
		MVT NewIndexVT = MVT::getVectorVT(IndexVT.getScalarType(), 4);
		Index = ExtendToType(Index, NewIndexVT, DAG);

		// Expand the mask with zeroes
		// Mask may be <2 x i64> or <2 x i1> at this moment
		assert((MaskVT == MVT::v2i1 \|\| MaskVT == MVT::v2i64) &&
		"Unexpected mask type");
		MVT ExtMaskVT = MVT::getVectorVT(MaskVT.getScalarType(), 4);
		Mask = ExtendToType(Mask, ExtMaskVT, DAG, true);
		VT = MVT::v4i32;
		}

		unsigned NumElts = VT.getVectorNumElements();
if (!Subtarget->hasVLX() && !VT.is512BitVector() &&		if (!Subtarget->hasVLX() && !VT.is512BitVector() &&
!Index.getSimpleValueType().is512BitVector())		!Index.getSimpleValueType().is512BitVector()) {
		// AVX512F supports only 512-bit vectors. Or data or index should
		// be 512 bit wide. If now the both index and data are 256-bit, but
		// the vector contains 8 elements, we just sign-extend the index
		if (IndexVT == MVT::v8i32)
		// Just extend index
		Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);
		else {
		// The minimal number of elts in scatter is 8
		NumElts = 8;
		// Index
		MVT NewIndexVT = MVT::getVectorVT(IndexVT.getScalarType(), NumElts);
		// Use original index here, do not modify the index twice
		Index = ExtendToType(N->getIndex(), NewIndexVT, DAG);
		if (IndexVT.getScalarType() == MVT::i32)
Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);		Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);

SDVTList VTs = DAG.getVTList(N->getMask().getValueType(), MVT::Other);		// Mask
SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),		// At this point we have promoted mask operand
N->getOperand(3), Index };		assert(MaskVT.getScalarSizeInBits() >= 32 && "unexpected mask type");
		MVT ExtMaskVT = MVT::getVectorVT(MaskVT.getScalarType(), NumElts);
SDValue NewScatter = DAG.getMaskedScatter(VTs, VT, dl, Ops, N->getMemOperand());		// Use the original mask here, do not modify the mask twice
		Mask = ExtendToType(N->getMask(), ExtMaskVT, DAG, true);

		// The value that should be stored
		MVT NewVT = MVT::getVectorVT(VT.getScalarType(), NumElts);
		Src = ExtendToType(Src, NewVT, DAG);
		}
		}
		// If the mask is "wide" at this point - truncate it to i1 vector
		MVT BitMaskVT = MVT::getVectorVT(MVT::i1, NumElts);
		Mask = DAG.getNode(ISD::TRUNCATE, dl, BitMaskVT, Mask);
		mbodartUnsubmitted Not Done Reply Inline Actions Minor mechanics question: should we first check if Mask's element type is i1, and if so, avoid inserting the TRUNCATE? mbodart: Minor mechanics question: should we first check if Mask's element type is i1, and if so, avoid…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions getNode(ISD::TRUNCATE..) does this optimization. delena: getNode(ISD::TRUNCATE..) does this optimization.

		// The mask is killed by scatter, add it to the values
		SDVTList VTs = DAG.getVTList(BitMaskVT, MVT::Other);
		SDValue Ops[] = {Chain, Src, Mask, BasePtr, Index};
		NewScatter = DAG.getMaskedScatter(VTs, N->getMemoryVT(), dl, Ops,
		N->getMemOperand());
DAG.ReplaceAllUsesWith(Op, SDValue(NewScatter.getNode(), 1));		DAG.ReplaceAllUsesWith(Op, SDValue(NewScatter.getNode(), 1));
return SDValue(NewScatter.getNode(), 0);		return SDValue(NewScatter.getNode(), 0);
}		}
return Op;
}

static SDValue LowerMGATHER(SDValue Op, const X86Subtarget *Subtarget,		static SDValue LowerMGATHER(SDValue Op, const X86Subtarget *Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
assert(Subtarget->hasAVX512() &&		assert(Subtarget->hasAVX512() &&
"MGATHER/MSCATTER are supported on AVX-512 arch only");		"MGATHER/MSCATTER are supported on AVX-512 arch only");

MaskedGatherSDNode *N = cast<MaskedGatherSDNode>(Op.getNode());		MaskedGatherSDNode *N = cast<MaskedGatherSDNode>(Op.getNode());
		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
		SDValue Index = N->getIndex();
		SDValue Mask = N->getMask();
		SDValue Src0 = N->getValue();
		MVT IndexVT = Index.getSimpleValueType();
		MVT MaskVT = Mask.getSimpleValueType();

		unsigned NumElts = VT.getVectorNumElements();
assert(VT.getScalarSizeInBits() >= 32 && "Unsupported gather op");		assert(VT.getScalarSizeInBits() >= 32 && "Unsupported gather op");
SDLoc dl(Op);

SDValue Index = N->getIndex();
if (!Subtarget->hasVLX() && !VT.is512BitVector() &&		if (!Subtarget->hasVLX() && !VT.is512BitVector() &&
!Index.getSimpleValueType().is512BitVector()) {		!Index.getSimpleValueType().is512BitVector()) {
		// AVX512F supports only 512-bit vectors. Or data or index should
		// be 512 bit wide. If now the both index and data are 256-bit, but
		// the vector contains 8 elements, we just sign-extend the index
		if (NumElts == 8) {
Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);		Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);
SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),		SDValue Ops[] = { N->getOperand(0), N->getOperand(1), N->getOperand(2),
N->getOperand(3), Index };		N->getOperand(3), Index };
DAG.UpdateNodeOperands(N, Ops);		DAG.UpdateNodeOperands(N, Ops);
		return Op;
		}

		// Minimal number of elements in Gather
		mbodartUnsubmitted Not Done Reply Inline Actions For a 32-bit target, we can gather 16 elements with a single gather. Though the comment says 8 is the minimum number of elements in a gather, which is true, the code also seems to be treating this as the maximum number. How are 16-element gathers supported? mbodart: For a 32-bit target, we can gather 16 elements with a single gather. Though the comment says 8…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions We always can sign-extend the indices. VGATHERQPS is supported in 32-bit mode. delena: We always can sign-extend the indices. VGATHERQPS is supported in 32-bit mode.
		NumElts = 8;
		// Index
		MVT NewIndexVT = MVT::getVectorVT(IndexVT.getScalarType(), NumElts);
		Index = ExtendToType(Index, NewIndexVT, DAG);
		if (IndexVT.getScalarType() == MVT::i32)
		Index = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::v8i64, Index);

		// Mask
		MVT MaskBitVT = MVT::getVectorVT(MVT::i1, NumElts);
		// At this point we have promoted mask operand
		assert(MaskVT.getScalarSizeInBits() >= 32 && "unexpected mask type");
		MVT ExtMaskVT = MVT::getVectorVT(MaskVT.getScalarType(), NumElts);
		Mask = ExtendToType(Mask, ExtMaskVT, DAG, true);
		Mask = DAG.getNode(ISD::TRUNCATE, dl, MaskBitVT, Mask);

		// The pass-thru value
		MVT NewVT = MVT::getVectorVT(VT.getScalarType(), NumElts);
		Src0 = ExtendToType(Src0, NewVT, DAG);

		SDValue Ops[] = { N->getChain(), Src0, Mask, N->getBasePtr(), Index };
		SDValue NewGather = DAG.getMaskedGather(DAG.getVTList(NewVT, MVT::Other),
		N->getMemoryVT(), dl, Ops,
		N->getMemOperand());
		SDValue Exract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, VT,
		NewGather.getValue(0),
		DAG.getIntPtrConstant(0, dl));
		SDValue RetOps[] = {Exract, NewGather.getValue(1)};
		return DAG.getMergeValues(RetOps, dl);
}		}
return Op;		return Op;
}		}

SDValue X86TargetLowering::LowerGC_TRANSITION_START(SDValue Op,		SDValue X86TargetLowering::LowerGC_TRANSITION_START(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
// TODO: Eventually, the lowering of these nodes should be informed by or		// TODO: Eventually, the lowering of these nodes should be informed by or
// deferred to the GC strategy for the function in which they appear. For		// deferred to the GC strategy for the function in which they appear. For
▲ Show 20 Lines • Show All 6,758 Lines • ▼ Show 20 Lines	if (auto *Mask = dyn_cast<ConstantSDNode>(N->getOperand(2)))
if (Mask->getZExtValue() == 2 && !isShuffleFoldableLoad(V0)) {		if (Mask->getZExtValue() == 2 && !isShuffleFoldableLoad(V0)) {
SDValue NewMask = DAG.getConstant(1, DL, MVT::i8);		SDValue NewMask = DAG.getConstant(1, DL, MVT::i8);
return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V0, NewMask);		return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V0, NewMask);
}		}

return SDValue();		return SDValue();
}		}

		static SDValue PerformGatherScatterCombine(SDNode *N, SelectionDAG &DAG) {
		SDLoc DL(N);
		// The mask will be truncated anyway. The SIGN_EXTEND_INREG is redundant.
		mbodartUnsubmitted Not Done Reply Inline Actions It is hard to interpret this comment, and thus the correctness of this routine, without any additional context. Why is a mask always truncated at this point? How do you know that it is truncated down to the element size of the SIGN_EXTEND_INREG's operand? Please add some supporting comments answering these kinds of questions. mbodart: It is hard to interpret this comment, and thus the correctness of this routine, without any…
		delenaAuthorUnsubmitted Not Done Reply Inline Actions I added more words here: // Gather and Scatter instructions use k-registers for masks. The type of // the masks is vi1. So the mask will be truncated anyway. // The SIGN_EXTEND_INREG my be dropped. delena:* I added more words here: // Gather and Scatter instructions use k-registers for masks. The…
		SDValue Mask = N->getOperand(2);
		if (Mask.getOpcode() == ISD::SIGN_EXTEND_INREG) {
		SmallVector<SDValue, 5> NewOps(N->op_begin(), N->op_end());
		NewOps[2] = Mask.getOperand(0);
		DAG.UpdateNodeOperands(N, NewOps);
		}
		return SDValue();
		}

// Helper function of PerformSETCCCombine. It is to materialize "setb reg"		// Helper function of PerformSETCCCombine. It is to materialize "setb reg"
// as "sbb reg,reg", since it can be extended without zext and produces		// as "sbb reg,reg", since it can be extended without zext and produces
// an all-ones bit which is more useful than 0/1 in some cases.		// an all-ones bit which is more useful than 0/1 in some cases.
static SDValue MaterializeSETB(SDLoc DL, SDValue EFLAGS, SelectionDAG &DAG,		static SDValue MaterializeSETB(SDLoc DL, SDValue EFLAGS, SelectionDAG &DAG,
MVT VT) {		MVT VT) {
if (VT == MVT::i8)		if (VT == MVT::i8)
return DAG.getNode(ISD::AND, DL, VT,		return DAG.getNode(ISD::AND, DL, VT,
DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,		DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,
▲ Show 20 Lines • Show All 424 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
case X86ISD::PSHUFLW:		case X86ISD::PSHUFLW:
case X86ISD::MOVSS:		case X86ISD::MOVSS:
case X86ISD::MOVSD:		case X86ISD::MOVSD:
case X86ISD::VPERMILPI:		case X86ISD::VPERMILPI:
case X86ISD::VPERM2X128:		case X86ISD::VPERM2X128:
case ISD::VECTOR_SHUFFLE: return PerformShuffleCombine(N, DAG, DCI,Subtarget);		case ISD::VECTOR_SHUFFLE: return PerformShuffleCombine(N, DAG, DCI,Subtarget);
case ISD::FMA: return PerformFMACombine(N, DAG, Subtarget);		case ISD::FMA: return PerformFMACombine(N, DAG, Subtarget);
case X86ISD::BLENDI: return PerformBLENDICombine(N, DAG);		case X86ISD::BLENDI: return PerformBLENDICombine(N, DAG);
		case ISD::MGATHER:
		case ISD::MSCATTER: return PerformGatherScatterCombine(N, DAG);
}		}

return SDValue();		return SDValue();
}		}

/// isTypeDesirableForOp - Return true if the target has native support for		/// isTypeDesirableForOp - Return true if the target has native support for
/// the specified value type and it is 'desirable' to use the type for the		/// the specified value type and it is 'desirable' to use the type for the
/// given node type. e.g. On x86 i16 is legal, but undesirable since i16		/// given node type. e.g. On x86 i16 is legal, but undesirable since i16
▲ Show 20 Lines • Show All 817 Lines • Show Last 20 Lines

../test/CodeGen/X86/masked_gather_scatter.ll

	; RUN: llc -mtriple=x86_64-apple-darwin -mcpu=knl < %s \| FileCheck %s -check-prefix=KNL			; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512f < %s \| FileCheck %s --check-prefix=KNL_64
				; RUN: llc -mtriple=i386-unknown-linux-gnu -mattr=+avx512f < %s \| FileCheck %s --check-prefix=KNL_32
				mbodartUnsubmitted Not Done Reply Inline Actions Have you tried your changes with gathers/scatters targetting 32-bit X86? Poking around in the X86 test area, it seems all triples use x86_64 for gather/scatter tests. mbodart: Have you tried your changes with gathers/scatters targetting 32-bit X86? Poking around in the…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions I tried before, saw that it works. I added tests for 32-bit. delena: I tried before, saw that it works. I added tests for 32-bit.
				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx512vl -mattr=+avx512dq < %s \| FileCheck %s --check-prefix=SKX
	; RUN: opt -mtriple=x86_64-apple-darwin -codegenprepare -mcpu=corei7-avx -S < %s \| FileCheck %s -check-prefix=SCALAR			; RUN: opt -mtriple=x86_64-apple-darwin -codegenprepare -mcpu=corei7-avx -S < %s \| FileCheck %s -check-prefix=SCALAR
				mbodartUnsubmitted Not Done Reply Inline Actions I would think we also want a test for i386 and avx512vl. mbodart: I would think we also want a test for i386 and avx512vl.
				delenaAuthorUnsubmitted Not Done Reply Inline Actions I added. delena: I added.


	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; KNL-LABEL: test1
	; KNL: kxnorw %k1, %k1, %k1
	; KNL: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}

	; SCALAR-LABEL: test1			; SCALAR-LABEL: test1
	; SCALAR: extractelement <16 x float*>			; SCALAR: extractelement <16 x float*>
	; SCALAR-NEXT: load float			; SCALAR-NEXT: load float
	; SCALAR-NEXT: insertelement <16 x float>			; SCALAR-NEXT: insertelement <16 x float>
	; SCALAR-NEXT: extractelement <16 x float*>			; SCALAR-NEXT: extractelement <16 x float*>
	; SCALAR-NEXT: load float			; SCALAR-NEXT: load float

	define <16 x float> @test1(float* %base, <16 x i32> %ind) {			define <16 x float> @test1(float* %base, <16 x i32> %ind) {
				; KNL_64-LABEL: test1:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test1:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vgatherdps (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test1:
				; SKX: # BB#0:
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0			%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0
	%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind			%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

	declare <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*>, i32, <16 x i1>, <16 x i32>)			declare <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*>, i32, <16 x i1>, <16 x i32>)
	declare <16 x float> @llvm.masked.gather.v16f32(<16 x float*>, i32, <16 x i1>, <16 x float>)			declare <16 x float> @llvm.masked.gather.v16f32(<16 x float*>, i32, <16 x i1>, <16 x float>)
	declare <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> , i32, <8 x i1> , <8 x i32> )			declare <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> , i32, <8 x i1> , <8 x i32> )

	; KNL-LABEL: test2
	; KNL: kmovw %esi, %k1
	; KNL: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}

	; SCALAR-LABEL: test2			; SCALAR-LABEL: test2
	; SCALAR: extractelement <16 x float*>			; SCALAR: extractelement <16 x float*>
	; SCALAR-NEXT: load float			; SCALAR-NEXT: load float
	; SCALAR-NEXT: insertelement <16 x float>			; SCALAR-NEXT: insertelement <16 x float>
	; SCALAR-NEXT: br label %else			; SCALAR-NEXT: br label %else
	; SCALAR: else:			; SCALAR: else:
	; SCALAR-NEXT: %res.phi.else = phi			; SCALAR-NEXT: %res.phi.else = phi
	; SCALAR-NEXT: %Mask1 = extractelement <16 x i1> %imask, i32 1			; SCALAR-NEXT: %Mask1 = extractelement <16 x i1> %imask, i32 1
	; SCALAR-NEXT: %ToLoad1 = icmp eq i1 %Mask1, true			; SCALAR-NEXT: %ToLoad1 = icmp eq i1 %Mask1, true
	; SCALAR-NEXT: br i1 %ToLoad1, label %cond.load1, label %else2			; SCALAR-NEXT: br i1 %ToLoad1, label %cond.load1, label %else2

	define <16 x float> @test2(float* %base, <16 x i32> %ind, i16 %mask) {			define <16 x float> @test2(float* %base, <16 x i32> %ind, i16 %mask) {
				; KNL_64-LABEL: test2:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kmovw %esi, %k1
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test2:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: vgatherdps (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test2:
				; SKX: # BB#0:
				; SKX-NEXT: kmovw %esi, %k1
				; SKX-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0			%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0
	%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind			%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind
	%imask = bitcast i16 %mask to <16 x i1>			%imask = bitcast i16 %mask to <16 x i1>
	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> %imask, <16 x float>undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> %imask, <16 x float>undef)
	ret <16 x float> %res			ret <16 x float> %res
	}			}

	; KNL-LABEL: test3
	; KNL: kmovw %esi, %k1
	; KNL: vpgatherdd (%rdi,%zmm0,4), %zmm1 {%k1}
	define <16 x i32> @test3(i32* %base, <16 x i32> %ind, i16 %mask) {			define <16 x i32> @test3(i32* %base, <16 x i32> %ind, i16 %mask) {
				; KNL_64-LABEL: test3:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kmovw %esi, %k1
				; KNL_64-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test3:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: vpgatherdd (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test3:
				; SKX: # BB#0:
				; SKX-NEXT: kmovw %esi, %k1
				; SKX-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0			%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0
	%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i64> %sext_ind			%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i64> %sext_ind
	%imask = bitcast i16 %mask to <16 x i1>			%imask = bitcast i16 %mask to <16 x i1>
	%res = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>undef)			%res = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>undef)
	ret <16 x i32> %res			ret <16 x i32> %res
	}			}

	; KNL-LABEL: test4
	; KNL: kmovw %esi, %k1
	; KNL: kmovw
	; KNL: vpgatherdd
	; KNL: vpgatherdd

	define <16 x i32> @test4(i32* %base, <16 x i32> %ind, i16 %mask) {			define <16 x i32> @test4(i32* %base, <16 x i32> %ind, i16 %mask) {
				; KNL_64-LABEL: test4:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kmovw %esi, %k1
				; KNL_64-NEXT: kmovw %k1, %k2
				; KNL_64-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm1 {%k2}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm2
				; KNL_64-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm2 {%k1}
				; KNL_64-NEXT: vpaddd %zmm2, %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test4:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: kmovw %k1, %k2
				; KNL_32-NEXT: vpgatherdd (%eax,%zmm0,4), %zmm1 {%k2}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm2
				; KNL_32-NEXT: vpgatherdd (%eax,%zmm0,4), %zmm2 {%k1}
				; KNL_32-NEXT: vpaddd %zmm2, %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test4:
				; SKX: # BB#0:
				; SKX-NEXT: kmovw %esi, %k1
				; SKX-NEXT: kmovw %k1, %k2
				; SKX-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm1 {%k2}
				; SKX-NEXT: vmovaps %zmm1, %zmm2
				; SKX-NEXT: vpgatherdd (%rdi,%zmm0,4), %zmm2 {%k1}
				; SKX-NEXT: vpaddd %zmm2, %zmm1, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0			%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0
	%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer

	%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind			%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
	%imask = bitcast i16 %mask to <16 x i1>			%imask = bitcast i16 %mask to <16 x i1>
	%gt1 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>undef)			%gt1 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>undef)
	%gt2 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>%gt1)			%gt2 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %gep.random, i32 4, <16 x i1> %imask, <16 x i32>%gt1)
	%res = add <16 x i32> %gt1, %gt2			%res = add <16 x i32> %gt1, %gt2
	ret <16 x i32> %res			ret <16 x i32> %res
	}			}

	; KNL-LABEL: test5
	; KNL: kmovw %k1, %k2
	; KNL: vpscatterdd {{.*}}%k2
	; KNL: vpscatterdd {{.*}}%k1

	; SCALAR-LABEL: test5			; SCALAR-LABEL: test5
	; SCALAR: %Mask0 = extractelement <16 x i1> %imask, i32 0			; SCALAR: %Mask0 = extractelement <16 x i1> %imask, i32 0
	; SCALAR-NEXT: %ToStore0 = icmp eq i1 %Mask0, true			; SCALAR-NEXT: %ToStore0 = icmp eq i1 %Mask0, true
	; SCALAR-NEXT: br i1 %ToStore0, label %cond.store, label %else			; SCALAR-NEXT: br i1 %ToStore0, label %cond.store, label %else
	; SCALAR: cond.store:			; SCALAR: cond.store:
	; SCALAR-NEXT: %Elt0 = extractelement <16 x i32> %val, i32 0			; SCALAR-NEXT: %Elt0 = extractelement <16 x i32> %val, i32 0
	; SCALAR-NEXT: %Ptr0 = extractelement <16 x i32*> %gep.random, i32 0			; SCALAR-NEXT: %Ptr0 = extractelement <16 x i32*> %gep.random, i32 0
	; SCALAR-NEXT: store i32 %Elt0, i32* %Ptr0, align 4			; SCALAR-NEXT: store i32 %Elt0, i32* %Ptr0, align 4
	; SCALAR-NEXT: br label %else			; SCALAR-NEXT: br label %else
	; SCALAR: else:			; SCALAR: else:
	; SCALAR-NEXT: %Mask1 = extractelement <16 x i1> %imask, i32 1			; SCALAR-NEXT: %Mask1 = extractelement <16 x i1> %imask, i32 1
	; SCALAR-NEXT: %ToStore1 = icmp eq i1 %Mask1, true			; SCALAR-NEXT: %ToStore1 = icmp eq i1 %Mask1, true
	; SCALAR-NEXT: br i1 %ToStore1, label %cond.store1, label %else2			; SCALAR-NEXT: br i1 %ToStore1, label %cond.store1, label %else2

	define void @test5(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) {			define void @test5(i32* %base, <16 x i32> %ind, i16 %mask, <16 x i32>%val) {
				; KNL_64-LABEL: test5:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kmovw %esi, %k1
				; KNL_64-NEXT: kmovw %k1, %k2
				; KNL_64-NEXT: vpscatterdd %zmm1, (%rdi,%zmm0,4) {%k2}
				; KNL_64-NEXT: vpscatterdd %zmm1, (%rdi,%zmm0,4) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test5:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: kmovw %k1, %k2
				; KNL_32-NEXT: vpscatterdd %zmm1, (%eax,%zmm0,4) {%k2}
				; KNL_32-NEXT: vpscatterdd %zmm1, (%eax,%zmm0,4) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test5:
				; SKX: # BB#0:
				; SKX-NEXT: kmovw %esi, %k1
				; SKX-NEXT: kmovw %k1, %k2
				; SKX-NEXT: vpscatterdd %zmm1, (%rdi,%zmm0,4) {%k2}
				; SKX-NEXT: vpscatterdd %zmm1, (%rdi,%zmm0,4) {%k1}
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0			%broadcast.splatinsert = insertelement <16 x i32> undef, i32 %base, i32 0
	%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x i32> %broadcast.splatinsert, <16 x i32> undef, <16 x i32> zeroinitializer

	%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind			%gep.random = getelementptr i32, <16 x i32*> %broadcast.splat, <16 x i32> %ind
	%imask = bitcast i16 %mask to <16 x i1>			%imask = bitcast i16 %mask to <16 x i1>
	call void @llvm.masked.scatter.v16i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)			call void @llvm.masked.scatter.v16i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
	call void @llvm.masked.scatter.v16i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)			call void @llvm.masked.scatter.v16i32(<16 x i32>%val, <16 x i32*> %gep.random, i32 4, <16 x i1> %imask)
	ret void			ret void
	}			}

	declare void @llvm.masked.scatter.v8i32(<8 x i32> , <8 x i32*> , i32 , <8 x i1> )			declare void @llvm.masked.scatter.v8i32(<8 x i32> , <8 x i32*> , i32 , <8 x i1> )
	declare void @llvm.masked.scatter.v16i32(<16 x i32> , <16 x i32*> , i32 , <16 x i1> )			declare void @llvm.masked.scatter.v16i32(<16 x i32> , <16 x i32*> , i32 , <16 x i1> )

	; KNL-LABEL: test6
	; KNL: kxnorw %k1, %k1, %k1
	; KNL: kxnorw %k2, %k2, %k2
	; KNL: vpgatherqd (,%zmm{{.}}), %ymm{{.}} {%k2}
	; KNL: vpscatterqd %ymm{{.}}, (,%zmm{{.}}) {%k1}

	; SCALAR-LABEL: test6			; SCALAR-LABEL: test6
	; SCALAR: store i32 %Elt0, i32* %Ptr01, align 4			; SCALAR: store i32 %Elt0, i32* %Ptr01, align 4
	; SCALAR-NEXT: %Elt1 = extractelement <8 x i32> %a1, i32 1			; SCALAR-NEXT: %Elt1 = extractelement <8 x i32> %a1, i32 1
	; SCALAR-NEXT: %Ptr12 = extractelement <8 x i32*> %ptr, i32 1			; SCALAR-NEXT: %Ptr12 = extractelement <8 x i32*> %ptr, i32 1
	; SCALAR-NEXT: store i32 %Elt1, i32* %Ptr12, align 4			; SCALAR-NEXT: store i32 %Elt1, i32* %Ptr12, align 4
	; SCALAR-NEXT: %Elt2 = extractelement <8 x i32> %a1, i32 2			; SCALAR-NEXT: %Elt2 = extractelement <8 x i32> %a1, i32 2
	; SCALAR-NEXT: %Ptr23 = extractelement <8 x i32*> %ptr, i32 2			; SCALAR-NEXT: %Ptr23 = extractelement <8 x i32*> %ptr, i32 2
	; SCALAR-NEXT: store i32 %Elt2, i32* %Ptr23, align 4			; SCALAR-NEXT: store i32 %Elt2, i32* %Ptr23, align 4

	define <8 x i32> @test6(<8 x i32>%a1, <8 x i32*> %ptr) {			define <8 x i32> @test6(<8 x i32>%a1, <8 x i32*> %ptr) {
				; KNL_64-LABEL: test6:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: kxnorw %k2, %k2, %k2
				; KNL_64-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k2}
				; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test6:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm2
				; KNL_32-NEXT: kxnorw %k2, %k2, %k2
				; KNL_32-NEXT: vpgatherqd (,%zmm2), %ymm1 {%k2}
				; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm2) {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test6:
				; SKX: # BB#0:
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: kxnorw %k2, %k2, %k2
				; SKX-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k2}
				; SKX-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq

	%a = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %ptr, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)			%a = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %ptr, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)

	call void @llvm.masked.scatter.v8i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)			call void @llvm.masked.scatter.v8i32(<8 x i32> %a1, <8 x i32*> %ptr, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>)
	ret <8 x i32>%a			ret <8 x i32>%a
	}			}

	; In this case the index should be promoted to <8 x i64> for KNL
	; KNL-LABEL: test7
	; KNL: vpmovsxdq %ymm0, %zmm0
	; KNL: kmovw %k1, %k2
	; KNL: vpgatherqd {{.*}} {%k2}
	; KNL: vpgatherqd {{.*}} {%k1}
	define <8 x i32> @test7(i32* %base, <8 x i32> %ind, i8 %mask) {			define <8 x i32> @test7(i32* %base, <8 x i32> %ind, i8 %mask) {
				;
				; KNL_64-LABEL: test7:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: movzbl %sil, %eax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_64-NEXT: kmovw %k1, %k2
				; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm1 {%k2}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm2
				; KNL_64-NEXT: vpgatherqd (%rdi,%zmm0,4), %ymm2 {%k1}
				; KNL_64-NEXT: vpaddd %ymm2, %ymm1, %ymm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test7:
				mbodartUnsubmitted Not Done Reply Inline Actions I don't understand how SKX can use 32-bit indices here (i.e., vpgatherdd instead of vpgatherqd), when targetting x86_64. How does lowering of the @llvm.masked.gather.v8i32 calls know that only the low 32-bits of the <8 x i32> pointer values are needed? Is there some analysis of the insertelement;shufflevector/getelementptr instructions which feed to < 8 x i32> pointers? mbodart: I don't understand how SKX can use 32-bit indices here (i.e., vpgatherdd instead of vpgatherqd)…
				delenaAuthorUnsubmitted Not Done Reply Inline Actions The base address in %rdi. (In %edi for 32-bit). In %ymm0 we have only indices. The real address of each element is "base +indexscale" -> %rdi + ymm[i] 4 delena: The base address in %rdi. (In %edi for 32-bit). In %ymm0 we have only indices. The real address…
				mbodartUnsubmitted Not Done Reply Inline Actions OK, thanks. mbodart: OK, thanks.
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_32-NEXT: kmovw %k1, %k2
				; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm1 {%k2}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm2
				; KNL_32-NEXT: vpgatherqd (%eax,%zmm0,4), %ymm2 {%k1}
				; KNL_32-NEXT: vpaddd %ymm2, %ymm1, %ymm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test7:
				; SKX: # BB#0:
				; SKX-NEXT: kmovb %esi, %k1
				; SKX-NEXT: kmovw %k1, %k2
				; SKX-NEXT: vpgatherdd (%rdi,%ymm0,4), %ymm1 {%k2}
				; SKX-NEXT: vmovaps %zmm1, %zmm2
				; SKX-NEXT: vpgatherdd (%rdi,%ymm0,4), %ymm2 {%k1}
				; SKX-NEXT: vpaddd %ymm2, %ymm1, %ymm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <8 x i32> undef, i32 %base, i32 0			%broadcast.splatinsert = insertelement <8 x i32> undef, i32 %base, i32 0
	%broadcast.splat = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> undef, <8 x i32> zeroinitializer			%broadcast.splat = shufflevector <8 x i32> %broadcast.splatinsert, <8 x i32> undef, <8 x i32> zeroinitializer

	%gep.random = getelementptr i32, <8 x i32*> %broadcast.splat, <8 x i32> %ind			%gep.random = getelementptr i32, <8 x i32*> %broadcast.splat, <8 x i32> %ind
	%imask = bitcast i8 %mask to <8 x i1>			%imask = bitcast i8 %mask to <8 x i1>
	%gt1 = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %gep.random, i32 4, <8 x i1> %imask, <8 x i32>undef)			%gt1 = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %gep.random, i32 4, <8 x i1> %imask, <8 x i32>undef)
	%gt2 = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %gep.random, i32 4, <8 x i1> %imask, <8 x i32>%gt1)			%gt2 = call <8 x i32> @llvm.masked.gather.v8i32(<8 x i32*> %gep.random, i32 4, <8 x i1> %imask, <8 x i32>%gt1)
	%res = add <8 x i32> %gt1, %gt2			%res = add <8 x i32> %gt1, %gt2
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

	; No uniform base in this case, index <8 x i64> contains addresses,			; No uniform base in this case, index <8 x i64> contains addresses,
	; each gather call will be split into two			; each gather call will be split into two
	; KNL-LABEL: test8
	; KNL: kshiftrw $8, %k1, %k2
	; KNL: vpgatherqd
	; KNL: vpgatherqd
	; KNL: vinserti64x4
	; KNL: vpgatherqd
	; KNL: vpgatherqd
	; KNL: vinserti64x4
	define <16 x i32> @test8(<16 x i32*> %ptr.random, <16 x i32> %ind, i16 %mask) {			define <16 x i32> @test8(<16 x i32*> %ptr.random, <16 x i32> %ind, i16 %mask) {
				; KNL_64-LABEL: test8:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kmovw %edi, %k1
				; KNL_64-NEXT: kshiftrw $8, %k1, %k2
				; KNL_64-NEXT: kmovw %k2, %k3
				; KNL_64-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k3}
				; KNL_64-NEXT: kmovw %k1, %k3
				; KNL_64-NEXT: vpgatherqd (,%zmm0), %ymm3 {%k3}
				; KNL_64-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm4
				; KNL_64-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k2}
				; KNL_64-NEXT: vpgatherqd (,%zmm0), %ymm3 {%k1}
				; KNL_64-NEXT: vinserti64x4 $1, %ymm2, %zmm3, %zmm0
				; KNL_64-NEXT: vpaddd %zmm0, %zmm4, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test8:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: kmovw {{[0-9]+}}(%esp), %k1
				; KNL_32-NEXT: kmovw %k1, %k2
				; KNL_32-NEXT: vpgatherdd (,%zmm0), %zmm1 {%k2}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm2
				; KNL_32-NEXT: vpgatherdd (,%zmm0), %zmm2 {%k1}
				; KNL_32-NEXT: vpaddd %zmm2, %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test8:
				; SKX: # BB#0:
				; SKX-NEXT: kmovw %edi, %k1
				; SKX-NEXT: kshiftrw $8, %k1, %k2
				; SKX-NEXT: kmovw %k2, %k3
				; SKX-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k3}
				; SKX-NEXT: kmovw %k1, %k3
				; SKX-NEXT: vpgatherqd (,%zmm0), %ymm3 {%k3}
				; SKX-NEXT: vinserti32x8 $1, %ymm2, %zmm3, %zmm4
				; SKX-NEXT: vpgatherqd (,%zmm1), %ymm2 {%k2}
				; SKX-NEXT: vpgatherqd (,%zmm0), %ymm3 {%k1}
				; SKX-NEXT: vinserti32x8 $1, %ymm2, %zmm3, %zmm0
				; SKX-NEXT: vpaddd %zmm0, %zmm4, %zmm0
				; SKX-NEXT: retq

	%imask = bitcast i16 %mask to <16 x i1>			%imask = bitcast i16 %mask to <16 x i1>
	%gt1 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %ptr.random, i32 4, <16 x i1> %imask, <16 x i32>undef)			%gt1 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %ptr.random, i32 4, <16 x i1> %imask, <16 x i32>undef)
	%gt2 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %ptr.random, i32 4, <16 x i1> %imask, <16 x i32>%gt1)			%gt2 = call <16 x i32> @llvm.masked.gather.v16i32(<16 x i32*> %ptr.random, i32 4, <16 x i1> %imask, <16 x i32>%gt1)
	%res = add <16 x i32> %gt1, %gt2			%res = add <16 x i32> %gt1, %gt2
	ret <16 x i32> %res			ret <16 x i32> %res
	}			}

	%struct.RT = type { i8, [10 x [20 x i32]], i8 }			%struct.RT = type { i8, [10 x [20 x i32]], i8 }
	%struct.ST = type { i32, double, %struct.RT }			%struct.ST = type { i32, double, %struct.RT }

	; Masked gather for agregate types			; Masked gather for agregate types
	; Test9 and Test10 should give the same result (scalar and vector indices in GEP)			; Test9 and Test10 should give the same result (scalar and vector indices in GEP)

	; KNL-LABEL: test9
	; KNL: vpbroadcastq %rdi, %zmm
	; KNL: vpmovsxdq
	; KNL: vpbroadcastq
	; KNL: vpmuludq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpgatherqd (,%zmm

	define <8 x i32> @test9(%struct.ST* %base, <8 x i64> %ind1, <8 x i32>%ind5) {			define <8 x i32> @test9(%struct.ST* %base, <8 x i64> %ind1, <8 x i32>%ind5) {
				; KNL_64-LABEL: test9:
				; KNL_64: # BB#0: # %entry
				; KNL_64-NEXT: vpbroadcastq %rdi, %zmm2
				; KNL_64-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_64-NEXT: vpbroadcastq {{.*}}(%rip), %zmm3
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm1, %zmm4
				; KNL_64-NEXT: vpsrlq $32, %zmm1, %zmm1
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm1, %zmm1
				; KNL_64-NEXT: vpsllq $32, %zmm1, %zmm1
				; KNL_64-NEXT: vpaddq %zmm1, %zmm4, %zmm1
				; KNL_64-NEXT: vpbroadcastq {{.*}}(%rip), %zmm3
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm0, %zmm4
				; KNL_64-NEXT: vpsrlq $32, %zmm0, %zmm0
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm0, %zmm0
				; KNL_64-NEXT: vpsllq $32, %zmm0, %zmm0
				; KNL_64-NEXT: vpaddq %zmm0, %zmm4, %zmm0
				; KNL_64-NEXT: vpaddq %zmm0, %zmm2, %zmm0
				; KNL_64-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; KNL_64-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test9:
				; KNL_32: # BB#0: # %entry
				; KNL_32-NEXT: vpbroadcastd {{[0-9]+}}(%esp), %ymm2
				; KNL_32-NEXT: vpbroadcastd .LCPI8_0, %ymm3
				; KNL_32-NEXT: vpmulld %ymm3, %ymm1, %ymm1
				; KNL_32-NEXT: vpmovqd %zmm0, %ymm0
				; KNL_32-NEXT: vpbroadcastd .LCPI8_1, %ymm3
				; KNL_32-NEXT: vpmulld %ymm3, %ymm0, %ymm0
				; KNL_32-NEXT: vpaddd %ymm0, %ymm2, %ymm0
				; KNL_32-NEXT: vpaddd %ymm1, %ymm0, %ymm0
				; KNL_32-NEXT: vpbroadcastd .LCPI8_2, %ymm1
				; KNL_32-NEXT: vpaddd %ymm1, %ymm0, %ymm0
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm1
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test9:
				; SKX: # BB#0: # %entry
				; SKX-NEXT: vpbroadcastq %rdi, %zmm2
				; SKX-NEXT: vpmullq {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; SKX-NEXT: vpaddq %zmm0, %zmm2, %zmm0
				; SKX-NEXT: vpmovsxdq %ymm1, %zmm1
				; SKX-NEXT: vpmullq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; SKX-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; SKX-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; SKX-NEXT: retq
	entry:			entry:
	%broadcast.splatinsert = insertelement <8 x %struct.ST> undef, %struct.ST %base, i32 0			%broadcast.splatinsert = insertelement <8 x %struct.ST> undef, %struct.ST %base, i32 0
	%broadcast.splat = shufflevector <8 x %struct.ST> %broadcast.splatinsert, <8 x %struct.ST> undef, <8 x i32> zeroinitializer			%broadcast.splat = shufflevector <8 x %struct.ST> %broadcast.splatinsert, <8 x %struct.ST> undef, <8 x i32> zeroinitializer

	%arrayidx = getelementptr %struct.ST, <8 x %struct.ST*> %broadcast.splat, <8 x i64> %ind1, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>, <8 x i32><i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <8 x i32> %ind5, <8 x i64> <i64 13, i64 13, i64 13, i64 13, i64 13, i64 13, i64 13, i64 13>			%arrayidx = getelementptr %struct.ST, <8 x %struct.ST*> %broadcast.splat, <8 x i64> %ind1, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>, <8 x i32><i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, <8 x i32> %ind5, <8 x i64> <i64 13, i64 13, i64 13, i64 13, i64 13, i64 13, i64 13, i64 13>
	%res = call <8 x i32 > @llvm.masked.gather.v8i32(<8 x i32*>%arrayidx, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)			%res = call <8 x i32 > @llvm.masked.gather.v8i32(<8 x i32*>%arrayidx, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

	; KNL-LABEL: test10
	; KNL: vpbroadcastq %rdi, %zmm
	; KNL: vpmovsxdq
	; KNL: vpbroadcastq
	; KNL: vpmuludq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpaddq
	; KNL: vpgatherqd (,%zmm
	define <8 x i32> @test10(%struct.ST* %base, <8 x i64> %i1, <8 x i32>%ind5) {			define <8 x i32> @test10(%struct.ST* %base, <8 x i64> %i1, <8 x i32>%ind5) {
				; KNL_64-LABEL: test10:
				; KNL_64: # BB#0: # %entry
				; KNL_64-NEXT: vpbroadcastq %rdi, %zmm2
				; KNL_64-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_64-NEXT: vpbroadcastq {{.*}}(%rip), %zmm3
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm1, %zmm4
				; KNL_64-NEXT: vpsrlq $32, %zmm1, %zmm1
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm1, %zmm1
				; KNL_64-NEXT: vpsllq $32, %zmm1, %zmm1
				; KNL_64-NEXT: vpaddq %zmm1, %zmm4, %zmm1
				; KNL_64-NEXT: vpbroadcastq {{.*}}(%rip), %zmm3
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm0, %zmm4
				; KNL_64-NEXT: vpsrlq $32, %zmm0, %zmm0
				; KNL_64-NEXT: vpmuludq %zmm3, %zmm0, %zmm0
				; KNL_64-NEXT: vpsllq $32, %zmm0, %zmm0
				; KNL_64-NEXT: vpaddq %zmm0, %zmm4, %zmm0
				; KNL_64-NEXT: vpaddq %zmm0, %zmm2, %zmm0
				; KNL_64-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; KNL_64-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test10:
				; KNL_32: # BB#0: # %entry
				; KNL_32-NEXT: vpbroadcastd {{[0-9]+}}(%esp), %ymm2
				; KNL_32-NEXT: vpbroadcastd .LCPI9_0, %ymm3
				; KNL_32-NEXT: vpmulld %ymm3, %ymm1, %ymm1
				; KNL_32-NEXT: vpmovqd %zmm0, %ymm0
				; KNL_32-NEXT: vpbroadcastd .LCPI9_1, %ymm3
				; KNL_32-NEXT: vpmulld %ymm3, %ymm0, %ymm0
				; KNL_32-NEXT: vpaddd %ymm0, %ymm2, %ymm0
				; KNL_32-NEXT: vpaddd %ymm1, %ymm0, %ymm0
				; KNL_32-NEXT: vpbroadcastd .LCPI9_2, %ymm1
				; KNL_32-NEXT: vpaddd %ymm1, %ymm0, %ymm0
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm1
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test10:
				; SKX: # BB#0: # %entry
				; SKX-NEXT: vpbroadcastq %rdi, %zmm2
				; SKX-NEXT: vpmullq {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; SKX-NEXT: vpaddq %zmm0, %zmm2, %zmm0
				; SKX-NEXT: vpmovsxdq %ymm1, %zmm1
				; SKX-NEXT: vpmullq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; SKX-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; SKX-NEXT: vpaddq {{.*}}(%rip){1to8}, %zmm0, %zmm1
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vpgatherqd (,%zmm1), %ymm0 {%k1}
				; SKX-NEXT: retq
	entry:			entry:
	%broadcast.splatinsert = insertelement <8 x %struct.ST> undef, %struct.ST %base, i32 0			%broadcast.splatinsert = insertelement <8 x %struct.ST> undef, %struct.ST %base, i32 0
	%broadcast.splat = shufflevector <8 x %struct.ST> %broadcast.splatinsert, <8 x %struct.ST> undef, <8 x i32> zeroinitializer			%broadcast.splat = shufflevector <8 x %struct.ST> %broadcast.splatinsert, <8 x %struct.ST> undef, <8 x i32> zeroinitializer

	%arrayidx = getelementptr %struct.ST, <8 x %struct.ST*> %broadcast.splat, <8 x i64> %i1, i32 2, i32 1, <8 x i32> %ind5, i64 13			%arrayidx = getelementptr %struct.ST, <8 x %struct.ST*> %broadcast.splat, <8 x i64> %i1, i32 2, i32 1, <8 x i32> %ind5, i64 13
	%res = call <8 x i32 > @llvm.masked.gather.v8i32(<8 x i32*>%arrayidx, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)			%res = call <8 x i32 > @llvm.masked.gather.v8i32(<8 x i32*>%arrayidx, i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x i32> undef)
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

	; Splat index in GEP, requires broadcast			; Splat index in GEP, requires broadcast
	; KNL-LABEL: test11
	; KNL: vpbroadcastd %esi, %zmm
	; KNL: vgatherdps (%rdi,%zmm
	define <16 x float> @test11(float* %base, i32 %ind) {			define <16 x float> @test11(float* %base, i32 %ind) {
				; KNL_64-LABEL: test11:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpbroadcastd %esi, %zmm1
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm1,4), %zmm0 {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test11:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpbroadcastd {{[0-9]+}}(%esp), %zmm1
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vgatherdps (%eax,%zmm1,4), %zmm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test11:
				; SKX: # BB#0:
				; SKX-NEXT: vpbroadcastd %esi, %zmm1
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vgatherdps (%rdi,%zmm1,4), %zmm0 {%k1}
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0			%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0
	%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer

	%gep.random = getelementptr float, <16 x float*> %broadcast.splat, i32 %ind			%gep.random = getelementptr float, <16 x float*> %broadcast.splat, i32 %ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

	; We are checking the uniform base here. It is taken directly from input to vgatherdps			; We are checking the uniform base here. It is taken directly from input to vgatherdps
	; KNL-LABEL: test12
	; KNL: kxnorw %k1, %k1, %k1
	; KNL: vgatherdps (%rdi,%zmm
	define <16 x float> @test12(float* %base, <16 x i32> %ind) {			define <16 x float> @test12(float* %base, <16 x i32> %ind) {
				; KNL_64-LABEL: test12:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: kxnorw %k1, %k1, %k1
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test12:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: kxnorw %k1, %k1, %k1
				; KNL_32-NEXT: vgatherdps (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test12:
				; SKX: # BB#0:
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr float, float *%base, <16 x i64> %sext_ind			%gep.random = getelementptr float, float *%base, <16 x i64> %sext_ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

	; The same as the previous, but the mask is undefined			; The same as the previous, but the mask is undefined
	; KNL-LABEL: test13
	; KNL-NOT: kxnorw
	; KNL: vgatherdps (%rdi,%zmm
	define <16 x float> @test13(float* %base, <16 x i32> %ind) {			define <16 x float> @test13(float* %base, <16 x i32> %ind) {
				; KNL_64-LABEL: test13:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test13:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vgatherdps (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test13:
				; SKX: # BB#0:
				; SKX-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr float, float *%base, <16 x i64> %sext_ind			%gep.random = getelementptr float, float *%base, <16 x i64> %sext_ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> undef, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> undef, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

	; The base pointer is not splat, can't find unform base			; The base pointer is not splat, can't find unform base
	; KNL-LABEL: test14
	; KNL: vgatherqps (,%zmm0)
	; KNL: vgatherqps (,%zmm0)
	define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {			define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
				; KNL_64-LABEL: test14:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
				; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
				; KNL_64-NEXT: vpbroadcastq %xmm0, %zmm0
				; KNL_64-NEXT: vmovd %esi, %xmm1
				; KNL_64-NEXT: vpbroadcastd %xmm1, %ymm1
				; KNL_64-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_64-NEXT: vpsllq $2, %zmm1, %zmm1
				; KNL_64-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; KNL_64-NEXT: kshiftrw $8, %k0, %k1
				; KNL_64-NEXT: vgatherqps (,%zmm0), %ymm1 {%k1}
				; KNL_64-NEXT: vgatherqps (,%zmm0), %ymm2 {%k1}
				; KNL_64-NEXT: vinsertf64x4 $1, %ymm1, %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test14:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm1
				; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
				; KNL_32-NEXT: vpbroadcastd %xmm0, %zmm0
				; KNL_32-NEXT: vpslld $2, {{[0-9]+}}(%esp){1to16}, %zmm1
				; KNL_32-NEXT: vpaddd %zmm1, %zmm0, %zmm1
				; KNL_32-NEXT: vgatherdps (,%zmm1), %zmm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test14:
				; SKX: # BB#0:
				; SKX-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
				; SKX-NEXT: vinserti64x2 $0, %xmm1, %zmm0, %zmm0
				; SKX-NEXT: vpbroadcastq %xmm0, %zmm0
				; SKX-NEXT: vmovd %esi, %xmm1
				; SKX-NEXT: vpbroadcastd %xmm1, %ymm1
				; SKX-NEXT: vpmovsxdq %ymm1, %zmm1
				; SKX-NEXT: vpsllq $2, %zmm1, %zmm1
				; SKX-NEXT: vpaddq %zmm1, %zmm0, %zmm0
				; SKX-NEXT: kshiftrw $8, %k0, %k1
				; SKX-NEXT: vgatherqps (,%zmm0), %ymm1 {%k1}
				; SKX-NEXT: vgatherqps (,%zmm0), %ymm2 {%k1}
				; SKX-NEXT: vinsertf32x8 $1, %ymm1, %zmm2, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x float> %vec, float %base, i32 1			%broadcast.splatinsert = insertelement <16 x float> %vec, float %base, i32 1
	%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer

	%gep.random = getelementptr float, <16 x float*> %broadcast.splat, i32 %ind			%gep.random = getelementptr float, <16 x float*> %broadcast.splat, i32 %ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> undef, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> undef, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

				declare <4 x float> @llvm.masked.gather.v4f32(<4 x float*>, i32, <4 x i1>, <4 x float>)
				declare <4 x double> @llvm.masked.gather.v4f64(<4 x double*>, i32, <4 x i1>, <4 x double>)
				declare <2 x double> @llvm.masked.gather.v2f64(<2 x double*>, i32, <2 x i1>, <2 x double>)

				; Gather smaller than existing instruction
				define <4 x float> @test15(float* %base, <4 x i32> %ind, <4 x i1> %mask) {
				;
				; KNL_64-LABEL: test15:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxor %ymm2, %ymm2, %ymm2
				; KNL_64-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm2[4,5,6,7]
				; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm2
				; KNL_64-NEXT: vpmovsxdq %ymm1, %zmm0
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; KNL_64-NEXT: vptestmq %zmm0, %zmm0, %k1
				; KNL_64-NEXT: vgatherqps (%rdi,%zmm2,4), %ymm0 {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test15:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxor %ymm2, %ymm2, %ymm2
				; KNL_32-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm2[4,5,6,7]
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm2
				; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm0
				; KNL_32-NEXT: vpandq .LCPI14_0, %zmm0, %zmm0
				; KNL_32-NEXT: vptestmq %zmm0, %zmm0, %k1
				; KNL_32-NEXT: vgatherqps (%eax,%zmm2,4), %ymm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test15:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovd2m %xmm1, %k1
				; SKX-NEXT: vgatherdps (%rdi,%xmm0,4), %xmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

				%sext_ind = sext <4 x i32> %ind to <4 x i64>
				%gep.random = getelementptr float, float* %base, <4 x i64> %sext_ind
				%res = call <4 x float> @llvm.masked.gather.v4f32(<4 x float*> %gep.random, i32 4, <4 x i1> %mask, <4 x float> undef)
				ret <4 x float>%res
				}

				; Gather smaller than existing instruction
				define <4 x double> @test16(double* %base, <4 x i32> %ind, <4 x i1> %mask, <4 x double> %src0) {
				;
				; KNL_64-LABEL: test16:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_64-NEXT: vpsrad $31, %xmm1, %xmm1
				; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti64x4 $0, %ymm1, %zmm3, %zmm1
				; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vgatherqpd (%rdi,%zmm0,8), %zmm2 {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test16:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
				; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti64x4 $0, %ymm1, %zmm3, %zmm1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_32-NEXT: vpandq .LCPI15_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vgatherqpd (%eax,%zmm0,8), %zmm2 {%k1}
				; KNL_32-NEXT: vmovaps %zmm2, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test16:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovd2m %xmm1, %k1
				; SKX-NEXT: vgatherdpd (%rdi,%xmm0,8), %ymm2 {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq

				%sext_ind = sext <4 x i32> %ind to <4 x i64>
				%gep.random = getelementptr double, double* %base, <4 x i64> %sext_ind
				%res = call <4 x double> @llvm.masked.gather.v4f64(<4 x double*> %gep.random, i32 4, <4 x i1> %mask, <4 x double> %src0)
				ret <4 x double>%res
				}

				define <2 x double> @test17(double* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x double> %src0) {
				;
				; KNL_64-LABEL: test17:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vgatherqpd (%rdi,%zmm0,8), %zmm2 {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test17:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpandq .LCPI16_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vgatherqpd (%eax,%zmm0,8), %zmm2 {%k1}
				; KNL_32-NEXT: vmovaps %zmm2, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test17:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovq2m %xmm1, %k1
				; SKX-NEXT: vgatherqpd (%rdi,%xmm0,8), %xmm2 {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq

				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr double, double* %base, <2 x i64> %sext_ind
				%res = call <2 x double> @llvm.masked.gather.v2f64(<2 x double*> %gep.random, i32 4, <2 x i1> %mask, <2 x double> %src0)
				ret <2 x double>%res
				}

				declare void @llvm.masked.scatter.v4i32(<4 x i32> , <4 x i32*> , i32 , <4 x i1> )
				declare void @llvm.masked.scatter.v4f64(<4 x double> , <4 x double*> , i32 , <4 x i1> )
				declare void @llvm.masked.scatter.v2i64(<2 x i64> , <2 x i64*> , i32 , <2 x i1> )
				declare void @llvm.masked.scatter.v2i32(<2 x i32> , <2 x i32*> , i32 , <2 x i1> )
				declare void @llvm.masked.scatter.v2f32(<2 x float> , <2 x float*> , i32 , <2 x i1> )

				define void @test18(<4 x i32>%a1, <4 x i32*> %ptr, <4 x i1>%mask) {
				;
				; KNL_64-LABEL: test18:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_64-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm3[4,5,6,7]
				; KNL_64-NEXT: vpmovsxdq %ymm2, %zmm2
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm2, %zmm2
				; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test18:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_32-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm3[4,5,6,7]
				; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_32-NEXT: vpmovsxdq %ymm2, %zmm2
				; KNL_32-NEXT: vpandq .LCPI17_0, %zmm2, %zmm2
				; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test18:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovd2m %xmm2, %k1
				; SKX-NEXT: vpscatterqd %xmm0, (,%ymm1) {%k1}
				; SKX-NEXT: retq
				call void @llvm.masked.scatter.v4i32(<4 x i32> %a1, <4 x i32*> %ptr, i32 4, <4 x i1> %mask)
				ret void
				}

				define void @test19(<4 x double>%a1, double* %ptr, <4 x i1>%mask, <4 x i64> %ind) {
				;
				; KNL_64-LABEL: test19:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_64-NEXT: vpsrad $31, %xmm1, %xmm1
				; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti64x4 $0, %ymm1, %zmm3, %zmm1
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vscatterqpd %zmm0, (%rdi,%zmm2,8) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test19:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_32-NEXT: vpsrad $31, %xmm1, %xmm1
				; KNL_32-NEXT: vpmovsxdq %xmm1, %ymm1
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti64x4 $0, %ymm1, %zmm3, %zmm1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpandq .LCPI18_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vscatterqpd %zmm0, (%eax,%zmm2,8) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test19:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovd2m %xmm1, %k1
				; SKX-NEXT: vscatterqpd %ymm0, (%rdi,%ymm2,8) {%k1}
				; SKX-NEXT: retq
				%gep = getelementptr double, double* %ptr, <4 x i64> %ind
				call void @llvm.masked.scatter.v4f64(<4 x double> %a1, <4 x double*> %gep, i32 8, <4 x i1> %mask)
				ret void
				}

				; Data type requires widening
				define void @test20(<2 x float>%a1, <2 x float*> %ptr, <2 x i1> %mask) {
				;
				; KNL_64-LABEL: test20:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; KNL_64-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2
				; KNL_64-NEXT: vpmovqd %zmm2, %ymm2
				; KNL_64-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_64-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm3[4,5,6,7]
				; KNL_64-NEXT: vpmovsxdq %ymm2, %zmm2
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm2, %zmm2
				; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_64-NEXT: vscatterqps %ymm0, (,%zmm1) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test20:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; KNL_32-NEXT: vinserti128 $1, %xmm3, %ymm2, %ymm2
				; KNL_32-NEXT: vpmovqd %zmm2, %ymm2
				; KNL_32-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_32-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm3[4,5,6,7]
				; KNL_32-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]
				; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_32-NEXT: vpmovsxdq %ymm2, %zmm2
				; KNL_32-NEXT: vpandq .LCPI19_0, %zmm2, %zmm2
				; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_32-NEXT: vscatterqps %ymm0, (,%zmm1) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test20:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovq2m %xmm2, %k0
				; SKX-NEXT: kshiftlw $2, %k0, %k0
				; SKX-NEXT: kshiftrw $2, %k0, %k1
				; SKX-NEXT: vscatterqps %xmm0, (,%ymm1) {%k1}
				; SKX-NEXT: retq
				call void @llvm.masked.scatter.v2f32(<2 x float> %a1, <2 x float*> %ptr, i32 4, <2 x i1> %mask)
				ret void
				}

				; Data type requires promotion
				define void @test21(<2 x i32>%a1, <2 x i32*> %ptr, <2 x i1>%mask) {
				;
				; KNL_64-LABEL: test21:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti32x4 $0, %xmm2, %zmm3, %zmm2
				; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm2, %zmm2
				; KNL_64-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test21:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti32x4 $0, %xmm2, %zmm3, %zmm2
				; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_32-NEXT: vpandq .LCPI20_0, %zmm2, %zmm2
				; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test21:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovq2m %xmm2, %k0
				; SKX-NEXT: kshiftlw $2, %k0, %k0
				; SKX-NEXT: kshiftrw $2, %k0, %k1
				; SKX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; SKX-NEXT: vpscatterqd %xmm0, (,%ymm1) {%k1}
				; SKX-NEXT: retq
				call void @llvm.masked.scatter.v2i32(<2 x i32> %a1, <2 x i32*> %ptr, i32 4, <2 x i1> %mask)
				ret void
				}

				; The result type requires widening
				declare <2 x float> @llvm.masked.gather.v2f32(<2 x float*>, i32, <2 x i1>, <2 x float>)

				define <2 x float> @test22(float* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x float> %src0) {
				;
				;
				; KNL_64-LABEL: test22:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; KNL_64-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1
				; KNL_64-NEXT: vpmovqd %zmm1, %ymm1
				; KNL_64-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_64-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm3[4,5,6,7]
				; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_64-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vgatherqps (%rdi,%zmm0,4), %ymm2 {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test22:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxor %xmm3, %xmm3, %xmm3
				; KNL_32-NEXT: vinserti128 $1, %xmm3, %ymm1, %ymm1
				; KNL_32-NEXT: vpmovqd %zmm1, %ymm1
				; KNL_32-NEXT: vpxor %ymm3, %ymm3, %ymm3
				; KNL_32-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm3[4,5,6,7]
				; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_32-NEXT: vpmovsxdq %ymm1, %zmm1
				; KNL_32-NEXT: vpandq .LCPI21_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vgatherqps (%eax,%zmm0,4), %ymm2 {%k1}
				; KNL_32-NEXT: vmovaps %zmm2, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test22:
				; SKX: # BB#0:
				; SKX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; SKX-NEXT: vpmovq2m %xmm1, %k0
				; SKX-NEXT: kshiftlw $2, %k0, %k0
				; SKX-NEXT: kshiftrw $2, %k0, %k1
				; SKX-NEXT: vgatherdps (%rdi,%xmm0,4), %xmm2 {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr float, float* %base, <2 x i64> %sext_ind
				%res = call <2 x float> @llvm.masked.gather.v2f32(<2 x float*> %gep.random, i32 4, <2 x i1> %mask, <2 x float> %src0)
				ret <2 x float>%res
				}

				declare <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*>, i32, <2 x i1>, <2 x i32>)
				declare <2 x i64> @llvm.masked.gather.v2i64(<2 x i64*>, i32, <2 x i1>, <2 x i64>)

				define <2 x i32> @test23(i32* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x i32> %src0) {
				;
				; KNL_64-LABEL: test23:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vpgatherqq (%rdi,%zmm0,8), %zmm2 {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test23:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpandq .LCPI22_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm2 {%k1}
				; KNL_32-NEXT: vmovaps %zmm2, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test23:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovq2m %xmm1, %k1
				; SKX-NEXT: vpgatherqq (%rdi,%xmm0,8), %xmm2 {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr i32, i32* %base, <2 x i64> %sext_ind
				%res = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> %gep.random, i32 4, <2 x i1> %mask, <2 x i32> %src0)
				ret <2 x i32>%res
				}

				define <2 x i32> @test24(i32* %base, <2 x i32> %ind) {
				;
				;
				; KNL_64-LABEL: test24:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: movb $3, %al
				; KNL_64-NEXT: movzbl %al, %eax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vpgatherqq (%rdi,%zmm0,8), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test24:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpxord %zmm1, %zmm1, %zmm1
				; KNL_32-NEXT: vinserti32x4 $0, .LCPI23_0, %zmm1, %zmm1
				; KNL_32-NEXT: vpandq .LCPI23_1, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test24:
				; SKX: # BB#0:
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vpgatherqq (%rdi,%xmm0,8), %xmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr i32, i32* %base, <2 x i64> %sext_ind
				%res = call <2 x i32> @llvm.masked.gather.v2i32(<2 x i32*> %gep.random, i32 4, <2 x i1> <i1 true, i1 true>, <2 x i32> undef)
				ret <2 x i32>%res
				}

				define <2 x i64> @test25(i64* %base, <2 x i32> %ind, <2 x i1> %mask, <2 x i64> %src0) {
				;
				; KNL_64-LABEL: test25:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_64-NEXT: vpandq {{.*}}(%rip){1to8}, %zmm1, %zmm1
				; KNL_64-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_64-NEXT: vpgatherqq (%rdi,%zmm0,8), %zmm2 {%k1}
				; KNL_64-NEXT: vmovaps %zmm2, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test25:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpxord %zmm3, %zmm3, %zmm3
				; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm3, %zmm1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpandq .LCPI24_0, %zmm1, %zmm1
				; KNL_32-NEXT: vptestmq %zmm1, %zmm1, %k1
				; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm2 {%k1}
				; KNL_32-NEXT: vmovaps %zmm2, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test25:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovq2m %xmm1, %k1
				; SKX-NEXT: vpgatherqq (%rdi,%xmm0,8), %xmm2 {%k1}
				; SKX-NEXT: vmovaps %zmm2, %zmm0
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr i64, i64* %base, <2 x i64> %sext_ind
				%res = call <2 x i64> @llvm.masked.gather.v2i64(<2 x i64*> %gep.random, i32 8, <2 x i1> %mask, <2 x i64> %src0)
				ret <2 x i64>%res
				}

	; KNL-LABEL: test15			define <2 x i64> @test26(i64* %base, <2 x i32> %ind, <2 x i64> %src0) {
	; KNL: kmovw %eax, %k1			;
	; KNL: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}			; KNL_64-LABEL: test26:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: movb $3, %al
				; KNL_64-NEXT: movzbl %al, %eax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vpgatherqq (%rdi,%zmm0,8), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test26:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpxord %zmm2, %zmm2, %zmm2
				; KNL_32-NEXT: vinserti32x4 $0, .LCPI25_0, %zmm2, %zmm2
				; KNL_32-NEXT: vpandq .LCPI25_1, %zmm2, %zmm2
				; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_32-NEXT: vpgatherqq (%eax,%zmm0,8), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test26:
				; SKX: # BB#0:
				; SKX-NEXT: kxnorw %k1, %k1, %k1
				; SKX-NEXT: vpgatherqq (%rdi,%xmm0,8), %xmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr i64, i64* %base, <2 x i64> %sext_ind
				%res = call <2 x i64> @llvm.masked.gather.v2i64(<2 x i64*> %gep.random, i32 8, <2 x i1> <i1 true, i1 true>, <2 x i64> %src0)
				ret <2 x i64>%res
				}

	; SCALAR-LABEL: test15			; Result type requires widening; all-ones mask
				define <2 x float> @test27(float* %base, <2 x i32> %ind) {
				;
				; KNL_64-LABEL: test27:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_64-NEXT: vpmovsxdq %ymm0, %zmm1
				; KNL_64-NEXT: movb $3, %al
				; KNL_64-NEXT: movzbl %al, %eax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vgatherqps (%rdi,%zmm1,4), %ymm0 {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test27:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm1
				; KNL_32-NEXT: vmovdqa {{.*#+}} xmm0 = [1,0,1,0]
				; KNL_32-NEXT: vpxor %xmm2, %xmm2, %xmm2
				; KNL_32-NEXT: vinserti128 $1, %xmm2, %ymm0, %ymm0
				; KNL_32-NEXT: vpmovqd %zmm0, %ymm0
				; KNL_32-NEXT: vpxor %ymm2, %ymm2, %ymm2
				; KNL_32-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm2[4,5,6,7]
				; KNL_32-NEXT: vpmovsxdq %ymm0, %zmm0
				; KNL_32-NEXT: vpandq .LCPI26_1, %zmm0, %zmm0
				; KNL_32-NEXT: vptestmq %zmm0, %zmm0, %k1
				; KNL_32-NEXT: vgatherqps (%eax,%zmm1,4), %ymm0 {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test27:
				; SKX: # BB#0:
				; SKX-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
				; SKX-NEXT: movb $3, %al
				; SKX-NEXT: kmovb %eax, %k1
				; SKX-NEXT: vgatherdps (%rdi,%xmm1,4), %xmm0 {%k1}
				; SKX-NEXT: retq
				%sext_ind = sext <2 x i32> %ind to <2 x i64>
				%gep.random = getelementptr float, float* %base, <2 x i64> %sext_ind
				%res = call <2 x float> @llvm.masked.gather.v2f32(<2 x float*> %gep.random, i32 4, <2 x i1> <i1 true, i1 true>, <2 x float> undef)
				ret <2 x float>%res
				}

				; Data type requires promotion, mask is all-ones
				define void @test28(<2 x i32>%a1, <2 x i32*> %ptr) {
				;
				;
				; KNL_64-LABEL: test28:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_64-NEXT: movb $3, %al
				; KNL_64-NEXT: movzbl %al, %eax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test28:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; KNL_32-NEXT: vpxord %zmm2, %zmm2, %zmm2
				; KNL_32-NEXT: vinserti32x4 $0, .LCPI27_0, %zmm2, %zmm2
				; KNL_32-NEXT: vpandq .LCPI27_1, %zmm2, %zmm2
				; KNL_32-NEXT: vptestmq %zmm2, %zmm2, %k1
				; KNL_32-NEXT: vpscatterqd %ymm0, (,%zmm1) {%k1}
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test28:
				; SKX: # BB#0:
				; SKX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; SKX-NEXT: movb $3, %al
				; SKX-NEXT: kmovb %eax, %k1
				; SKX-NEXT: vpscatterqd %xmm0, (,%ymm1) {%k1}
				; SKX-NEXT: retq
				call void @llvm.masked.scatter.v2i32(<2 x i32> %a1, <2 x i32*> %ptr, i32 4, <2 x i1> <i1 true, i1 true>)
				ret void
				}


				; SCALAR-LABEL: test29
	; SCALAR: extractelement <16 x float*>			; SCALAR: extractelement <16 x float*>
	; SCALAR-NEXT: load float			; SCALAR-NEXT: load float
	; SCALAR-NEXT: insertelement <16 x float>			; SCALAR-NEXT: insertelement <16 x float>
	; SCALAR-NEXT: extractelement <16 x float*>			; SCALAR-NEXT: extractelement <16 x float*>
	; SCALAR-NEXT: load float			; SCALAR-NEXT: load float

	define <16 x float> @test15(float* %base, <16 x i32> %ind) {			define <16 x float> @test29(float* %base, <16 x i32> %ind) {
				; KNL_64-LABEL: test29:
				; KNL_64: # BB#0:
				; KNL_64-NEXT: movw $44, %ax
				; KNL_64-NEXT: kmovw %eax, %k1
				; KNL_64-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; KNL_64-NEXT: vmovaps %zmm1, %zmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test29:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: movw $44, %cx
				; KNL_32-NEXT: kmovw %ecx, %k1
				; KNL_32-NEXT: vgatherdps (%eax,%zmm0,4), %zmm1 {%k1}
				; KNL_32-NEXT: vmovaps %zmm1, %zmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test29:
				; SKX: # BB#0:
				; SKX-NEXT: movw $44, %ax
				; SKX-NEXT: kmovw %eax, %k1
				; SKX-NEXT: vgatherdps (%rdi,%zmm0,4), %zmm1 {%k1}
				; SKX-NEXT: vmovaps %zmm1, %zmm0
				; SKX-NEXT: retq

	%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0			%broadcast.splatinsert = insertelement <16 x float> undef, float %base, i32 0
	%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer			%broadcast.splat = shufflevector <16 x float> %broadcast.splatinsert, <16 x float> undef, <16 x i32> zeroinitializer

	%sext_ind = sext <16 x i32> %ind to <16 x i64>			%sext_ind = sext <16 x i32> %ind to <16 x i64>
	%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind			%gep.random = getelementptr float, <16 x float*> %broadcast.splat, <16 x i64> %sext_ind

	%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 false, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef)			%res = call <16 x float> @llvm.masked.gather.v16f32(<16 x float*> %gep.random, i32 4, <16 x i1> <i1 false, i1 false, i1 true, i1 true, i1 false, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x float> undef)
	ret <16 x float>%res			ret <16 x float>%res
	}			}

	; Check non-power-of-2 case. It should be scalarized.			; Check non-power-of-2 case. It should be scalarized.
	declare <3 x i32> @llvm.masked.gather.v3i32(<3 x i32*>, i32, <3 x i1>, <3 x i32>)			declare <3 x i32> @llvm.masked.gather.v3i32(<3 x i32*>, i32, <3 x i1>, <3 x i32>)
	; KNL-LABEL: test16			define <3 x i32> @test30(<3 x i32*> %base, <3 x i32> %ind, <3 x i1> %mask, <3 x
				mbodartUnsubmitted Done Reply Inline Actions Please keep the last <3 x i32> all on the same line. mbodart: Please keep the last <3 x i32> all on the same line.
	; KNL: testb			; KNL_64-LABEL: test30:
	; KNL: je			; KNL_64: # BB#0:
	; KNL: testb			; KNL_64-NEXT: andl $1, %edx
	; KNL: je			; KNL_64-NEXT: kmovw %edx, %k1
	; KNL: testb			; KNL_64-NEXT: andl $1, %esi
	; KNL: je			; KNL_64-NEXT: kmovw %esi, %k2
	define <3 x i32> @test16(<3 x i32*> %base, <3 x i32> %ind, <3 x i1> %mask, <3 x i32> %src0) {			; KNL_64-NEXT: movl %edi, %eax
				; KNL_64-NEXT: andl $1, %eax
				; KNL_64-NEXT: kmovw %eax, %k0
				; KNL_64-NEXT: vpmovsxdq %xmm1, %ymm1
				; KNL_64-NEXT: vpsllq $2, %ymm1, %ymm1
				; KNL_64-NEXT: vpaddq %ymm1, %ymm0, %ymm1
				; KNL_64-NEXT: # implicit-def: %XMM0
				; KNL_64-NEXT: testb $1, %dil
				; KNL_64-NEXT: je .LBB29_2
				; KNL_64-NEXT: # BB#1: # %cond.load
				; KNL_64-NEXT: vmovq %xmm1, %rax
				; KNL_64-NEXT: vmovd (%rax), %xmm0
				; KNL_64-NEXT: .LBB29_2: # %else
				; KNL_64-NEXT: kmovw %k2, %eax
				; KNL_64-NEXT: movl %eax, %ecx
				; KNL_64-NEXT: andl $1, %ecx
				; KNL_64-NEXT: testb %cl, %cl
				; KNL_64-NEXT: je .LBB29_4
				; KNL_64-NEXT: # BB#3: # %cond.load1
				; KNL_64-NEXT: vpextrq $1, %xmm1, %rcx
				; KNL_64-NEXT: vpinsrd $1, (%rcx), %xmm0, %xmm0
				; KNL_64-NEXT: .LBB29_4: # %else2
				; KNL_64-NEXT: kmovw %k1, %ecx
				; KNL_64-NEXT: movl %ecx, %edx
				; KNL_64-NEXT: andl $1, %edx
				; KNL_64-NEXT: testb %dl, %dl
				; KNL_64-NEXT: je .LBB29_6
				; KNL_64-NEXT: # BB#5: # %cond.load4
				; KNL_64-NEXT: vextracti128 $1, %ymm1, %xmm1
				; KNL_64-NEXT: vmovq %xmm1, %rdx
				; KNL_64-NEXT: vpinsrd $2, (%rdx), %xmm0, %xmm0
				; KNL_64-NEXT: .LBB29_6: # %else5
				; KNL_64-NEXT: kmovw %k0, %edx
				; KNL_64-NEXT: vmovd %edx, %xmm1
				; KNL_64-NEXT: vpinsrd $1, %eax, %xmm1, %xmm1
				; KNL_64-NEXT: vpinsrd $2, %ecx, %xmm1, %xmm1
				; KNL_64-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_64-NEXT: vblendvps %xmm1, %xmm0, %xmm2, %xmm0
				; KNL_64-NEXT: retq
				;
				; KNL_32-LABEL: test30:
				; KNL_32: # BB#0:
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: andl $1, %eax
				; KNL_32-NEXT: kmovw %eax, %k1
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: andl $1, %eax
				; KNL_32-NEXT: kmovw %eax, %k2
				; KNL_32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; KNL_32-NEXT: movl %eax, %ecx
				; KNL_32-NEXT: andl $1, %ecx
				; KNL_32-NEXT: kmovw %ecx, %k0
				; KNL_32-NEXT: vpslld $2, %xmm1, %xmm1
				; KNL_32-NEXT: vpaddd %xmm1, %xmm0, %xmm1
				; KNL_32-NEXT: # implicit-def: %XMM0
				; KNL_32-NEXT: testb $1, %al
				; KNL_32-NEXT: je .LBB29_2
				; KNL_32-NEXT: # BB#1: # %cond.load
				; KNL_32-NEXT: vmovd %xmm1, %eax
				; KNL_32-NEXT: vmovd (%eax), %xmm0
				; KNL_32-NEXT: .LBB29_2: # %else
				; KNL_32-NEXT: kmovw %k2, %eax
				; KNL_32-NEXT: movl %eax, %ecx
				; KNL_32-NEXT: andl $1, %ecx
				; KNL_32-NEXT: testb %cl, %cl
				; KNL_32-NEXT: je .LBB29_4
				; KNL_32-NEXT: # BB#3: # %cond.load1
				; KNL_32-NEXT: vpextrd $1, %xmm1, %ecx
				; KNL_32-NEXT: vpinsrd $1, (%ecx), %xmm0, %xmm0
				; KNL_32-NEXT: .LBB29_4: # %else2
				; KNL_32-NEXT: kmovw %k1, %ecx
				; KNL_32-NEXT: movl %ecx, %edx
				; KNL_32-NEXT: andl $1, %edx
				; KNL_32-NEXT: testb %dl, %dl
				; KNL_32-NEXT: je .LBB29_6
				; KNL_32-NEXT: # BB#5: # %cond.load4
				; KNL_32-NEXT: vpextrd $2, %xmm1, %edx
				; KNL_32-NEXT: vpinsrd $2, (%edx), %xmm0, %xmm0
				; KNL_32-NEXT: .LBB29_6: # %else5
				; KNL_32-NEXT: kmovw %k0, %edx
				; KNL_32-NEXT: vmovd %edx, %xmm1
				; KNL_32-NEXT: vpinsrd $1, %eax, %xmm1, %xmm1
				; KNL_32-NEXT: vpinsrd $2, %ecx, %xmm1, %xmm1
				; KNL_32-NEXT: vpslld $31, %xmm1, %xmm1
				; KNL_32-NEXT: vblendvps %xmm1, %xmm0, %xmm2, %xmm0
				; KNL_32-NEXT: retl
				;
				; SKX-LABEL: test30:
				; SKX: # BB#0:
				; SKX-NEXT: vpmovd2m %xmm2, %k1
				; SKX-NEXT: kmovb %k1, -{{[0-9]+}}(%rsp)
				; SKX-NEXT: vpmovsxdq %xmm1, %ymm1
				; SKX-NEXT: vpsllq $2, %ymm1, %ymm1
				; SKX-NEXT: vpaddq %ymm1, %ymm0, %ymm1
				; SKX-NEXT: movb -{{[0-9]+}}(%rsp), %al
				; SKX-NEXT: # implicit-def: %XMM0
				; SKX-NEXT: andb $1, %al
				; SKX-NEXT: je .LBB29_2
				; SKX-NEXT: # BB#1: # %cond.load
				; SKX-NEXT: vmovq %xmm1, %rax
				; SKX-NEXT: vmovd (%rax), %xmm0
				; SKX-NEXT: .LBB29_2: # %else
				; SKX-NEXT: kmovb %k1, -{{[0-9]+}}(%rsp)
				; SKX-NEXT: movb -{{[0-9]+}}(%rsp), %al
				; SKX-NEXT: andb $1, %al
				; SKX-NEXT: je .LBB29_4
				; SKX-NEXT: # BB#3: # %cond.load1
				; SKX-NEXT: vpextrq $1, %xmm1, %rax
				; SKX-NEXT: vpinsrd $1, (%rax), %xmm0, %xmm0
				; SKX-NEXT: .LBB29_4: # %else2
				; SKX-NEXT: kmovb %k1, -{{[0-9]+}}(%rsp)
				; SKX-NEXT: movb -{{[0-9]+}}(%rsp), %al
				; SKX-NEXT: andb $1, %al
				; SKX-NEXT: je .LBB29_6
				; SKX-NEXT: # BB#5: # %cond.load4
				; SKX-NEXT: vextracti128 $1, %ymm1, %xmm1
				; SKX-NEXT: vmovq %xmm1, %rax
				; SKX-NEXT: vpinsrd $2, (%rax), %xmm0, %xmm0
				; SKX-NEXT: .LBB29_6: # %else5
				; SKX-NEXT: vmovdqa32 %xmm0, %xmm3 {%k1}
				; SKX-NEXT: vmovaps %zmm3, %zmm0
				; SKX-NEXT: retq
				i32> %src0) {
	%sext_ind = sext <3 x i32> %ind to <3 x i64>			%sext_ind = sext <3 x i32> %ind to <3 x i64>
	%gep.random = getelementptr i32, <3 x i32*> %base, <3 x i64> %sext_ind			%gep.random = getelementptr i32, <3 x i32*> %base, <3 x i64> %sext_ind
	%res = call <3 x i32> @llvm.masked.gather.v3i32(<3 x i32*> %gep.random, i32 4, <3 x i1> %mask, <3 x i32> %src0)			%res = call <3 x i32> @llvm.masked.gather.v3i32(<3 x i32*> %gep.random, i32 4, <3 x i1> %mask, <3 x i32> %src0)
	ret <3 x i32>%res			ret <3 x i32>%res
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Type legalizer for masked gather/scatter intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 41020

../include/llvm/CodeGen/SelectionDAGNodes.h

../lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp

../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

../lib/CodeGen/SelectionDAG/LegalizeTypes.h

../lib/CodeGen/SelectionDAG/LegalizeTypes.cpp

../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

../lib/Target/X86/X86ISelLowering.cpp

../test/CodeGen/X86/masked_gather_scatter.ll

Type legalizer for masked gather/scatter intrinsics
ClosedPublic