This is an archive of the discontinued LLVM Phabricator instance.

[DAG] convert vector select-of-constants to logic/math
ClosedPublic

Authored by spatel on Aug 17 2017, 11:44 AM.

Download Raw Diff

Details

Reviewers

hfinkel
nemanjai
craig.topper
efriedma
escha
zvi
DavidKreitzer
aaboud
aivchenk

Commits

rGe404cbff6630: [DAG] convert vector select-of-constants to logic/math
rL311731: [DAG] convert vector select-of-constants to logic/math

Summary

This goes back to a discussion about IR canonicalization. We'd like to preserve and convert more IR to 'select' than we currently do because that's likely the best choice in IR:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/105335.html
...but that's often not true for codegen, so we need to account for this pattern coming in to the backend and transform it to better DAG ops.

Steps in this patch:

Add an EVT param to the existing convertSelectOfConstantsToMath() TLI hook to more finely enable this transform. Other targets will probably want that anyway to distinguish scalars from vectors, but we need it here because AVX512 vectors infinite loop with these folds.
Convert vselect to math or logic. Credit to @RKSimon for suggesting the xor/and/xor hack instead of add/sub for the general case ( also see https://graphics.stanford.edu/~seander/bithacks.html#MaskedMerge ). Bitwise ops should always be more amenable to further folding, so we don't need to add even more special cases for -1/0 constants here.

Try to verify the logic in Alive:
http://rise4fun.com/Alive/yFR

For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs, so it always makes sense to replace that with simpler instructions. I'm not sure if the same is true for PPC.

Ie, if this:
-; CHECK-NEXT: xxsel 34, 51, 50, 34
+; CHECK-NEXT: xxland 0, 34, 50
+; CHECK-NEXT: xxlxor 34, 0, 51

...is not a good optimization in general, we could make the TLI hook distinguish between constants as a further refinement. Another possibility is to convert that back into a select as a machine-instruction-level fold predicated on uarch details?

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Aug 17 2017, 11:44 AM

Herald added subscribers: kbarton, mcrosier. · View Herald TranscriptAug 17 2017, 11:44 AM

For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs

Are you sure?

Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.

Oops...maybe it's not so clear anymore. Page references to the doc dated May 2, 2017:

Bulldozer, Piledriver: 1 uop and 2 cycle latency for 128-bit; 2 uops and 2 cycles for 256-bit. (p. 48, 61)
Ryzen: 1 uop and 1 cycle for 128-bit; 2 uops and 1 cycle for 256-bit. (p. 88)
Skylake: 1 uop and 1 cycle for 128-bit (or just legacy encoded xmm?); 2 uops and 2 cycles for 256-bit (or any vex version?) (p. 239, 243)

So yes, it seems the recent uarch are putting more effort into making blendv a fast op. I think logic ops are still the better choice for default x86 (~SandyBridge). As I suggested in the description, we could re-form a select in machine combiner or some other machine pass if that's the better choice for a particular uarch. WDYT?

Adding some more x86 experts.

If the general case is controversial, another possibility would be to split that off as its own patch. All of the special case (0, -1, off-by-one) diffs are clear wins?

On Thu, Aug 17, 2017 at 3:52 PM, <escha@apple.com> wrote:
btw, to note, blendv *is* a “fast op” on most arches; what’s one extra uop is the fact that the V form takes 3 inputs. iirc, it’s the same reason that ADC, CMOV, etc can’t be 1 uop (2 inputs + flags). the forms that don’t take 3 inputs are usually 1 uop.

Yeah, my reality has been warped by being on x86 too long. But there's apparently good news if I'm reading Agner's numbers correctly this time: both Ryzen and Skylake can do cmov or blendv in a single cycle - getting wider vector ops to be one uop just requires widening the vector unit implementations to actually match the ISA...no problem, right? :)

That means we can canonicalize to 'select' in IR, and we're mostly done: 'select' IR becomes a (v)select node and gets lowered to the matching select instruction.

We can still detect and optimize the special cases here. Optimizing to bit-logic-ops for weak x86 becomes the uarch-specific MI transform.

Let me remove that "general" case from this patch.
IOW, gcc is doing the wrong thing here for skylake by extending the dependency chain:
https://godbolt.org/g/5WJytM

Patch updated:
Remove the general case fold which actually isn't a win if vselect is fast. We only convert the vselect when there's a clear optimization now.

In D36840#844861, @spatel wrote:

Patch updated:
Remove the general case fold which actually isn't a win if vselect is fast. We only convert the vselect when there's a clear optimization now.

That means we shouldn't need a TLI hook now, but I'd like to make that a follow-up because I'm not sure how other targets will change.

This patch certainly looks good from the PPC perspective as far as I can tell.

In D36840#846755, @nemanjai wrote:

This patch certainly looks good from the PPC perspective as far as I can tell.

Thanks. I saw one potential oddity, so I filed:
https://bugs.llvm.org/show_bug.cgi?id=34246

spatel mentioned this in D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type.Aug 21 2017, 6:25 AM

Ping.

Can someone have a look at the x86 diffs? I can then remove the TLI hook as a follow-up and see what effect this has on other targets. Or I can go straight to that step if people think that's better.

I think the X86 diffs looks reasonable.

These changes looks good to me.

In D36840#851951, @craig.topper wrote:

I think the X86 diffs looks reasonable.

Thanks! So both affected targets are approved. Can I take that as approval of the patch?

aaboud accepted this revision.Aug 24 2017, 3:25 PM

This revision is now accepted and ready to land.Aug 24 2017, 3:25 PM

Closed by commit rL311731: [DAG] convert vector select-of-constants to logic/math (authored by spatel). · Explain WhyAug 24 2017, 4:25 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetLowering.h

2 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

59 lines

Target/

PowerPC/

PPCISelLowering.h

2 lines

X86/

X86ISelLowering.h

4 lines

X86ISelLowering.cpp

9 lines

test/

CodeGen/

PowerPC/

vselect-constants.ll

80 lines

X86/

vselect-avx.ll

11 lines

vselect-constants.ll

65 lines

widen_compare-1.ll

4 lines

Diff 112635

llvm/trunk/include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 1,585 Lines • ▼ Show 20 Lines	virtual bool shouldNormalizeToSelectSequence(LLVMContext &Context,
LegalizeTypeAction Action = getTypeAction(Context, VT);		LegalizeTypeAction Action = getTypeAction(Context, VT);
return Action != TypeExpandInteger && Action != TypeExpandFloat &&		return Action != TypeExpandInteger && Action != TypeExpandFloat &&
Action != TypeSplitVector;		Action != TypeSplitVector;
}		}

/// Return true if a select of constants (select Cond, C1, C2) should be		/// Return true if a select of constants (select Cond, C1, C2) should be
/// transformed into simple math ops with the condition value. For example:		/// transformed into simple math ops with the condition value. For example:
/// select Cond, C1, C1-1 --> add (zext Cond), C1-1		/// select Cond, C1, C1-1 --> add (zext Cond), C1-1
virtual bool convertSelectOfConstantsToMath() const {		virtual bool convertSelectOfConstantsToMath(EVT VT) const {
return false;		return false;
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// TargetLowering Configuration Methods - These methods should be invoked by		// TargetLowering Configuration Methods - These methods should be invoked by
// the derived class constructor to configure this object for the target.		// the derived class constructor to configure this object for the target.
//		//
protected:		protected:
▲ Show 20 Lines • Show All 1,905 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	private:

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue LHS,		SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue LHS,
SDValue RHS);		SDValue RHS);

SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);		SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);

SDValue foldSelectOfConstants(SDNode *N);		SDValue foldSelectOfConstants(SDNode *N);
		SDValue foldVSelectOfConstants(SDNode *N);
SDValue foldBinOpIntoSelect(SDNode *BO);		SDValue foldBinOpIntoSelect(SDNode *BO);
bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);		bool SimplifySelectOps(SDNode *SELECT, SDValue LHS, SDValue RHS);
SDValue SimplifyBinOpWithSameOpcodeHands(SDNode *N);		SDValue SimplifyBinOpWithSameOpcodeHands(SDNode *N);
SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);		SDValue SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1, SDValue N2);
SDValue SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,		SDValue SimplifySelectCC(const SDLoc &DL, SDValue N0, SDValue N1,
SDValue N2, SDValue N3, ISD::CondCode CC,		SDValue N2, SDValue N3, ISD::CondCode CC,
bool NotExtCompare = false);		bool NotExtCompare = false);
SDValue foldSelectCCToShiftAnd(const SDLoc &DL, SDValue N0, SDValue N1,		SDValue foldSelectCCToShiftAnd(const SDLoc &DL, SDValue N0, SDValue N1,
▲ Show 20 Lines • Show All 5,827 Lines • ▼ Show 20 Lines	if (C1->isAllOnesValue() && C2->isNullValue()) {
if (VT != MVT::i1)		if (VT != MVT::i1)
Cond = DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Cond);		Cond = DAG.getNode(ISD::SIGN_EXTEND, DL, VT, Cond);
return Cond;		return Cond;
}		}

// For any constants that differ by 1, we can transform the select into an		// For any constants that differ by 1, we can transform the select into an
// extend and add. Use a target hook because some targets may prefer to		// extend and add. Use a target hook because some targets may prefer to
// transform in the other direction.		// transform in the other direction.
if (TLI.convertSelectOfConstantsToMath()) {		if (TLI.convertSelectOfConstantsToMath(VT)) {
if (C1->getAPIntValue() - 1 == C2->getAPIntValue()) {		if (C1->getAPIntValue() - 1 == C2->getAPIntValue()) {
// select Cond, C1, C1-1 --> add (zext Cond), C1-1		// select Cond, C1, C1-1 --> add (zext Cond), C1-1
if (VT != MVT::i1)		if (VT != MVT::i1)
Cond = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, Cond);		Cond = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, Cond);
return DAG.getNode(ISD::ADD, DL, VT, Cond, N2);		return DAG.getNode(ISD::ADD, DL, VT, Cond, N2);
}		}
if (C1->getAPIntValue() + 1 == C2->getAPIntValue()) {		if (C1->getAPIntValue() + 1 == C2->getAPIntValue()) {
// select Cond, C1, C1+1 --> add (sext Cond), C1+1		// select Cond, C1, C1+1 --> add (sext Cond), C1+1
▲ Show 20 Lines • Show All 552 Lines • ▼ Show 20 Lines	if (Mask.getOpcode() == ISD::SETCC) {
SDValue LoadRes = DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Lo, Hi);		SDValue LoadRes = DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Lo, Hi);

SDValue RetOps[] = { LoadRes, Chain };		SDValue RetOps[] = { LoadRes, Chain };
return DAG.getMergeValues(RetOps, DL);		return DAG.getMergeValues(RetOps, DL);
}		}
return SDValue();		return SDValue();
}		}

		/// A vector select of 2 constant vectors can be simplified to math/logic to
		/// avoid a variable select instruction and possibly avoid constant loads.
		SDValue DAGCombiner::foldVSelectOfConstants(SDNode *N) {
		SDValue Cond = N->getOperand(0);
		SDValue N1 = N->getOperand(1);
		SDValue N2 = N->getOperand(2);
		EVT VT = N->getValueType(0);
		if (!Cond.hasOneUse() \|\| Cond.getScalarValueSizeInBits() != 1 \|\|
		!TLI.convertSelectOfConstantsToMath(VT) \|\|
		!ISD::isBuildVectorOfConstantSDNodes(N1.getNode()) \|\|
		!ISD::isBuildVectorOfConstantSDNodes(N2.getNode()))
		return SDValue();

		// Check if we can use the condition value to increment/decrement a single
		// constant value. This simplifies a select to an add and removes a constant
		// load/materialization from the general case.
		bool AllAddOne = true;
		bool AllSubOne = true;
		unsigned Elts = VT.getVectorNumElements();
		for (unsigned i = 0; i != Elts; ++i) {
		SDValue N1Elt = N1.getOperand(i);
		SDValue N2Elt = N2.getOperand(i);
		if (N1Elt.isUndef() \|\| N2Elt.isUndef())
		continue;

		const APInt &C1 = cast<ConstantSDNode>(N1Elt)->getAPIntValue();
		const APInt &C2 = cast<ConstantSDNode>(N2Elt)->getAPIntValue();
		if (C1 != C2 + 1)
		AllAddOne = false;
		if (C1 != C2 - 1)
		AllSubOne = false;
		}

		// Further simplifications for the extra-special cases where the constants are
		// all 0 or all -1 should be implemented as folds of these patterns.
		SDLoc DL(N);
		if (AllAddOne \|\| AllSubOne) {
		// vselect <N x i1> Cond, C+1, C --> add (zext Cond), C
		// vselect <N x i1> Cond, C-1, C --> add (sext Cond), C
		auto ExtendOpcode = AllAddOne ? ISD::ZERO_EXTEND : ISD::SIGN_EXTEND;
		SDValue ExtendedCond = DAG.getNode(ExtendOpcode, DL, VT, Cond);
		return DAG.getNode(ISD::ADD, DL, VT, ExtendedCond, N2);
		}

		// The general case for select-of-constants:
		// vselect <N x i1> Cond, C1, C2 --> xor (and (sext Cond), (C1^C2)), C2
		// ...but that only makes sense if a vselect is slower than 2 logic ops, so
		// leave that to a machine-specific pass.
		return SDValue();
		}

SDValue DAGCombiner::visitVSELECT(SDNode *N) {		SDValue DAGCombiner::visitVSELECT(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
SDLoc DL(N);		SDLoc DL(N);

// fold (vselect C, X, X) -> X		// fold (vselect C, X, X) -> X
if (N1 == N2)		if (N1 == N2)
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitVSELECT(SDNode *N) {
// and addressed.		// and addressed.
if (N1.getOpcode() == ISD::CONCAT_VECTORS &&		if (N1.getOpcode() == ISD::CONCAT_VECTORS &&
N2.getOpcode() == ISD::CONCAT_VECTORS &&		N2.getOpcode() == ISD::CONCAT_VECTORS &&
ISD::isBuildVectorOfConstantSDNodes(N0.getNode())) {		ISD::isBuildVectorOfConstantSDNodes(N0.getNode())) {
if (SDValue CV = ConvertSelectToConcatVector(N, DAG))		if (SDValue CV = ConvertSelectToConcatVector(N, DAG))
return CV;		return CV;
}		}

		if (SDValue V = foldVSelectOfConstants(N))
		return V;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSELECT_CC(SDNode *N) {		SDValue DAGCombiner::visitSELECT_CC(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
SDValue N3 = N->getOperand(3);		SDValue N3 = N->getOperand(3);
▲ Show 20 Lines • Show All 569 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::SETCC) {
// of the appropriate width.		// of the appropriate width.
SDValue ExtTrueVal = (SetCCWidth == 1) ? DAG.getAllOnesConstant(DL, VT)		SDValue ExtTrueVal = (SetCCWidth == 1) ? DAG.getAllOnesConstant(DL, VT)
: TLI.getConstTrueVal(DAG, VT, DL);		: TLI.getConstTrueVal(DAG, VT, DL);
SDValue Zero = DAG.getConstant(0, DL, VT);		SDValue Zero = DAG.getConstant(0, DL, VT);
if (SDValue SCC =		if (SDValue SCC =
SimplifySelectCC(DL, N00, N01, ExtTrueVal, Zero, CC, true))		SimplifySelectCC(DL, N00, N01, ExtTrueVal, Zero, CC, true))
return SCC;		return SCC;

if (!VT.isVector() && !TLI.convertSelectOfConstantsToMath()) {		if (!VT.isVector() && !TLI.convertSelectOfConstantsToMath(VT)) {
EVT SetCCVT = getSetCCResultType(N00VT);		EVT SetCCVT = getSetCCResultType(N00VT);
// Don't do this transform for i1 because there's a select transform		// Don't do this transform for i1 because there's a select transform
// that would reverse it.		// that would reverse it.
// TODO: We should not do this transform at all without a target hook		// TODO: We should not do this transform at all without a target hook
// because a sext is likely cheaper than a select?		// because a sext is likely cheaper than a select?
if (SetCCVT.getScalarSizeInBits() != 1 &&		if (SetCCVT.getScalarSizeInBits() != 1 &&
(!LegalOperations \|\| TLI.isOperationLegal(ISD::SETCC, N00VT))) {		(!LegalOperations \|\| TLI.isOperationLegal(ISD::SETCC, N00VT))) {
SDValue SetCC = DAG.getSetCC(DL, SetCCVT, N00, N01, CC);		SDValue SetCC = DAG.getSetCC(DL, SetCCVT, N00, N01, CC);
▲ Show 20 Lines • Show All 9,950 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 759 Lines • ▼ Show 20 Lines	public:

bool isFPExtFree(EVT VT) const override;		bool isFPExtFree(EVT VT) const override;

/// \brief Returns true if it is beneficial to convert a load of a constant		/// \brief Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

bool convertSelectOfConstantsToMath() const override {		bool convertSelectOfConstantsToMath(EVT VT) const override {
return true;		return true;
}		}

bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;		bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;

bool getTgtMemIntrinsic(IntrinsicInfo &Info,		bool getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &I,		const CallInst &I,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;
▲ Show 20 Lines • Show All 337 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,024 Lines • ▼ Show 20 Lines	bool isScalarFPTypeInSSEReg(EVT VT) const {
(VT == MVT::f32 && X86ScalarSSEf32); // f32 is when SSE1		(VT == MVT::f32 && X86ScalarSSEf32); // f32 is when SSE1
}		}

/// \brief Returns true if it is beneficial to convert a load of a constant		/// \brief Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

bool convertSelectOfConstantsToMath() const override {		bool convertSelectOfConstantsToMath(EVT VT) const override;
return true;
}

/// Return true if EXTRACT_SUBVECTOR is cheap for this result type		/// Return true if EXTRACT_SUBVECTOR is cheap for this result type
/// with this index.		/// with this index.
bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const override;		unsigned Index) const override;

/// Intel processors have a unified instruction and data cache		/// Intel processors have a unified instruction and data cache
const char * getClearCacheBuiltinName() const override {		const char * getClearCacheBuiltinName() const override {
▲ Show 20 Lines • Show All 460 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,568 Lines • ▼ Show 20 Lines	bool X86TargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned BitSize = Ty->getPrimitiveSizeInBits();		unsigned BitSize = Ty->getPrimitiveSizeInBits();
if (BitSize == 0 \|\| BitSize > 64)		if (BitSize == 0 \|\| BitSize > 64)
return false;		return false;
return true;		return true;
}		}

		bool X86TargetLowering::convertSelectOfConstantsToMath(EVT VT) const {
		// TODO: It might be a win to ease or lift this restriction, but the generic
		// folds in DAGCombiner conflict with vector folds for an AVX512 target.
		if (VT.isVector() && Subtarget.hasAVX512())
		return false;

		return true;
		}

bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,		bool X86TargetLowering::isExtractSubvectorCheap(EVT ResVT, EVT SrcVT,
unsigned Index) const {		unsigned Index) const {
if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))		if (!isOperationLegalOrCustom(ISD::EXTRACT_SUBVECTOR, ResVT))
return false;		return false;

// Mask vectors support all subregister combinations and operations that		// Mask vectors support all subregister combinations and operations that
// extract half of vector.		// extract half of vector.
if (ResVT.getVectorElementType() == MVT::i1)		if (ResVT.getVectorElementType() == MVT::i1)
▲ Show 20 Lines • Show All 32,267 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/PowerPC/vselect-constants.ll

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 3000, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 3000, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_Cplus1_or_C_vec(<4 x i1> %cond) {		define <4 x i32> @sel_Cplus1_or_C_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_Cplus1_or_C_vec:		; CHECK-LABEL: sel_Cplus1_or_C_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, 1
; CHECK-NEXT: vspltisw 4, 15
; CHECK-NEXT: addis 3, 2, .LCPI2_0@toc@ha		; CHECK-NEXT: addis 3, 2, .LCPI2_0@toc@ha
; CHECK-NEXT: addis 4, 2, .LCPI2_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI2_0@toc@l		; CHECK-NEXT: addi 3, 3, .LCPI2_0@toc@l
; CHECK-NEXT: addi 4, 4, .LCPI2_1@toc@l		; CHECK-NEXT: lvx 19, 0, 3
; CHECK-NEXT: lvx 18, 0, 3		; CHECK-NEXT: xxland 34, 34, 35
; CHECK-NEXT: lvx 19, 0, 4		; CHECK-NEXT: vadduwm 2, 2, 19
; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 51, 50, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_Cplus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_Cplus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_Cplus1_or_C_vec:		; CHECK-LABEL: cmp_sel_Cplus1_or_C_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: addis 3, 2, .LCPI3_0@toc@ha		; CHECK-NEXT: addis 3, 2, .LCPI3_0@toc@ha
; CHECK-NEXT: addis 4, 2, .LCPI3_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI3_0@toc@l		; CHECK-NEXT: addi 3, 3, .LCPI3_0@toc@l
; CHECK-NEXT: addi 4, 4, .LCPI3_1@toc@l
; CHECK-NEXT: lvx 19, 0, 3		; CHECK-NEXT: lvx 19, 0, 3
; CHECK-NEXT: lvx 4, 0, 4		; CHECK-NEXT: vsubuwm 2, 19, 2
; CHECK-NEXT: xxsel 34, 36, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_Cminus1_or_C_vec(<4 x i1> %cond) {		define <4 x i32> @sel_Cminus1_or_C_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_Cminus1_or_C_vec:		; CHECK-LABEL: sel_Cminus1_or_C_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, -16
; CHECK-NEXT: vspltisw 4, 15		; CHECK-NEXT: vspltisw 4, 15
; CHECK-NEXT: addis 3, 2, .LCPI4_0@toc@ha		; CHECK-NEXT: addis 3, 2, .LCPI4_0@toc@ha
; CHECK-NEXT: addis 4, 2, .LCPI4_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI4_0@toc@l		; CHECK-NEXT: addi 3, 3, .LCPI4_0@toc@l
; CHECK-NEXT: addi 4, 4, .LCPI4_1@toc@l		; CHECK-NEXT: lvx 19, 0, 3
; CHECK-NEXT: lvx 18, 0, 3
; CHECK-NEXT: lvx 19, 0, 4
; CHECK-NEXT: vsubuwm 3, 4, 3		; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3		; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3		; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 51, 50, 34		; CHECK-NEXT: vadduwm 2, 2, 19
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_Cminus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_Cminus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_Cminus1_or_C_vec:		; CHECK-LABEL: cmp_sel_Cminus1_or_C_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: addis 3, 2, .LCPI5_0@toc@ha		; CHECK-NEXT: addis 3, 2, .LCPI5_0@toc@ha
; CHECK-NEXT: addis 4, 2, .LCPI5_1@toc@ha
; CHECK-NEXT: addi 3, 3, .LCPI5_0@toc@l		; CHECK-NEXT: addi 3, 3, .LCPI5_0@toc@l
; CHECK-NEXT: addi 4, 4, .LCPI5_1@toc@l
; CHECK-NEXT: lvx 19, 0, 3		; CHECK-NEXT: lvx 19, 0, 3
; CHECK-NEXT: lvx 4, 0, 4		; CHECK-NEXT: vadduwm 2, 2, 19
; CHECK-NEXT: xxsel 34, 36, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_minus1_or_0_vec(<4 x i1> %cond) {		define <4 x i32> @sel_minus1_or_0_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_minus1_or_0_vec:		; CHECK-LABEL: sel_minus1_or_0_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, -16
; CHECK-NEXT: vspltisw 4, 15		; CHECK-NEXT: vspltisw 4, 15
; CHECK-NEXT: vspltisb 19, -1
; CHECK-NEXT: xxlxor 0, 0, 0
; CHECK-NEXT: vsubuwm 3, 4, 3		; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3		; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3		; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 0, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_minus1_or_0_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_minus1_or_0_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_minus1_or_0_vec:		; CHECK-LABEL: cmp_sel_minus1_or_0_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: vspltisb 19, -1
; CHECK-NEXT: xxlxor 0, 0, 0
; CHECK-NEXT: xxsel 34, 0, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_0_or_minus1_vec(<4 x i1> %cond) {		define <4 x i32> @sel_0_or_minus1_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_0_or_minus1_vec:		; CHECK-LABEL: sel_0_or_minus1_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, 1
; CHECK-NEXT: vspltisw 4, 15		; CHECK-NEXT: vspltisb 4, -1
; CHECK-NEXT: vspltisb 19, -1		; CHECK-NEXT: xxland 34, 34, 35
; CHECK-NEXT: xxlxor 0, 0, 0		; CHECK-NEXT: vadduwm 2, 2, 4
; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 51, 0, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_0_or_minus1_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_0_or_minus1_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_0_or_minus1_vec:		; CHECK-LABEL: cmp_sel_0_or_minus1_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: vspltisb 19, -1		; CHECK-NEXT: xxlnor 34, 34, 34
; CHECK-NEXT: xxlxor 0, 0, 0
; CHECK-NEXT: xxsel 34, 51, 0, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_1_or_0_vec(<4 x i1> %cond) {		define <4 x i32> @sel_1_or_0_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_1_or_0_vec:		; CHECK-LABEL: sel_1_or_0_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, 1
; CHECK-NEXT: vspltisw 4, 15		; CHECK-NEXT: xxland 34, 34, 35
; CHECK-NEXT: vspltisw 19, 1
; CHECK-NEXT: xxlxor 0, 0, 0
; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 0, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_1_or_0_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_1_or_0_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_1_or_0_vec:		; CHECK-LABEL: cmp_sel_1_or_0_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: vspltisw 19, 1		; CHECK-NEXT: vspltisw 19, 1
; CHECK-NEXT: xxlxor 0, 0, 0		; CHECK-NEXT: xxland 34, 34, 51
; CHECK-NEXT: xxsel 34, 0, 51, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_0_or_1_vec(<4 x i1> %cond) {		define <4 x i32> @sel_0_or_1_vec(<4 x i1> %cond) {
; CHECK-LABEL: sel_0_or_1_vec:		; CHECK-LABEL: sel_0_or_1_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vspltisw 3, -16		; CHECK-NEXT: vspltisw 3, 1
; CHECK-NEXT: vspltisw 4, 15		; CHECK-NEXT: xxlandc 34, 35, 34
; CHECK-NEXT: vspltisw 19, 1
; CHECK-NEXT: xxlxor 0, 0, 0
; CHECK-NEXT: vsubuwm 3, 4, 3
; CHECK-NEXT: vslw 2, 2, 3
; CHECK-NEXT: vsraw 2, 2, 3
; CHECK-NEXT: xxsel 34, 51, 0, 34
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_0_or_1_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_0_or_1_vec(<4 x i32> %x, <4 x i32> %y) {
; CHECK-LABEL: cmp_sel_0_or_1_vec:		; CHECK-LABEL: cmp_sel_0_or_1_vec:
; CHECK: # BB#0:		; CHECK: # BB#0:
; CHECK-NEXT: vcmpequw 2, 2, 3		; CHECK-NEXT: vcmpequw 2, 2, 3
; CHECK-NEXT: vspltisw 19, 1		; CHECK-NEXT: vspltisw 19, 1
; CHECK-NEXT: xxlxor 0, 0, 0		; CHECK-NEXT: xxlnor 0, 34, 34
; CHECK-NEXT: xxsel 34, 51, 0, 34		; CHECK-NEXT: xxland 34, 0, 51
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

llvm/trunk/test/CodeGen/X86/vselect-avx.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; AVX1-LABEL: PR22706:			; AVX1-LABEL: PR22706:
	; AVX1: ## BB#0:			; AVX1: ## BB#0:
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	; AVX1-NEXT: vpsllw $7, %xmm1, %xmm1			; AVX1-NEXT: vpsllw $7, %xmm1, %xmm1
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128]			; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128]
	; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1			; AVX1-NEXT: vpand %xmm2, %xmm1, %xmm1
	; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3			; AVX1-NEXT: vpxor %xmm3, %xmm3, %xmm3
	; AVX1-NEXT: vpcmpgtb %xmm1, %xmm3, %xmm1			; AVX1-NEXT: vpcmpgtb %xmm1, %xmm3, %xmm1
				; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]
				; AVX1-NEXT: vpaddb %xmm4, %xmm1, %xmm1
	; AVX1-NEXT: vpsllw $7, %xmm0, %xmm0			; AVX1-NEXT: vpsllw $7, %xmm0, %xmm0
	; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0			; AVX1-NEXT: vpand %xmm2, %xmm0, %xmm0
	; AVX1-NEXT: vpcmpgtb %xmm0, %xmm3, %xmm0			; AVX1-NEXT: vpcmpgtb %xmm0, %xmm3, %xmm0
				; AVX1-NEXT: vpaddb %xmm4, %xmm0, %xmm0
	; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; AVX1-NEXT: vandnps {{.*}}(%rip), %ymm0, %ymm1
	; AVX1-NEXT: vandps {{.*}}(%rip), %ymm0, %ymm0
	; AVX1-NEXT: vorps %ymm1, %ymm0, %ymm0
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: PR22706:			; AVX2-LABEL: PR22706:
	; AVX2: ## BB#0:			; AVX2: ## BB#0:
	; AVX2-NEXT: vpsllw $7, %ymm0, %ymm0			; AVX2-NEXT: vpsllw $7, %ymm0, %ymm0
	; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa {{.*#+}} ymm1 = [2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]			; AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpblendvb %ymm0, {{.*}}(%rip), %ymm1, %ymm0			; AVX2-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
				; AVX2-NEXT: vpaddb {{.*}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	%tmp = select <32 x i1> %x, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>			%tmp = select <32 x i1> %x, <32 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, <32 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2>
	ret <32 x i8> %tmp			ret <32 x i8> %tmp
	}			}

llvm/trunk/test/CodeGen/X86/vselect-constants.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=sse2 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX

; First, check the generic pattern for any 2 vector constants. Then, check special cases where		; First, check the generic pattern for any 2 vector constants. Then, check special cases where
; the constants are all off-by-one. Finally, check the extra special cases where the constants		; the constants are all off-by-one. Finally, check the extra special cases where the constants
; include 0 or -1.		; include 0 or -1.
; Each minimal select test is repeated with a more typical pattern that includes a compare to		; Each minimal select test is repeated with a more typical pattern that includes a compare to
; generate the condition value.		; generate the condition value.

		; TODO: If we don't have blendv, this can definitely be improved. There's also a selection of
		; chips where it makes sense to transform the general case blendv to 2 bit-ops. That should be
		; a uarch-specfic transform. At some point (Ryzen?), the implementation should catch up to the
		; architecture, so blendv is as fast as a single bit-op.

define <4 x i32> @sel_C1_or_C2_vec(<4 x i1> %cond) {		define <4 x i32> @sel_C1_or_C2_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_C1_or_C2_vec:		; SSE-LABEL: sel_C1_or_C2_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pslld $31, %xmm0		; SSE-NEXT: pslld $31, %xmm0
; SSE-NEXT: psrad $31, %xmm0		; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: movdqa %xmm0, %xmm1
; SSE-NEXT: pandn {{.*}}(%rip), %xmm1		; SSE-NEXT: pandn {{.*}}(%rip), %xmm1
; SSE-NEXT: pand {{.*}}(%rip), %xmm0		; SSE-NEXT: pand {{.*}}(%rip), %xmm0
Show All 29 Lines	; AVX-NEXT: retq
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 3000, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 3000, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_Cplus1_or_C_vec(<4 x i1> %cond) {		define <4 x i32> @sel_Cplus1_or_C_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_Cplus1_or_C_vec:		; SSE-LABEL: sel_Cplus1_or_C_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pslld $31, %xmm0
; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: movdqa %xmm0, %xmm1
; SSE-NEXT: pandn {{.*}}(%rip), %xmm1
; SSE-NEXT: pand {{.*}}(%rip), %xmm0		; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: por %xmm1, %xmm0		; SSE-NEXT: paddd {{.*}}(%rip), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sel_Cplus1_or_C_vec:		; AVX-LABEL: sel_Cplus1_or_C_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpslld $31, %xmm0, %xmm0		; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [42,0,4294967294,4294967295]		; AVX-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vblendvps %xmm0, {{.*}}(%rip), %xmm1, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_Cplus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_Cplus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {
; SSE-LABEL: cmp_sel_Cplus1_or_C_vec:		; SSE-LABEL: cmp_sel_Cplus1_or_C_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pcmpeqd %xmm1, %xmm0		; SSE-NEXT: pcmpeqd %xmm1, %xmm0
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: movdqa {{.*#+}} xmm1 = [42,0,4294967294,4294967295]
; SSE-NEXT: pandn {{.*}}(%rip), %xmm1		; SSE-NEXT: psubd %xmm0, %xmm1
; SSE-NEXT: pand {{.*}}(%rip), %xmm0		; SSE-NEXT: movdqa %xmm1, %xmm0
; SSE-NEXT: por %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: cmp_sel_Cplus1_or_C_vec:		; AVX-LABEL: cmp_sel_Cplus1_or_C_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [42,0,4294967294,4294967295]		; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [42,0,4294967294,4294967295]
; AVX-NEXT: vblendvps %xmm0, {{.*}}(%rip), %xmm1, %xmm0		; AVX-NEXT: vpsubd %xmm0, %xmm1, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 42, i32 0, i32 -2, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_Cminus1_or_C_vec(<4 x i1> %cond) {		define <4 x i32> @sel_Cminus1_or_C_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_Cminus1_or_C_vec:		; SSE-LABEL: sel_Cminus1_or_C_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pslld $31, %xmm0		; SSE-NEXT: pslld $31, %xmm0
; SSE-NEXT: psrad $31, %xmm0		; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: paddd {{.*}}(%rip), %xmm0
; SSE-NEXT: pandn {{.*}}(%rip), %xmm1
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: por %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sel_Cminus1_or_C_vec:		; AVX-LABEL: sel_Cminus1_or_C_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpslld $31, %xmm0, %xmm0		; AVX-NEXT: vpslld $31, %xmm0, %xmm0
; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [44,2,0,1]		; AVX-NEXT: vpsrad $31, %xmm0, %xmm0
; AVX-NEXT: vblendvps %xmm0, {{.*}}(%rip), %xmm1, %xmm0		; AVX-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_Cminus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_Cminus1_or_C_vec(<4 x i32> %x, <4 x i32> %y) {
; SSE-LABEL: cmp_sel_Cminus1_or_C_vec:		; SSE-LABEL: cmp_sel_Cminus1_or_C_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pcmpeqd %xmm1, %xmm0		; SSE-NEXT: pcmpeqd %xmm1, %xmm0
; SSE-NEXT: movdqa %xmm0, %xmm1		; SSE-NEXT: paddd {{.*}}(%rip), %xmm0
; SSE-NEXT: pandn {{.*}}(%rip), %xmm1
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: por %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: cmp_sel_Cminus1_or_C_vec:		; AVX-LABEL: cmp_sel_Cminus1_or_C_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
; AVX-NEXT: vmovaps {{.*#+}} xmm1 = [44,2,0,1]		; AVX-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vblendvps %xmm0, {{.*}}(%rip), %xmm1, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 43, i32 1, i32 -1, i32 0>, <4 x i32> <i32 44, i32 2, i32 0, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_minus1_or_0_vec(<4 x i1> %cond) {		define <4 x i32> @sel_minus1_or_0_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_minus1_or_0_vec:		; SSE-LABEL: sel_minus1_or_0_vec:
Show All 24 Lines	; AVX-NEXT: retq
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_0_or_minus1_vec(<4 x i1> %cond) {		define <4 x i32> @sel_0_or_minus1_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_0_or_minus1_vec:		; SSE-LABEL: sel_0_or_minus1_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pslld $31, %xmm0		; SSE-NEXT: pand {{.*}}(%rip), %xmm0
; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: pcmpeqd %xmm1, %xmm1		; SSE-NEXT: pcmpeqd %xmm1, %xmm1
; SSE-NEXT: pxor %xmm1, %xmm0		; SSE-NEXT: paddd %xmm1, %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sel_0_or_minus1_vec:		; AVX-LABEL: sel_0_or_minus1_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpslld $31, %xmm0, %xmm0		; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1		; AVX-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
; AVX-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2		; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_0_or_minus1_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_0_or_minus1_vec(<4 x i32> %x, <4 x i32> %y) {
; SSE-LABEL: cmp_sel_0_or_minus1_vec:		; SSE-LABEL: cmp_sel_0_or_minus1_vec:
; SSE: # BB#0:		; SSE: # BB#0:
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%cond = icmp eq <4 x i32> %x, %y		%cond = icmp eq <4 x i32> %x, %y
%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>		%add = select <4 x i1> %cond, <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 0, i32 0, i32 0, i32 0>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @sel_0_or_1_vec(<4 x i1> %cond) {		define <4 x i32> @sel_0_or_1_vec(<4 x i1> %cond) {
; SSE-LABEL: sel_0_or_1_vec:		; SSE-LABEL: sel_0_or_1_vec:
; SSE: # BB#0:		; SSE: # BB#0:
; SSE-NEXT: pslld $31, %xmm0		; SSE-NEXT: andnps {{.*}}(%rip), %xmm0
; SSE-NEXT: psrad $31, %xmm0
; SSE-NEXT: pandn {{.*}}(%rip), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sel_0_or_1_vec:		; AVX-LABEL: sel_0_or_1_vec:
; AVX: # BB#0:		; AVX: # BB#0:
; AVX-NEXT: vpslld $31, %xmm0, %xmm0		; AVX-NEXT: vandnps {{.*}}(%rip), %xmm0, %xmm0
; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
; AVX-NEXT: vmovaps {{.*#+}} xmm2 = [1,1,1,1]
; AVX-NEXT: vblendvps %xmm0, %xmm1, %xmm2, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%add = select <4 x i1> %cond, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
ret <4 x i32> %add		ret <4 x i32> %add
}		}

define <4 x i32> @cmp_sel_0_or_1_vec(<4 x i32> %x, <4 x i32> %y) {		define <4 x i32> @cmp_sel_0_or_1_vec(<4 x i32> %x, <4 x i32> %y) {
; SSE-LABEL: cmp_sel_0_or_1_vec:		; SSE-LABEL: cmp_sel_0_or_1_vec:
; SSE: # BB#0:		; SSE: # BB#0:
Show All 14 Lines

llvm/trunk/test/CodeGen/X86/widen_compare-1.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse4.2 \| FileCheck %s --check-prefix=X64

	; compare v2i16			; compare v2i16

	define <2 x i16> @compare_v2i64_to_v2i16(<2 x i16>* %src) nounwind {			define <2 x i16> @compare_v2i64_to_v2i16(<2 x i16>* %src) nounwind {
	; X86-LABEL: compare_v2i64_to_v2i16:			; X86-LABEL: compare_v2i64_to_v2i16:
	; X86: # BB#0:			; X86: # BB#0:
	; X86-NEXT: movaps {{.*#+}} xmm0 = [65535,0,65535,0]			; X86-NEXT: pcmpeqd %xmm0, %xmm0
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: compare_v2i64_to_v2i16:			; X64-LABEL: compare_v2i64_to_v2i16:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: movaps {{.*#+}} xmm0 = [65535,65535]			; X64-NEXT: pcmpeqd %xmm0, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%val = load <2 x i16>, <2 x i16>* %src, align 4			%val = load <2 x i16>, <2 x i16>* %src, align 4
	%cmp = icmp uge <2 x i16> %val, %val			%cmp = icmp uge <2 x i16> %val, %val
	%sel = select <2 x i1> %cmp, <2 x i16> <i16 -1, i16 -1>, <2 x i16> zeroinitializer			%sel = select <2 x i1> %cmp, <2 x i16> <i16 -1, i16 -1>, <2 x i16> zeroinitializer
	ret <2 x i16> %sel			ret <2 x i16> %sel
	}			}