This is an archive of the discontinued LLVM Phabricator instance.

[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl
ClosedPublic

Authored by RKSimon on Nov 22 2021, 3:07 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
nemanjai
t.p.northover
dmgreen
efriedma
uweigand
craig.topper

Commits

rG63b1e58f0738: [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
rG3cf4a2c6203b: [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl

Summary

If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM rev16 pattern, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

RKSimon created this revision.Nov 22 2021, 3:07 AM

Herald added subscribers: steven.zhang, pengfei, hiraditya, kristof.beyls. · View Herald TranscriptNov 22 2021, 3:07 AM

RKSimon requested review of this revision.Nov 22 2021, 3:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 3:07 AM

Harbormaster completed remote builds in B135374: Diff 388844.Nov 22 2021, 3:45 AM

I don't understand the changes to the SystemZ test cases. The new behavior seems to be simply wrong?

Note that the tests you change are specifically constructed to require bits from both arms of the rotate pattern, so they cannot be optimized to a simple shift - if this optimization does that in these cases, something seems to be wrong here.

lebedev.ri requested changes to this revision.Nov 22 2021, 9:51 AM

lebedev.ri added inline comments.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1749–1755	Are you sure this is right? https://alive2.llvm.org/ce/z/EJDzlC

This revision now requires changes to proceed.Nov 22 2021, 9:51 AM

run out of time, will look at this later

lebedev.ri added inline comments.Nov 22 2021, 2:44 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1749–1755	Perhaps this is what you had in mind: https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

craig.topper added inline comments.Nov 22 2021, 4:41 PM

llvm/lib/Target/ARM/ARMISelLowering.cpp
17052	Can this be done in isel pattern? There's an existing top16Zero PatFrag that calls MaskedValueIsZero in ARMInstrThumb2.td

dmgreen added inline comments.Nov 23 2021, 1:45 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
17052	That does sound good if it will work. It would be a good way to keep the same pattern working, and I would say a tablegen pattern is preferable to a new node type. I noticed that llvm.bswap.i16 would no longer generate a rev16, which would be a shame to see. The same thing didn't seem to happen on AArch64 though, it was still fine. I'm not entirely sure what the difference was.

Fixed dumb typo in original variant (sorry, I was focussing too much on the SSE codegen and nothing else)

Added PatFrag to handled ARM rev16 instruction pattern.

RKSimon added inline comments.Nov 23 2021, 2:12 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
1749–1755	Thanks @lebedev.ri !
llvm/lib/Target/ARM/ARMISelLowering.cpp
17052	Sorry - I missed these comments before doing my own version, which handles the bswap_upperzero specific case.

LGTM, thank you.

This revision is now accepted and ready to land.Nov 23 2021, 3:22 AM

Harbormaster completed remote builds in B135578: Diff 389127.Nov 23 2021, 5:10 AM

dmgreen added inline comments.Nov 23 2021, 6:18 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
17052	Thanks. This seems OK. although reusing top16Zero might be a little cleaner. We probably need patterns for tREV16 and t2REV16 too (and some tests to go with them!) I'll update the existing tests with some more triples. I think the patterns should be the same.

Move existing top16Zero pat leaf to ARMInstrInfo.td and reuse for ARM6/Thumb/Thumb2 rev16 patterns

Thanks! LGTM

Harbormaster completed remote builds in B135723: Diff 389322.Nov 23 2021, 4:38 PM

This revision was landed with ongoing or failed builds.Nov 24 2021, 3:28 AM

Closed by commit rG3cf4a2c6203b: [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (authored by RKSimon). · Explain Why

This revision was automatically updated to reflect the committed changes.

RKSimon added a commit: rG3cf4a2c6203b: [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl.

bkramer added a reverting change: rGd32787230d52: Revert "[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl".Nov 24 2021, 5:45 AM

RKSimon mentioned this in rGf93520349695: [AArch64] Add regression test for D114354.Nov 24 2021, 8:48 AM

RKSimon added a commit: rG63b1e58f0738: [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED).Nov 25 2021, 3:14 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

29 lines

Target/

ARM/

12 lines

8 lines

2 lines

10 lines

test/

CodeGen/

X86/

18 lines

48 lines

64 lines

2 lines

Diff 389451

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,719 Lines • ▼ Show 20 Lines	if (isPowerOf2_32(BitWidth)) {
return true;		return true;
}		}
break;		break;
}		}
case ISD::ROTL:		case ISD::ROTL:
case ISD::ROTR: {		case ISD::ROTR: {
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
		bool IsROTL = (Op.getOpcode() == ISD::ROTL);

// If we're rotating an 0/-1 value, then it stays an 0/-1 value.		// If we're rotating an 0/-1 value, then it stays an 0/-1 value.
if (BitWidth == TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1))		if (BitWidth == TLO.DAG.ComputeNumSignBits(Op0, DemandedElts, Depth + 1))
return TLO.CombineTo(Op, Op0);		return TLO.CombineTo(Op, Op0);

		if (ConstantSDNode *SA = isConstOrConstSplat(Op1, DemandedElts)) {
		unsigned Amt = SA->getAPIntValue().urem(BitWidth);
		unsigned RevAmt = BitWidth - Amt;

		// rotl: (Op0 << Amt) \| (Op0 >> (BW - Amt))
		// rotr: (Op0 << (BW - Amt)) \| (Op0 >> Amt)
		APInt Demanded0 = DemandedBits.rotr(IsROTL ? Amt : RevAmt);
		if (SimplifyDemandedBits(Op0, Demanded0, DemandedElts, Known2, TLO,
		Depth + 1))
		return true;

		// rot*(x, 0) --> x
		if (Amt == 0)
		return TLO.CombineTo(Op, Op0);

		// See if we don't demand either half of the rotated bits.
		if ((!TLO.LegalOperations() \|\| isOperationLegal(ISD::SHL, VT)) &&
		DemandedBits.countTrailingZeros() >= (IsROTL ? Amt : RevAmt)) {
		Op1 = TLO.DAG.getConstant(IsROTL ? Amt : RevAmt, dl, Op1.getValueType());
		return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SHL, dl, VT, Op0, Op1));
		}
		if ((!TLO.LegalOperations() \|\| isOperationLegal(ISD::SRL, VT)) &&
		lebedev.riUnsubmitted Not Done Reply Inline Actions Are you sure this is right? https://alive2.llvm.org/ce/z/EJDzlC lebedev.ri: Are you sure this is right? https://alive2.llvm.org/ce/z/EJDzlC
		lebedev.riUnsubmitted Not Done Reply Inline Actions Perhaps this is what you had in mind: https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) lebedev.ri: Perhaps this is what you had in mind: * https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt…
		RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Thanks @lebedev.ri ! RKSimon: Thanks @lebedev.ri !
		DemandedBits.countLeadingZeros() >= (IsROTL ? RevAmt : Amt)) {
		Op1 = TLO.DAG.getConstant(IsROTL ? RevAmt : Amt, dl, Op1.getValueType());
		return TLO.CombineTo(Op, TLO.DAG.getNode(ISD::SRL, dl, VT, Op0, Op1));
		}
		}

// For pow-2 bitwidths we only demand the bottom modulo amt bits.		// For pow-2 bitwidths we only demand the bottom modulo amt bits.
if (isPowerOf2_32(BitWidth)) {		if (isPowerOf2_32(BitWidth)) {
APInt DemandedAmtBits(Op1.getScalarValueSizeInBits(), BitWidth - 1);		APInt DemandedAmtBits(Op1.getScalarValueSizeInBits(), BitWidth - 1);
if (SimplifyDemandedBits(Op1, DemandedAmtBits, DemandedElts, Known2, TLO,		if (SimplifyDemandedBits(Op1, DemandedAmtBits, DemandedElts, Known2, TLO,
Depth + 1))		Depth + 1))
return true;		return true;
}		}
break;		break;
▲ Show 20 Lines • Show All 7,241 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,043 Lines • ▼ Show 20 Lines	SDValue ARMTargetLowering::PerformIntrinsicCombine(SDNode *N,

case Intrinsic::arm_mve_addv: {		case Intrinsic::arm_mve_addv: {
// Turn this intrinsic straight into the appropriate ARMISD::VADDV node,		// Turn this intrinsic straight into the appropriate ARMISD::VADDV node,
// which allow PerformADDVecReduce to turn it into VADDLV when possible.		// which allow PerformADDVecReduce to turn it into VADDLV when possible.
bool Unsigned = cast<ConstantSDNode>(N->getOperand(2))->getZExtValue();		bool Unsigned = cast<ConstantSDNode>(N->getOperand(2))->getZExtValue();
unsigned Opc = Unsigned ? ARMISD::VADDVu : ARMISD::VADDVs;		unsigned Opc = Unsigned ? ARMISD::VADDVu : ARMISD::VADDVs;
return DAG.getNode(Opc, SDLoc(N), N->getVTList(), N->getOperand(1));		return DAG.getNode(Opc, SDLoc(N), N->getVTList(), N->getOperand(1));
}		}

craig.topperUnsubmitted Not Done Reply Inline Actions Can this be done in isel pattern? There's an existing top16Zero PatFrag that calls MaskedValueIsZero in ARMInstrThumb2.td craig.topper: Can this be done in isel pattern? There's an existing top16Zero PatFrag that calls…
dmgreenUnsubmitted Not Done Reply Inline Actions That does sound good if it will work. It would be a good way to keep the same pattern working, and I would say a tablegen pattern is preferable to a new node type. I noticed that llvm.bswap.i16 would no longer generate a rev16, which would be a shame to see. The same thing didn't seem to happen on AArch64 though, it was still fine. I'm not entirely sure what the difference was. dmgreen: That does sound good if it will work. It would be a good way to keep the same pattern working…
RKSimonAuthorUnsubmitted Not Done Reply Inline Actions Sorry - I missed these comments before doing my own version, which handles the bswap_upperzero specific case. RKSimon: Sorry - I missed these comments before doing my own version, which handles the bswap_upperzero…
dmgreenUnsubmitted Not Done Reply Inline Actions Thanks. This seems OK. although reusing top16Zero might be a little cleaner. We probably need patterns for tREV16 and t2REV16 too (and some tests to go with them!) I'll update the existing tests with some more triples. I think the patterns should be the same. dmgreen: Thanks. This seems OK. although reusing top16Zero might be a little cleaner. We probably need…
case Intrinsic::arm_mve_addlv:		case Intrinsic::arm_mve_addlv:
case Intrinsic::arm_mve_addlv_predicated: {		case Intrinsic::arm_mve_addlv_predicated: {
// Same for these, but ARMISD::VADDLV has to be followed by a BUILD_PAIR		// Same for these, but ARMISD::VADDLV has to be followed by a BUILD_PAIR
// which recombines the two outputs into an i64		// which recombines the two outputs into an i64
bool Unsigned = cast<ConstantSDNode>(N->getOperand(2))->getZExtValue();		bool Unsigned = cast<ConstantSDNode>(N->getOperand(2))->getZExtValue();
unsigned Opc = IntNo == Intrinsic::arm_mve_addlv ?		unsigned Opc = IntNo == Intrinsic::arm_mve_addlv ?
(Unsigned ? ARMISD::VADDLVu : ARMISD::VADDLVs) :		(Unsigned ? ARMISD::VADDLVu : ARMISD::VADDLVs) :
(Unsigned ? ARMISD::VADDLVpu : ARMISD::VADDLVps);		(Unsigned ? ARMISD::VADDLVpu : ARMISD::VADDLVps);
Show All 18 Lines
/// combining instead of DAG legalizing because the build_vectors for 64-bit		/// combining instead of DAG legalizing because the build_vectors for 64-bit
/// vector element shift counts are generally not legal, and it is hard to see		/// vector element shift counts are generally not legal, and it is hard to see
/// their values after they get legalized to loads from a constant pool.		/// their values after they get legalized to loads from a constant pool.
static SDValue PerformShiftCombine(SDNode *N,		static SDValue PerformShiftCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const ARMSubtarget *ST) {		const ARMSubtarget *ST) {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (N->getOpcode() == ISD::SRL && VT == MVT::i32 && ST->hasV6Ops()) {
// Canonicalize (srl (bswap x), 16) to (rotr (bswap x), 16) if the high
// 16-bits of x is zero. This optimizes rev + lsr 16 to rev16.
SDValue N1 = N->getOperand(1);
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(N1)) {
SDValue N0 = N->getOperand(0);
if (C->getZExtValue() == 16 && N0.getOpcode() == ISD::BSWAP &&
DAG.MaskedValueIsZero(N0.getOperand(0),
APInt::getHighBitsSet(32, 16)))
return DAG.getNode(ISD::ROTR, SDLoc(N), VT, N0, N1);
}
}

if (ST->isThumb1Only() && N->getOpcode() == ISD::SHL && VT == MVT::i32 &&		if (ST->isThumb1Only() && N->getOpcode() == ISD::SHL && VT == MVT::i32 &&
N->getOperand(0)->getOpcode() == ISD::AND &&		N->getOperand(0)->getOpcode() == ISD::AND &&
N->getOperand(0)->hasOneUse()) {		N->getOperand(0)->hasOneUse()) {
if (DCI.isBeforeLegalize() \|\| DCI.isCalledByLegalizer())		if (DCI.isBeforeLegalize() \|\| DCI.isCalledByLegalizer())
return SDValue();		return SDValue();
// Look for the pattern (shl (and x, AndMask), ShiftAmt). This doesn't		// Look for the pattern (shl (and x, AndMask), ShiftAmt). This doesn't
// usually show up because instcombine prefers to canonicalize it to		// usually show up because instcombine prefers to canonicalize it to
▲ Show 20 Lines • Show All 4,278 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrInfo.td

Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines	return CurDAG->getTargetConstant((uint32_t)N->getZExtValue() >> 16, SDLoc(N),
MVT::i32);		MVT::i32);
}]>;		}]>;

def lo16AllZero : PatLeaf<(i32 imm), [{		def lo16AllZero : PatLeaf<(i32 imm), [{
// Returns true if all low 16-bits are 0.		// Returns true if all low 16-bits are 0.
return (((uint32_t)N->getZExtValue()) & 0xFFFFUL) == 0;		return (((uint32_t)N->getZExtValue()) & 0xFFFFUL) == 0;
}], hi16>;		}], hi16>;

		// top16Zero - answer true if the upper 16 bits of $src are 0, false otherwise
		def top16Zero: PatLeaf<(i32 GPR:$src), [{
		return !SDValue(N,0)->getValueType(0).isVector() &&
		CurDAG->MaskedValueIsZero(SDValue(N,0), APInt::getHighBitsSet(32, 16));
		}]>;

class BinOpFrag<dag res> : PatFrag<(ops node:$LHS, node:$RHS), res>;		class BinOpFrag<dag res> : PatFrag<(ops node:$LHS, node:$RHS), res>;
class UnOpFrag <dag res> : PatFrag<(ops node:$Src), res>;		class UnOpFrag <dag res> : PatFrag<(ops node:$Src), res>;

// An 'and' node with a single use.		// An 'and' node with a single use.
def and_su : PatFrag<(ops node:$lhs, node:$rhs), (and node:$lhs, node:$rhs), [{		def and_su : PatFrag<(ops node:$lhs, node:$rhs), (and node:$lhs, node:$rhs), [{
return N->hasOneUse();		return N->hasOneUse();
}]>;		}]>;

▲ Show 20 Lines • Show All 4,312 Lines • ▼ Show 20 Lines	def REV16 : AMiscA1I<0b01101011, 0b1011, (outs GPR:$Rd), (ins GPR:$Rm),
[(set GPR:$Rd, (rotr (bswap GPR:$Rm), (i32 16)))]>,		[(set GPR:$Rd, (rotr (bswap GPR:$Rm), (i32 16)))]>,
Requires<[IsARM, HasV6]>,		Requires<[IsARM, HasV6]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

def : ARMV6Pat<(srl (bswap (extloadi16 addrmode3:$addr)), (i32 16)),		def : ARMV6Pat<(srl (bswap (extloadi16 addrmode3:$addr)), (i32 16)),
(REV16 (LDRH addrmode3:$addr))>;		(REV16 (LDRH addrmode3:$addr))>;
def : ARMV6Pat<(truncstorei16 (srl (bswap GPR:$Rn), (i32 16)), addrmode3:$addr),		def : ARMV6Pat<(truncstorei16 (srl (bswap GPR:$Rn), (i32 16)), addrmode3:$addr),
(STRH (REV16 GPR:$Rn), addrmode3:$addr)>;		(STRH (REV16 GPR:$Rn), addrmode3:$addr)>;
		def : ARMV6Pat<(srl (bswap top16Zero:$Rn), (i32 16)),
		(REV16 GPR:$Rn)>;

let AddedComplexity = 5 in		let AddedComplexity = 5 in
def REVSH : AMiscA1I<0b01101111, 0b1011, (outs GPR:$Rd), (ins GPR:$Rm),		def REVSH : AMiscA1I<0b01101111, 0b1011, (outs GPR:$Rd), (ins GPR:$Rm),
IIC_iUNAr, "revsh", "\t$Rd, $Rm",		IIC_iUNAr, "revsh", "\t$Rd, $Rm",
[(set GPR:$Rd, (sra (bswap GPR:$Rm), (i32 16)))]>,		[(set GPR:$Rd, (sra (bswap GPR:$Rm), (i32 16)))]>,
Requires<[IsARM, HasV6]>,		Requires<[IsARM, HasV6]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

▲ Show 20 Lines • Show All 1,701 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrThumb.td

	Show First 20 Lines • Show All 1,570 Lines • ▼ Show 20 Lines
	def : T1Pat<(ARMcmpZ tGPR:$Rn, tGPR:$Rm),			def : T1Pat<(ARMcmpZ tGPR:$Rn, tGPR:$Rm),
	(tCMPr tGPR:$Rn, tGPR:$Rm)>;			(tCMPr tGPR:$Rn, tGPR:$Rm)>;

	// Bswap 16 with load/store			// Bswap 16 with load/store
	def : T1Pat<(srl (bswap (extloadi16 t_addrmode_is2:$addr)), (i32 16)),			def : T1Pat<(srl (bswap (extloadi16 t_addrmode_is2:$addr)), (i32 16)),
	(tREV16 (tLDRHi t_addrmode_is2:$addr))>;			(tREV16 (tLDRHi t_addrmode_is2:$addr))>;
	def : T1Pat<(srl (bswap (extloadi16 t_addrmode_rr:$addr)), (i32 16)),			def : T1Pat<(srl (bswap (extloadi16 t_addrmode_rr:$addr)), (i32 16)),
	(tREV16 (tLDRHr t_addrmode_rr:$addr))>;			(tREV16 (tLDRHr t_addrmode_rr:$addr))>;
				def : T1Pat<(srl (bswap top16Zero:$Rn), (i32 16)),
				(tREV16 tGPR:$Rn)>;
	def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),			def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),
	t_addrmode_is2:$addr),			t_addrmode_is2:$addr),
	(tSTRHi(tREV16 tGPR:$Rn), t_addrmode_is2:$addr)>;			(tSTRHi(tREV16 tGPR:$Rn), t_addrmode_is2:$addr)>;
	def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),			def : T1Pat<(truncstorei16 (srl (bswap tGPR:$Rn), (i32 16)),
	t_addrmode_rr:$addr),			t_addrmode_rr:$addr),
	(tSTRHr (tREV16 tGPR:$Rn), t_addrmode_rr:$addr)>;			(tSTRHr (tREV16 tGPR:$Rn), t_addrmode_rr:$addr)>;

	// ConstantPool			// ConstantPool
	▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMInstrThumb2.td

Show First 20 Lines • Show All 2,920 Lines • ▼ Show 20 Lines
defm t2MVN : T2I_un_irs <0b0011, "mvn",		defm t2MVN : T2I_un_irs <0b0011, "mvn",
IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,		IIC_iMVNi, IIC_iMVNr, IIC_iMVNsi,
not, 1, 1, 1>;		not, 1, 1, 1>;

let AddedComplexity = 1 in		let AddedComplexity = 1 in
def : T2Pat<(and rGPR:$src, t2_so_imm_not:$imm),		def : T2Pat<(and rGPR:$src, t2_so_imm_not:$imm),
(t2BICri rGPR:$src, t2_so_imm_not:$imm)>;		(t2BICri rGPR:$src, t2_so_imm_not:$imm)>;

// top16Zero - answer true if the upper 16 bits of $src are 0, false otherwise
def top16Zero: PatLeaf<(i32 rGPR:$src), [{
return !SDValue(N,0)->getValueType(0).isVector() &&
CurDAG->MaskedValueIsZero(SDValue(N,0), APInt::getHighBitsSet(32, 16));
}]>;

// so_imm_notSext is needed instead of so_imm_not, as the value of imm		// so_imm_notSext is needed instead of so_imm_not, as the value of imm
// will match the extended, not the original bitWidth for $src.		// will match the extended, not the original bitWidth for $src.
def : T2Pat<(and top16Zero:$src, t2_so_imm_notSext:$imm),		def : T2Pat<(and top16Zero:$src, t2_so_imm_notSext:$imm),
(t2BICri rGPR:$src, t2_so_imm_notSext:$imm)>;		(t2BICri rGPR:$src, t2_so_imm_notSext:$imm)>;


// FIXME: Disable this pattern on Darwin to workaround an assembler bug.		// FIXME: Disable this pattern on Darwin to workaround an assembler bug.
def : T2Pat<(or rGPR:$src, t2_so_imm_not:$imm),		def : T2Pat<(or rGPR:$src, t2_so_imm_not:$imm),
(t2ORNri rGPR:$src, t2_so_imm_not:$imm)>,		(t2ORNri rGPR:$src, t2_so_imm_not:$imm)>,
Requires<[IsThumb2]>;		Requires<[IsThumb2]>;

def : T2Pat<(t2_so_imm_not:$src),		def : T2Pat<(t2_so_imm_not:$src),
(t2MVNi t2_so_imm_not:$src)>;		(t2MVNi t2_so_imm_not:$src)>;

▲ Show 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	def t2REV : T2I_misc<0b01, 0b00, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
"rev", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (bswap rGPR:$Rm))]>,		"rev", ".w\t$Rd, $Rm", [(set rGPR:$Rd, (bswap rGPR:$Rm))]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

def t2REV16 : T2I_misc<0b01, 0b01, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,		def t2REV16 : T2I_misc<0b01, 0b01, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
"rev16", ".w\t$Rd, $Rm",		"rev16", ".w\t$Rd, $Rm",
[(set rGPR:$Rd, (rotr (bswap rGPR:$Rm), (i32 16)))]>,		[(set rGPR:$Rd, (rotr (bswap rGPR:$Rm), (i32 16)))]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

		def : T2Pat<(srl (bswap top16Zero:$Rn), (i32 16)),
		(t2REV16 rGPR:$Rn)>;

def t2REVSH : T2I_misc<0b01, 0b11, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,		def t2REVSH : T2I_misc<0b01, 0b11, (outs rGPR:$Rd), (ins rGPR:$Rm), IIC_iUNAr,
"revsh", ".w\t$Rd, $Rm",		"revsh", ".w\t$Rd, $Rm",
[(set rGPR:$Rd, (sra (bswap rGPR:$Rm), (i32 16)))]>,		[(set rGPR:$Rd, (sra (bswap rGPR:$Rm), (i32 16)))]>,
Sched<[WriteALU]>;		Sched<[WriteALU]>;

def : T2Pat<(or (sra (shl rGPR:$Rm, (i32 24)), (i32 16)),		def : T2Pat<(or (sra (shl rGPR:$Rm, (i32 24)), (i32 16)),
(and (srl rGPR:$Rm, (i32 8)), 0xFF)),		(and (srl rGPR:$Rm, (i32 8)), 0xFF)),
(t2REVSH rGPR:$Rm)>;		(t2REVSH rGPR:$Rm)>;
▲ Show 20 Lines • Show All 2,345 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/rotate_vec.ll

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> %y)		%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> %y)
ret <4 x i32> %2		ret <4 x i32> %2
}		}

define <4 x i32> @rot_v4i32_mask_ashr0(<4 x i32> %a0) {		define <4 x i32> @rot_v4i32_mask_ashr0(<4 x i32> %a0) {
; XOPAVX1-LABEL: rot_v4i32_mask_ashr0:		; XOPAVX1-LABEL: rot_v4i32_mask_ashr0:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vpshad {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX1-NEXT: vpshad {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX1-NEXT: vprotd $1, %xmm0, %xmm0		; XOPAVX1-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; XOPAVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: rot_v4i32_mask_ashr0:		; XOPAVX2-LABEL: rot_v4i32_mask_ashr0:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
; XOPAVX2-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX2-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX2-NEXT: vprotd $1, %xmm0, %xmm0		; XOPAVX2-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX2-NEXT: retq		; XOPAVX2-NEXT: retq
;		;
; AVX512-LABEL: rot_v4i32_mask_ashr0:		; AVX512-LABEL: rot_v4i32_mask_ashr0:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX512-NEXT: vprold $1, %xmm0, %xmm0		; AVX512-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>		%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>
%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)		%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
%3 = ashr <4 x i32> %2, <i32 1, i32 2, i32 3, i32 4>		%3 = ashr <4 x i32> %2, <i32 1, i32 2, i32 3, i32 4>
%4 = and <4 x i32> %3, <i32 -32768, i32 -65536, i32 -32768, i32 -65536>		%4 = and <4 x i32> %3, <i32 -32768, i32 -65536, i32 -32768, i32 -65536>
ret <4 x i32> %4		ret <4 x i32> %4
}		}

define <4 x i32> @rot_v4i32_mask_ashr1(<4 x i32> %a0) {		define <4 x i32> @rot_v4i32_mask_ashr1(<4 x i32> %a0) {
; XOPAVX1-LABEL: rot_v4i32_mask_ashr1:		; XOPAVX1-LABEL: rot_v4i32_mask_ashr1:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vpsrad $25, %xmm0, %xmm0		; XOPAVX1-NEXT: vpshad {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX1-NEXT: vprotd $1, %xmm0, %xmm0		; XOPAVX1-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]		; XOPAVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
; XOPAVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX1-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: rot_v4i32_mask_ashr1:		; XOPAVX2-LABEL: rot_v4i32_mask_ashr1:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
; XOPAVX2-NEXT: vpsrad $25, %xmm0, %xmm0		; XOPAVX2-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX2-NEXT: vprotd $1, %xmm0, %xmm0		; XOPAVX2-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; XOPAVX2-NEXT: vpbroadcastd %xmm0, %xmm0		; XOPAVX2-NEXT: vpbroadcastd %xmm0, %xmm0
; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; XOPAVX2-NEXT: retq		; XOPAVX2-NEXT: retq
;		;
; AVX512-LABEL: rot_v4i32_mask_ashr1:		; AVX512-LABEL: rot_v4i32_mask_ashr1:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpsrad $25, %xmm0, %xmm0		; AVX512-NEXT: vpsravd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX512-NEXT: vprold $1, %xmm0, %xmm0		; AVX512-NEXT: vpaddd %xmm0, %xmm0, %xmm0
; AVX512-NEXT: vpbroadcastd %xmm0, %xmm0		; AVX512-NEXT: vpbroadcastd %xmm0, %xmm0
; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX512-NEXT: retq		; AVX512-NEXT: retq
%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>		%1 = ashr <4 x i32> %a0, <i32 25, i32 26, i32 27, i32 28>
%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 2, i32 3, i32 4>)		%2 = call <4 x i32> @llvm.fshl.v4i32(<4 x i32> %1, <4 x i32> %1, <4 x i32> <i32 1, i32 2, i32 3, i32 4>)
%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> zeroinitializer		%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> zeroinitializer
%4 = ashr <4 x i32> %3, <i32 1, i32 2, i32 3, i32 4>		%4 = ashr <4 x i32> %3, <i32 1, i32 2, i32 3, i32 4>
%5 = and <4 x i32> %4, <i32 -4096, i32 -8192, i32 -4096, i32 -8192>		%5 = and <4 x i32> %4, <i32 -4096, i32 -8192, i32 -4096, i32 -8192>
ret <4 x i32> %5		ret <4 x i32> %5
}		}

declare <4 x i32> @llvm.fshl.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)		declare <4 x i32> @llvm.fshl.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)

llvm/test/CodeGen/X86/vector-rotate-128.ll

	Show First 20 Lines • Show All 2,069 Lines • ▼ Show 20 Lines
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: splatconstant_rotate_mask_v2i64:			; AVX-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vpsrlq $49, %xmm0, %xmm0			; AVX-NEXT: vpsrlq $49, %xmm0, %xmm0
	; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512F-LABEL: splatconstant_rotate_mask_v2i64:			; AVX512-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512F: # %bb.0:			; AVX512: # %bb.0:
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; AVX512-NEXT: vpsrlq $49, %xmm0, %xmm0
	; AVX512F-NEXT: vprolq $15, %zmm0, %zmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; AVX512-NEXT: retq
	; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq
	;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vprolq $15, %xmm0, %xmm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512VL-NEXT: retq
	;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512BW-NEXT: vprolq $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512BW-NEXT: vzeroupper
	; AVX512BW-NEXT: retq
	;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vprolq $15, %xmm0, %xmm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512VLBW-NEXT: retq
	;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; AVX512VBMI2-NEXT: vprolq $15, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512VBMI2-NEXT: vzeroupper
	; AVX512VBMI2-NEXT: retq
	;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v2i64:
	; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vprolq $15, %xmm0, %xmm0
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOP-LABEL: splatconstant_rotate_mask_v2i64:			; XOP-LABEL: splatconstant_rotate_mask_v2i64:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vprotq $15, %xmm0, %xmm0			; XOP-NEXT: vpsrlq $49, %xmm0, %xmm0
	; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	;			;
	; X86-SSE2-LABEL: splatconstant_rotate_mask_v2i64:			; X86-SSE2-LABEL: splatconstant_rotate_mask_v2i64:
	; X86-SSE2: # %bb.0:			; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: psrlq $49, %xmm0			; X86-SSE2-NEXT: psrlq $49, %xmm0
	; X86-SSE2-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE2-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	▲ Show 20 Lines • Show All 413 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-rotate-256.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefix=AVX512F			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=AVX512,AVX512F
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vl \| FileCheck %s --check-prefix=AVX512VL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vl \| FileCheck %s --check-prefixes=AVX512,AVX512VL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw \| FileCheck %s --check-prefixes=AVX512,AVX512BW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512vl \| FileCheck %s --check-prefix=AVX512VLBW			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512bw,+avx512vl \| FileCheck %s --check-prefixes=AVX512,AVX512VLBW
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vbmi2 \| FileCheck %s --check-prefix=AVX512VBMI2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vbmi2 \| FileCheck %s --check-prefixes=AVX512,AVX512VBMI2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vbmi2,+avx512vl \| FileCheck %s --check-prefix=AVX512VLVBMI2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f,+avx512vbmi2,+avx512vl \| FileCheck %s --check-prefixes=AVX512,AVX512VLVBMI2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+xop,+avx \| FileCheck %s --check-prefix=XOPAVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+xop,+avx \| FileCheck %s --check-prefix=XOPAVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+xop,+avx2 \| FileCheck %s --check-prefix=XOPAVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+xop,+avx2 \| FileCheck %s --check-prefix=XOPAVX2

	;			;
	; Variable Rotates			; Variable Rotates
	;			;

	define <4 x i64> @var_rotate_v4i64(<4 x i64> %a, <4 x i64> %b) nounwind {			define <4 x i64> @var_rotate_v4i64(<4 x i64> %a, <4 x i64> %b) nounwind {
	▲ Show 20 Lines • Show All 1,776 Lines • ▼ Show 20 Lines
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: splatconstant_rotate_mask_v4i64:			; AVX2-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpsrlq $49, %ymm0, %ymm0			; AVX2-NEXT: vpsrlq $49, %ymm0, %ymm0
	; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512F-LABEL: splatconstant_rotate_mask_v4i64:			; AVX512-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512F: # %bb.0:			; AVX512: # %bb.0:
	; AVX512F-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0			; AVX512-NEXT: vpsrlq $49, %ymm0, %ymm0
	; AVX512F-NEXT: vprolq $15, %zmm0, %zmm0			; AVX512-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512F-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; AVX512-NEXT: retq
	; AVX512F-NEXT: retq
	;
	; AVX512VL-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vprolq $15, %ymm0, %ymm0
	; AVX512VL-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512VL-NEXT: retq
	;
	; AVX512BW-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512BW: # %bb.0:
	; AVX512BW-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512BW-NEXT: vprolq $15, %zmm0, %zmm0
	; AVX512BW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512BW-NEXT: retq
	;
	; AVX512VLBW-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512VLBW: # %bb.0:
	; AVX512VLBW-NEXT: vprolq $15, %ymm0, %ymm0
	; AVX512VLBW-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512VLBW-NEXT: retq
	;
	; AVX512VBMI2-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512VBMI2: # %bb.0:
	; AVX512VBMI2-NEXT: # kill: def $ymm0 killed $ymm0 def $zmm0
	; AVX512VBMI2-NEXT: vprolq $15, %zmm0, %zmm0
	; AVX512VBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512VBMI2-NEXT: retq
	;
	; AVX512VLVBMI2-LABEL: splatconstant_rotate_mask_v4i64:
	; AVX512VLVBMI2: # %bb.0:
	; AVX512VLVBMI2-NEXT: vprolq $15, %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; AVX512VLVBMI2-NEXT: retq
	;			;
	; XOPAVX1-LABEL: splatconstant_rotate_mask_v4i64:			; XOPAVX1-LABEL: splatconstant_rotate_mask_v4i64:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vprotq $15, %xmm0, %xmm1			; XOPAVX1-NEXT: vpsrlq $49, %xmm0, %xmm1
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
	; XOPAVX1-NEXT: vprotq $15, %xmm0, %xmm0			; XOPAVX1-NEXT: vpsrlq $49, %xmm0, %xmm0
	; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0			; XOPAVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; XOPAVX1-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; XOPAVX1-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; XOPAVX1-NEXT: retq			; XOPAVX1-NEXT: retq
	;			;
	; XOPAVX2-LABEL: splatconstant_rotate_mask_v4i64:			; XOPAVX2-LABEL: splatconstant_rotate_mask_v4i64:
	; XOPAVX2: # %bb.0:			; XOPAVX2: # %bb.0:
	; XOPAVX2-NEXT: vprotq $15, %xmm0, %xmm1			; XOPAVX2-NEXT: vpsrlq $49, %ymm0, %ymm0
	; XOPAVX2-NEXT: vextracti128 $1, %ymm0, %xmm0
	; XOPAVX2-NEXT: vprotq $15, %xmm0, %xmm0
	; XOPAVX2-NEXT: vinserti128 $1, %xmm0, %ymm1, %ymm0
	; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; XOPAVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; XOPAVX2-NEXT: retq			; XOPAVX2-NEXT: retq
	%shl = shl <4 x i64> %a, <i64 15, i64 15, i64 15, i64 15>			%shl = shl <4 x i64> %a, <i64 15, i64 15, i64 15, i64 15>
	%lshr = lshr <4 x i64> %a, <i64 49, i64 49, i64 49, i64 49>			%lshr = lshr <4 x i64> %a, <i64 49, i64 49, i64 49, i64 49>
	%rmask = and <4 x i64> %lshr, <i64 255, i64 127, i64 127, i64 255>			%rmask = and <4 x i64> %lshr, <i64 255, i64 127, i64 127, i64 255>
	%lmask = and <4 x i64> %shl, <i64 33, i64 65, i64 129, i64 257>			%lmask = and <4 x i64> %shl, <i64 33, i64 65, i64 129, i64 257>
	%or = or <4 x i64> %lmask, %rmask			%or = or <4 x i64> %lmask, %rmask
	ret <4 x i64> %or			ret <4 x i64> %or
	▲ Show 20 Lines • Show All 279 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-rotate-512.ll

	Show First 20 Lines • Show All 1,037 Lines • ▼ Show 20 Lines

	;			;
	; Masked Uniform Constant Rotates			; Masked Uniform Constant Rotates
	;			;

	define <8 x i64> @splatconstant_rotate_mask_v8i64(<8 x i64> %a) nounwind {			define <8 x i64> @splatconstant_rotate_mask_v8i64(<8 x i64> %a) nounwind {
	; AVX512-LABEL: splatconstant_rotate_mask_v8i64:			; AVX512-LABEL: splatconstant_rotate_mask_v8i64:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vprolq $15, %zmm0, %zmm0			; AVX512-NEXT: vpsrlq $49, %zmm0, %zmm0
	; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0			; AVX512-NEXT: vpandq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %zmm0, %zmm0
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%shl = shl <8 x i64> %a, <i64 15, i64 15, i64 15, i64 15, i64 15, i64 15, i64 15, i64 15>			%shl = shl <8 x i64> %a, <i64 15, i64 15, i64 15, i64 15, i64 15, i64 15, i64 15, i64 15>
	%lshr = lshr <8 x i64> %a, <i64 49, i64 49, i64 49, i64 49, i64 49, i64 49, i64 49, i64 49>			%lshr = lshr <8 x i64> %a, <i64 49, i64 49, i64 49, i64 49, i64 49, i64 49, i64 49, i64 49>
	%rmask = and <8 x i64> %lshr, <i64 255, i64 127, i64 127, i64 255, i64 255, i64 127, i64 127, i64 255>			%rmask = and <8 x i64> %lshr, <i64 255, i64 127, i64 127, i64 255, i64 255, i64 127, i64 127, i64 255>
	%lmask = and <8 x i64> %shl, <i64 33, i64 65, i64 129, i64 257, i64 33, i64 65, i64 129, i64 257>			%lmask = and <8 x i64> %shl, <i64 33, i64 65, i64 129, i64 257, i64 33, i64 65, i64 129, i64 257>
	%or = or <8 x i64> %lmask, %rmask			%or = or <8 x i64> %lmask, %rmask
	ret <8 x i64> %or			ret <8 x i64> %or
	▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines