This is an archive of the discontinued LLVM Phabricator instance.

Differential D20897

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
ClosedPublic

Authored by igorb on Jun 2 2016, 12:37 AM.

Download Raw Diff

Details

Reviewers

RKSimon
AsafBadouh
delena

Commits

rGe59165ca63d3: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic…
rL273138: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic…

Summary

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Intel SRA intrinsic behavior are different from LLVM ashr instruction.
LLVM ashr instruction : If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, the result is undefined.
Intel SRA: If the unsigned integer value specified in the respective data element of the second source operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination data element are filled with the corresponding sign bit of the source element.

Diff Detail

Repository: rL LLVM

Event Timeline

igorb updated this revision to Diff 59341.Jun 2 2016, 12:37 AM

igorb retitled this revision from to [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering..

igorb updated this object.

igorb added reviewers: delena, AsafBadouh.

igorb set the repository for this revision to rL LLVM.

igorb added a subscriber: llvm-commits.

This looks very similar in aim to D19675 (although we barely have any support for AVX512 intrinsics in InstCombine)

In D20897#446667, @RKSimon wrote:

This looks very similar in aim to D19675 (although we barely have any support for AVX512 intrinsics in InstCombine)

Simon,

Do you see any problem with the current patch?

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

In this case constant folding done before intrinsic simplification only in order to match intrinsic behavior so general ISD::SRA can be used.

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

I agree. But today we are going towards generation IR from clang whenever it is possible. As far as this specific intrinsic (SAR), LLVM-IR specification and Intel intrinsic specification are different for out-of-range constants. You can generate a constant if the both arguments are constants, but replacing intrinsic with generic IR instruction is incorrect in this case.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

We should generate an optimal code, if we can. So, we, probably, should fold constants whenever it is possible.

IMO, once we are visiting SAR intrinsic in LowerINTRINSIC_WO_CHAIN, we should try to fold constants there. And it is actual for all intrinsics where IR and architecture specification does not match.

In D20897#449204, @delena wrote:

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

I agree. But today we are going towards generation IR from clang whenever it is possible. As far as this specific intrinsic (SAR), LLVM-IR specification and Intel intrinsic specification are different for out-of-range constants. You can generate a constant if the both arguments are constants, but replacing intrinsic with generic IR instruction is incorrect in this case.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

We should generate an optimal code, if we can. So, we, probably, should fold constants whenever it is possible.

IMO, once we are visiting SAR intrinsic in LowerINTRINSIC_WO_CHAIN, we should try to fold constants there. And it is actual for all intrinsics where IR and architecture specification does not match.

Are there many examples of where these SAR intrinsics can only be folded at lowering? And is it just SAR or are SLR/SHL shifts likely as well? What prevented them from being folded earlier in instcombine? I'm happier now with the idea of adding it to LowerINTRINSIC_WO_CHAIN - I just want to know it will be useful.

I think we need this patch for correctness in case we got this intrinsic with const index in back-end ( for example if InstCombiner doesn't run , -O0 compilation) .

igorb added a reviewer: RKSimon.Jun 9 2016, 2:18 AM

In D20897#453183, @igorb wrote:

I think we need this patch for correctness in case we got this intrinsic with const index in back-end ( for example if InstCombiner doesn't run , -O0 compilation) .

Except this isn't a correctness issue, its an optimization no? The code will run fine at -O0 or higher as it will lower to the vpsrav intrinsic which supports the out-of-range shift value and will give the correct result.

If this was in combineINTRINSIC_WO_CHAIN (which we recently removed as it wasn't being used....) I think we could accept this but only if we have a use case that InstCombine isn't catching.

In D20897#453507, @RKSimon wrote:

Except this isn't a correctness issue, its an optimization no? The code will run fine at -O0 or higher as it will lower to the vpsrav intrinsic which supports the out-of-range shift value and will give the correct result.

No, I belive it is correctness issue. vpsrav intrinsic lowering with constant out-of-range shift value is incorrect.

For example

define <4 x i32> @test_x86_avx2_psrav_d_fold(<4 x i32> %a0, <4 x i32> %a1) {
  %res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)
  ret <4 x i32> %res
}
declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

Without patch we got incorrect result:

movl    $1, %eax
movd    %eax, %xmm0
retq

With the patch:

 .LCPI0_0:
.long   1                       # 0x1
.long   0                       # 0x0
.long   4294967295              # 0xffffffff
.long   0                       # 0x0

movaps  .LCPI0_0(%rip), %xmm0   # xmm0 = [1,0,4294967295,0]
retq

No, I belive it is correctness issue. vpsrav intrinsic lowering with constant out-of-range shift value is incorrect.

Thanks for the example - the issue you describe seems to be that the x86 variable shift intrinsics are being mapped to ISD::SRA - I think this is the problem you are encountering?

lib/Target/X86/X86IntrinsicsInfo.h
326	These variable shift intrinsics should NOT use the ISD::SRA opcode but need a X86ISD::VSRAV opcode instead - similarly for the logical left/right intrinsics.

Update patch according to comments.
Thanks for review.

LGTM

This revision is now accepted and ready to land.Jun 15 2016, 11:14 PM

Thanks Igor - do you think the logical shift equivalents need to be changed as well? They also have defined behaviour for out of range shifts.

In D20897#461603, @RKSimon wrote:

Thanks Igor - do you think the logical shift equivalents need to be changed as well? They also have defined behaviour for out of range shifts.

I am not sure . The out of range shifts behaviour undefined but implemented as "undefined results" to be always 0.
APInt APInt::lshr(unsigned shiftAmt)
APInt LLVM_ATTRIBUTE_UNUSED_RESULT shl(unsigned shiftAmt)

I will add a few tests to insure that this behavior doesn't change.

Closed by commit rL273138: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic… (authored by ibreger). · Explain WhyJun 20 2016, 12:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

59 lines

X86IntrinsicsInfo.h

25 lines

test/

CodeGen/

X86/

avx2-intrinsics-x86.ll

29 lines

avx512bw-intrinsics.ll

16 lines

avx512vl-intrinsics.ll

22 lines

Diff 59341

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,065 Lines • ▼ Show 20 Lines	static SDValue recoverFramePointer(SelectionDAG &DAG, const Function *Fn,
int RegNodeSize = getSEHRegistrationNodeSize(Fn);		int RegNodeSize = getSEHRegistrationNodeSize(Fn);
// RegNodeBase = EntryEBP - RegNodeSize		// RegNodeBase = EntryEBP - RegNodeSize
// ParentFP = RegNodeBase - ParentFrameOffset		// ParentFP = RegNodeBase - ParentFrameOffset
SDValue RegNodeBase = DAG.getNode(ISD::SUB, dl, PtrVT, EntryEBP,		SDValue RegNodeBase = DAG.getNode(ISD::SUB, dl, PtrVT, EntryEBP,
DAG.getConstant(RegNodeSize, dl, PtrVT));		DAG.getConstant(RegNodeSize, dl, PtrVT));
return DAG.getNode(ISD::SUB, dl, PtrVT, RegNodeBase, ParentFrameOffset);		return DAG.getNode(ISD::SUB, dl, PtrVT, RegNodeBase, ParentFrameOffset);
}		}

		static SDValue SRAFoldConstant(SDLoc DL, EVT VT, SDValue Cst1, SDValue Cst2,
		SelectionDAG &DAG) {

		// For vectors extract each constant element so we can constant
		// fold them individually.
		BuildVectorSDNode *BV1 = dyn_cast<BuildVectorSDNode>(Cst1.getNode());
		BuildVectorSDNode *BV2 = dyn_cast<BuildVectorSDNode>(Cst2.getNode());
		if (!BV1 \|\| !BV2)
		return SDValue();

		assert(BV1->getNumOperands() == BV2->getNumOperands() && "Out of sync!");

		EVT SVT = VT.getScalarType();
		SmallVector<SDValue, 4> Outputs;
		for (unsigned I = 0, E = BV1->getNumOperands(); I != E; ++I) {
		ConstantSDNode *V1 = dyn_cast<ConstantSDNode>(BV1->getOperand(I));
		ConstantSDNode *V2 = dyn_cast<ConstantSDNode>(BV2->getOperand(I));
		if (!V1 \|\| !V2) // Not a constant, bail.
		return SDValue();

		if (V1->isOpaque() \|\| V2->isOpaque())
		return SDValue();

		if (V1->getValueType(0) != SVT \|\| V2->getValueType(0) != SVT)
		return SDValue();

		// Fold one vector element.
		const APInt &C1 = V1->getAPIntValue();
		const APInt &C2 = V2->getAPIntValue();
		unsigned shiftAmt = C2.getLimitedValue(C1.getBitWidth() - 1);
		APInt Val = C1.ashr(shiftAmt);
		Outputs.push_back(DAG.getConstant(Val, DL, SVT));
		}

		// Build a big vector out of the scalar elements we generated.
		return DAG.getBuildVector(VT, SDLoc(), Outputs);
		}

static SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, const X86Subtarget &Subtarget,		static SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, const X86Subtarget &Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
SDLoc dl(Op);		SDLoc dl(Op);
unsigned IntNo = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();		unsigned IntNo = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
const IntrinsicData* IntrData = getIntrinsicWithoutChain(IntNo);		const IntrinsicData* IntrData = getIntrinsicWithoutChain(IntNo);
if (IntrData) {		if (IntrData) {
switch(IntrData->Type) {		switch(IntrData->Type) {
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	case BRCST32x2_TO_VEC: {
//bitcast Src to packed 64		//bitcast Src to packed 64
MVT ScalarVT = VT.getScalarType() == MVT::i32 ? MVT::i64 : MVT::f64;		MVT ScalarVT = VT.getScalarType() == MVT::i32 ? MVT::i64 : MVT::f64;
MVT BitcastVT = MVT::getVectorVT(ScalarVT, Src.getValueSizeInBits()/64);		MVT BitcastVT = MVT::getVectorVT(ScalarVT, Src.getValueSizeInBits()/64);
Src = DAG.getBitcast(BitcastVT, Src);		Src = DAG.getBitcast(BitcastVT, Src);

return getVectorMaskingNode(DAG.getNode(IntrData->Opc0, dl, VT, Src),		return getVectorMaskingNode(DAG.getNode(IntrData->Opc0, dl, VT, Src),
Mask, PassThru, Subtarget, DAG);		Mask, PassThru, Subtarget, DAG);
}		}
		case INTR_SRA_MASK:
		case INTR_SRA: {
		SDValue Src1 = Op.getOperand(1);
		SDValue Src2 = Op.getOperand(2);
		//From SPEC: If the value specified in the respective data element of
		// count is greater than element size then the destination data element
		// are filled with the corresponding sign bit of the source element.
		// This behavior is different from LLVM SRA (in such case the res is undef).
		// Perform Constant folding before SRA creation.
		SDValue SRA = SRAFoldConstant(dl, VT, Src1, Src2, DAG);

		if(!SRA.getNode())
		SRA = DAG.getNode(IntrData->Opc0, dl, VT, Src1, Src2);

		if (IntrData->Type == INTR_SRA)
		return SRA;

		return getVectorMaskingNode(SRA, Op.getOperand(4), Op.getOperand(3),
		Subtarget, DAG);
		}

default:		default:
break;		break;
}		}
}		}

switch (IntNo) {		switch (IntNo) {
default: return SDValue(); // Don't custom lower most intrinsics.		default: return SDValue(); // Don't custom lower most intrinsics.

▲ Show 20 Lines • Show All 13,316 Lines • Show Last 20 Lines

lib/Target/X86/X86IntrinsicsInfo.h

Show All 29 Lines	enum IntrinsicType {
FMA_OP_MASK, FMA_OP_MASKZ, FMA_OP_MASK3,		FMA_OP_MASK, FMA_OP_MASKZ, FMA_OP_MASK3,
FMA_OP_SCALAR_MASK, FMA_OP_SCALAR_MASKZ, FMA_OP_SCALAR_MASK3,		FMA_OP_SCALAR_MASK, FMA_OP_SCALAR_MASKZ, FMA_OP_SCALAR_MASK3,
VPERM_2OP_MASK, VPERM_3OP_MASK, VPERM_3OP_MASKZ, INTR_TYPE_SCALAR_MASK,		VPERM_2OP_MASK, VPERM_3OP_MASK, VPERM_3OP_MASKZ, INTR_TYPE_SCALAR_MASK,
INTR_TYPE_SCALAR_MASK_RM, INTR_TYPE_3OP_SCALAR_MASK_RM,		INTR_TYPE_SCALAR_MASK_RM, INTR_TYPE_3OP_SCALAR_MASK_RM,
COMPRESS_EXPAND_IN_REG, COMPRESS_TO_MEM, BRCST_SUBVEC_TO_VEC, BRCST32x2_TO_VEC,		COMPRESS_EXPAND_IN_REG, COMPRESS_TO_MEM, BRCST_SUBVEC_TO_VEC, BRCST32x2_TO_VEC,
TRUNCATE_TO_MEM_VI8, TRUNCATE_TO_MEM_VI16, TRUNCATE_TO_MEM_VI32,		TRUNCATE_TO_MEM_VI8, TRUNCATE_TO_MEM_VI16, TRUNCATE_TO_MEM_VI32,
EXPAND_FROM_MEM, STOREANT, BLEND, INSERT_SUBVEC,		EXPAND_FROM_MEM, STOREANT, BLEND, INSERT_SUBVEC,
TERLOG_OP_MASK, TERLOG_OP_MASKZ, BROADCASTM, KUNPCK, FIXUPIMM, FIXUPIMM_MASKZ, FIXUPIMMS,		TERLOG_OP_MASK, TERLOG_OP_MASKZ, BROADCASTM, KUNPCK, FIXUPIMM, FIXUPIMM_MASKZ, FIXUPIMMS,
FIXUPIMMS_MASKZ, CONVERT_MASK_TO_VEC, CONVERT_TO_MASK		FIXUPIMMS_MASKZ, CONVERT_MASK_TO_VEC, CONVERT_TO_MASK,
		INTR_SRA_MASK, INTR_SRA
};		};

struct IntrinsicData {		struct IntrinsicData {

unsigned Id;		unsigned Id;
IntrinsicType Type;		IntrinsicType Type;
unsigned Opc0;		unsigned Opc0;
unsigned Opc1;		unsigned Opc1;
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx2_psllv_d, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_d, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_d_256, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_d_256, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_q, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_q, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_q_256, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_q_256, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx2_psrav_d, INTR_TYPE_2OP, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx2_psrav_d, INTR_SRA, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx2_psrav_d_256, INTR_TYPE_2OP, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx2_psrav_d_256, INTR_SRA, ISD::SRA, 0),
		RKSimonUnsubmitted Not Done Reply Inline Actions These variable shift intrinsics should NOT use the ISD::SRA opcode but need a X86ISD::VSRAV opcode instead - similarly for the logical left/right intrinsics. RKSimon: These variable shift intrinsics should NOT use the ISD::SRA opcode but need a X86ISD::VSRAV…
X86_INTRINSIC_DATA(avx2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),
▲ Show 20 Lines • Show All 1,095 Lines • ▼ Show 20 Lines
X86_INTRINSIC_DATA(avx512_mask_psra_w_128, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_128, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_w_256, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_256, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_w_512, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_512, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrai_d, VSHIFT_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrai_d, VSHIFT_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrai_q, VSHIFT_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrai_q, VSHIFT_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_d, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_d, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q_128, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q_128, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q_256, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q_256, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav16_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav16_hi, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav32_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav32_hi, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav4_si, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav4_si, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav8_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav8_hi, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav8_si, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav8_si, INTR_SRA_MASK, ISD::SRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d_256, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d_256, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_q, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_q, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_q_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_q_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

test/CodeGen/X86/avx2-intrinsics-x86.ll

	Show First 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-LABEL: test_x86_avx2_psrav_d:			; AVX512VL-LABEL: test_x86_avx2_psrav_d:
	; AVX512VL: ## BB#0:			; AVX512VL: ## BB#0:
	; AVX512VL-NEXT: vpsravd %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsravd %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retl			; AVX512VL-NEXT: retl
	%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> %a0, <4 x i32> %a1) ; <<4 x i32>> [#uses=1]			%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> %a0, <4 x i32> %a1) ; <<4 x i32>> [#uses=1]
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}
	declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

				define <4 x i32> @test_x86_avx2_psrav_d_fold(<4 x i32> %a0, <4 x i32> %a1) {
				; AVX2-LABEL: test_x86_avx2_psrav_d_fold:
				; AVX2: ## BB#0:
				; AVX2-NEXT: vmovaps {{.*#+}} xmm0 = [1,0,4294967295,0]
				; AVX2-NEXT: retl
				;
				; AVX512VL-LABEL: test_x86_avx2_psrav_d_fold:
				; AVX512VL: ## BB#0:
				; AVX512VL-NEXT: vmovdqa32 {{.*#+}} xmm0 = [1,0,4294967295,0]
				; AVX512VL-NEXT: retl
				%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)
				ret <4 x i32> %res
				}
				declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

	define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {			define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {
	; AVX2-LABEL: test_x86_avx2_psrav_d_256:			; AVX2-LABEL: test_x86_avx2_psrav_d_256:
	; AVX2: ## BB#0:			; AVX2: ## BB#0:
	; AVX2-NEXT: vpsravd %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpsravd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retl			; AVX2-NEXT: retl
	;			;
	; AVX512VL-LABEL: test_x86_avx2_psrav_d_256:			; AVX512VL-LABEL: test_x86_avx2_psrav_d_256:
	; AVX512VL: ## BB#0:			; AVX512VL: ## BB#0:
	; AVX512VL-NEXT: vpsravd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsravd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retl			; AVX512VL-NEXT: retl
	%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> %a0, <8 x i32> %a1) ; <<8 x i32>> [#uses=1]			%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> %a0, <8 x i32> %a1) ; <<8 x i32>> [#uses=1]
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

				define <8 x i32> @test_x86_avx2_psrav_d_256_fold(<8 x i32> %a0, <8 x i32> %a1) {
				; AVX2-LABEL: test_x86_avx2_psrav_d_256_fold:
				; AVX2: ## BB#0:
				; AVX2-NEXT: vmovaps {{.*#+}} ymm0 = [1,0,4294967295,0,4294967295,0,4294967295,0]
				; AVX2-NEXT: retl
				;
				; AVX512VL-LABEL: test_x86_avx2_psrav_d_256_fold:
				; AVX512VL: ## BB#0:
				; AVX512VL-NEXT: vmovdqa32 {{.*#+}} ymm0 = [1,0,4294967295,0,4294967295,0,4294967295,0]
				; AVX512VL-NEXT: retl
				%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>)
				ret <8 x i32> %res
				}
	declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone			declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone

	define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {			define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {
	; AVX2-LABEL: test_x86_avx2_gather_d_pd:			; AVX2-LABEL: test_x86_avx2_gather_d_pd:
	; AVX2: ## BB#0:			; AVX2: ## BB#0:
	; AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax			; AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax
	; AVX2-NEXT: vgatherdpd %xmm2, (%eax,%xmm1,2), %xmm0			; AVX2-NEXT: vgatherdpd %xmm2, (%eax,%xmm1,2), %xmm0
	; AVX2-NEXT: retl			; AVX2-NEXT: retl
	▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512bw-intrinsics.ll

Show First 20 Lines • Show All 3,101 Lines • ▼ Show 20 Lines	; AVX512F-32-NEXT: retl
%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3)		%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3)
%res1 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> zeroinitializer, i32 %x3)		%res1 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> zeroinitializer, i32 %x3)
%res2 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 -1)		%res2 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 -1)
%res3 = add <32 x i16> %res, %res1		%res3 = add <32 x i16> %res, %res1
%res4 = add <32 x i16> %res3, %res2		%res4 = add <32 x i16> %res3, %res2
ret <32 x i16> %res4		ret <32 x i16> %res4
}		}

		define <32 x i16>@test_int_x86_avx512_mask_psrav32_hi_fold(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3) {
		; AVX512BW-LABEL: test_int_x86_avx512_mask_psrav32_hi_fold:
		; AVX512BW: ## BB#0:
		; AVX512BW-NEXT: vmovdqu16 {{.*#+}} zmm0 = [1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0]
		; AVX512BW-NEXT: retq
		;
		; AVX512F-32-LABEL: test_int_x86_avx512_mask_psrav32_hi_fold:
		; AVX512F-32: # BB#0:
		; AVX512F-32-NEXT: vmovdqu16 {{.*#+}} zmm0 = [1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0,1,0,65535,0,65535,0,65535,0]
		; AVX512F-32-NEXT: retl
		%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> <i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51>,
		<32 x i16> <i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49>,
		<32 x i16> zeroinitializer, i32 -1)
		ret <32 x i16> %res
		}

declare <32 x i16> @llvm.x86.avx512.mask.psll.w.512(<32 x i16>, <8 x i16>, <32 x i16>, i32)		declare <32 x i16> @llvm.x86.avx512.mask.psll.w.512(<32 x i16>, <8 x i16>, <32 x i16>, i32)

define <32 x i16>@test_int_x86_avx512_mask_psll_w_512(<32 x i16> %x0, <8 x i16> %x1, <32 x i16> %x2, i32 %x3) {		define <32 x i16>@test_int_x86_avx512_mask_psll_w_512(<32 x i16> %x0, <8 x i16> %x1, <32 x i16> %x2, i32 %x3) {
; AVX512BW-LABEL: test_int_x86_avx512_mask_psll_w_512:		; AVX512BW-LABEL: test_int_x86_avx512_mask_psll_w_512:
; AVX512BW: ## BB#0:		; AVX512BW: ## BB#0:
; AVX512BW-NEXT: kmovd %edi, %k1		; AVX512BW-NEXT: kmovd %edi, %k1
; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}		; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm3 {%k1} {z}		; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm3 {%k1} {z}
▲ Show 20 Lines • Show All 415 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512vl-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,057 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq ## encoding: [0xc3]
%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 %x3)		%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 %x3)
%res1 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> zeroinitializer, i8 %x3)		%res1 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> zeroinitializer, i8 %x3)
%res2 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 -1)		%res2 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 -1)
%res3 = add <8 x i32> %res, %res1		%res3 = add <8 x i32> %res, %res1
%res4 = add <8 x i32> %res3, %res2		%res4 = add <8 x i32> %res3, %res2
ret <8 x i32> %res4		ret <8 x i32> %res4
}		}

		define <8 x i32>@test_int_x86_avx512_mask_psrav8_si_fold() {
		; CHECK-LABEL: test_int_x86_avx512_mask_psrav8_si_fold:
		; CHECK: ## BB#0:
		; CHECK-NEXT: vmovdqa32 {{.*#+}} ymm0 = [1,0,4294967295,0,4294967295,0,4294967295,0]
		; CHECK-NEXT: ## encoding: [0x62,0xf1,0x7d,0x28,0x6f,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI520_0-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: retq ## encoding: [0xc3]
		%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>, <8 x i32> zeroinitializer, i8 -1)
		ret <8 x i32> %res
		}

declare <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64>, <2 x i64>, <2 x i64>, i8)		declare <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64>, <2 x i64>, <2 x i64>, i8)

define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3) {		define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3) {
; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128:		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]		; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x09,0x46,0xd1]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x09,0x46,0xd1]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0x89,0x46,0xd9]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0x89,0x46,0xd9]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm0 ## encoding: [0x62,0xf2,0xfd,0x08,0x46,0xc1]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm0 ## encoding: [0x62,0xf2,0xfd,0x08,0x46,0xc1]
; CHECK-NEXT: vpaddq %xmm3, %xmm2, %xmm1 ## encoding: [0x62,0xf1,0xed,0x08,0xd4,0xcb]		; CHECK-NEXT: vpaddq %xmm3, %xmm2, %xmm1 ## encoding: [0x62,0xf1,0xed,0x08,0xd4,0xcb]
; CHECK-NEXT: vpaddq %xmm0, %xmm1, %xmm0 ## encoding: [0x62,0xf1,0xf5,0x08,0xd4,0xc0]		; CHECK-NEXT: vpaddq %xmm0, %xmm1, %xmm0 ## encoding: [0x62,0xf1,0xf5,0x08,0xd4,0xc0]
; CHECK-NEXT: retq ## encoding: [0xc3]		; CHECK-NEXT: retq ## encoding: [0xc3]
%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3)		%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3)
%res1 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> zeroinitializer, i8 %x3)		%res1 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> zeroinitializer, i8 %x3)
%res2 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 -1)		%res2 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 -1)
%res3 = add <2 x i64> %res, %res1		%res3 = add <2 x i64> %res, %res1
%res4 = add <2 x i64> %res3, %res2		%res4 = add <2 x i64> %res3, %res2
ret <2 x i64> %res4		ret <2 x i64> %res4
}		}

		define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128_fold(i8 %x3) {
		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128_fold:
		; CHECK: ## BB#0:
		; CHECK-NEXT: vmovdqa64 {{.*#+}} xmm0 = [1,18446744073709551615]
		; CHECK-NEXT: ## encoding: [0x62,0xf1,0xfd,0x08,0x6f,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI522_0-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: retq ## encoding: [0xc3]
		%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> <i64 2, i64 -9>, <2 x i64> <i64 1, i64 90>, <2 x i64> zeroinitializer, i8 -1)
		ret <2 x i64> %res
		}

declare <4 x i64> @llvm.x86.avx512.mask.psrav.q.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)		declare <4 x i64> @llvm.x86.avx512.mask.psrav.q.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)

define <4 x i64>@test_int_x86_avx512_mask_psrav_q_256(<4 x i64> %x0, <4 x i64> %x1, <4 x i64> %x2, i8 %x3) {		define <4 x i64>@test_int_x86_avx512_mask_psrav_q_256(<4 x i64> %x0, <4 x i64> %x1, <4 x i64> %x2, i8 %x3) {
; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_256:		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_256:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]		; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x29,0x46,0xd1]		; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x29,0x46,0xd1]
; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0xa9,0x46,0xd9]		; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0xa9,0x46,0xd9]
▲ Show 20 Lines • Show All 1,496 Lines • Show Last 20 Lines