This is an archive of the discontinued LLVM Phabricator instance.

Differential D20897

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
ClosedPublic

Authored by igorb on Jun 2 2016, 12:37 AM.

Download Raw Diff

Details

Reviewers

RKSimon
AsafBadouh
delena

Commits

rGe59165ca63d3: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic…
rL273138: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic…

Summary

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
Intel SRA intrinsic behavior are different from LLVM ashr instruction.
LLVM ashr instruction : If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, the result is undefined.
Intel SRA: If the unsigned integer value specified in the respective data element of the second source operand is greater than 15 (for words), 31 (for doublewords), or 63 (for a quadword), then the destination data element are filled with the corresponding sign bit of the source element.

Diff Detail

Repository: rL LLVM

Event Timeline

igorb updated this revision to Diff 59341.Jun 2 2016, 12:37 AM

igorb retitled this revision from to [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering..

igorb updated this object.

igorb added reviewers: delena, AsafBadouh.

igorb set the repository for this revision to rL LLVM.

igorb added a subscriber: llvm-commits.

This looks very similar in aim to D19675 (although we barely have any support for AVX512 intrinsics in InstCombine)

In D20897#446667, @RKSimon wrote:

This looks very similar in aim to D19675 (although we barely have any support for AVX512 intrinsics in InstCombine)

Simon,

Do you see any problem with the current patch?

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

In this case constant folding done before intrinsic simplification only in order to match intrinsic behavior so general ISD::SRA can be used.

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

I agree. But today we are going towards generation IR from clang whenever it is possible. As far as this specific intrinsic (SAR), LLVM-IR specification and Intel intrinsic specification are different for out-of-range constants. You can generate a constant if the both arguments are constants, but replacing intrinsic with generic IR instruction is incorrect in this case.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

We should generate an optimal code, if we can. So, we, probably, should fold constants whenever it is possible.

IMO, once we are visiting SAR intrinsic in LowerINTRINSIC_WO_CHAIN, we should try to fold constants there. And it is actual for all intrinsics where IR and architecture specification does not match.

In D20897#449204, @delena wrote:

In D20897#447281, @RKSimon wrote:

I just think we're better off handling simplification of intrinsics as early as possible - in this case in InstCombiner::visitCallInst instead of waiting until lowering.

I agree. But today we are going towards generation IR from clang whenever it is possible. As far as this specific intrinsic (SAR), LLVM-IR specification and Intel intrinsic specification are different for out-of-range constants. You can generate a constant if the both arguments are constants, but replacing intrinsic with generic IR instruction is incorrect in this case.

Also, should we be adding constant folding (or other optimizations) to LowerINTRINSIC_WO_CHAIN ? I understood that is for cleanup + canonicalization only.

We should generate an optimal code, if we can. So, we, probably, should fold constants whenever it is possible.

IMO, once we are visiting SAR intrinsic in LowerINTRINSIC_WO_CHAIN, we should try to fold constants there. And it is actual for all intrinsics where IR and architecture specification does not match.

Are there many examples of where these SAR intrinsics can only be folded at lowering? And is it just SAR or are SLR/SHL shifts likely as well? What prevented them from being folded earlier in instcombine? I'm happier now with the idea of adding it to LowerINTRINSIC_WO_CHAIN - I just want to know it will be useful.

I think we need this patch for correctness in case we got this intrinsic with const index in back-end ( for example if InstCombiner doesn't run , -O0 compilation) .

igorb added a reviewer: RKSimon.Jun 9 2016, 2:18 AM

In D20897#453183, @igorb wrote:

I think we need this patch for correctness in case we got this intrinsic with const index in back-end ( for example if InstCombiner doesn't run , -O0 compilation) .

Except this isn't a correctness issue, its an optimization no? The code will run fine at -O0 or higher as it will lower to the vpsrav intrinsic which supports the out-of-range shift value and will give the correct result.

If this was in combineINTRINSIC_WO_CHAIN (which we recently removed as it wasn't being used....) I think we could accept this but only if we have a use case that InstCombine isn't catching.

In D20897#453507, @RKSimon wrote:

Except this isn't a correctness issue, its an optimization no? The code will run fine at -O0 or higher as it will lower to the vpsrav intrinsic which supports the out-of-range shift value and will give the correct result.

No, I belive it is correctness issue. vpsrav intrinsic lowering with constant out-of-range shift value is incorrect.

For example

define <4 x i32> @test_x86_avx2_psrav_d_fold(<4 x i32> %a0, <4 x i32> %a1) {
  %res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)
  ret <4 x i32> %res
}
declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

Without patch we got incorrect result:

movl    $1, %eax
movd    %eax, %xmm0
retq

With the patch:

 .LCPI0_0:
.long   1                       # 0x1
.long   0                       # 0x0
.long   4294967295              # 0xffffffff
.long   0                       # 0x0

movaps  .LCPI0_0(%rip), %xmm0   # xmm0 = [1,0,4294967295,0]
retq

No, I belive it is correctness issue. vpsrav intrinsic lowering with constant out-of-range shift value is incorrect.

Thanks for the example - the issue you describe seems to be that the x86 variable shift intrinsics are being mapped to ISD::SRA - I think this is the problem you are encountering?

lib/Target/X86/X86IntrinsicsInfo.h
326 ↗	(On Diff #59341)	These variable shift intrinsics should NOT use the ISD::SRA opcode but need a X86ISD::VSRAV opcode instead - similarly for the logical left/right intrinsics.

Update patch according to comments.
Thanks for review.

LGTM

This revision is now accepted and ready to land.Jun 15 2016, 11:14 PM

Thanks Igor - do you think the logical shift equivalents need to be changed as well? They also have defined behaviour for out of range shifts.

In D20897#461603, @RKSimon wrote:

Thanks Igor - do you think the logical shift equivalents need to be changed as well? They also have defined behaviour for out of range shifts.

I am not sure . The out of range shifts behaviour undefined but implemented as "undefined results" to be always 0.
APInt APInt::lshr(unsigned shiftAmt)
APInt LLVM_ATTRIBUTE_UNUSED_RESULT shl(unsigned shiftAmt)

I will add a few tests to insure that this behavior doesn't change.

Closed by commit rL273138: [AVX512] [AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic… (authored by ibreger). · Explain WhyJun 20 2016, 12:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.h

5 lines

X86ISelLowering.cpp

1 line

X86InstrAVX512.td

5 lines

X86InstrFragmentsSIMD.td

2 lines

X86InstrInfo.cpp

2 lines

X86InstrSSE.td

2 lines

X86IntrinsicsInfo.h

22 lines

test/

CodeGen/

X86/

avx2-intrinsics-x86.ll

33 lines

avx512bw-intrinsics.ll

18 lines

avx512vl-intrinsics.ll

26 lines

Diff 61235

llvm/trunk/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CVT2MASK,		CVT2MASK,

// 128-bit vector logical left / right shift		// 128-bit vector logical left / right shift
VSHLDQ, VSRLDQ,		VSHLDQ, VSRLDQ,

// Vector shift elements		// Vector shift elements
VSHL, VSRL, VSRA,		VSHL, VSRL, VSRA,

		// Vector variable shift right arithmetic.
		// Unlike ISD::SRA, in case shift count greater then element size
		// use sign bit to fill destination data element.
		VSRAV,

// Vector shift elements by immediate		// Vector shift elements by immediate
VSHLI, VSRLI, VSRAI,		VSHLI, VSRLI, VSRAI,

// Bit rotate by immediate		// Bit rotate by immediate
VROTLI, VROTRI,		VROTLI, VROTRI,

// Vector packed double/float comparison.		// Vector packed double/float comparison.
CMPP,		CMPP,
▲ Show 20 Lines • Show All 920 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 22,016 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::VSHLDQ: return "X86ISD::VSHLDQ";		case X86ISD::VSHLDQ: return "X86ISD::VSHLDQ";
case X86ISD::VSRLDQ: return "X86ISD::VSRLDQ";		case X86ISD::VSRLDQ: return "X86ISD::VSRLDQ";
case X86ISD::VSHL: return "X86ISD::VSHL";		case X86ISD::VSHL: return "X86ISD::VSHL";
case X86ISD::VSRL: return "X86ISD::VSRL";		case X86ISD::VSRL: return "X86ISD::VSRL";
case X86ISD::VSRA: return "X86ISD::VSRA";		case X86ISD::VSRA: return "X86ISD::VSRA";
case X86ISD::VSHLI: return "X86ISD::VSHLI";		case X86ISD::VSHLI: return "X86ISD::VSHLI";
case X86ISD::VSRLI: return "X86ISD::VSRLI";		case X86ISD::VSRLI: return "X86ISD::VSRLI";
case X86ISD::VSRAI: return "X86ISD::VSRAI";		case X86ISD::VSRAI: return "X86ISD::VSRAI";
		case X86ISD::VSRAV: return "X86ISD::VSRAV";
case X86ISD::VROTLI: return "X86ISD::VROTLI";		case X86ISD::VROTLI: return "X86ISD::VROTLI";
case X86ISD::VROTRI: return "X86ISD::VROTRI";		case X86ISD::VROTRI: return "X86ISD::VROTRI";
case X86ISD::VPPERM: return "X86ISD::VPPERM";		case X86ISD::VPPERM: return "X86ISD::VPPERM";
case X86ISD::CMPP: return "X86ISD::CMPP";		case X86ISD::CMPP: return "X86ISD::CMPP";
case X86ISD::PCMPEQ: return "X86ISD::PCMPEQ";		case X86ISD::PCMPEQ: return "X86ISD::PCMPEQ";
case X86ISD::PCMPGT: return "X86ISD::PCMPGT";		case X86ISD::PCMPGT: return "X86ISD::PCMPGT";
case X86ISD::PCMPEQM: return "X86ISD::PCMPEQM";		case X86ISD::PCMPEQM: return "X86ISD::PCMPEQM";
case X86ISD::PCMPGTM: return "X86ISD::PCMPGTM";		case X86ISD::PCMPGTM: return "X86ISD::PCMPGTM";
▲ Show 20 Lines • Show All 9,385 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,291 Lines • ▼ Show 20 Lines	multiclass avx512_var_shift_w<bits<8> opc, string OpcodeStr,
defm WZ128: avx512_var_shift<opc, OpcodeStr, OpNode, v8i16x_info>,		defm WZ128: avx512_var_shift<opc, OpcodeStr, OpNode, v8i16x_info>,
EVEX_V128, VEX_W;		EVEX_V128, VEX_W;
}		}
}		}

defm VPSLLV : avx512_var_shift_types<0x47, "vpsllv", shl>,		defm VPSLLV : avx512_var_shift_types<0x47, "vpsllv", shl>,
avx512_var_shift_w<0x12, "vpsllvw", shl>,		avx512_var_shift_w<0x12, "vpsllvw", shl>,
avx512_var_shift_w_lowering<avx512vl_i16_info, shl>;		avx512_var_shift_w_lowering<avx512vl_i16_info, shl>;

defm VPSRAV : avx512_var_shift_types<0x46, "vpsrav", sra>,		defm VPSRAV : avx512_var_shift_types<0x46, "vpsrav", sra>,
avx512_var_shift_w<0x11, "vpsravw", sra>,		avx512_var_shift_w<0x11, "vpsravw", sra>,
avx512_var_shift_w_lowering<avx512vl_i16_info, sra>;		avx512_var_shift_w_lowering<avx512vl_i16_info, sra>;
		let isCodeGenOnly = 1 in
		defm VPSRAV_Int : avx512_var_shift_types<0x46, "vpsrav", X86vsrav>,
		avx512_var_shift_w<0x11, "vpsravw", X86vsrav>;

defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", srl>,		defm VPSRLV : avx512_var_shift_types<0x45, "vpsrlv", srl>,
avx512_var_shift_w<0x10, "vpsrlvw", srl>,		avx512_var_shift_w<0x10, "vpsrlvw", srl>,
avx512_var_shift_w_lowering<avx512vl_i16_info, srl>;		avx512_var_shift_w_lowering<avx512vl_i16_info, srl>;
defm VPRORV : avx512_var_shift_types<0x14, "vprorv", rotr>;		defm VPRORV : avx512_var_shift_types<0x14, "vprorv", rotr>;
defm VPROLV : avx512_var_shift_types<0x15, "vprolv", rotl>;		defm VPROLV : avx512_var_shift_types<0x15, "vprolv", rotl>;

//===-------------------------------------------------------------------===//		//===-------------------------------------------------------------------===//
// 1-src variable permutation VPERMW/D/Q		// 1-src variable permutation VPERMW/D/Q
▲ Show 20 Lines • Show All 3,474 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	def X86vshl : SDNode<"X86ISD::VSHL",
SDTCisVec<2>]>>;		SDTCisVec<2>]>>;
def X86vsrl : SDNode<"X86ISD::VSRL",		def X86vsrl : SDNode<"X86ISD::VSRL",
SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,		SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
SDTCisVec<2>]>>;		SDTCisVec<2>]>>;
def X86vsra : SDNode<"X86ISD::VSRA",		def X86vsra : SDNode<"X86ISD::VSRA",
SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,		SDTypeProfile<1, 2, [SDTCisVec<0>, SDTCisSameAs<0,1>,
SDTCisVec<2>]>>;		SDTCisVec<2>]>>;

		def X86vsrav : SDNode<"X86ISD::VSRAV" , SDTIntShiftOp>;

def X86vshli : SDNode<"X86ISD::VSHLI", SDTIntShiftOp>;		def X86vshli : SDNode<"X86ISD::VSHLI", SDTIntShiftOp>;
def X86vsrli : SDNode<"X86ISD::VSRLI", SDTIntShiftOp>;		def X86vsrli : SDNode<"X86ISD::VSRLI", SDTIntShiftOp>;
def X86vsrai : SDNode<"X86ISD::VSRAI", SDTIntShiftOp>;		def X86vsrai : SDNode<"X86ISD::VSRAI", SDTIntShiftOp>;

def X86vrotli : SDNode<"X86ISD::VROTLI", SDTIntShiftOp>;		def X86vrotli : SDNode<"X86ISD::VROTLI", SDTIntShiftOp>;
def X86vrotri : SDNode<"X86ISD::VROTRI", SDTIntShiftOp>;		def X86vrotri : SDNode<"X86ISD::VROTRI", SDTIntShiftOp>;

def X86vprot : SDNode<"X86ISD::VPROT",		def X86vprot : SDNode<"X86ISD::VPROT",
▲ Show 20 Lines • Show All 804 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,541 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable2[] = {
{ X86::VPSLLVDrr, X86::VPSLLVDrm, 0 },		{ X86::VPSLLVDrr, X86::VPSLLVDrm, 0 },
{ X86::VPSLLVDYrr, X86::VPSLLVDYrm, 0 },		{ X86::VPSLLVDYrr, X86::VPSLLVDYrm, 0 },
{ X86::VPSLLVQrr, X86::VPSLLVQrm, 0 },		{ X86::VPSLLVQrr, X86::VPSLLVQrm, 0 },
{ X86::VPSLLVQYrr, X86::VPSLLVQYrm, 0 },		{ X86::VPSLLVQYrr, X86::VPSLLVQYrm, 0 },
{ X86::VPSRADYrr, X86::VPSRADYrm, 0 },		{ X86::VPSRADYrr, X86::VPSRADYrm, 0 },
{ X86::VPSRAWYrr, X86::VPSRAWYrm, 0 },		{ X86::VPSRAWYrr, X86::VPSRAWYrm, 0 },
{ X86::VPSRAVDrr, X86::VPSRAVDrm, 0 },		{ X86::VPSRAVDrr, X86::VPSRAVDrm, 0 },
{ X86::VPSRAVDYrr, X86::VPSRAVDYrm, 0 },		{ X86::VPSRAVDYrr, X86::VPSRAVDYrm, 0 },
		{ X86::VPSRAVD_Intrr, X86::VPSRAVD_Intrm, 0 },
		{ X86::VPSRAVD_IntYrr, X86::VPSRAVD_IntYrm, 0 },
{ X86::VPSRLDYrr, X86::VPSRLDYrm, 0 },		{ X86::VPSRLDYrr, X86::VPSRLDYrm, 0 },
{ X86::VPSRLQYrr, X86::VPSRLQYrm, 0 },		{ X86::VPSRLQYrr, X86::VPSRLQYrm, 0 },
{ X86::VPSRLWYrr, X86::VPSRLWYrm, 0 },		{ X86::VPSRLWYrr, X86::VPSRLWYrm, 0 },
{ X86::VPSRLVDrr, X86::VPSRLVDrm, 0 },		{ X86::VPSRLVDrr, X86::VPSRLVDrm, 0 },
{ X86::VPSRLVDYrr, X86::VPSRLVDYrm, 0 },		{ X86::VPSRLVDYrr, X86::VPSRLVDYrm, 0 },
{ X86::VPSRLVQrr, X86::VPSRLVQrm, 0 },		{ X86::VPSRLVQrr, X86::VPSRLVQrm, 0 },
{ X86::VPSRLVQYrr, X86::VPSRLVQYrm, 0 },		{ X86::VPSRLVQYrr, X86::VPSRLVQYrm, 0 },
{ X86::VPSUBBYrr, X86::VPSUBBYrm, 0 },		{ X86::VPSUBBYrr, X86::VPSUBBYrm, 0 },
▲ Show 20 Lines • Show All 5,972 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,758 Lines • ▼ Show 20 Lines
	}			}

	let Predicates = [HasAVX2, NoVLX] in {			let Predicates = [HasAVX2, NoVLX] in {
	defm VPSLLVD : avx2_var_shift<0x47, "vpsllvd", shl, v4i32, v8i32>;			defm VPSLLVD : avx2_var_shift<0x47, "vpsllvd", shl, v4i32, v8i32>;
	defm VPSLLVQ : avx2_var_shift<0x47, "vpsllvq", shl, v2i64, v4i64>, VEX_W;			defm VPSLLVQ : avx2_var_shift<0x47, "vpsllvq", shl, v2i64, v4i64>, VEX_W;
	defm VPSRLVD : avx2_var_shift<0x45, "vpsrlvd", srl, v4i32, v8i32>;			defm VPSRLVD : avx2_var_shift<0x45, "vpsrlvd", srl, v4i32, v8i32>;
	defm VPSRLVQ : avx2_var_shift<0x45, "vpsrlvq", srl, v2i64, v4i64>, VEX_W;			defm VPSRLVQ : avx2_var_shift<0x45, "vpsrlvq", srl, v2i64, v4i64>, VEX_W;
	defm VPSRAVD : avx2_var_shift<0x46, "vpsravd", sra, v4i32, v8i32>;			defm VPSRAVD : avx2_var_shift<0x46, "vpsravd", sra, v4i32, v8i32>;
				let isCodeGenOnly = 1 in
				defm VPSRAVD_Int : avx2_var_shift<0x46, "vpsravd", X86vsrav, v4i32, v8i32>;
	}			}
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// VGATHER - GATHER Operations			// VGATHER - GATHER Operations
	multiclass avx2_gather<bits<8> opc, string OpcodeStr, RegisterClass RC256,			multiclass avx2_gather<bits<8> opc, string OpcodeStr, RegisterClass RC256,
	X86MemOperand memop128, X86MemOperand memop256> {			X86MemOperand memop128, X86MemOperand memop256> {
	def rm : AVX28I<opc, MRMSrcMem, (outs VR128:$dst, VR128:$mask_wb),			def rm : AVX28I<opc, MRMSrcMem, (outs VR128:$dst, VR128:$mask_wb),
	(ins VR128:$src1, memop128:$src2, VR128:$mask),			(ins VR128:$src1, memop128:$src2, VR128:$mask),
	!strconcat(OpcodeStr,			!strconcat(OpcodeStr,
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86IntrinsicsInfo.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx2_psllv_d, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_d, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_d_256, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_d_256, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_q, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_q, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psllv_q_256, INTR_TYPE_2OP, ISD::SHL, 0),		X86_INTRINSIC_DATA(avx2_psllv_q_256, INTR_TYPE_2OP, ISD::SHL, 0),
X86_INTRINSIC_DATA(avx2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx2_psrav_d, INTR_TYPE_2OP, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx2_psrav_d, INTR_TYPE_2OP, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx2_psrav_d_256, INTR_TYPE_2OP, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx2_psrav_d_256, INTR_TYPE_2OP, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),
▲ Show 20 Lines • Show All 1,035 Lines • ▼ Show 20 Lines
X86_INTRINSIC_DATA(avx512_mask_psra_w_128, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_128, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_w_256, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_256, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_w_512, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_w_512, INTR_TYPE_2OP_MASK, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psra_wi_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psra_wi_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrai_d, VSHIFT_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrai_d, VSHIFT_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrai_q, VSHIFT_MASK, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrai_q, VSHIFT_MASK, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_d, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_d, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q_128, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q_128, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav_q_256, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav_q_256, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav16_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav16_hi, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav32_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav32_hi, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav4_si, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav4_si, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav8_hi, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav8_hi, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrav8_si, INTR_TYPE_2OP_MASK, ISD::SRA, 0),		X86_INTRINSIC_DATA(avx512_mask_psrav8_si, INTR_TYPE_2OP_MASK, X86ISD::VSRAV, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_d_256, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_d_256, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_128, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_256, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_di_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_di_512, INTR_TYPE_2OP_IMM8_MASK, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_q, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_q, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(avx512_mask_psrl_q_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(avx512_mask_psrl_q_128, INTR_TYPE_2OP_MASK, X86ISD::VSRL, 0),
▲ Show 20 Lines • Show All 821 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-x86.ll

	Show First 20 Lines • Show All 1,451 Lines • ▼ Show 20 Lines
	;			;
	; AVX512VL-LABEL: test_x86_avx2_psrav_d:			; AVX512VL-LABEL: test_x86_avx2_psrav_d:
	; AVX512VL: ## BB#0:			; AVX512VL: ## BB#0:
	; AVX512VL-NEXT: vpsravd %xmm1, %xmm0, %xmm0			; AVX512VL-NEXT: vpsravd %xmm1, %xmm0, %xmm0
	; AVX512VL-NEXT: retl			; AVX512VL-NEXT: retl
	%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> %a0, <4 x i32> %a1) ; <<4 x i32>> [#uses=1]			%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> %a0, <4 x i32> %a1) ; <<4 x i32>> [#uses=1]
	ret <4 x i32> %res			ret <4 x i32> %res
	}			}
	declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

				define <4 x i32> @test_x86_avx2_psrav_d_const(<4 x i32> %a0, <4 x i32> %a1) {
				; AVX2-LABEL: test_x86_avx2_psrav_d_const:
				; AVX2: ## BB#0:
				; AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = [2,9,4294967284,23]
				; AVX2-NEXT: vpsravd LCPI90_1, %xmm0, %xmm0
				; AVX2-NEXT: retl
				;
				; AVX512VL-LABEL: test_x86_avx2_psrav_d_const:
				; AVX512VL: ## BB#0:
				; AVX512VL-NEXT: vmovdqa32 {{.*#+}} xmm0 = [2,9,4294967284,23]
				; AVX512VL-NEXT: vpsravd LCPI90_1, %xmm0, %xmm0
				; AVX512VL-NEXT: retl
				%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)
				ret <4 x i32> %res
				}
				declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

	define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {			define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {
	; AVX2-LABEL: test_x86_avx2_psrav_d_256:			; AVX2-LABEL: test_x86_avx2_psrav_d_256:
	; AVX2: ## BB#0:			; AVX2: ## BB#0:
	; AVX2-NEXT: vpsravd %ymm1, %ymm0, %ymm0			; AVX2-NEXT: vpsravd %ymm1, %ymm0, %ymm0
	; AVX2-NEXT: retl			; AVX2-NEXT: retl
	;			;
	; AVX512VL-LABEL: test_x86_avx2_psrav_d_256:			; AVX512VL-LABEL: test_x86_avx2_psrav_d_256:
	; AVX512VL: ## BB#0:			; AVX512VL: ## BB#0:
	; AVX512VL-NEXT: vpsravd %ymm1, %ymm0, %ymm0			; AVX512VL-NEXT: vpsravd %ymm1, %ymm0, %ymm0
	; AVX512VL-NEXT: retl			; AVX512VL-NEXT: retl
	%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> %a0, <8 x i32> %a1) ; <<8 x i32>> [#uses=1]			%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> %a0, <8 x i32> %a1) ; <<8 x i32>> [#uses=1]
	ret <8 x i32> %res			ret <8 x i32> %res
	}			}

				define <8 x i32> @test_x86_avx2_psrav_d_256_const(<8 x i32> %a0, <8 x i32> %a1) {
				; AVX2-LABEL: test_x86_avx2_psrav_d_256_const:
				; AVX2: ## BB#0:
				; AVX2-NEXT: vmovdqa {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
				; AVX2-NEXT: vpsravd LCPI92_1, %ymm0, %ymm0
				; AVX2-NEXT: retl
				;
				; AVX512VL-LABEL: test_x86_avx2_psrav_d_256_const:
				; AVX512VL: ## BB#0:
				; AVX512VL-NEXT: vmovdqa32 {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
				; AVX512VL-NEXT: vpsravd LCPI92_1, %ymm0, %ymm0
				; AVX512VL-NEXT: retl
				%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>)
				ret <8 x i32> %res
				}
	declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone			declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone

	define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {			define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {
	; AVX2-LABEL: test_x86_avx2_gather_d_pd:			; AVX2-LABEL: test_x86_avx2_gather_d_pd:
	; AVX2: ## BB#0:			; AVX2: ## BB#0:
	; AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax			; AVX2-NEXT: movl {{[0-9]+}}(%esp), %eax
	; AVX2-NEXT: vgatherdpd %xmm2, (%eax,%xmm1,2), %xmm0			; AVX2-NEXT: vgatherdpd %xmm2, (%eax,%xmm1,2), %xmm0
	; AVX2-NEXT: retl			; AVX2-NEXT: retl
	▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics.ll

Show First 20 Lines • Show All 2,929 Lines • ▼ Show 20 Lines	; AVX512F-32-NEXT: retl
%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3)		%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3)
%res1 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> zeroinitializer, i32 %x3)		%res1 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> zeroinitializer, i32 %x3)
%res2 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 -1)		%res2 = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 -1)
%res3 = add <32 x i16> %res, %res1		%res3 = add <32 x i16> %res, %res1
%res4 = add <32 x i16> %res3, %res2		%res4 = add <32 x i16> %res3, %res2
ret <32 x i16> %res4		ret <32 x i16> %res4
}		}

		define <32 x i16>@test_int_x86_avx512_mask_psrav32_hi_const(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3) {
		; AVX512BW-LABEL: test_int_x86_avx512_mask_psrav32_hi_const:
		; AVX512BW: ## BB#0:
		; AVX512BW-NEXT: vmovdqu16 {{.*#+}} zmm0 = [2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51]
		; AVX512BW-NEXT: vpsravw {{.*}}(%rip), %zmm0, %zmm0
		; AVX512BW-NEXT: retq
		;
		; AVX512F-32-LABEL: test_int_x86_avx512_mask_psrav32_hi_const:
		; AVX512F-32: # BB#0:
		; AVX512F-32-NEXT: vmovdqu16 {{.*#+}} zmm0 = [2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51,2,9,65524,23,65510,37,65496,51]
		; AVX512F-32-NEXT: vpsravw {{\.LCPI.*}}, %zmm0, %zmm0
		; AVX512F-32-NEXT: retl
		%res = call <32 x i16> @llvm.x86.avx512.mask.psrav32.hi(<32 x i16> <i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51, i16 2, i16 9, i16 -12, i16 23, i16 -26, i16 37, i16 -40, i16 51>,
		<32 x i16> <i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49, i16 1, i16 10, i16 35, i16 52, i16 69, i16 9, i16 16, i16 49>,
		<32 x i16> zeroinitializer, i32 -1)
		ret <32 x i16> %res
		}

declare <32 x i16> @llvm.x86.avx512.mask.psll.w.512(<32 x i16>, <8 x i16>, <32 x i16>, i32)		declare <32 x i16> @llvm.x86.avx512.mask.psll.w.512(<32 x i16>, <8 x i16>, <32 x i16>, i32)

define <32 x i16>@test_int_x86_avx512_mask_psll_w_512(<32 x i16> %x0, <8 x i16> %x1, <32 x i16> %x2, i32 %x3) {		define <32 x i16>@test_int_x86_avx512_mask_psll_w_512(<32 x i16> %x0, <8 x i16> %x1, <32 x i16> %x2, i32 %x3) {
; AVX512BW-LABEL: test_int_x86_avx512_mask_psll_w_512:		; AVX512BW-LABEL: test_int_x86_avx512_mask_psll_w_512:
; AVX512BW: ## BB#0:		; AVX512BW: ## BB#0:
; AVX512BW-NEXT: kmovd %edi, %k1		; AVX512BW-NEXT: kmovd %edi, %k1
; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}		; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}
; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm3 {%k1} {z}		; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm3 {%k1} {z}
▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,944 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq ## encoding: [0xc3]
%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 %x3)		%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 %x3)
%res1 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> zeroinitializer, i8 %x3)		%res1 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> zeroinitializer, i8 %x3)
%res2 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 -1)		%res2 = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> %x0, <8 x i32> %x1, <8 x i32> %x2, i8 -1)
%res3 = add <8 x i32> %res, %res1		%res3 = add <8 x i32> %res, %res1
%res4 = add <8 x i32> %res3, %res2		%res4 = add <8 x i32> %res3, %res2
ret <8 x i32> %res4		ret <8 x i32> %res4
}		}

		define <8 x i32>@test_int_x86_avx512_mask_psrav8_si_const() {
		; CHECK-LABEL: test_int_x86_avx512_mask_psrav8_si_const:
		; CHECK: ## BB#0:
		; CHECK-NEXT: vmovdqa32 {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
		; CHECK-NEXT: ## encoding: [0x62,0xf1,0x7d,0x28,0x6f,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI510_0-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: vpsravd {{.*}}(%rip), %ymm0, %ymm0 ## encoding: [0x62,0xf2,0x7d,0x28,0x46,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI510_1-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: retq ## encoding: [0xc3]
		%res = call <8 x i32> @llvm.x86.avx512.mask.psrav8.si(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>, <8 x i32> zeroinitializer, i8 -1)
		ret <8 x i32> %res
		}

declare <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64>, <2 x i64>, <2 x i64>, i8)		declare <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64>, <2 x i64>, <2 x i64>, i8)

define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3) {		define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3) {
; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128:		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]		; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x09,0x46,0xd1]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x09,0x46,0xd1]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0x89,0x46,0xd9]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0x89,0x46,0xd9]
; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm0 ## encoding: [0x62,0xf2,0xfd,0x08,0x46,0xc1]		; CHECK-NEXT: vpsravq %xmm1, %xmm0, %xmm0 ## encoding: [0x62,0xf2,0xfd,0x08,0x46,0xc1]
; CHECK-NEXT: vpaddq %xmm3, %xmm2, %xmm1 ## encoding: [0x62,0xf1,0xed,0x08,0xd4,0xcb]		; CHECK-NEXT: vpaddq %xmm3, %xmm2, %xmm1 ## encoding: [0x62,0xf1,0xed,0x08,0xd4,0xcb]
; CHECK-NEXT: vpaddq %xmm0, %xmm1, %xmm0 ## encoding: [0x62,0xf1,0xf5,0x08,0xd4,0xc0]		; CHECK-NEXT: vpaddq %xmm0, %xmm1, %xmm0 ## encoding: [0x62,0xf1,0xf5,0x08,0xd4,0xc0]
; CHECK-NEXT: retq ## encoding: [0xc3]		; CHECK-NEXT: retq ## encoding: [0xc3]
%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3)		%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 %x3)
%res1 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> zeroinitializer, i8 %x3)		%res1 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> zeroinitializer, i8 %x3)
%res2 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 -1)		%res2 = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> %x0, <2 x i64> %x1, <2 x i64> %x2, i8 -1)
%res3 = add <2 x i64> %res, %res1		%res3 = add <2 x i64> %res, %res1
%res4 = add <2 x i64> %res3, %res2		%res4 = add <2 x i64> %res3, %res2
ret <2 x i64> %res4		ret <2 x i64> %res4
}		}

		define <2 x i64>@test_int_x86_avx512_mask_psrav_q_128_const(i8 %x3) {
		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_128_const:
		; CHECK: ## BB#0:
		; CHECK-NEXT: vmovdqa64 {{.*#+}} xmm0 = [2,18446744073709551607]
		; CHECK-NEXT: ## encoding: [0x62,0xf1,0xfd,0x08,0x6f,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI512_0-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: vpsravq {{.*}}(%rip), %xmm0, %xmm0 ## encoding: [0x62,0xf2,0xfd,0x08,0x46,0x05,A,A,A,A]
		; CHECK-NEXT: ## fixup A - offset: 6, value: LCPI512_1-4, kind: reloc_riprel_4byte
		; CHECK-NEXT: retq ## encoding: [0xc3]
		%res = call <2 x i64> @llvm.x86.avx512.mask.psrav.q.128(<2 x i64> <i64 2, i64 -9>, <2 x i64> <i64 1, i64 90>, <2 x i64> zeroinitializer, i8 -1)
		ret <2 x i64> %res
		}

declare <4 x i64> @llvm.x86.avx512.mask.psrav.q.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)		declare <4 x i64> @llvm.x86.avx512.mask.psrav.q.256(<4 x i64>, <4 x i64>, <4 x i64>, i8)

define <4 x i64>@test_int_x86_avx512_mask_psrav_q_256(<4 x i64> %x0, <4 x i64> %x1, <4 x i64> %x2, i8 %x3) {		define <4 x i64>@test_int_x86_avx512_mask_psrav_q_256(<4 x i64> %x0, <4 x i64> %x1, <4 x i64> %x2, i8 %x3) {
; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_256:		; CHECK-LABEL: test_int_x86_avx512_mask_psrav_q_256:
; CHECK: ## BB#0:		; CHECK: ## BB#0:
; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]		; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x29,0x46,0xd1]		; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf2,0xfd,0x29,0x46,0xd1]
; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0xa9,0x46,0xd9]		; CHECK-NEXT: vpsravq %ymm1, %ymm0, %ymm3 {%k1} {z} ## encoding: [0x62,0xf2,0xfd,0xa9,0x46,0xd9]
▲ Show 20 Lines • Show All 1,329 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61235

llvm/trunk/lib/Target/X86/X86ISelLowering.h

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/lib/Target/X86/X86InstrAVX512.td

llvm/trunk/lib/Target/X86/X86InstrFragmentsSIMD.td

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/lib/Target/X86/X86InstrSSE.td

llvm/trunk/lib/Target/X86/X86IntrinsicsInfo.h

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-x86.ll

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics.ll

llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics.ll

[AVX512/AVX][Intrinsics] Fix Variable Bit Shift Right Arithmetic intrinsic lowering.
ClosedPublic