This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM.
ClosedPublic

Authored by craig.topper on Nov 6 2016, 9:39 AM.

Download Raw Diff

Details

Reviewers

RKSimon
zvi
delena

Commits

rG9d25c5e2fa36: [AVX-512] Add unmasked version of shift by immediate and shift by single…
rL286711: [AVX-512] Add unmasked version of shift by immediate and shift by single…

Summary

This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.

This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.

Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.

Diff Detail

Event Timeline

craig.topper updated this revision to Diff 76989.Nov 6 2016, 9:39 AM

craig.topper retitled this revision from to [AVX-512] Add unmasked version of shift by immediate and shift by single element in XMM..

craig.topper updated this object.

craig.topper added reviewers: RKSimon, delena, zvi.

craig.topper updated this object.

craig.topper added a subscriber: llvm-commits.

Are we testing for the lowering to mask/maskz with the separate select instruction anywhere?

No., I can add it. Or it will be implicitly tested once I autoupgrade the original intrinsics to these intrinsics plus select.

In D26333#588337, @craig.topper wrote:

No., I can add it. Or it will be implicitly tested once I autoupgrade the original intrinsics to these intrinsics plus select.

Adding an explicit test now would be better I think - which just made me realise if/when we do start removing upgrades we might be in danger of losing such test coverage if we're not careful!

Add tests for masking the new intrinsics.

Do you mean InstCombine on IR-2-IR or DAG combine?

LGTM

In D26333#591505, @delena wrote:

Do you mean InstCombine on IR-2-IR or DAG combine?

InstCombineCalls.cpp has code to convert SSE/AVX2 vector shifts to generic shifts if we can guarantee that the shift values are in range. With this change Craig will be able to add support for the AVX512 equivalents without having to add mask/maskz support.

This revision is now accepted and ready to land.Nov 10 2016, 5:57 AM

Please wait! I disagree with adding a bunch of unmasked intrinsics additionally to the masked. If you want to create IR in InstCombineCalls, you can do this for masked intrinsics as well. I'm afraid that we will need to duplicate hundreds of intrinsics. I want to ask Intel guys opinion before commit.

This revision now requires changes to proceed.Nov 10 2016, 6:52 AM

Elena, this the first patch of a four step plan.

Add new unmasked intrinsics
Add support for new unmasked intrinsics to InstCombineCalls
Switch clang to new unmasked intrinsics
Remove masked intrinsics and auto upgrade them to the unmasked instrinsic plus select.

So in the end there wont' be duplicate intrinsics. But there will be quite a bit of autoupgrade cases.

Is the reason for the temporary duplication the need to make changes in two repositories, LLVM and Clang? If yes, Elena, how can this be done differently?

We could certainly teach InstCombine to handle the masking for the existing intrinsics. I was just going for consistency and not spreading understanding of masking IR to another place.

One concern i have about masking in general is that for a lot of legacy instructions we have have unmasked builtins and I've been wrapping them with selects in IR in the frontend. So the middle end optimizers see the selects and can maybe optimize them through constant folding and the like.. But for 512-bit intrinsics we don't have the selects in IR and instead we have instrinsics that don't expose the same ability to the middle end optimizers.

Right now I just want to replace this subset. I'll have to review more of InstCombineCalls for other cases.

delena accepted this revision.Nov 11 2016, 12:04 AM

delena edited edge metadata.

This revision is now accepted and ready to land.Nov 11 2016, 12:04 AM

Closed by commit rL286711: [AVX-512] Add unmasked version of shift by immediate and shift by single… (authored by ctopper). · Explain WhyNov 11 2016, 9:38 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsX86.td

70 lines

lib/

Target/

X86/

X86IntrinsicsInfo.h

22 lines

test/

CodeGen/

X86/

avx512-intrinsics.ll

410 lines

avx512bw-intrinsics.ll

312 lines

avx512vl-intrinsics.ll

144 lines

Diff 77450

include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,806 Lines • ▼ Show 20 Lines	def int_x86_avx2_psrli_q : GCCBuiltin<"__builtin_ia32_psrlqi256">,
llvm_i32_ty], [IntrNoMem]>;		llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx2_psrai_w : GCCBuiltin<"__builtin_ia32_psrawi256">,		def int_x86_avx2_psrai_w : GCCBuiltin<"__builtin_ia32_psrawi256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_i32_ty], [IntrNoMem]>;		llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx2_psrai_d : GCCBuiltin<"__builtin_ia32_psradi256">,		def int_x86_avx2_psrai_d : GCCBuiltin<"__builtin_ia32_psradi256">,
Intrinsic<[llvm_v8i32_ty], [llvm_v8i32_ty,		Intrinsic<[llvm_v8i32_ty], [llvm_v8i32_ty,
llvm_i32_ty], [IntrNoMem]>;		llvm_i32_ty], [IntrNoMem]>;

		def int_x86_avx512_psra_q_128 : GCCBuiltin<"__builtin_ia32_psraq128">,
		Intrinsic<[llvm_v2i64_ty], [llvm_v2i64_ty,
		llvm_v2i64_ty], [IntrNoMem]>;
		def int_x86_avx512_psra_q_256 : GCCBuiltin<"__builtin_ia32_psraq256">,
		Intrinsic<[llvm_v4i64_ty], [llvm_v4i64_ty,
		llvm_v2i64_ty], [IntrNoMem]>;

		def int_x86_avx512_psrai_q_128 : GCCBuiltin<"__builtin_ia32_psraqi128">,
		Intrinsic<[llvm_v2i64_ty], [llvm_v2i64_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrai_q_256 : GCCBuiltin<"__builtin_ia32_psraqi256">,
		Intrinsic<[llvm_v4i64_ty], [llvm_v4i64_ty,
		llvm_i32_ty], [IntrNoMem]>;

		def int_x86_avx512_psll_w_512 : GCCBuiltin<"__builtin_ia32_psllw512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_v8i16_ty], [IntrNoMem]>;
		def int_x86_avx512_psll_d_512 : GCCBuiltin<"__builtin_ia32_pslld512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_v4i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psll_q_512 : GCCBuiltin<"__builtin_ia32_psllq512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_v2i64_ty], [IntrNoMem]>;
		def int_x86_avx512_psrl_w_512 : GCCBuiltin<"__builtin_ia32_psrlw512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_v8i16_ty], [IntrNoMem]>;
		def int_x86_avx512_psrl_d_512 : GCCBuiltin<"__builtin_ia32_psrld512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_v4i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrl_q_512 : GCCBuiltin<"__builtin_ia32_psrlq512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_v2i64_ty], [IntrNoMem]>;
		def int_x86_avx512_psra_w_512 : GCCBuiltin<"__builtin_ia32_psraw512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_v8i16_ty], [IntrNoMem]>;
		def int_x86_avx512_psra_d_512 : GCCBuiltin<"__builtin_ia32_psrad512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_v4i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psra_q_512 : GCCBuiltin<"__builtin_ia32_psraq512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_v2i64_ty], [IntrNoMem]>;

		def int_x86_avx512_pslli_w_512 : GCCBuiltin<"__builtin_ia32_psllwi512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_pslli_d_512 : GCCBuiltin<"__builtin_ia32_pslldi512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_pslli_q_512 : GCCBuiltin<"__builtin_ia32_psllqi512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrli_w_512 : GCCBuiltin<"__builtin_ia32_psrlwi512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrli_d_512 : GCCBuiltin<"__builtin_ia32_psrldi512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrli_q_512 : GCCBuiltin<"__builtin_ia32_psrlqi512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrai_w_512 : GCCBuiltin<"__builtin_ia32_psrawi512">,
		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrai_d_512 : GCCBuiltin<"__builtin_ia32_psradi512">,
		Intrinsic<[llvm_v16i32_ty], [llvm_v16i32_ty,
		llvm_i32_ty], [IntrNoMem]>;
		def int_x86_avx512_psrai_q_512 : GCCBuiltin<"__builtin_ia32_psraqi512">,
		Intrinsic<[llvm_v8i64_ty], [llvm_v8i64_ty,
		llvm_i32_ty], [IntrNoMem]>;

def int_x86_avx512_mask_psrl_w_512 : GCCBuiltin<"__builtin_ia32_psrlw512_mask">,		def int_x86_avx512_mask_psrl_w_512 : GCCBuiltin<"__builtin_ia32_psrlw512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
llvm_v8i16_ty, llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_v8i16_ty, llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psrl_wi_512 : GCCBuiltin<"__builtin_ia32_psrlwi512_mask">,		def int_x86_avx512_mask_psrl_wi_512 : GCCBuiltin<"__builtin_ia32_psrlwi512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
llvm_i32_ty, llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_i32_ty, llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;

def int_x86_avx512_mask_psra_w_512 : GCCBuiltin<"__builtin_ia32_psraw512_mask">,		def int_x86_avx512_mask_psra_w_512 : GCCBuiltin<"__builtin_ia32_psraw512_mask">,
▲ Show 20 Lines • Show All 4,958 Lines • Show Last 20 Lines

lib/Target/X86/X86IntrinsicsInfo.h

Show First 20 Lines • Show All 1,522 Lines • ▼ Show 20 Lines	X86_INTRINSIC_DATA(avx512_maskz_vpmadd52h_uq_512, FMA_OP_MASKZ,
X86ISD::VPMADD52H, 0),		X86ISD::VPMADD52H, 0),
X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_128, FMA_OP_MASKZ,		X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_128, FMA_OP_MASKZ,
X86ISD::VPMADD52L, 0),		X86ISD::VPMADD52L, 0),
X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_256, FMA_OP_MASKZ,		X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_256, FMA_OP_MASKZ,
X86ISD::VPMADD52L, 0),		X86ISD::VPMADD52L, 0),
X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_512, FMA_OP_MASKZ,		X86_INTRINSIC_DATA(avx512_maskz_vpmadd52l_uq_512, FMA_OP_MASKZ,
X86ISD::VPMADD52L, 0),		X86ISD::VPMADD52L, 0),
X86_INTRINSIC_DATA(avx512_psad_bw_512, INTR_TYPE_2OP, X86ISD::PSADBW, 0),		X86_INTRINSIC_DATA(avx512_psad_bw_512, INTR_TYPE_2OP, X86ISD::PSADBW, 0),
		X86_INTRINSIC_DATA(avx512_psll_d_512, INTR_TYPE_2OP, X86ISD::VSHL, 0),
		X86_INTRINSIC_DATA(avx512_psll_q_512, INTR_TYPE_2OP, X86ISD::VSHL, 0),
		X86_INTRINSIC_DATA(avx512_psll_w_512, INTR_TYPE_2OP, X86ISD::VSHL, 0),
		X86_INTRINSIC_DATA(avx512_pslli_d_512, VSHIFT, X86ISD::VSHLI, 0),
		X86_INTRINSIC_DATA(avx512_pslli_q_512, VSHIFT, X86ISD::VSHLI, 0),
		X86_INTRINSIC_DATA(avx512_pslli_w_512, VSHIFT, X86ISD::VSHLI, 0),
		X86_INTRINSIC_DATA(avx512_psra_d_512, INTR_TYPE_2OP, X86ISD::VSRA, 0),
		X86_INTRINSIC_DATA(avx512_psra_q_128, INTR_TYPE_2OP, X86ISD::VSRA, 0),
		X86_INTRINSIC_DATA(avx512_psra_q_256, INTR_TYPE_2OP, X86ISD::VSRA, 0),
		X86_INTRINSIC_DATA(avx512_psra_q_512, INTR_TYPE_2OP, X86ISD::VSRA, 0),
		X86_INTRINSIC_DATA(avx512_psra_w_512, INTR_TYPE_2OP, X86ISD::VSRA, 0),
		X86_INTRINSIC_DATA(avx512_psrai_d_512, VSHIFT, X86ISD::VSRAI, 0),
		X86_INTRINSIC_DATA(avx512_psrai_q_128, VSHIFT, X86ISD::VSRAI, 0),
		X86_INTRINSIC_DATA(avx512_psrai_q_256, VSHIFT, X86ISD::VSRAI, 0),
		X86_INTRINSIC_DATA(avx512_psrai_q_512, VSHIFT, X86ISD::VSRAI, 0),
		X86_INTRINSIC_DATA(avx512_psrai_w_512, VSHIFT, X86ISD::VSRAI, 0),
		X86_INTRINSIC_DATA(avx512_psrl_d_512, INTR_TYPE_2OP, X86ISD::VSRL, 0),
		X86_INTRINSIC_DATA(avx512_psrl_q_512, INTR_TYPE_2OP, X86ISD::VSRL, 0),
		X86_INTRINSIC_DATA(avx512_psrl_w_512, INTR_TYPE_2OP, X86ISD::VSRL, 0),
		X86_INTRINSIC_DATA(avx512_psrli_d_512, VSHIFT, X86ISD::VSRLI, 0),
		X86_INTRINSIC_DATA(avx512_psrli_q_512, VSHIFT, X86ISD::VSRLI, 0),
		X86_INTRINSIC_DATA(avx512_psrli_w_512, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx512_ptestm_b_128, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_b_128, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_b_256, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_b_256, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_b_512, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_b_512, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_d_128, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_d_128, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_d_256, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_d_256, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_d_512, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_d_512, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_q_128, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_q_128, CMP_MASK, X86ISD::TESTM, 0),
X86_INTRINSIC_DATA(avx512_ptestm_q_256, CMP_MASK, X86ISD::TESTM, 0),		X86_INTRINSIC_DATA(avx512_ptestm_q_256, CMP_MASK, X86ISD::TESTM, 0),
▲ Show 20 Lines • Show All 227 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,638 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vfmadd213ss (%rdi), %xmm0, %xmm1 {%k1} {z}			; CHECK-NEXT: vfmadd213ss (%rdi), %xmm0, %xmm1 {%k1} {z}
	; CHECK-NEXT: vmovaps %xmm1, %xmm0			; CHECK-NEXT: vmovaps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%q = load float, float* %ptr_b			%q = load float, float* %ptr_b
	%vecinit.i = insertelement <4 x float> undef, float %q, i32 0			%vecinit.i = insertelement <4 x float> undef, float %q, i32 0
	%res = call <4 x float> @llvm.x86.avx512.maskz.vfmadd.ss(<4 x float> %x0, <4 x float> %x1, <4 x float> %vecinit.i, i8 0, i32 4)			%res = call <4 x float> @llvm.x86.avx512.maskz.vfmadd.ss(<4 x float> %x0, <4 x float> %x1, <4 x float> %vecinit.i, i8 0, i32 4)
	ret < 4 x float> %res			ret < 4 x float> %res
	}			}

				define <16 x i32> @test_x86_avx512_psll_d_512(<16 x i32> %a0, <4 x i32> %a1) {
				; CHECK-LABEL: test_x86_avx512_psll_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpslld %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psll.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_psll_d_512(<16 x i32> %a0, <4 x i32> %a1, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psll_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpslld %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psll.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_psll_d_512(<16 x i32> %a0, <4 x i32> %a1, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psll_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpslld %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psll.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.psll.d.512(<16 x i32>, <4 x i32>) nounwind readnone


				define <8 x i64> @test_x86_avx512_psll_q_512(<8 x i64> %a0, <2 x i64> %a1) {
				; CHECK-LABEL: test_x86_avx512_psll_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsllq %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psll.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_psll_q_512(<8 x i64> %a0, <2 x i64> %a1, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psll_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsllq %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psll.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_psll_q_512(<8 x i64> %a0, <2 x i64> %a1, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psll_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsllq %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psll.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.psll.q.512(<8 x i64>, <2 x i64>) nounwind readnone


				define <16 x i32> @test_x86_avx512_pslli_d_512(<16 x i32> %a0) {
				; CHECK-LABEL: test_x86_avx512_pslli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpslld $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.pslli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_pslli_d_512(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_pslli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpslld $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.pslli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_pslli_d_512(<16 x i32> %a0, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_pslli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpslld $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.pslli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.pslli.d.512(<16 x i32>, i32) nounwind readnone


				define <8 x i64> @test_x86_avx512_pslli_q_512(<8 x i64> %a0) {
				; CHECK-LABEL: test_x86_avx512_pslli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsllq $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.pslli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_pslli_q_512(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_pslli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsllq $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.pslli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_pslli_q_512(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_pslli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsllq $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.pslli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.pslli.q.512(<8 x i64>, i32) nounwind readnone


				define <8 x i64> @test_x86_avx512_psra_q_512(<8 x i64> %a0, <2 x i64> %a1) {
				; CHECK-LABEL: test_x86_avx512_psra_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psra.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_psra_q_512(<8 x i64> %a0, <2 x i64> %a1, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psra_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsraq %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psra.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_psra_q_512(<8 x i64> %a0, <2 x i64> %a1, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psra_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsraq %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psra.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.psra.q.512(<8 x i64>, <2 x i64>) nounwind readnone


				define <16 x i32> @test_x86_avx512_psra_d_512(<16 x i32> %a0, <4 x i32> %a1) {
				; CHECK-LABEL: test_x86_avx512_psra_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrad %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psra.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_psra_d_512(<16 x i32> %a0, <4 x i32> %a1, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psra_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrad %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psra.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_psra_d_512(<16 x i32> %a0, <4 x i32> %a1, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psra_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrad %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psra.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.psra.d.512(<16 x i32>, <4 x i32>) nounwind readnone



				define <8 x i64> @test_x86_avx512_psrai_q_512(<8 x i64> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrai_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrai.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_psrai_q_512(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrai_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsraq $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrai.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_psrai_q_512(<8 x i64> %a0, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrai_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsraq $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrai.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.psrai.q.512(<8 x i64>, i32) nounwind readnone


				define <16 x i32> @test_x86_avx512_psrai_d_512(<16 x i32> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrai_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrad $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrai.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_psrai_d_512(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrai_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrad $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrai.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_psrai_d_512(<16 x i32> %a0, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrai_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrad $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrai.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.psrai.d.512(<16 x i32>, i32) nounwind readnone



				define <16 x i32> @test_x86_avx512_psrl_d_512(<16 x i32> %a0, <4 x i32> %a1) {
				; CHECK-LABEL: test_x86_avx512_psrl_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrld %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrl.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_psrl_d_512(<16 x i32> %a0, <4 x i32> %a1, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrl_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrld %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrl.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_psrl_d_512(<16 x i32> %a0, <4 x i32> %a1, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrl_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrld %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrl.d.512(<16 x i32> %a0, <4 x i32> %a1) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.psrl.d.512(<16 x i32>, <4 x i32>) nounwind readnone


				define <8 x i64> @test_x86_avx512_psrl_q_512(<8 x i64> %a0, <2 x i64> %a1) {
				; CHECK-LABEL: test_x86_avx512_psrl_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrlq %xmm1, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrl.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_psrl_q_512(<8 x i64> %a0, <2 x i64> %a1, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrl_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrlq %xmm1, %zmm0, %zmm2 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm2, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrl.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_psrl_q_512(<8 x i64> %a0, <2 x i64> %a1, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrl_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrlq %xmm1, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrl.q.512(<8 x i64> %a0, <2 x i64> %a1) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.psrl.q.512(<8 x i64>, <2 x i64>) nounwind readnone


				define <16 x i32> @test_x86_avx512_psrli_d_512(<16 x i32> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrld $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				ret <16 x i32> %res
				}
				define <16 x i32> @test_x86_avx512_mask_psrli_d_512(<16 x i32> %a0, <16 x i32> %passthru, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrld $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> %passthru
				ret <16 x i32> %res2
				}
				define <16 x i32> @test_x86_avx512_maskz_psrli_d_512(<16 x i32> %a0, i16 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrli_d_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrld $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <16 x i32> @llvm.x86.avx512.psrli.d.512(<16 x i32> %a0, i32 7) ; <<16 x i32>> [#uses=1]
				%mask.cast = bitcast i16 %mask to <16 x i1>
				%res2 = select <16 x i1> %mask.cast, <16 x i32> %res, <16 x i32> zeroinitializer
				ret <16 x i32> %res2
				}
				declare <16 x i32> @llvm.x86.avx512.psrli.d.512(<16 x i32>, i32) nounwind readnone


				define <8 x i64> @test_x86_avx512_psrli_q_512(<8 x i64> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsrlq $7, %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				ret <8 x i64> %res
				}
				define <8 x i64> @test_x86_avx512_mask_psrli_q_512(<8 x i64> %a0, <8 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrlq $7, %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> %passthru
				ret <8 x i64> %res2
				}
				define <8 x i64> @test_x86_avx512_maskz_psrli_q_512(<8 x i64> %a0, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrli_q_512:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpsrlq $7, %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%res = call <8 x i64> @llvm.x86.avx512.psrli.q.512(<8 x i64> %a0, i32 7) ; <<8 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%res2 = select <8 x i1> %mask.cast, <8 x i64> %res, <8 x i64> zeroinitializer
				ret <8 x i64> %res2
				}
				declare <8 x i64> @llvm.x86.avx512.psrli.q.512(<8 x i64>, i32) nounwind readnone

test/CodeGen/X86/avx512bw-intrinsics.ll

	Show First 20 Lines • Show All 2,788 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%res = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> %x1, i32 -1)			%res = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> %x1, i32 -1)
	%res1 = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> %x1, i32 %mask)			%res1 = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> %x1, i32 %mask)
	%res2 = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> zeroinitializer, i32 %mask)			%res2 = call <32 x i16> @llvm.x86.avx512.mask.pbroadcast.w.gpr.512(i16 %x0, <32 x i16> zeroinitializer, i32 %mask)
	%res3 = add <32 x i16> %res, %res1			%res3 = add <32 x i16> %res, %res1
	%res4 = add <32 x i16> %res2, %res3			%res4 = add <32 x i16> %res2, %res3
	ret <32 x i16> %res4			ret <32 x i16> %res4
	}			}


				define <32 x i16> @test_x86_avx512_psll_w_512(<32 x i16> %a0, <8 x i16> %a1) {
				; AVX512BW-LABEL: test_x86_avx512_psll_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_psll_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsllw %xmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psll.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_psll_w_512(<32 x i16> %a0, <8 x i16> %a1, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_psll_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_psll_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsllw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psll.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_psll_w_512(<32 x i16> %a0, <8 x i16> %a1, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_psll_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsllw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_psll_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsllw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psll.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.psll.w.512(<32 x i16>, <8 x i16>) nounwind readnone


				define <32 x i16> @test_x86_avx512_pslli_w_512(<32 x i16> %a0) {
				; AVX512BW-LABEL: test_x86_avx512_pslli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsllw $7, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_pslli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsllw $7, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.pslli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_pslli_w_512(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_pslli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsllw $7, %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_pslli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsllw $7, %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.pslli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_pslli_w_512(<32 x i16> %a0, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_pslli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsllw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_pslli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsllw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.pslli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.pslli.w.512(<32 x i16>, i32) nounwind readnone


				define <32 x i16> @test_x86_avx512_psra_w_512(<32 x i16> %a0, <8 x i16> %a1) {
				; AVX512BW-LABEL: test_x86_avx512_psra_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsraw %xmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_psra_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsraw %xmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psra.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_psra_w_512(<32 x i16> %a0, <8 x i16> %a1, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_psra_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsraw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_psra_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsraw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psra.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_psra_w_512(<32 x i16> %a0, <8 x i16> %a1, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_psra_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsraw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_psra_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsraw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psra.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.psra.w.512(<32 x i16>, <8 x i16>) nounwind readnone


				define <32 x i16> @test_x86_avx512_psrai_w_512(<32 x i16> %a0) {
				; AVX512BW-LABEL: test_x86_avx512_psrai_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsraw $7, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_psrai_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsraw $7, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrai.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_psrai_w_512(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_psrai_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsraw $7, %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_psrai_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsraw $7, %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrai.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_psrai_w_512(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_psrai_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsraw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_psrai_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsraw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrai.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.psrai.w.512(<32 x i16>, i32) nounwind readnone


				define <32 x i16> @test_x86_avx512_psrl_w_512(<32 x i16> %a0, <8 x i16> %a1) {
				; AVX512BW-LABEL: test_x86_avx512_psrl_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsrlw %xmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_psrl_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsrlw %xmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrl.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_psrl_w_512(<32 x i16> %a0, <8 x i16> %a1, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_psrl_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsrlw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_psrl_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsrlw %xmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrl.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_psrl_w_512(<32 x i16> %a0, <8 x i16> %a1, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_psrl_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsrlw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_psrl_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsrlw %xmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrl.w.512(<32 x i16> %a0, <8 x i16> %a1) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.psrl.w.512(<32 x i16>, <8 x i16>) nounwind readnone


				define <32 x i16> @test_x86_avx512_psrli_w_512(<32 x i16> %a0) {
				; AVX512BW-LABEL: test_x86_avx512_psrli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_psrli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: vpsrlw $7, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				ret <32 x i16> %res
				}
				define <32 x i16> @test_x86_avx512_mask_psrli_w_512(<32 x i16> %a0, <32 x i16> %passthru, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_mask_psrli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_mask_psrli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsrlw $7, %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> %passthru
				ret <32 x i16> %res2
				}
				define <32 x i16> @test_x86_avx512_maskz_psrli_w_512(<32 x i16> %a0, i32 %mask) {
				; AVX512BW-LABEL: test_x86_avx512_maskz_psrli_w_512:
				; AVX512BW: ## BB#0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsrlw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_x86_avx512_maskz_psrli_w_512:
				; AVX512F-32: # BB#0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsrlw $7, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.psrli.w.512(<32 x i16> %a0, i32 7) ; <<32 x i16>> [#uses=1]
				%mask.cast = bitcast i32 %mask to <32 x i1>
				%res2 = select <32 x i1> %mask.cast, <32 x i16> %res, <32 x i16> zeroinitializer
				ret <32 x i16> %res2
				}
				declare <32 x i16> @llvm.x86.avx512.psrli.w.512(<32 x i16>, i32) nounwind readnone

test/CodeGen/X86/avx512vl-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,351 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: retq ## encoding: [0xc3]			; CHECK-NEXT: retq ## encoding: [0xc3]
	%res = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> %x1,i8 -1)			%res = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> %x1,i8 -1)
	%res1 = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> %x1,i8 %mask)			%res1 = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> %x1,i8 %mask)
	%res2 = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> zeroinitializer,i8 %mask)			%res2 = call <2 x i64> @llvm.x86.avx512.mask.pbroadcast.q.gpr.128(i64 %x0, <2 x i64> zeroinitializer,i8 %mask)
	%res3 = add <2 x i64> %res, %res1			%res3 = add <2 x i64> %res, %res1
	%res4 = add <2 x i64> %res2, %res3			%res4 = add <2 x i64> %res2, %res3
	ret <2 x i64> %res4			ret <2 x i64> %res4
	}			}


				define <2 x i64> @test_x86_avx512_psra_q_128(<2 x i64> %a0, <2 x i64> %a1) {
				; CHECK-LABEL: test_x86_avx512_psra_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq %xmm1, %xmm0, %xmm0 ## encoding: [0x62,0xf1,0xfd,0x08,0xe2,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psra.q.128(<2 x i64> %a0, <2 x i64> %a1) ; <<2 x i64>> [#uses=1]
				ret <2 x i64> %res
				}
				define <2 x i64> @test_x86_avx512_mask_psra_q_128(<2 x i64> %a0, <2 x i64> %a1, <2 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psra_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf1,0xfd,0x09,0xe2,0xd1]
				; CHECK-NEXT: vmovdqa64 %xmm2, %xmm0 ## encoding: [0x62,0xf1,0xfd,0x08,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psra.q.128(<2 x i64> %a0, <2 x i64> %a1) ; <<2 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%res2 = select <2 x i1> %mask.extract, <2 x i64> %res, <2 x i64> %passthru
				ret <2 x i64> %res2
				}
				define <2 x i64> @test_x86_avx512_maskz_psra_q_128(<2 x i64> %a0, <2 x i64> %a1, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psra_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq %xmm1, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0xfd,0x89,0xe2,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psra.q.128(<2 x i64> %a0, <2 x i64> %a1) ; <<2 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%res2 = select <2 x i1> %mask.extract, <2 x i64> %res, <2 x i64> zeroinitializer
				ret <2 x i64> %res2
				}
				declare <2 x i64> @llvm.x86.avx512.psra.q.128(<2 x i64>, <2 x i64>) nounwind readnone


				define <4 x i64> @test_x86_avx512_psra_q_256(<4 x i64> %a0, <2 x i64> %a1) {
				; CHECK-LABEL: test_x86_avx512_psra_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq %xmm1, %ymm0, %ymm0 ## encoding: [0x62,0xf1,0xfd,0x28,0xe2,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psra.q.256(<4 x i64> %a0, <2 x i64> %a1) ; <<4 x i64>> [#uses=1]
				ret <4 x i64> %res
				}
				define <4 x i64> @test_x86_avx512_mask_psra_q_256(<4 x i64> %a0, <2 x i64> %a1, <4 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psra_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq %xmm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf1,0xfd,0x29,0xe2,0xd1]
				; CHECK-NEXT: vmovdqa64 %ymm2, %ymm0 ## encoding: [0x62,0xf1,0xfd,0x28,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psra.q.256(<4 x i64> %a0, <2 x i64> %a1) ; <<4 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%res2 = select <4 x i1> %mask.extract, <4 x i64> %res, <4 x i64> %passthru
				ret <4 x i64> %res2
				}
				define <4 x i64> @test_x86_avx512_maskz_psra_q_256(<4 x i64> %a0, <2 x i64> %a1, <4 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psra_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq %xmm1, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0xfd,0xa9,0xe2,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psra.q.256(<4 x i64> %a0, <2 x i64> %a1) ; <<4 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%res2 = select <4 x i1> %mask.extract, <4 x i64> %res, <4 x i64> zeroinitializer
				ret <4 x i64> %res2
				}
				declare <4 x i64> @llvm.x86.avx512.psra.q.256(<4 x i64>, <2 x i64>) nounwind readnone


				define <2 x i64> @test_x86_avx512_psrai_q_128(<2 x i64> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrai_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq $7, %xmm0, %xmm0 ## encoding: [0x62,0xf1,0xfd,0x08,0x72,0xe0,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psrai.q.128(<2 x i64> %a0, i32 7) ; <<2 x i64>> [#uses=1]
				ret <2 x i64> %res
				}
				define <2 x i64> @test_x86_avx512_mask_psrai_q_128(<2 x i64> %a0, <2 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrai_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq $7, %xmm0, %xmm1 {%k1} ## encoding: [0x62,0xf1,0xf5,0x09,0x72,0xe0,0x07]
				; CHECK-NEXT: vmovdqa64 %xmm1, %xmm0 ## encoding: [0x62,0xf1,0xfd,0x08,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psrai.q.128(<2 x i64> %a0, i32 7) ; <<2 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%res2 = select <2 x i1> %mask.extract, <2 x i64> %res, <2 x i64> %passthru
				ret <2 x i64> %res2
				}
				define <2 x i64> @test_x86_avx512_maskz_psrai_q_128(<2 x i64> %a0, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrai_q_128:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq $7, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0xfd,0x89,0x72,0xe0,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <2 x i64> @llvm.x86.avx512.psrai.q.128(<2 x i64> %a0, i32 7) ; <<2 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
				%res2 = select <2 x i1> %mask.extract, <2 x i64> %res, <2 x i64> zeroinitializer
				ret <2 x i64> %res2
				}
				declare <2 x i64> @llvm.x86.avx512.psrai.q.128(<2 x i64>, i32) nounwind readnone


				define <4 x i64> @test_x86_avx512_psrai_q_256(<4 x i64> %a0) {
				; CHECK-LABEL: test_x86_avx512_psrai_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpsraq $7, %ymm0, %ymm0 ## encoding: [0x62,0xf1,0xfd,0x28,0x72,0xe0,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psrai.q.256(<4 x i64> %a0, i32 7) ; <<4 x i64>> [#uses=1]
				ret <4 x i64> %res
				}
				define <4 x i64> @test_x86_avx512_mask_psrai_q_256(<4 x i64> %a0, <4 x i64> %passthru, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_mask_psrai_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq $7, %ymm0, %ymm1 {%k1} ## encoding: [0x62,0xf1,0xf5,0x29,0x72,0xe0,0x07]
				; CHECK-NEXT: vmovdqa64 %ymm1, %ymm0 ## encoding: [0x62,0xf1,0xfd,0x28,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psrai.q.256(<4 x i64> %a0, i32 7) ; <<4 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%res2 = select <4 x i1> %mask.extract, <4 x i64> %res, <4 x i64> %passthru
				ret <4 x i64> %res2
				}
				define <4 x i64> @test_x86_avx512_maskz_psrai_q_256(<4 x i64> %a0, i8 %mask) {
				; CHECK-LABEL: test_x86_avx512_maskz_psrai_q_256:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1 ## encoding: [0xc5,0xf8,0x92,0xcf]
				; CHECK-NEXT: vpsraq $7, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0xfd,0xa9,0x72,0xe0,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%res = call <4 x i64> @llvm.x86.avx512.psrai.q.256(<4 x i64> %a0, i32 7) ; <<4 x i64>> [#uses=1]
				%mask.cast = bitcast i8 %mask to <8 x i1>
				%mask.extract = shufflevector <8 x i1> %mask.cast, <8 x i1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%res2 = select <4 x i1> %mask.extract, <4 x i64> %res, <4 x i64> zeroinitializer
				ret <4 x i64> %res2
				}
				declare <4 x i64> @llvm.x86.avx512.psrai.q.256(<4 x i64>, i32) nounwind readnone