This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
2/4
X86FixupInstTuning.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
tuning-shuffle-unpckpd-avx512.ll

Differential D147541

[X86] Add InstFixup for masked `unpck{l|h}pd` -> masked `shufpd`
ClosedPublic

Authored by goldstein.w.n on Apr 4 2023, 9:05 AM.

Download Raw Diff

Details

Reviewers

pengfei
RKSimon

Commits

rGfd347ceac490: [X86] Add InstFixup for masked `unpck{l|h}pd` -> masked `shufpd`

Summary

This is a follow up D147507 which removed the prior transformation to
shufps which was incorrect as the mask was for 64-bit double
elements, not 32-bit float elements. Using shufpd for the
replacement, however, preserves the mask semantics and has the same
benefits as shufps.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

goldstein.w.n created this revision.Apr 4 2023, 9:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:05 AM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

goldstein.w.n requested review of this revision.Apr 4 2023, 9:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 9:05 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

goldstein.w.n added reviewers: pengfei, RKSimon.Apr 4 2023, 9:06 AM

goldstein.w.n mentioned this in D147507: [X86] Disable masked UNPCKLPD/UNPCKHPD -> SHUFPS transformation.

Harbormaster completed remote builds in B223599: Diff 510833.Apr 4 2023, 10:23 AM

LGTM.

llvm/lib/Target/X86/X86FixupInstTuning.cpp
250–255	No sure if we should use `VSHUFPD` for them too.

This revision is now accepted and ready to land.Apr 4 2023, 7:41 PM

LGTM

llvm/lib/Target/X86/X86FixupInstTuning.cpp
250–255	+1

goldstein.w.n marked 2 inline comments as done.Apr 5 2023, 10:52 PM

shufps -> shufpd

This revision was landed with ongoing or failed builds.Apr 5 2023, 11:37 PM

Closed by commit rGfd347ceac490: [X86] Add InstFixup for masked `unpck{l|h}pd` -> masked `shufpd` (authored by goldstein.w.n). · Explain Why

This revision was automatically updated to reflect the committed changes.

goldstein.w.n added a commit: rGfd347ceac490: [X86] Add InstFixup for masked `unpck{l|h}pd` -> masked `shufpd`.

Harbormaster completed remote builds in B223923: Diff 511282.Apr 6 2023, 12:26 AM

skatkov added a subscriber: skatkov.Apr 7 2023, 1:52 AM

skatkov added inline comments.

llvm/lib/Target/X86/X86FixupInstTuning.cpp
157	Hello Noah, we've got downstream mis-compile on fuzzer testing on this patch. Specifically reverting constants 0x00 -> 0x44 and 0xFF -> 0xee fixes the issue. While we do not have IR reproducer for this problem at the moment, may be you would be able to detect the problem from source code? At least I see that this comment states that for r,r case we should use old constants and for r,r,k new ones. However it looks like the patch updates more cases. Is it correct? It just is wild guess as I'm not familiar with this code.

pengfei added inline comments.Apr 7 2023, 2:16 AM

llvm/lib/Target/X86/X86FixupInstTuning.cpp
240–247	Guess problem may come from here.

If it helps. the bad and good compilations differ only in one line:

vshufps	$0, %xmm0, %xmm1, %xmm0         # xmm0 = xmm1[0,0],xmm0[0,0]

vshufps	$68, %xmm0, %xmm1, %xmm0        # xmm0 = xmm1[0,1],xmm0[0,1]

So probably the problem comes from case which have not been updated but uses a lambda which is updated?

I put D147775 for example, but think we can modify it together with D147728.

@pengfei D147775 works, thanks!

In D147541#4251040, @aleksandr.popov wrote:

@pengfei D147775 works, thanks!

Thanks for the confirmation! Then I'd like to land it as a quick fix.

pengfei mentioned this in rG2ffdfb5f9dff: [X86] Fix problem in D147541.Apr 7 2023, 7:02 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86FixupInstTuning.cpp

32 lines

test/

CodeGen/

X86/

tuning-shuffle-unpckpd-avx512.ll

304 lines

Diff 510833

llvm/lib/Target/X86/X86FixupInstTuning.cpp

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	auto ProcessVPERMILPSmi = [&](unsigned NewOpc) -> bool {
// `vpshufd` saves a byte of code size.		// `vpshufd` saves a byte of code size.
if (!ST->hasNoDomainDelayShuffle() &&		if (!ST->hasNoDomainDelayShuffle() &&
!NewOpcPreferable(NewOpc, /ReplaceInTie/ false))		!NewOpcPreferable(NewOpc, /ReplaceInTie/ false))
return false;		return false;
MI.setDesc(TII->get(NewOpc));		MI.setDesc(TII->get(NewOpc));
return true;		return true;
};		};

// `vunpcklpd/vmovlhps r, r` -> `vshufps r, r, 0x44`		// `vunpcklpd/vmovlhps r, r` -> `vshufps r, r, 0x44`
		skatkovUnsubmitted Not Done Reply Inline Actions Hello Noah, we've got downstream mis-compile on fuzzer testing on this patch. Specifically reverting constants 0x00 -> 0x44 and 0xFF -> 0xee fixes the issue. While we do not have IR reproducer for this problem at the moment, may be you would be able to detect the problem from source code? At least I see that this comment states that for r,r case we should use old constants and for r,r,k new ones. However it looks like the patch updates more cases. Is it correct? It just is wild guess as I'm not familiar with this code. skatkov: Hello Noah, we've got downstream mis-compile on fuzzer testing on this patch. Specifically…
// `vunpckhpd/vmovlhps r, r` -> `vshufps r, r, 0xee`		// `vunpckhpd/vmovlhps r, r` -> `vshufps r, r, 0xee`
		// `vunpcklpd r, r, k` -> `vshufpd r, r, 0x00`
		// `vunpckhpd r, r, k` -> `vshufpd r, r, 0xff`
// iff `vshufps` is faster than `vunpck{l\|h}pd`. Otherwise stick with		// iff `vshufps` is faster than `vunpck{l\|h}pd`. Otherwise stick with
// `vunpck{l\|h}pd` as it uses less code size.		// `vunpck{l\|h}pd` as it uses less code size.
// TODO: Look into using `{VP}UNPCK{L\|H}QDQ{...}` instead of `{V}SHUF{...}PS`		// TODO: Look into using `{VP}UNPCK{L\|H}QDQ{...}` instead of `{V}SHUF{...}PS`
// as the replacement. `{VP}UNPCK{L\|H}QDQ{...}` has no codesize cost.		// as the replacement. `{VP}UNPCK{L\|H}QDQ{...}` has no codesize cost.
auto ProcessUNPCKPD = [&](unsigned NewOpc, unsigned MaskImm) -> bool {		auto ProcessUNPCKPD = [&](unsigned NewOpc, unsigned MaskImm) -> bool {
if (!NewOpcPreferable(NewOpc, /ReplaceInTie/ false))		if (!NewOpcPreferable(NewOpc, /ReplaceInTie/ false))
return false;		return false;

MI.setDesc(TII->get(NewOpc));		MI.setDesc(TII->get(NewOpc));
MI.addOperand(MachineOperand::CreateImm(MaskImm));		MI.addOperand(MachineOperand::CreateImm(MaskImm));
return true;		return true;
};		};
auto ProcessUNPCKLPDrr = [&](unsigned NewOpc) -> bool {		auto ProcessUNPCKLPDrr = [&](unsigned NewOpc) -> bool {
return ProcessUNPCKPD(NewOpc, 0x44);		return ProcessUNPCKPD(NewOpc, 0x44);
};		};
auto ProcessUNPCKHPDrr = [&](unsigned NewOpc) -> bool {		auto ProcessUNPCKHPDrr = [&](unsigned NewOpc) -> bool {
return ProcessUNPCKPD(NewOpc, 0xee);		return ProcessUNPCKPD(NewOpc, 0xee);
};		};
		auto ProcessUNPCKLPDrrk = [&](unsigned NewOpc) -> bool {
		return ProcessUNPCKPD(NewOpc, 0x00);
		};
		auto ProcessUNPCKHPDrrk = [&](unsigned NewOpc) -> bool {
		return ProcessUNPCKPD(NewOpc, 0xff);
		};

switch (Opc) {		switch (Opc) {
case X86::VPERMILPSri:		case X86::VPERMILPSri:
return ProcessVPERMILPSri(X86::VSHUFPSrri);		return ProcessVPERMILPSri(X86::VSHUFPSrri);
case X86::VPERMILPSYri:		case X86::VPERMILPSYri:
return ProcessVPERMILPSri(X86::VSHUFPSYrri);		return ProcessVPERMILPSri(X86::VSHUFPSYrri);
case X86::VPERMILPSZ128ri:		case X86::VPERMILPSZ128ri:
return ProcessVPERMILPSri(X86::VSHUFPSZ128rri);		return ProcessVPERMILPSri(X86::VSHUFPSZ128rri);
Show All 39 Lines	case X86::VPERMILPSZmik:
return ProcessVPERMILPSmi(X86::VPSHUFDZmik);		return ProcessVPERMILPSmi(X86::VPSHUFDZmik);

// TODO: {V}UNPCK{L\|H}PD{...} is probably safe to transform to		// TODO: {V}UNPCK{L\|H}PD{...} is probably safe to transform to
// `{VP}UNPCK{L\|H}QDQ{...}` which gets the same perf benefit as		// `{VP}UNPCK{L\|H}QDQ{...}` which gets the same perf benefit as
// `{V}SHUF{...}PS` but 1) without increasing code size and 2) can also		// `{V}SHUF{...}PS` but 1) without increasing code size and 2) can also
// handle the `mr` case. ICL doesn't have a domain penalty for replacing		// handle the `mr` case. ICL doesn't have a domain penalty for replacing
// float unpck -> int unpck, but at this time, I haven't verified the set of		// float unpck -> int unpck, but at this time, I haven't verified the set of
// processors where its safe.		// processors where its safe.
case X86::MOVLHPSrr:		case X86::MOVLHPSrr:
case X86::UNPCKLPDrr:		case X86::UNPCKLPDrr:
return ProcessUNPCKLPDrr(X86::SHUFPSrri);		return ProcessUNPCKLPDrr(X86::SHUFPSrri);
case X86::VMOVLHPSrr:		case X86::VMOVLHPSrr:
case X86::VUNPCKLPDrr:		case X86::VUNPCKLPDrr:
return ProcessUNPCKLPDrr(X86::VSHUFPSrri);		return ProcessUNPCKLPDrr(X86::VSHUFPSrri);
case X86::VUNPCKLPDYrr:		case X86::VUNPCKLPDYrr:
return ProcessUNPCKLPDrr(X86::VSHUFPSYrri);		return ProcessUNPCKLPDrr(X86::VSHUFPSYrri);
		pengfeiUnsubmitted Not Done Reply Inline Actions Guess problem may come from here. pengfei: Guess problem may come from here.
// VMOVLHPS is always 128 bits.		// VMOVLHPS is always 128 bits.
case X86::VMOVLHPSZrr:		case X86::VMOVLHPSZrr:
case X86::VUNPCKLPDZ128rr:		case X86::VUNPCKLPDZ128rr:
return ProcessUNPCKLPDrr(X86::VSHUFPSZ128rri);		return ProcessUNPCKLPDrr(X86::VSHUFPSZ128rri);
case X86::VUNPCKLPDZ256rr:		case X86::VUNPCKLPDZ256rr:
return ProcessUNPCKLPDrr(X86::VSHUFPSZ256rri);		return ProcessUNPCKLPDrr(X86::VSHUFPSZ256rri);
case X86::VUNPCKLPDZrr:		case X86::VUNPCKLPDZrr:
return ProcessUNPCKLPDrr(X86::VSHUFPSZrri);		return ProcessUNPCKLPDrr(X86::VSHUFPSZrri);
		pengfeiUnsubmitted Done Reply Inline Actions No sure if we should use `VSHUFPD` for them too. pengfei: No sure if we should use `VSHUFPD` for them too.
		RKSimonUnsubmitted Done Reply Inline Actions +1 RKSimon: +1
		case X86::VUNPCKLPDZ128rrk:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZ128rrik);
		case X86::VUNPCKLPDZ256rrk:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZ256rrik);
		case X86::VUNPCKLPDZrrk:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZrrik);
		case X86::VUNPCKLPDZ128rrkz:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZ128rrikz);
		case X86::VUNPCKLPDZ256rrkz:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZ256rrikz);
		case X86::VUNPCKLPDZrrkz:
		return ProcessUNPCKLPDrrk(X86::VSHUFPDZrrikz);
case X86::UNPCKHPDrr:		case X86::UNPCKHPDrr:
return ProcessUNPCKHPDrr(X86::SHUFPSrri);		return ProcessUNPCKHPDrr(X86::SHUFPSrri);
case X86::VUNPCKHPDrr:		case X86::VUNPCKHPDrr:
return ProcessUNPCKHPDrr(X86::VSHUFPSrri);		return ProcessUNPCKHPDrr(X86::VSHUFPSrri);
case X86::VUNPCKHPDYrr:		case X86::VUNPCKHPDYrr:
return ProcessUNPCKHPDrr(X86::VSHUFPSYrri);		return ProcessUNPCKHPDrr(X86::VSHUFPSYrri);
case X86::VUNPCKHPDZ128rr:		case X86::VUNPCKHPDZ128rr:
return ProcessUNPCKHPDrr(X86::VSHUFPSZ128rri);		return ProcessUNPCKHPDrr(X86::VSHUFPSZ128rri);
case X86::VUNPCKHPDZ256rr:		case X86::VUNPCKHPDZ256rr:
return ProcessUNPCKHPDrr(X86::VSHUFPSZ256rri);		return ProcessUNPCKHPDrr(X86::VSHUFPSZ256rri);
case X86::VUNPCKHPDZrr:		case X86::VUNPCKHPDZrr:
return ProcessUNPCKHPDrr(X86::VSHUFPSZrri);		return ProcessUNPCKHPDrr(X86::VSHUFPSZrri);
		case X86::VUNPCKHPDZ128rrk:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZ128rrik);
		case X86::VUNPCKHPDZ256rrk:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZ256rrik);
		case X86::VUNPCKHPDZrrk:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZrrik);
		case X86::VUNPCKHPDZ128rrkz:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZ128rrikz);
		case X86::VUNPCKHPDZ256rrkz:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZ256rrikz);
		case X86::VUNPCKHPDZrrkz:
		return ProcessUNPCKHPDrrk(X86::VSHUFPDZrrikz);
default:		default:
return false;		return false;
}		}
}		}

bool X86FixupInstTuningPass::runOnMachineFunction(MachineFunction &MF) {		bool X86FixupInstTuningPass::runOnMachineFunction(MachineFunction &MF) {
LLVM_DEBUG(dbgs() << "Start X86FixupInstTuning\n";);		LLVM_DEBUG(dbgs() << "Start X86FixupInstTuning\n";);
bool Changed = false;		bool Changed = false;
Show All 15 Lines

llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll

	Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vunpckhpd {{.*#+}} zmm0 {%k1} {z} = zmm0[1],zmm1[1],zmm0[3],zmm1[3],zmm0[5],zmm1[5],zmm0[7],zmm1[7]			; CHECK-NEXT: vunpckhpd {{.*#+}} zmm0 {%k1} {z} = zmm0[1],zmm1[1],zmm0[3],zmm1[3],zmm0[5],zmm1[5],zmm0[7],zmm1[7]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mask = bitcast i8 %mask_int to <8 x i1>			%mask = bitcast i8 %mask_int to <8 x i1>
	%shufp = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%shufp = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	%res = select <8 x i1> %mask, <8 x double> %shufp, <8 x double> zeroinitializer			%res = select <8 x i1> %mask, <8 x double> %shufp, <8 x double> zeroinitializer
	ret <8 x double> %res			ret <8 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <4 x double> @transform_VUNPCKLPDYrrkz(<4 x double> %a, <4 x double> %b, i4 %mask_int) nounwind {			define <4 x double> @transform_VUNPCKLPDYrrkz(<4 x double> %a, <4 x double> %b, i4 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDYrrkz:			; CHECK-SKX-LABEL: transform_VUNPCKLPDYrrkz:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpcklpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]			; CHECK-SKX-NEXT: vunpcklpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKLPDYrrkz:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKLPDYrrkz:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpcklpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKLPDYrrkz:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpcklpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKLPDYrrkz:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpcklpd {{.*#+}} ymm0 {%k1} {z} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i4 %mask_int to <4 x i1>			%mask = bitcast i4 %mask_int to <4 x i1>
	%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> zeroinitializer			%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> zeroinitializer
	ret <4 x double> %res			ret <4 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <4 x double> @transform_VUNPCKHPDYrrkz(<4 x double> %a, <4 x double> %b, i4 %mask_int) nounwind {			define <4 x double> @transform_VUNPCKHPDYrrkz(<4 x double> %a, <4 x double> %b, i4 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKHPDYrrkz:			; CHECK-SKX-LABEL: transform_VUNPCKHPDYrrkz:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpckhpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]			; CHECK-SKX-NEXT: vunpckhpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKHPDYrrkz:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKHPDYrrkz:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpckhpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKHPDYrrkz:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpckhpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKHPDYrrkz:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpckhpd {{.*#+}} ymm0 {%k1} {z} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i4 %mask_int to <4 x i1>			%mask = bitcast i4 %mask_int to <4 x i1>
	%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> zeroinitializer			%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> zeroinitializer
	ret <4 x double> %res			ret <4 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <2 x double> @transform_VUNPCKLPDrrkz(<2 x double> %a, <2 x double> %b, i2 %mask_int) nounwind {			define <2 x double> @transform_VUNPCKLPDrrkz(<2 x double> %a, <2 x double> %b, i2 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDrrkz:			; CHECK-SKX-LABEL: transform_VUNPCKLPDrrkz:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpcklpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]			; CHECK-SKX-NEXT: vunpcklpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKLPDrrkz:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKLPDrrkz:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpcklpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKLPDrrkz:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKLPDrrkz:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpcklpd {{.*#+}} xmm0 {%k1} {z} = xmm0[0],xmm1[0]
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i2 %mask_int to <2 x i1>			%mask = bitcast i2 %mask_int to <2 x i1>
	%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 2>			%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 2>
	%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> zeroinitializer			%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> zeroinitializer
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <2 x double> @transform_VUNPCKHPDrrkz(<2 x double> %a, <2 x double> %b, i2 %mask_int) nounwind {			define <2 x double> @transform_VUNPCKHPDrrkz(<2 x double> %a, <2 x double> %b, i2 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKHPDrrkz:			; CHECK-SKX-LABEL: transform_VUNPCKHPDrrkz:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]			; CHECK-SKX-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKHPDrrkz:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKHPDrrkz:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKHPDrrkz:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKHPDrrkz:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpckhpd {{.*#+}} xmm0 {%k1} {z} = xmm0[1],xmm1[1]
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i2 %mask_int to <2 x i1>			%mask = bitcast i2 %mask_int to <2 x i1>
	%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 1, i32 3>			%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 1, i32 3>
	%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> zeroinitializer			%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> zeroinitializer
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <8 x double> @transform_VUNPCKLPDZrrk(<8 x double> %a, <8 x double> %b, <8 x double> %c, i8 %mask_int) nounwind {			define <8 x double> @transform_VUNPCKLPDZrrk(<8 x double> %a, <8 x double> %b, <8 x double> %c, i8 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDZrrk:			; CHECK-LABEL: transform_VUNPCKLPDZrrk:
	Show All 16 Lines
	; CHECK-NEXT: vmovapd %zmm2, %zmm0			; CHECK-NEXT: vmovapd %zmm2, %zmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%mask = bitcast i8 %mask_int to <8 x i1>			%mask = bitcast i8 %mask_int to <8 x i1>
	%shufp = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>			%shufp = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 5, i32 13, i32 7, i32 15>
	%res = select <8 x i1> %mask, <8 x double> %shufp, <8 x double> %c			%res = select <8 x i1> %mask, <8 x double> %shufp, <8 x double> %c
	ret <8 x double> %res			ret <8 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <4 x double> @transform_VUNPCKLPDYrrk(<4 x double> %a, <4 x double> %b, <4 x double> %c, i4 %mask_int) nounwind {			define <4 x double> @transform_VUNPCKLPDYrrk(<4 x double> %a, <4 x double> %b, <4 x double> %c, i4 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDYrrk:			; CHECK-SKX-LABEL: transform_VUNPCKLPDYrrk:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpcklpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]			; CHECK-SKX-NEXT: vunpcklpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
	; CHECK-NEXT: vmovapd %ymm2, %ymm0			; CHECK-SKX-NEXT: vmovapd %ymm2, %ymm0
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKLPDYrrk:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-ICX-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKLPDYrrk:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpcklpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-V4-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKLPDYrrk:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpcklpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-AVX512-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKLPDYrrk:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpcklpd {{.*#+}} ymm2 {%k1} = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
				; CHECK-ZNVER4-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i4 %mask_int to <4 x i1>			%mask = bitcast i4 %mask_int to <4 x i1>
	%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>			%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 0, i32 4, i32 2, i32 6>
	%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> %c			%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> %c
	ret <4 x double> %res			ret <4 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <4 x double> @transform_VUNPCKHPDYrrk(<4 x double> %a, <4 x double> %b, <4 x double> %c, i4 %mask_int) nounwind {			define <4 x double> @transform_VUNPCKHPDYrrk(<4 x double> %a, <4 x double> %b, <4 x double> %c, i4 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKHPDYrrk:			; CHECK-SKX-LABEL: transform_VUNPCKHPDYrrk:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpckhpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]			; CHECK-SKX-NEXT: vunpckhpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
	; CHECK-NEXT: vmovapd %ymm2, %ymm0			; CHECK-SKX-NEXT: vmovapd %ymm2, %ymm0
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKHPDYrrk:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-ICX-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKHPDYrrk:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpckhpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-V4-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKHPDYrrk:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpckhpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-AVX512-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKHPDYrrk:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpckhpd {{.*#+}} ymm2 {%k1} = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
				; CHECK-ZNVER4-NEXT: vmovapd %ymm2, %ymm0
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i4 %mask_int to <4 x i1>			%mask = bitcast i4 %mask_int to <4 x i1>
	%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%shufp = shufflevector <4 x double> %a, <4 x double> %b, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> %c			%res = select <4 x i1> %mask, <4 x double> %shufp, <4 x double> %c
	ret <4 x double> %res			ret <4 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <2 x double> @transform_VUNPCKLPDrrk(<2 x double> %a, <2 x double> %b, <2 x double> %c, i2 %mask_int) nounwind {			define <2 x double> @transform_VUNPCKLPDrrk(<2 x double> %a, <2 x double> %b, <2 x double> %c, i2 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDrrk:			; CHECK-SKX-LABEL: transform_VUNPCKLPDrrk:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpcklpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]			; CHECK-SKX-NEXT: vunpcklpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]
	; CHECK-NEXT: vmovapd %xmm2, %xmm0			; CHECK-SKX-NEXT: vmovapd %xmm2, %xmm0
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKLPDrrk:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]
				; CHECK-ICX-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKLPDrrk:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpcklpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]
				; CHECK-V4-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKLPDrrk:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpcklpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]
				; CHECK-AVX512-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKLPDrrk:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpcklpd {{.*#+}} xmm2 {%k1} = xmm0[0],xmm1[0]
				; CHECK-ZNVER4-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i2 %mask_int to <2 x i1>			%mask = bitcast i2 %mask_int to <2 x i1>
	%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 2>			%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 0, i32 2>
	%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> %c			%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> %c
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	; Check that masked vunpcklpd will not be transformed into vshufps.
	define <2 x double> @transform_VUNPCKHPDrrk(<2 x double> %a, <2 x double> %b, <2 x double> %c, i2 %mask_int) nounwind {			define <2 x double> @transform_VUNPCKHPDrrk(<2 x double> %a, <2 x double> %b, <2 x double> %c, i2 %mask_int) nounwind {
	; CHECK-LABEL: transform_VUNPCKHPDrrk:			; CHECK-SKX-LABEL: transform_VUNPCKHPDrrk:
	; CHECK: # %bb.0:			; CHECK-SKX: # %bb.0:
	; CHECK-NEXT: kmovd %edi, %k1			; CHECK-SKX-NEXT: kmovd %edi, %k1
	; CHECK-NEXT: vunpckhpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]			; CHECK-SKX-NEXT: vunpckhpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]
	; CHECK-NEXT: vmovapd %xmm2, %xmm0			; CHECK-SKX-NEXT: vmovapd %xmm2, %xmm0
	; CHECK-NEXT: retq			; CHECK-SKX-NEXT: retq
				;
				; CHECK-ICX-LABEL: transform_VUNPCKHPDrrk:
				; CHECK-ICX: # %bb.0:
				; CHECK-ICX-NEXT: kmovd %edi, %k1
				; CHECK-ICX-NEXT: vshufpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]
				; CHECK-ICX-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-ICX-NEXT: retq
				;
				; CHECK-V4-LABEL: transform_VUNPCKHPDrrk:
				; CHECK-V4: # %bb.0:
				; CHECK-V4-NEXT: kmovd %edi, %k1
				; CHECK-V4-NEXT: vunpckhpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]
				; CHECK-V4-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-V4-NEXT: retq
				;
				; CHECK-AVX512-LABEL: transform_VUNPCKHPDrrk:
				; CHECK-AVX512: # %bb.0:
				; CHECK-AVX512-NEXT: kmovd %edi, %k1
				; CHECK-AVX512-NEXT: vunpckhpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]
				; CHECK-AVX512-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-AVX512-NEXT: retq
				;
				; CHECK-ZNVER4-LABEL: transform_VUNPCKHPDrrk:
				; CHECK-ZNVER4: # %bb.0:
				; CHECK-ZNVER4-NEXT: kmovd %edi, %k1
				; CHECK-ZNVER4-NEXT: vunpckhpd {{.*#+}} xmm2 {%k1} = xmm0[1],xmm1[1]
				; CHECK-ZNVER4-NEXT: vmovapd %xmm2, %xmm0
				; CHECK-ZNVER4-NEXT: retq
	%mask = bitcast i2 %mask_int to <2 x i1>			%mask = bitcast i2 %mask_int to <2 x i1>
	%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 1, i32 3>			%shufp = shufflevector <2 x double> %a, <2 x double> %b, <2 x i32> <i32 1, i32 3>
	%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> %c			%res = select <2 x i1> %mask, <2 x double> %shufp, <2 x double> %c
	ret <2 x double> %res			ret <2 x double> %res
	}			}

	define <16 x float> @transform_VUNPCKLPDZrm(<16 x float> %a, ptr %pb) nounwind {			define <16 x float> @transform_VUNPCKLPDZrm(<16 x float> %a, ptr %pb) nounwind {
	; CHECK-LABEL: transform_VUNPCKLPDZrm:			; CHECK-LABEL: transform_VUNPCKLPDZrm:
	▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines