This is an archive of the discontinued LLVM Phabricator instance.

[X86] Adding vpopcntd and vpopcntq instructions
ClosedPublic

Authored by oren_ben_simhon on May 14 2017, 5:09 AM.

Download Raw Diff

Details

Reviewers

craig.topper
igorb
m_zuckerman
RKSimon

Commits

rG7bf27f03f2cb: [X86] Adding vpopcntd and vpopcntq instructions
rL303858: [X86] Adding vpopcntd and vpopcntq instructions

Summary

VPOPCNTDQ is a new feature set that Intel published here.

The patch represents the LLVM side of the addition of two new intrinsic based instructions (vpopcntd and vpopcntq).
Notice that this patch does not include pattern matching (auto generation) of the commands. This will be introduced in future review.

Diff Detail

Repository: rL LLVM

Event Timeline

oren_ben_simhon created this revision.May 14 2017, 5:09 AM

oren_ben_simhon added a reviewer: m_zuckerman.

Please can you rebase against trunk latest?

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

lib/Target/X86/X86InstrAVX512.td
8695	You should be able to make 128/512 vector lowering legal as well using the same pattern technique as we do for VPABS and VPLZCNT on NoVLX targets

craig.topper added inline comments.May 14 2017, 9:08 AM

lib/Target/X86/X86InstrInfo.cpp
7061	I don't think belongs here. These instructions are ones that update EFLAGS Z flag based on their output being 0. That's not ture of VPOPCNTD/Q.
lib/Target/X86/X86InstrInfo.td
818	I don't see NoVPOPCNTDQ being used anywhere so we probably shouldn't add it.

Can you add vpopcnt command lines to test/CodeGen/X86/vector-tzcnt-*.ll as well. I believe we create ctpop nodes as part of cttz lowering that are currently expanding to a lookup table implementation.

Disassembler tests?

Implemented comments posted until 05/15 (Thanks Simon and Craig)

Herald added a subscriber: krytarowski. · View Herald TranscriptMay 15 2017, 12:35 PM

RKSimon added inline comments.May 15 2017, 12:44 PM

lib/Target/X86/X86InstrAVX512.td
8693	The doc you reference doesn't refer to VLX versions of VPOPCNT - just zmm versions. So shouldn't the NoVLX predicate be dropped?
lib/Target/X86/X86InstrInfo.cpp
496–1035	Please revert these whitespace changes.

Please notice that clang-format reformatted some lists that I modified in the file lib/Target/X86/X86InstrInfo.cpp.
It caused major cosmetic changes which introduce many diffs in that file.

In D33169#754377, @RKSimon wrote:

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

In D33169#754316, @RKSimon wrote:

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

Since AVX512 is missing from the cost table and since i need some investigation on this subject, I prefer to make this changes in different patch.

oren_ben_simhon added inline comments.May 15 2017, 12:52 PM

lib/Target/X86/X86InstrInfo.cpp
496–1035	Shouldn't we follow clang-format formatting?

Reverted clang-format for lib/Target/X86/X86InstrInfo.cpp and removed NoVLX predicate (Thanks Simon)

Do we have a generic ctpop test like we do for tzcnt and lzcnt? If so should we just add command lines to that instead of a new intrinsic test?

lib/Support/Host.cpp
1401	I think we're trying to keep the checks in order by bit position here. Can you move this up?
lib/Target/X86/X86InstrInfo.cpp
7042	I think this also an EFLAGS related piece of code. So the vector pop br shouldn't be here.

In D33169#755457, @craig.topper wrote:

Do we have a generic ctpop test like we do for tzcnt and lzcnt? If so should we just add command lines to that instead of a new intrinsic test?

llvm\test\CodeGen\X86\vector-popcnt-*.ll - there wasn't any need to add avx512 to anything less than 512 but we should probably add avx512vpopcntdq tests to all three.

A possible addition would be to custom lower i8/i16 vectors with a trunc(popcnt(zext))) pattern.

In D33169#754377, @RKSimon wrote:

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

I don't think the MC tests actually use the disassembler code to try and get back to the instruction - @craig.topper can you confirm?

In D33169#754316, @RKSimon wrote:

Add cost-model support in X86TTIImpl::getIntrinsicInstrCost

Since AVX512 is missing from the cost table and since i need some investigation on this subject, I prefer to make this changes in different patch.

OK.

lib/Target/X86/X86InstrInfo.cpp
7042	+1 - I don't think this is going to work for vectors, but you can try later if you want.
test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2	Add --show-mc-encoding ?

In D33169#754377, @RKSimon wrote:

Disassembler tests?

I believe that the test test/MC/X86/x86-64-avx512vpopcntdq.s covers the required tests.
If you think additional tests are required i will appreciate an example.

I don't think the MC tests actually use the disassembler code to try and get back to the instruction - @craig.topper can you confirm?

Yeah the .s MC tests don't check the disassembler. That will have to be done from something like test/MC/Disassembler/X86/avx-512.txt

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2	But should we even have this test file or just use the the popcount tests since these aren't x86 specific intrinsics anymore?

craig.topper added inline comments.May 15 2017, 2:33 PM

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2	Nevermind I guess we need this for mask testing? I assume the generic test doesn't cover it.

RKSimon added inline comments.May 17 2017, 6:04 AM

lib/Target/X86/X86InstrInfo.cpp
881	Why TB_NO_REVERSE? This is typically only used for instructions where the mem size doesn't match the reg size to prevent out of bounds loads.

In D33169#755488, @RKSimon wrote:

A possible addition would be to custom lower i8/i16 vectors with a trunc(popcnt(zext))) pattern.

I agree with you, Will it be OK to create a separate patch for it?

In D33169#754377, @RKSimon wrote:

Disassembler tests?

Thanks for catching that, I added disassembly tests to avx-512.txt

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
2	I moved the tests to corresponding vector_popcnt_*.ll files. Also the tests check for X86 instructions as such should reside in X86 directory,
2	Add --show-mc-encoding ? Encoding is not tested in this file. See file test/MC/X86/x86-64-avx512vpopcntdq.s.

Implemented comments posted until 05/16 (Thanks again Simon and Craig)

Removed TB_NO_REVERSE flag (Thanks Simon)

The test files need some attention.

test/CodeGen/X86/vector-popcnt-128.ll
8 ↗	(On Diff #99296)	Re-generate these files, don't manually edit them. Keep to the x86_64-unknown-unknown triple
test/CodeGen/X86/vector-popcnt-256.ll
4 ↗	(On Diff #99296)	Re-generate these files, don't manually edit them. Keep to the x86_64-unknown-unknown triple
test/CodeGen/X86/vector-popcnt-512.ll
4 ↗	(On Diff #99296)	Re-generate these files, don't manually edit them.
198 ↗	(On Diff #99296)	Don't include mask tests here - full coverage is what the -intrinsics test files are for - please re-add it.
test/CodeGen/X86/vector-tzcnt-128.ll
10	Re-generate these files, don't manually edit them.
test/CodeGen/X86/vector-tzcnt-256.ll
6	Re-generate these files, don't manually edit them.
test/CodeGen/X86/vector-tzcnt-512.ll
5	Re-generate these files, don't manually edit them.

Updated the tests (Thanks Simon)

I will appreciate any additional comments.
Please help me finish the review.

Any idea why phabricator is showing so many unchanged lines from X86InstrInfo.cpp? Have you changed the line endings or something? They aren't appearing in the the downloaded diff FWIW.

A couple of minors, but I think you're almost there.

lib/Target/X86/X86InstrAVX512.td
8692	Don't put Predicate on a new line.
test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll
3	Probably worth testing on i686-unknown-unknown triple as well. I know its overlaps mc test coverage but adding --show-mc-encoding would be trivial since the filechecks are auto-generated, we do this on many other intrinsics test files.

craig.topper added inline comments.May 24 2017, 8:15 AM

lib/Target/X86/X86InstrInfo.cpp
881	These should also be alphabetized with the rest of the instructions in this section.
2310	Alphabetize
2933	Alphabetize

In D33169#763254, @RKSimon wrote:

Any idea why phabricator is showing so many unchanged lines from X86InstrInfo.cpp? Have you changed the line endings or something? They aren't appearing in the the downloaded diff FWIW.

Probably because I changed the indentation but reverted it back in an updated revision.
There are currently no indentation changes.

Implemented comments posted until 05/25 (Thanks Simon and Craig)

LGTM

This revision is now accepted and ready to land.May 25 2017, 4:11 AM

Closed by commit rL303858: [X86] Adding vpopcntd and vpopcntq instructions (authored by orenb). · Explain WhyMay 25 2017, 6:45 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Support/

	Host.cpp
	Host.cpp (revision 303088)

1 line

Target/

X86/

	X86.td
	X86.td (revision 303088)

3 lines

	X86ISelLowering.cpp
	X86ISelLowering.cpp (revision 303088)

8 lines

	X86InstrAVX512.td
	X86InstrAVX512.td (revision 303088)

36 lines

	X86InstrInfo.cpp
	X86InstrInfo.cpp (revision 303088)

12 lines

	X86InstrInfo.td
	X86InstrInfo.td (revision 303088)

2 lines

	X86Subtarget.h
	X86Subtarget.h (revision 303088)

4 lines

	X86Subtarget.cpp
	X86Subtarget.cpp (revision 303088)

1 line

test/

CodeGen/

X86/

	avx512vpopcntdq-intrinsics.ll
	avx512vpopcntdq-intrinsics.ll (revision 0)

123 lines

	vector-tzcnt-128.ll
	vector-tzcnt-128.ll (revision 303088)

24 lines

	vector-tzcnt-256.ll
	vector-tzcnt-256.ll (revision 303088)

23 lines

	vector-tzcnt-512.ll
	vector-tzcnt-512.ll (revision 303088)

19 lines

MC/

X86/

	x86-64-avx512vpopcntdq.s
	x86-64-avx512vpopcntdq.s (revision 0)

225 lines

Diff 99054

lib/Support/Host.cpp

Show First 20 Lines • Show All 1,392 Lines • ▼ Show 20 Lines	bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
Features["avx512f"] = HasLeaf7 && ((EBX >> 16) & 1) && HasAVX512Save;		Features["avx512f"] = HasLeaf7 && ((EBX >> 16) & 1) && HasAVX512Save;
Features["avx512dq"] = HasLeaf7 && ((EBX >> 17) & 1) && HasAVX512Save;		Features["avx512dq"] = HasLeaf7 && ((EBX >> 17) & 1) && HasAVX512Save;
Features["avx512ifma"] = HasLeaf7 && ((EBX >> 21) & 1) && HasAVX512Save;		Features["avx512ifma"] = HasLeaf7 && ((EBX >> 21) & 1) && HasAVX512Save;
Features["avx512pf"] = HasLeaf7 && ((EBX >> 26) & 1) && HasAVX512Save;		Features["avx512pf"] = HasLeaf7 && ((EBX >> 26) & 1) && HasAVX512Save;
Features["avx512er"] = HasLeaf7 && ((EBX >> 27) & 1) && HasAVX512Save;		Features["avx512er"] = HasLeaf7 && ((EBX >> 27) & 1) && HasAVX512Save;
Features["avx512cd"] = HasLeaf7 && ((EBX >> 28) & 1) && HasAVX512Save;		Features["avx512cd"] = HasLeaf7 && ((EBX >> 28) & 1) && HasAVX512Save;
Features["avx512bw"] = HasLeaf7 && ((EBX >> 30) & 1) && HasAVX512Save;		Features["avx512bw"] = HasLeaf7 && ((EBX >> 30) & 1) && HasAVX512Save;
Features["avx512vl"] = HasLeaf7 && ((EBX >> 31) & 1) && HasAVX512Save;		Features["avx512vl"] = HasLeaf7 && ((EBX >> 31) & 1) && HasAVX512Save;
		Features["avx512vpopcntdq"] = HasLeaf7 && ((EBX >> 14) & 1) && HasAVX512Save;
		craig.topperUnsubmitted Done Reply Inline Actions I think we're trying to keep the checks in order by bit position here. Can you move this up? craig.topper: I think we're trying to keep the checks in order by bit position here. Can you move this up?

Features["prefetchwt1"] = HasLeaf7 && (ECX & 1);		Features["prefetchwt1"] = HasLeaf7 && (ECX & 1);
Features["avx512vbmi"] = HasLeaf7 && ((ECX >> 1) & 1) && HasAVX512Save;		Features["avx512vbmi"] = HasLeaf7 && ((ECX >> 1) & 1) && HasAVX512Save;
// Enable protection keys		// Enable protection keys
Features["pku"] = HasLeaf7 && ((ECX >> 4) & 1);		Features["pku"] = HasLeaf7 && ((ECX >> 4) & 1);

bool HasLeafD = MaxLevel >= 0xd &&		bool HasLeafD = MaxLevel >= 0xd &&
!getX86CpuIDAndInfoEx(0xd, 0x1, &EAX, &EBX, &ECX, &EDX);		!getX86CpuIDAndInfoEx(0xd, 0x1, &EAX, &EBX, &ECX, &EDX);
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

lib/Target/X86/X86.td

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	def FeatureAVX512 : SubtargetFeature<"avx512f", "X86SSELevel", "AVX512F",
"Enable AVX-512 instructions",		"Enable AVX-512 instructions",
[FeatureAVX2]>;		[FeatureAVX2]>;
def FeatureERI : SubtargetFeature<"avx512er", "HasERI", "true",		def FeatureERI : SubtargetFeature<"avx512er", "HasERI", "true",
"Enable AVX-512 Exponential and Reciprocal Instructions",		"Enable AVX-512 Exponential and Reciprocal Instructions",
[FeatureAVX512]>;		[FeatureAVX512]>;
def FeatureCDI : SubtargetFeature<"avx512cd", "HasCDI", "true",		def FeatureCDI : SubtargetFeature<"avx512cd", "HasCDI", "true",
"Enable AVX-512 Conflict Detection Instructions",		"Enable AVX-512 Conflict Detection Instructions",
[FeatureAVX512]>;		[FeatureAVX512]>;
		def FeatureVPOPCNTDQ : SubtargetFeature<"avx512vpopcntdq", "HasVPOPCNTDQ",
		"true", "Enable AVX-512 Population Count Instructions",
		[FeatureAVX512]>;
def FeaturePFI : SubtargetFeature<"avx512pf", "HasPFI", "true",		def FeaturePFI : SubtargetFeature<"avx512pf", "HasPFI", "true",
"Enable AVX-512 PreFetch Instructions",		"Enable AVX-512 PreFetch Instructions",
[FeatureAVX512]>;		[FeatureAVX512]>;
def FeaturePREFETCHWT1 : SubtargetFeature<"prefetchwt1", "HasPFPREFETCHWT1",		def FeaturePREFETCHWT1 : SubtargetFeature<"prefetchwt1", "HasPFPREFETCHWT1",
"true",		"true",
"Prefetch with Intent to Write and T1 Hint">;		"Prefetch with Intent to Write and T1 Hint">;
def FeatureDQI : SubtargetFeature<"avx512dq", "HasDQI", "true",		def FeatureDQI : SubtargetFeature<"avx512dq", "HasDQI", "true",
"Enable AVX-512 Doubleword and Quadword Instructions",		"Enable AVX-512 Doubleword and Quadword Instructions",
▲ Show 20 Lines • Show All 773 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,367 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasAVX512()) {

if (Subtarget.hasDQI()) {		if (Subtarget.hasDQI()) {
// NonVLX sub-targets extend 128/256 vectors to use the 512 version.		// NonVLX sub-targets extend 128/256 vectors to use the 512 version.
setOperationAction(ISD::MUL, MVT::v2i64, Legal);		setOperationAction(ISD::MUL, MVT::v2i64, Legal);
setOperationAction(ISD::MUL, MVT::v4i64, Legal);		setOperationAction(ISD::MUL, MVT::v4i64, Legal);
setOperationAction(ISD::MUL, MVT::v8i64, Legal);		setOperationAction(ISD::MUL, MVT::v8i64, Legal);
}		}

		if (Subtarget.hasVPOPCNTDQ()) {
		// VPOPCNTDQ sub-targets extend 128/256 vectors to use the avx512
		// version of popcntd/q.
		for (auto VT : {MVT::v16i32, MVT::v8i64, MVT::v8i32, MVT::v4i64,
		MVT::v4i32, MVT::v2i64})
		setOperationAction(ISD::CTPOP, VT, Legal);
		}

// Custom lower several nodes.		// Custom lower several nodes.
for (auto VT : { MVT::v4i32, MVT::v8i32, MVT::v2i64, MVT::v4i64,		for (auto VT : { MVT::v4i32, MVT::v8i32, MVT::v2i64, MVT::v4i64,
MVT::v4f32, MVT::v8f32, MVT::v2f64, MVT::v4f64 }) {		MVT::v4f32, MVT::v8f32, MVT::v2f64, MVT::v4f64 }) {
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
}		}
// Extract subvector is special because the value type		// Extract subvector is special because the value type
// (result) is 256-bit but the source is 512-bit wide.		// (result) is 256-bit but the source is 512-bit wide.
▲ Show 20 Lines • Show All 34,693 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,673 Lines • ▼ Show 20 Lines	let Predicates = [HasCDI, NoVLX] in {
def : Pat<(v4i32 (ctlz VR128X:$src)),		def : Pat<(v4i32 (ctlz VR128X:$src)),
(EXTRACT_SUBREG		(EXTRACT_SUBREG
(VPLZCNTDZrr		(VPLZCNTDZrr
(INSERT_SUBREG (v16i32 (IMPLICIT_DEF)), VR128X:$src, sub_xmm)),		(INSERT_SUBREG (v16i32 (IMPLICIT_DEF)), VR128X:$src, sub_xmm)),
sub_xmm)>;		sub_xmm)>;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
		// Counts number of ones - VPOPCNTD and VPOPCNTQ
		//===---------------------------------------------------------------------===//

		multiclass avx512_unary_rmb_popcnt<bits<8> opc, string OpcodeStr, X86VectorVTInfo VTInfo> {
		let Predicates = [HasVPOPCNTDQ] in
		defm Z : avx512_unary_rmb<opc, OpcodeStr, ctpop, VTInfo>, EVEX_V512;
		}

		// Use 512bit version to implement 128/256 bit.
		multiclass avx512_unary_lowering<SDNode OpNode, AVX512VLVectorVTInfo _,
		Predicate prd> {
		RKSimonUnsubmitted Done Reply Inline Actions Don't put Predicate on a new line. RKSimon: Don't put Predicate on a new line.
		let Predicates = [prd] in {
		RKSimonUnsubmitted Done Reply Inline Actions The doc you reference doesn't refer to VLX versions of VPOPCNT - just zmm versions. So shouldn't the NoVLX predicate be dropped? RKSimon: The doc you reference doesn't refer to VLX versions of VPOPCNT - just zmm versions. So…
		def Z256_Alt : Pat<(_.info256.VT(OpNode _.info256.RC:$src1)),
		(EXTRACT_SUBREG
		RKSimonUnsubmitted Done Reply Inline Actions You should be able to make 128/512 vector lowering legal as well using the same pattern technique as we do for VPABS and VPLZCNT on NoVLX targets RKSimon: You should be able to make 128/512 vector lowering legal as well using the same pattern…
		(!cast<Instruction>(NAME # "Zrr")
		(INSERT_SUBREG(_.info512.VT(IMPLICIT_DEF)),
		_.info256.RC:$src1,
		_.info256.SubRegIdx)),
		_.info256.SubRegIdx)>;

		def Z128_Alt : Pat<(_.info128.VT(OpNode _.info128.RC:$src1)),
		(EXTRACT_SUBREG
		(!cast<Instruction>(NAME # "Zrr")
		(INSERT_SUBREG(_.info512.VT(IMPLICIT_DEF)),
		_.info128.RC:$src1,
		_.info128.SubRegIdx)),
		_.info128.SubRegIdx)>;
		}
		}

		defm VPOPCNTD : avx512_unary_rmb_popcnt<0x55, "vpopcntd", v16i32_info>,
		avx512_unary_lowering<ctpop, avx512vl_i32_info, HasVPOPCNTDQ>;
		defm VPOPCNTQ : avx512_unary_rmb_popcnt<0x55, "vpopcntq", v8i64_info>,
		avx512_unary_lowering<ctpop, avx512vl_i64_info, HasVPOPCNTDQ>, VEX_W;

		//===---------------------------------------------------------------------===//
// Replicate Single FP - MOVSHDUP and MOVSLDUP		// Replicate Single FP - MOVSHDUP and MOVSLDUP
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
multiclass avx512_replicate<bits<8> opc, string OpcodeStr, SDNode OpNode>{		multiclass avx512_replicate<bits<8> opc, string OpcodeStr, SDNode OpNode>{
defm NAME: avx512_unary_rm_vl<opc, OpcodeStr, OpNode, avx512vl_f32_info,		defm NAME: avx512_unary_rm_vl<opc, OpcodeStr, OpNode, avx512vl_f32_info,
HasAVX512>, XS;		HasAVX512>, XS;
}		}

defm VMOVSHDUP : avx512_replicate<0x16, "vmovshdup", X86Movshdup>;		defm VMOVSHDUP : avx512_replicate<0x16, "vmovshdup", X86Movshdup>;
▲ Show 20 Lines • Show All 772 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 487 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable0[] = {
{ X86::VCVTPS2PHYrr, X86::VCVTPS2PHYmr, TB_FOLDED_STORE }		{ X86::VCVTPS2PHYrr, X86::VCVTPS2PHYmr, TB_FOLDED_STORE }
};		};

for (X86MemoryFoldTableEntry Entry : MemoryFoldTable0) {		for (X86MemoryFoldTableEntry Entry : MemoryFoldTable0) {
AddTableEntry(RegOp2MemOpTable0, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable0, MemOp2RegOpTable,
Entry.RegOp, Entry.MemOp, TB_INDEX_0 \| Entry.Flags);		Entry.RegOp, Entry.MemOp, TB_INDEX_0 \| Entry.Flags);
}		}

static const X86MemoryFoldTableEntry MemoryFoldTable1[] = {		static const X86MemoryFoldTableEntry MemoryFoldTable1[] = {
{ X86::BSF16rr, X86::BSF16rm, 0 },		{ X86::BSF16rr, X86::BSF16rm, 0 },
{ X86::BSF32rr, X86::BSF32rm, 0 },		{ X86::BSF32rr, X86::BSF32rm, 0 },
{ X86::BSF64rr, X86::BSF64rm, 0 },		{ X86::BSF64rr, X86::BSF64rm, 0 },
{ X86::BSR16rr, X86::BSR16rm, 0 },		{ X86::BSR16rr, X86::BSR16rm, 0 },
{ X86::BSR32rr, X86::BSR32rm, 0 },		{ X86::BSR32rr, X86::BSR32rm, 0 },
{ X86::BSR64rr, X86::BSR64rm, 0 },		{ X86::BSR64rr, X86::BSR64rm, 0 },
{ X86::CMP16rr, X86::CMP16rm, 0 },		{ X86::CMP16rr, X86::CMP16rm, 0 },
{ X86::CMP32rr, X86::CMP32rm, 0 },		{ X86::CMP32rr, X86::CMP32rm, 0 },
{ X86::CMP64rr, X86::CMP64rm, 0 },		{ X86::CMP64rr, X86::CMP64rm, 0 },
{ X86::CMP8rr, X86::CMP8rm, 0 },		{ X86::CMP8rr, X86::CMP8rm, 0 },
{ X86::CVTSD2SSrr, X86::CVTSD2SSrm, 0 },		{ X86::CVTSD2SSrr, X86::CVTSD2SSrm, 0 },
{ X86::CVTSI2SD64rr, X86::CVTSI2SD64rm, 0 },		{ X86::CVTSI2SD64rr, X86::CVTSI2SD64rm, 0 },
{ X86::CVTSI2SDrr, X86::CVTSI2SDrm, 0 },		{ X86::CVTSI2SDrr, X86::CVTSI2SDrm, 0 },
{ X86::CVTSI2SS64rr, X86::CVTSI2SS64rm, 0 },		{ X86::CVTSI2SS64rr, X86::CVTSI2SS64rm, 0 },
{ X86::CVTSI2SSrr, X86::CVTSI2SSrm, 0 },		{ X86::CVTSI2SSrr, X86::CVTSI2SSrm, 0 },
{ X86::CVTSS2SDrr, X86::CVTSS2SDrm, 0 },		{ X86::CVTSS2SDrr, X86::CVTSS2SDrm, 0 },
{ X86::CVTTSD2SI64rr, X86::CVTTSD2SI64rm, 0 },		{ X86::CVTTSD2SI64rr, X86::CVTTSD2SI64rm, 0 },
{ X86::CVTTSD2SIrr, X86::CVTTSD2SIrm, 0 },		{ X86::CVTTSD2SIrr, X86::CVTTSD2SIrm, 0 },
{ X86::CVTTSS2SI64rr, X86::CVTTSS2SI64rm, 0 },		{ X86::CVTTSS2SI64rr, X86::CVTTSS2SI64rm, 0 },
{ X86::CVTTSS2SIrr, X86::CVTTSS2SIrm, 0 },		{ X86::CVTTSS2SIrr, X86::CVTTSS2SIrm, 0 },
{ X86::IMUL16rri, X86::IMUL16rmi, 0 },		{ X86::IMUL16rri, X86::IMUL16rmi, 0 },
{ X86::IMUL16rri8, X86::IMUL16rmi8, 0 },		{ X86::IMUL16rri8, X86::IMUL16rmi8, 0 },
{ X86::IMUL32rri, X86::IMUL32rmi, 0 },		{ X86::IMUL32rri, X86::IMUL32rmi, 0 },
{ X86::IMUL32rri8, X86::IMUL32rmi8, 0 },		{ X86::IMUL32rri8, X86::IMUL32rmi8, 0 },
{ X86::IMUL64rri32, X86::IMUL64rmi32, 0 },		{ X86::IMUL64rri32, X86::IMUL64rmi32, 0 },
{ X86::IMUL64rri8, X86::IMUL64rmi8, 0 },		{ X86::IMUL64rri8, X86::IMUL64rmi8, 0 },
{ X86::Int_COMISDrr, X86::Int_COMISDrm, TB_NO_REVERSE },		{ X86::Int_COMISDrr, X86::Int_COMISDrm, TB_NO_REVERSE },
{ X86::Int_COMISSrr, X86::Int_COMISSrm, TB_NO_REVERSE },		{ X86::Int_COMISSrr, X86::Int_COMISSrm, TB_NO_REVERSE },
{ X86::CVTSD2SI64rr, X86::CVTSD2SI64rm, TB_NO_REVERSE },		{ X86::CVTSD2SI64rr, X86::CVTSD2SI64rm, TB_NO_REVERSE },
{ X86::CVTSD2SIrr, X86::CVTSD2SIrm, TB_NO_REVERSE },		{ X86::CVTSD2SIrr, X86::CVTSD2SIrm, TB_NO_REVERSE },
{ X86::CVTSS2SI64rr, X86::CVTSS2SI64rm, TB_NO_REVERSE },		{ X86::CVTSS2SI64rr, X86::CVTSS2SI64rm, TB_NO_REVERSE },
{ X86::CVTSS2SIrr, X86::CVTSS2SIrm, TB_NO_REVERSE },		{ X86::CVTSS2SIrr, X86::CVTSS2SIrm, TB_NO_REVERSE },
{ X86::CVTDQ2PDrr, X86::CVTDQ2PDrm, TB_NO_REVERSE },		{ X86::CVTDQ2PDrr, X86::CVTDQ2PDrm, TB_NO_REVERSE },
{ X86::CVTDQ2PSrr, X86::CVTDQ2PSrm, TB_ALIGN_16 },		{ X86::CVTDQ2PSrr, X86::CVTDQ2PSrm, TB_ALIGN_16 },
{ X86::CVTPD2DQrr, X86::CVTPD2DQrm, TB_ALIGN_16 },		{ X86::CVTPD2DQrr, X86::CVTPD2DQrm, TB_ALIGN_16 },
{ X86::CVTPD2PSrr, X86::CVTPD2PSrm, TB_ALIGN_16 },		{ X86::CVTPD2PSrr, X86::CVTPD2PSrm, TB_ALIGN_16 },
{ X86::CVTPS2DQrr, X86::CVTPS2DQrm, TB_ALIGN_16 },		{ X86::CVTPS2DQrr, X86::CVTPS2DQrm, TB_ALIGN_16 },
{ X86::CVTPS2PDrr, X86::CVTPS2PDrm, TB_NO_REVERSE },		{ X86::CVTPS2PDrr, X86::CVTPS2PDrm, TB_NO_REVERSE },
{ X86::CVTTPD2DQrr, X86::CVTTPD2DQrm, TB_ALIGN_16 },		{ X86::CVTTPD2DQrr, X86::CVTTPD2DQrm, TB_ALIGN_16 },
{ X86::CVTTPS2DQrr, X86::CVTTPS2DQrm, TB_ALIGN_16 },		{ X86::CVTTPS2DQrr, X86::CVTTPS2DQrm, TB_ALIGN_16 },
{ X86::Int_CVTTSD2SI64rr,X86::Int_CVTTSD2SI64rm, TB_NO_REVERSE },		{ X86::Int_CVTTSD2SI64rr,X86::Int_CVTTSD2SI64rm, TB_NO_REVERSE },
{ X86::Int_CVTTSD2SIrr, X86::Int_CVTTSD2SIrm, TB_NO_REVERSE },		{ X86::Int_CVTTSD2SIrr, X86::Int_CVTTSD2SIrm, TB_NO_REVERSE },
{ X86::Int_CVTTSS2SI64rr,X86::Int_CVTTSS2SI64rm, TB_NO_REVERSE },		{ X86::Int_CVTTSS2SI64rr,X86::Int_CVTTSS2SI64rm, TB_NO_REVERSE },
{ X86::Int_CVTTSS2SIrr, X86::Int_CVTTSS2SIrm, TB_NO_REVERSE },		{ X86::Int_CVTTSS2SIrr, X86::Int_CVTTSS2SIrm, TB_NO_REVERSE },
{ X86::Int_UCOMISDrr, X86::Int_UCOMISDrm, TB_NO_REVERSE },		{ X86::Int_UCOMISDrr, X86::Int_UCOMISDrm, TB_NO_REVERSE },
{ X86::Int_UCOMISSrr, X86::Int_UCOMISSrm, TB_NO_REVERSE },		{ X86::Int_UCOMISSrr, X86::Int_UCOMISSrm, TB_NO_REVERSE },
{ X86::MOV16rr, X86::MOV16rm, 0 },		{ X86::MOV16rr, X86::MOV16rm, 0 },
{ X86::MOV32rr, X86::MOV32rm, 0 },		{ X86::MOV32rr, X86::MOV32rm, 0 },
{ X86::MOV64rr, X86::MOV64rm, 0 },		{ X86::MOV64rr, X86::MOV64rm, 0 },
{ X86::MOV64toPQIrr, X86::MOVQI2PQIrm, 0 },		{ X86::MOV64toPQIrr, X86::MOVQI2PQIrm, 0 },
{ X86::MOV64toSDrr, X86::MOV64toSDrm, 0 },		{ X86::MOV64toSDrr, X86::MOV64toSDrm, 0 },
{ X86::MOV8rr, X86::MOV8rm, 0 },		{ X86::MOV8rr, X86::MOV8rm, 0 },
{ X86::MOVAPDrr, X86::MOVAPDrm, TB_ALIGN_16 },		{ X86::MOVAPDrr, X86::MOVAPDrm, TB_ALIGN_16 },
{ X86::MOVAPSrr, X86::MOVAPSrm, TB_ALIGN_16 },		{ X86::MOVAPSrr, X86::MOVAPSrm, TB_ALIGN_16 },
{ X86::MOVDDUPrr, X86::MOVDDUPrm, TB_NO_REVERSE },		{ X86::MOVDDUPrr, X86::MOVDDUPrm, TB_NO_REVERSE },
{ X86::MOVDI2PDIrr, X86::MOVDI2PDIrm, 0 },		{ X86::MOVDI2PDIrr, X86::MOVDI2PDIrm, 0 },
{ X86::MOVDI2SSrr, X86::MOVDI2SSrm, 0 },		{ X86::MOVDI2SSrr, X86::MOVDI2SSrm, 0 },
{ X86::MOVDQArr, X86::MOVDQArm, TB_ALIGN_16 },		{ X86::MOVDQArr, X86::MOVDQArm, TB_ALIGN_16 },
{ X86::MOVDQUrr, X86::MOVDQUrm, 0 },		{ X86::MOVDQUrr, X86::MOVDQUrm, 0 },
{ X86::MOVSHDUPrr, X86::MOVSHDUPrm, TB_ALIGN_16 },		{ X86::MOVSHDUPrr, X86::MOVSHDUPrm, TB_ALIGN_16 },
{ X86::MOVSLDUPrr, X86::MOVSLDUPrm, TB_ALIGN_16 },		{ X86::MOVSLDUPrr, X86::MOVSLDUPrm, TB_ALIGN_16 },
{ X86::MOVSX16rr8, X86::MOVSX16rm8, 0 },		{ X86::MOVSX16rr8, X86::MOVSX16rm8, 0 },
{ X86::MOVSX32rr16, X86::MOVSX32rm16, 0 },		{ X86::MOVSX32rr16, X86::MOVSX32rm16, 0 },
{ X86::MOVSX32rr8, X86::MOVSX32rm8, 0 },		{ X86::MOVSX32rr8, X86::MOVSX32rm8, 0 },
{ X86::MOVSX64rr16, X86::MOVSX64rm16, 0 },		{ X86::MOVSX64rr16, X86::MOVSX64rm16, 0 },
{ X86::MOVSX64rr32, X86::MOVSX64rm32, 0 },		{ X86::MOVSX64rr32, X86::MOVSX64rm32, 0 },
{ X86::MOVSX64rr8, X86::MOVSX64rm8, 0 },		{ X86::MOVSX64rr8, X86::MOVSX64rm8, 0 },
{ X86::MOVUPDrr, X86::MOVUPDrm, 0 },		{ X86::MOVUPDrr, X86::MOVUPDrm, 0 },
{ X86::MOVUPSrr, X86::MOVUPSrm, 0 },		{ X86::MOVUPSrr, X86::MOVUPSrm, 0 },
{ X86::MOVZPQILo2PQIrr, X86::MOVQI2PQIrm, TB_NO_REVERSE },		{ X86::MOVZPQILo2PQIrr, X86::MOVQI2PQIrm, TB_NO_REVERSE },
{ X86::MOVZX16rr8, X86::MOVZX16rm8, 0 },		{ X86::MOVZX16rr8, X86::MOVZX16rm8, 0 },
{ X86::MOVZX32rr16, X86::MOVZX32rm16, 0 },		{ X86::MOVZX32rr16, X86::MOVZX32rm16, 0 },
{ X86::MOVZX32_NOREXrr8, X86::MOVZX32_NOREXrm8, 0 },		{ X86::MOVZX32_NOREXrr8, X86::MOVZX32_NOREXrm8, 0 },
{ X86::MOVZX32rr8, X86::MOVZX32rm8, 0 },		{ X86::MOVZX32rr8, X86::MOVZX32rm8, 0 },
{ X86::PABSBrr, X86::PABSBrm, TB_ALIGN_16 },		{ X86::PABSBrr, X86::PABSBrm, TB_ALIGN_16 },
{ X86::PABSDrr, X86::PABSDrm, TB_ALIGN_16 },		{ X86::PABSDrr, X86::PABSDrm, TB_ALIGN_16 },
{ X86::PABSWrr, X86::PABSWrm, TB_ALIGN_16 },		{ X86::PABSWrr, X86::PABSWrm, TB_ALIGN_16 },
{ X86::PCMPESTRIrr, X86::PCMPESTRIrm, TB_ALIGN_16 },		{ X86::PCMPESTRIrr, X86::PCMPESTRIrm, TB_ALIGN_16 },
{ X86::PCMPESTRM128rr, X86::PCMPESTRM128rm, TB_ALIGN_16 },		{ X86::PCMPESTRM128rr, X86::PCMPESTRM128rm, TB_ALIGN_16 },
{ X86::PCMPISTRIrr, X86::PCMPISTRIrm, TB_ALIGN_16 },		{ X86::PCMPISTRIrr, X86::PCMPISTRIrm, TB_ALIGN_16 },
{ X86::PCMPISTRM128rr, X86::PCMPISTRM128rm, TB_ALIGN_16 },		{ X86::PCMPISTRM128rr, X86::PCMPISTRM128rm, TB_ALIGN_16 },
{ X86::PHMINPOSUWrr128, X86::PHMINPOSUWrm128, TB_ALIGN_16 },		{ X86::PHMINPOSUWrr128, X86::PHMINPOSUWrm128, TB_ALIGN_16 },
{ X86::PMOVSXBDrr, X86::PMOVSXBDrm, TB_NO_REVERSE },		{ X86::PMOVSXBDrr, X86::PMOVSXBDrm, TB_NO_REVERSE },
{ X86::PMOVSXBQrr, X86::PMOVSXBQrm, TB_NO_REVERSE },		{ X86::PMOVSXBQrr, X86::PMOVSXBQrm, TB_NO_REVERSE },
{ X86::PMOVSXBWrr, X86::PMOVSXBWrm, TB_NO_REVERSE },		{ X86::PMOVSXBWrr, X86::PMOVSXBWrm, TB_NO_REVERSE },
{ X86::PMOVSXDQrr, X86::PMOVSXDQrm, TB_NO_REVERSE },		{ X86::PMOVSXDQrr, X86::PMOVSXDQrm, TB_NO_REVERSE },
{ X86::PMOVSXWDrr, X86::PMOVSXWDrm, TB_NO_REVERSE },		{ X86::PMOVSXWDrr, X86::PMOVSXWDrm, TB_NO_REVERSE },
{ X86::PMOVSXWQrr, X86::PMOVSXWQrm, TB_NO_REVERSE },		{ X86::PMOVSXWQrr, X86::PMOVSXWQrm, TB_NO_REVERSE },
{ X86::PMOVZXBDrr, X86::PMOVZXBDrm, TB_NO_REVERSE },		{ X86::PMOVZXBDrr, X86::PMOVZXBDrm, TB_NO_REVERSE },
{ X86::PMOVZXBQrr, X86::PMOVZXBQrm, TB_NO_REVERSE },		{ X86::PMOVZXBQrr, X86::PMOVZXBQrm, TB_NO_REVERSE },
{ X86::PMOVZXBWrr, X86::PMOVZXBWrm, TB_NO_REVERSE },		{ X86::PMOVZXBWrr, X86::PMOVZXBWrm, TB_NO_REVERSE },
{ X86::PMOVZXDQrr, X86::PMOVZXDQrm, TB_NO_REVERSE },		{ X86::PMOVZXDQrr, X86::PMOVZXDQrm, TB_NO_REVERSE },
{ X86::PMOVZXWDrr, X86::PMOVZXWDrm, TB_NO_REVERSE },		{ X86::PMOVZXWDrr, X86::PMOVZXWDrm, TB_NO_REVERSE },
{ X86::PMOVZXWQrr, X86::PMOVZXWQrm, TB_NO_REVERSE },		{ X86::PMOVZXWQrr, X86::PMOVZXWQrm, TB_NO_REVERSE },
{ X86::PSHUFDri, X86::PSHUFDmi, TB_ALIGN_16 },		{ X86::PSHUFDri, X86::PSHUFDmi, TB_ALIGN_16 },
{ X86::PSHUFHWri, X86::PSHUFHWmi, TB_ALIGN_16 },		{ X86::PSHUFHWri, X86::PSHUFHWmi, TB_ALIGN_16 },
{ X86::PSHUFLWri, X86::PSHUFLWmi, TB_ALIGN_16 },		{ X86::PSHUFLWri, X86::PSHUFLWmi, TB_ALIGN_16 },
{ X86::PTESTrr, X86::PTESTrm, TB_ALIGN_16 },		{ X86::PTESTrr, X86::PTESTrm, TB_ALIGN_16 },
{ X86::RCPPSr, X86::RCPPSm, TB_ALIGN_16 },		{ X86::RCPPSr, X86::RCPPSm, TB_ALIGN_16 },
{ X86::RCPSSr, X86::RCPSSm, 0 },		{ X86::RCPSSr, X86::RCPSSm, 0 },
{ X86::RCPSSr_Int, X86::RCPSSm_Int, TB_NO_REVERSE },		{ X86::RCPSSr_Int, X86::RCPSSm_Int, TB_NO_REVERSE },
{ X86::ROUNDPDr, X86::ROUNDPDm, TB_ALIGN_16 },		{ X86::ROUNDPDr, X86::ROUNDPDm, TB_ALIGN_16 },
{ X86::ROUNDPSr, X86::ROUNDPSm, TB_ALIGN_16 },		{ X86::ROUNDPSr, X86::ROUNDPSm, TB_ALIGN_16 },
{ X86::ROUNDSDr, X86::ROUNDSDm, 0 },		{ X86::ROUNDSDr, X86::ROUNDSDm, 0 },
{ X86::ROUNDSSr, X86::ROUNDSSm, 0 },		{ X86::ROUNDSSr, X86::ROUNDSSm, 0 },
{ X86::RSQRTPSr, X86::RSQRTPSm, TB_ALIGN_16 },		{ X86::RSQRTPSr, X86::RSQRTPSm, TB_ALIGN_16 },
{ X86::RSQRTSSr, X86::RSQRTSSm, 0 },		{ X86::RSQRTSSr, X86::RSQRTSSm, 0 },
{ X86::RSQRTSSr_Int, X86::RSQRTSSm_Int, TB_NO_REVERSE },		{ X86::RSQRTSSr_Int, X86::RSQRTSSm_Int, TB_NO_REVERSE },
{ X86::SQRTPDr, X86::SQRTPDm, TB_ALIGN_16 },		{ X86::SQRTPDr, X86::SQRTPDm, TB_ALIGN_16 },
{ X86::SQRTPSr, X86::SQRTPSm, TB_ALIGN_16 },		{ X86::SQRTPSr, X86::SQRTPSm, TB_ALIGN_16 },
{ X86::SQRTSDr, X86::SQRTSDm, 0 },		{ X86::SQRTSDr, X86::SQRTSDm, 0 },
{ X86::SQRTSDr_Int, X86::SQRTSDm_Int, TB_NO_REVERSE },		{ X86::SQRTSDr_Int, X86::SQRTSDm_Int, TB_NO_REVERSE },
{ X86::SQRTSSr, X86::SQRTSSm, 0 },		{ X86::SQRTSSr, X86::SQRTSSm, 0 },
{ X86::SQRTSSr_Int, X86::SQRTSSm_Int, TB_NO_REVERSE },		{ X86::SQRTSSr_Int, X86::SQRTSSm_Int, TB_NO_REVERSE },
{ X86::TEST16rr, X86::TEST16rm, 0 },		{ X86::TEST16rr, X86::TEST16rm, 0 },
{ X86::TEST32rr, X86::TEST32rm, 0 },		{ X86::TEST32rr, X86::TEST32rm, 0 },
{ X86::TEST64rr, X86::TEST64rm, 0 },		{ X86::TEST64rr, X86::TEST64rm, 0 },
{ X86::TEST8rr, X86::TEST8rm, 0 },		{ X86::TEST8rr, X86::TEST8rm, 0 },
// FIXME: TEST*rr EAX,EAX ---> CMP [mem], 0		// FIXME: TEST*rr EAX,EAX ---> CMP [mem], 0
{ X86::UCOMISDrr, X86::UCOMISDrm, 0 },		{ X86::UCOMISDrr, X86::UCOMISDrm, 0 },
{ X86::UCOMISSrr, X86::UCOMISSrm, 0 },		{ X86::UCOMISSrr, X86::UCOMISSrm, 0 },

// MMX version of foldable instructions		// MMX version of foldable instructions
{ X86::MMX_CVTPD2PIirr, X86::MMX_CVTPD2PIirm, 0 },		{ X86::MMX_CVTPD2PIirr, X86::MMX_CVTPD2PIirm, 0 },
{ X86::MMX_CVTPI2PDirr, X86::MMX_CVTPI2PDirm, 0 },		{ X86::MMX_CVTPI2PDirr, X86::MMX_CVTPI2PDirm, 0 },
{ X86::MMX_CVTPS2PIirr, X86::MMX_CVTPS2PIirm, 0 },		{ X86::MMX_CVTPS2PIirr, X86::MMX_CVTPS2PIirm, 0 },
{ X86::MMX_CVTTPD2PIirr, X86::MMX_CVTTPD2PIirm, 0 },		{ X86::MMX_CVTTPD2PIirr, X86::MMX_CVTTPD2PIirm, 0 },
{ X86::MMX_CVTTPS2PIirr, X86::MMX_CVTTPS2PIirm, 0 },		{ X86::MMX_CVTTPS2PIirr, X86::MMX_CVTTPS2PIirm, 0 },
{ X86::MMX_MOVD64to64rr, X86::MMX_MOVQ64rm, 0 },		{ X86::MMX_MOVD64to64rr, X86::MMX_MOVQ64rm, 0 },
{ X86::MMX_PABSBrr64, X86::MMX_PABSBrm64, 0 },		{ X86::MMX_PABSBrr64, X86::MMX_PABSBrm64, 0 },
{ X86::MMX_PABSDrr64, X86::MMX_PABSDrm64, 0 },		{ X86::MMX_PABSDrr64, X86::MMX_PABSDrm64, 0 },
{ X86::MMX_PABSWrr64, X86::MMX_PABSWrm64, 0 },		{ X86::MMX_PABSWrr64, X86::MMX_PABSWrm64, 0 },
{ X86::MMX_PSHUFWri, X86::MMX_PSHUFWmi, 0 },		{ X86::MMX_PSHUFWri, X86::MMX_PSHUFWmi, 0 },

// 3DNow! version of foldable instructions		// 3DNow! version of foldable instructions
{ X86::PF2IDrr, X86::PF2IDrm, 0 },		{ X86::PF2IDrr, X86::PF2IDrm, 0 },
{ X86::PF2IWrr, X86::PF2IWrm, 0 },		{ X86::PF2IWrr, X86::PF2IWrm, 0 },
{ X86::PFRCPrr, X86::PFRCPrm, 0 },		{ X86::PFRCPrr, X86::PFRCPrm, 0 },
{ X86::PFRSQRTrr, X86::PFRSQRTrm, 0 },		{ X86::PFRSQRTrr, X86::PFRSQRTrm, 0 },
{ X86::PI2FDrr, X86::PI2FDrm, 0 },		{ X86::PI2FDrr, X86::PI2FDrm, 0 },
{ X86::PI2FWrr, X86::PI2FWrm, 0 },		{ X86::PI2FWrr, X86::PI2FWrm, 0 },
{ X86::PSWAPDrr, X86::PSWAPDrm, 0 },		{ X86::PSWAPDrr, X86::PSWAPDrm, 0 },

// AVX 128-bit versions of foldable instructions		// AVX 128-bit versions of foldable instructions
{ X86::Int_VCOMISDrr, X86::Int_VCOMISDrm, TB_NO_REVERSE },		{ X86::Int_VCOMISDrr, X86::Int_VCOMISDrm, TB_NO_REVERSE },
{ X86::Int_VCOMISSrr, X86::Int_VCOMISSrm, TB_NO_REVERSE },		{ X86::Int_VCOMISSrr, X86::Int_VCOMISSrm, TB_NO_REVERSE },
{ X86::Int_VUCOMISDrr, X86::Int_VUCOMISDrm, TB_NO_REVERSE },		{ X86::Int_VUCOMISDrr, X86::Int_VUCOMISDrm, TB_NO_REVERSE },
{ X86::Int_VUCOMISSrr, X86::Int_VUCOMISSrm, TB_NO_REVERSE },		{ X86::Int_VUCOMISSrr, X86::Int_VUCOMISSrm, TB_NO_REVERSE },
{ X86::VCVTTSD2SI64rr, X86::VCVTTSD2SI64rm, 0 },		{ X86::VCVTTSD2SI64rr, X86::VCVTTSD2SI64rm, 0 },
{ X86::Int_VCVTTSD2SI64rr,X86::Int_VCVTTSD2SI64rm,TB_NO_REVERSE },		{ X86::Int_VCVTTSD2SI64rr,X86::Int_VCVTTSD2SI64rm,TB_NO_REVERSE },
{ X86::VCVTTSD2SIrr, X86::VCVTTSD2SIrm, 0 },		{ X86::VCVTTSD2SIrr, X86::VCVTTSD2SIrm, 0 },
{ X86::Int_VCVTTSD2SIrr,X86::Int_VCVTTSD2SIrm, TB_NO_REVERSE },		{ X86::Int_VCVTTSD2SIrr,X86::Int_VCVTTSD2SIrm, TB_NO_REVERSE },
{ X86::VCVTTSS2SI64rr, X86::VCVTTSS2SI64rm, 0 },		{ X86::VCVTTSS2SI64rr, X86::VCVTTSS2SI64rm, 0 },
{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,TB_NO_REVERSE },		{ X86::Int_VCVTTSS2SI64rr,X86::Int_VCVTTSS2SI64rm,TB_NO_REVERSE },
{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },		{ X86::VCVTTSS2SIrr, X86::VCVTTSS2SIrm, 0 },
{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, TB_NO_REVERSE },		{ X86::Int_VCVTTSS2SIrr,X86::Int_VCVTTSS2SIrm, TB_NO_REVERSE },
{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, TB_NO_REVERSE },		{ X86::VCVTSD2SI64rr, X86::VCVTSD2SI64rm, TB_NO_REVERSE },
{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, TB_NO_REVERSE },		{ X86::VCVTSD2SIrr, X86::VCVTSD2SIrm, TB_NO_REVERSE },
{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, TB_NO_REVERSE },		{ X86::VCVTSS2SI64rr, X86::VCVTSS2SI64rm, TB_NO_REVERSE },
{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, TB_NO_REVERSE },		{ X86::VCVTSS2SIrr, X86::VCVTSS2SIrm, TB_NO_REVERSE },
{ X86::VCVTDQ2PDrr, X86::VCVTDQ2PDrm, TB_NO_REVERSE },		{ X86::VCVTDQ2PDrr, X86::VCVTDQ2PDrm, TB_NO_REVERSE },
{ X86::VCVTDQ2PSrr, X86::VCVTDQ2PSrm, 0 },		{ X86::VCVTDQ2PSrr, X86::VCVTDQ2PSrm, 0 },
{ X86::VCVTPD2DQrr, X86::VCVTPD2DQrm, 0 },		{ X86::VCVTPD2DQrr, X86::VCVTPD2DQrm, 0 },
{ X86::VCVTPD2PSrr, X86::VCVTPD2PSrm, 0 },		{ X86::VCVTPD2PSrr, X86::VCVTPD2PSrm, 0 },
{ X86::VCVTPS2DQrr, X86::VCVTPS2DQrm, 0 },		{ X86::VCVTPS2DQrr, X86::VCVTPS2DQrm, 0 },
{ X86::VCVTPS2PDrr, X86::VCVTPS2PDrm, TB_NO_REVERSE },		{ X86::VCVTPS2PDrr, X86::VCVTPS2PDrm, TB_NO_REVERSE },
{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQrm, 0 },		{ X86::VCVTTPD2DQrr, X86::VCVTTPD2DQrm, 0 },
{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },		{ X86::VCVTTPS2DQrr, X86::VCVTTPS2DQrm, 0 },
{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },		{ X86::VMOV64toPQIrr, X86::VMOVQI2PQIrm, 0 },
{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },		{ X86::VMOV64toSDrr, X86::VMOV64toSDrm, 0 },
{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },		{ X86::VMOVAPDrr, X86::VMOVAPDrm, TB_ALIGN_16 },
{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },		{ X86::VMOVAPSrr, X86::VMOVAPSrm, TB_ALIGN_16 },
{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, TB_NO_REVERSE },		{ X86::VMOVDDUPrr, X86::VMOVDDUPrm, TB_NO_REVERSE },
{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },		{ X86::VMOVDI2PDIrr, X86::VMOVDI2PDIrm, 0 },
{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },		{ X86::VMOVDI2SSrr, X86::VMOVDI2SSrm, 0 },
{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },		{ X86::VMOVDQArr, X86::VMOVDQArm, TB_ALIGN_16 },
{ X86::VMOVDQUrr, X86::VMOVDQUrm, 0 },		{ X86::VMOVDQUrr, X86::VMOVDQUrm, 0 },
{ X86::VMOVSLDUPrr, X86::VMOVSLDUPrm, 0 },		{ X86::VMOVSLDUPrr, X86::VMOVSLDUPrm, 0 },
{ X86::VMOVSHDUPrr, X86::VMOVSHDUPrm, 0 },		{ X86::VMOVSHDUPrr, X86::VMOVSHDUPrm, 0 },
{ X86::VMOVUPDrr, X86::VMOVUPDrm, 0 },		{ X86::VMOVUPDrr, X86::VMOVUPDrm, 0 },
{ X86::VMOVUPSrr, X86::VMOVUPSrm, 0 },		{ X86::VMOVUPSrr, X86::VMOVUPSrm, 0 },
{ X86::VMOVZPQILo2PQIrr,X86::VMOVQI2PQIrm, TB_NO_REVERSE },		{ X86::VMOVZPQILo2PQIrr,X86::VMOVQI2PQIrm, TB_NO_REVERSE },
{ X86::VPABSBrr, X86::VPABSBrm, 0 },		{ X86::VPABSBrr, X86::VPABSBrm, 0 },
{ X86::VPABSDrr, X86::VPABSDrm, 0 },		{ X86::VPABSDrr, X86::VPABSDrm, 0 },
{ X86::VPABSWrr, X86::VPABSWrm, 0 },		{ X86::VPABSWrr, X86::VPABSWrm, 0 },
{ X86::VPCMPESTRIrr, X86::VPCMPESTRIrm, 0 },		{ X86::VPCMPESTRIrr, X86::VPCMPESTRIrm, 0 },
{ X86::VPCMPESTRM128rr, X86::VPCMPESTRM128rm, 0 },		{ X86::VPCMPESTRM128rr, X86::VPCMPESTRM128rm, 0 },
{ X86::VPCMPISTRIrr, X86::VPCMPISTRIrm, 0 },		{ X86::VPCMPISTRIrr, X86::VPCMPISTRIrm, 0 },
{ X86::VPCMPISTRM128rr, X86::VPCMPISTRM128rm, 0 },		{ X86::VPCMPISTRM128rr, X86::VPCMPISTRM128rm, 0 },
{ X86::VPHMINPOSUWrr128, X86::VPHMINPOSUWrm128, 0 },		{ X86::VPHMINPOSUWrr128, X86::VPHMINPOSUWrm128, 0 },
{ X86::VPERMILPDri, X86::VPERMILPDmi, 0 },		{ X86::VPERMILPDri, X86::VPERMILPDmi, 0 },
{ X86::VPERMILPSri, X86::VPERMILPSmi, 0 },		{ X86::VPERMILPSri, X86::VPERMILPSmi, 0 },
{ X86::VPMOVSXBDrr, X86::VPMOVSXBDrm, TB_NO_REVERSE },		{ X86::VPMOVSXBDrr, X86::VPMOVSXBDrm, TB_NO_REVERSE },
{ X86::VPMOVSXBQrr, X86::VPMOVSXBQrm, TB_NO_REVERSE },		{ X86::VPMOVSXBQrr, X86::VPMOVSXBQrm, TB_NO_REVERSE },
{ X86::VPMOVSXBWrr, X86::VPMOVSXBWrm, TB_NO_REVERSE },		{ X86::VPMOVSXBWrr, X86::VPMOVSXBWrm, TB_NO_REVERSE },
{ X86::VPMOVSXDQrr, X86::VPMOVSXDQrm, TB_NO_REVERSE },		{ X86::VPMOVSXDQrr, X86::VPMOVSXDQrm, TB_NO_REVERSE },
{ X86::VPMOVSXWDrr, X86::VPMOVSXWDrm, TB_NO_REVERSE },		{ X86::VPMOVSXWDrr, X86::VPMOVSXWDrm, TB_NO_REVERSE },
{ X86::VPMOVSXWQrr, X86::VPMOVSXWQrm, TB_NO_REVERSE },		{ X86::VPMOVSXWQrr, X86::VPMOVSXWQrm, TB_NO_REVERSE },
{ X86::VPMOVZXBDrr, X86::VPMOVZXBDrm, TB_NO_REVERSE },		{ X86::VPMOVZXBDrr, X86::VPMOVZXBDrm, TB_NO_REVERSE },
{ X86::VPMOVZXBQrr, X86::VPMOVZXBQrm, TB_NO_REVERSE },		{ X86::VPMOVZXBQrr, X86::VPMOVZXBQrm, TB_NO_REVERSE },
{ X86::VPMOVZXBWrr, X86::VPMOVZXBWrm, TB_NO_REVERSE },		{ X86::VPMOVZXBWrr, X86::VPMOVZXBWrm, TB_NO_REVERSE },
{ X86::VPMOVZXDQrr, X86::VPMOVZXDQrm, TB_NO_REVERSE },		{ X86::VPMOVZXDQrr, X86::VPMOVZXDQrm, TB_NO_REVERSE },
{ X86::VPMOVZXWDrr, X86::VPMOVZXWDrm, TB_NO_REVERSE },		{ X86::VPMOVZXWDrr, X86::VPMOVZXWDrm, TB_NO_REVERSE },
{ X86::VPMOVZXWQrr, X86::VPMOVZXWQrm, TB_NO_REVERSE },		{ X86::VPMOVZXWQrr, X86::VPMOVZXWQrm, TB_NO_REVERSE },
{ X86::VPSHUFDri, X86::VPSHUFDmi, 0 },		{ X86::VPSHUFDri, X86::VPSHUFDmi, 0 },
{ X86::VPSHUFHWri, X86::VPSHUFHWmi, 0 },		{ X86::VPSHUFHWri, X86::VPSHUFHWmi, 0 },
{ X86::VPSHUFLWri, X86::VPSHUFLWmi, 0 },		{ X86::VPSHUFLWri, X86::VPSHUFLWmi, 0 },
{ X86::VPTESTrr, X86::VPTESTrm, 0 },		{ X86::VPTESTrr, X86::VPTESTrm, 0 },
{ X86::VRCPPSr, X86::VRCPPSm, 0 },		{ X86::VRCPPSr, X86::VRCPPSm, 0 },
{ X86::VROUNDPDr, X86::VROUNDPDm, 0 },		{ X86::VROUNDPDr, X86::VROUNDPDm, 0 },
{ X86::VROUNDPSr, X86::VROUNDPSm, 0 },		{ X86::VROUNDPSr, X86::VROUNDPSm, 0 },
{ X86::VRSQRTPSr, X86::VRSQRTPSm, 0 },		{ X86::VRSQRTPSr, X86::VRSQRTPSm, 0 },
{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },		{ X86::VSQRTPDr, X86::VSQRTPDm, 0 },
{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },		{ X86::VSQRTPSr, X86::VSQRTPSm, 0 },
{ X86::VTESTPDrr, X86::VTESTPDrm, 0 },		{ X86::VTESTPDrr, X86::VTESTPDrm, 0 },
{ X86::VTESTPSrr, X86::VTESTPSrm, 0 },		{ X86::VTESTPSrr, X86::VTESTPSrm, 0 },
{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },		{ X86::VUCOMISDrr, X86::VUCOMISDrm, 0 },
{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },		{ X86::VUCOMISSrr, X86::VUCOMISSrm, 0 },

// AVX 256-bit foldable instructions		// AVX 256-bit foldable instructions
{ X86::VCVTDQ2PDYrr, X86::VCVTDQ2PDYrm, TB_NO_REVERSE },		{ X86::VCVTDQ2PDYrr, X86::VCVTDQ2PDYrm, TB_NO_REVERSE },
{ X86::VCVTDQ2PSYrr, X86::VCVTDQ2PSYrm, 0 },		{ X86::VCVTDQ2PSYrr, X86::VCVTDQ2PSYrm, 0 },
{ X86::VCVTPD2DQYrr, X86::VCVTPD2DQYrm, 0 },		{ X86::VCVTPD2DQYrr, X86::VCVTPD2DQYrm, 0 },
{ X86::VCVTPD2PSYrr, X86::VCVTPD2PSYrm, 0 },		{ X86::VCVTPD2PSYrr, X86::VCVTPD2PSYrm, 0 },
{ X86::VCVTPS2DQYrr, X86::VCVTPS2DQYrm, 0 },		{ X86::VCVTPS2DQYrr, X86::VCVTPS2DQYrm, 0 },
{ X86::VCVTPS2PDYrr, X86::VCVTPS2PDYrm, TB_NO_REVERSE },		{ X86::VCVTPS2PDYrr, X86::VCVTPS2PDYrm, TB_NO_REVERSE },
{ X86::VCVTTPD2DQYrr, X86::VCVTTPD2DQYrm, 0 },		{ X86::VCVTTPD2DQYrr, X86::VCVTTPD2DQYrm, 0 },
{ X86::VCVTTPS2DQYrr, X86::VCVTTPS2DQYrm, 0 },		{ X86::VCVTTPS2DQYrr, X86::VCVTTPS2DQYrm, 0 },
{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },		{ X86::VMOVAPDYrr, X86::VMOVAPDYrm, TB_ALIGN_32 },
{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },		{ X86::VMOVAPSYrr, X86::VMOVAPSYrm, TB_ALIGN_32 },
{ X86::VMOVDDUPYrr, X86::VMOVDDUPYrm, 0 },		{ X86::VMOVDDUPYrr, X86::VMOVDDUPYrm, 0 },
{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },		{ X86::VMOVDQAYrr, X86::VMOVDQAYrm, TB_ALIGN_32 },
{ X86::VMOVDQUYrr, X86::VMOVDQUYrm, 0 },		{ X86::VMOVDQUYrr, X86::VMOVDQUYrm, 0 },
{ X86::VMOVSLDUPYrr, X86::VMOVSLDUPYrm, 0 },		{ X86::VMOVSLDUPYrr, X86::VMOVSLDUPYrm, 0 },
{ X86::VMOVSHDUPYrr, X86::VMOVSHDUPYrm, 0 },		{ X86::VMOVSHDUPYrr, X86::VMOVSHDUPYrm, 0 },
{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },		{ X86::VMOVUPDYrr, X86::VMOVUPDYrm, 0 },
{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },		{ X86::VMOVUPSYrr, X86::VMOVUPSYrm, 0 },
{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },		{ X86::VPERMILPDYri, X86::VPERMILPDYmi, 0 },
{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },		{ X86::VPERMILPSYri, X86::VPERMILPSYmi, 0 },
{ X86::VPTESTYrr, X86::VPTESTYrm, 0 },		{ X86::VPTESTYrr, X86::VPTESTYrm, 0 },
{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },		{ X86::VRCPPSYr, X86::VRCPPSYm, 0 },
{ X86::VROUNDYPDr, X86::VROUNDYPDm, 0 },		{ X86::VROUNDYPDr, X86::VROUNDYPDm, 0 },
{ X86::VROUNDYPSr, X86::VROUNDYPSm, 0 },		{ X86::VROUNDYPSr, X86::VROUNDYPSm, 0 },
{ X86::VRSQRTPSYr, X86::VRSQRTPSYm, 0 },		{ X86::VRSQRTPSYr, X86::VRSQRTPSYm, 0 },
{ X86::VSQRTPDYr, X86::VSQRTPDYm, 0 },		{ X86::VSQRTPDYr, X86::VSQRTPDYm, 0 },
{ X86::VSQRTPSYr, X86::VSQRTPSYm, 0 },		{ X86::VSQRTPSYr, X86::VSQRTPSYm, 0 },
{ X86::VTESTPDYrr, X86::VTESTPDYrm, 0 },		{ X86::VTESTPDYrr, X86::VTESTPDYrm, 0 },
{ X86::VTESTPSYrr, X86::VTESTPSYrm, 0 },		{ X86::VTESTPSYrr, X86::VTESTPSYrm, 0 },

// AVX2 foldable instructions		// AVX2 foldable instructions

// VBROADCASTS{SD}rr register instructions were an AVX2 addition while the		// VBROADCASTS{SD}rr register instructions were an AVX2 addition while the
// VBROADCASTS{SD}rm memory instructions were available from AVX1.		// VBROADCASTS{SD}rm memory instructions were available from AVX1.
// TB_NO_REVERSE prevents unfolding from introducing an illegal instruction		// TB_NO_REVERSE prevents unfolding from introducing an illegal instruction
// on AVX1 targets. The VPBROADCAST instructions are all AVX2 instructions		// on AVX1 targets. The VPBROADCAST instructions are all AVX2 instructions
// so they don't need an equivalent limitation.		// so they don't need an equivalent limitation.
{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },		{ X86::VBROADCASTSSrr, X86::VBROADCASTSSrm, TB_NO_REVERSE },
{ X86::VBROADCASTSSYrr, X86::VBROADCASTSSYrm, TB_NO_REVERSE },		{ X86::VBROADCASTSSYrr, X86::VBROADCASTSSYrm, TB_NO_REVERSE },
{ X86::VBROADCASTSDYrr, X86::VBROADCASTSDYrm, TB_NO_REVERSE },		{ X86::VBROADCASTSDYrr, X86::VBROADCASTSDYrm, TB_NO_REVERSE },
{ X86::VPABSBYrr, X86::VPABSBYrm, 0 },		{ X86::VPABSBYrr, X86::VPABSBYrm, 0 },
{ X86::VPABSDYrr, X86::VPABSDYrm, 0 },		{ X86::VPABSDYrr, X86::VPABSDYrm, 0 },
{ X86::VPABSWYrr, X86::VPABSWYrm, 0 },		{ X86::VPABSWYrr, X86::VPABSWYrm, 0 },
{ X86::VPBROADCASTBrr, X86::VPBROADCASTBrm, TB_NO_REVERSE },		{ X86::VPBROADCASTBrr, X86::VPBROADCASTBrm, TB_NO_REVERSE },
{ X86::VPBROADCASTBYrr, X86::VPBROADCASTBYrm, TB_NO_REVERSE },		{ X86::VPBROADCASTBYrr, X86::VPBROADCASTBYrm, TB_NO_REVERSE },
{ X86::VPBROADCASTDrr, X86::VPBROADCASTDrm, TB_NO_REVERSE },		{ X86::VPBROADCASTDrr, X86::VPBROADCASTDrm, TB_NO_REVERSE },
{ X86::VPBROADCASTDYrr, X86::VPBROADCASTDYrm, TB_NO_REVERSE },		{ X86::VPBROADCASTDYrr, X86::VPBROADCASTDYrm, TB_NO_REVERSE },
{ X86::VPBROADCASTQrr, X86::VPBROADCASTQrm, TB_NO_REVERSE },		{ X86::VPBROADCASTQrr, X86::VPBROADCASTQrm, TB_NO_REVERSE },
{ X86::VPBROADCASTQYrr, X86::VPBROADCASTQYrm, TB_NO_REVERSE },		{ X86::VPBROADCASTQYrr, X86::VPBROADCASTQYrm, TB_NO_REVERSE },
{ X86::VPBROADCASTWrr, X86::VPBROADCASTWrm, TB_NO_REVERSE },		{ X86::VPBROADCASTWrr, X86::VPBROADCASTWrm, TB_NO_REVERSE },
{ X86::VPBROADCASTWYrr, X86::VPBROADCASTWYrm, TB_NO_REVERSE },		{ X86::VPBROADCASTWYrr, X86::VPBROADCASTWYrm, TB_NO_REVERSE },
{ X86::VPERMPDYri, X86::VPERMPDYmi, 0 },		{ X86::VPERMPDYri, X86::VPERMPDYmi, 0 },
{ X86::VPERMQYri, X86::VPERMQYmi, 0 },		{ X86::VPERMQYri, X86::VPERMQYmi, 0 },
{ X86::VPMOVSXBDYrr, X86::VPMOVSXBDYrm, TB_NO_REVERSE },		{ X86::VPMOVSXBDYrr, X86::VPMOVSXBDYrm, TB_NO_REVERSE },
{ X86::VPMOVSXBQYrr, X86::VPMOVSXBQYrm, TB_NO_REVERSE },		{ X86::VPMOVSXBQYrr, X86::VPMOVSXBQYrm, TB_NO_REVERSE },
{ X86::VPMOVSXBWYrr, X86::VPMOVSXBWYrm, 0 },		{ X86::VPMOVSXBWYrr, X86::VPMOVSXBWYrm, 0 },
{ X86::VPMOVSXDQYrr, X86::VPMOVSXDQYrm, 0 },		{ X86::VPMOVSXDQYrr, X86::VPMOVSXDQYrm, 0 },
{ X86::VPMOVSXWDYrr, X86::VPMOVSXWDYrm, 0 },		{ X86::VPMOVSXWDYrr, X86::VPMOVSXWDYrm, 0 },
{ X86::VPMOVSXWQYrr, X86::VPMOVSXWQYrm, TB_NO_REVERSE },		{ X86::VPMOVSXWQYrr, X86::VPMOVSXWQYrm, TB_NO_REVERSE },
{ X86::VPMOVZXBDYrr, X86::VPMOVZXBDYrm, TB_NO_REVERSE },		{ X86::VPMOVZXBDYrr, X86::VPMOVZXBDYrm, TB_NO_REVERSE },
{ X86::VPMOVZXBQYrr, X86::VPMOVZXBQYrm, TB_NO_REVERSE },		{ X86::VPMOVZXBQYrr, X86::VPMOVZXBQYrm, TB_NO_REVERSE },
{ X86::VPMOVZXBWYrr, X86::VPMOVZXBWYrm, 0 },		{ X86::VPMOVZXBWYrr, X86::VPMOVZXBWYrm, 0 },
{ X86::VPMOVZXDQYrr, X86::VPMOVZXDQYrm, 0 },		{ X86::VPMOVZXDQYrr, X86::VPMOVZXDQYrm, 0 },
{ X86::VPMOVZXWDYrr, X86::VPMOVZXWDYrm, 0 },		{ X86::VPMOVZXWDYrr, X86::VPMOVZXWDYrm, 0 },
{ X86::VPMOVZXWQYrr, X86::VPMOVZXWQYrm, TB_NO_REVERSE },		{ X86::VPMOVZXWQYrr, X86::VPMOVZXWQYrm, TB_NO_REVERSE },
{ X86::VPSHUFDYri, X86::VPSHUFDYmi, 0 },		{ X86::VPSHUFDYri, X86::VPSHUFDYmi, 0 },
{ X86::VPSHUFHWYri, X86::VPSHUFHWYmi, 0 },		{ X86::VPSHUFHWYri, X86::VPSHUFHWYmi, 0 },
{ X86::VPSHUFLWYri, X86::VPSHUFLWYmi, 0 },		{ X86::VPSHUFLWYri, X86::VPSHUFLWYmi, 0 },

// XOP foldable instructions		// XOP foldable instructions
{ X86::VFRCZPDrr, X86::VFRCZPDrm, 0 },		{ X86::VFRCZPDrr, X86::VFRCZPDrm, 0 },
{ X86::VFRCZPDrrY, X86::VFRCZPDrmY, 0 },		{ X86::VFRCZPDrrY, X86::VFRCZPDrmY, 0 },
{ X86::VFRCZPSrr, X86::VFRCZPSrm, 0 },		{ X86::VFRCZPSrr, X86::VFRCZPSrm, 0 },
{ X86::VFRCZPSrrY, X86::VFRCZPSrmY, 0 },		{ X86::VFRCZPSrrY, X86::VFRCZPSrmY, 0 },
{ X86::VFRCZSDrr, X86::VFRCZSDrm, 0 },		{ X86::VFRCZSDrr, X86::VFRCZSDrm, 0 },
{ X86::VFRCZSSrr, X86::VFRCZSSrm, 0 },		{ X86::VFRCZSSrr, X86::VFRCZSSrm, 0 },
{ X86::VPHADDBDrr, X86::VPHADDBDrm, 0 },		{ X86::VPHADDBDrr, X86::VPHADDBDrm, 0 },
{ X86::VPHADDBQrr, X86::VPHADDBQrm, 0 },		{ X86::VPHADDBQrr, X86::VPHADDBQrm, 0 },
{ X86::VPHADDBWrr, X86::VPHADDBWrm, 0 },		{ X86::VPHADDBWrr, X86::VPHADDBWrm, 0 },
{ X86::VPHADDDQrr, X86::VPHADDDQrm, 0 },		{ X86::VPHADDDQrr, X86::VPHADDDQrm, 0 },
{ X86::VPHADDWDrr, X86::VPHADDWDrm, 0 },		{ X86::VPHADDWDrr, X86::VPHADDWDrm, 0 },
{ X86::VPHADDWQrr, X86::VPHADDWQrm, 0 },		{ X86::VPHADDWQrr, X86::VPHADDWQrm, 0 },
{ X86::VPHADDUBDrr, X86::VPHADDUBDrm, 0 },		{ X86::VPHADDUBDrr, X86::VPHADDUBDrm, 0 },
{ X86::VPHADDUBQrr, X86::VPHADDUBQrm, 0 },		{ X86::VPHADDUBQrr, X86::VPHADDUBQrm, 0 },
{ X86::VPHADDUBWrr, X86::VPHADDUBWrm, 0 },		{ X86::VPHADDUBWrr, X86::VPHADDUBWrm, 0 },
{ X86::VPHADDUDQrr, X86::VPHADDUDQrm, 0 },		{ X86::VPHADDUDQrr, X86::VPHADDUDQrm, 0 },
{ X86::VPHADDUWDrr, X86::VPHADDUWDrm, 0 },		{ X86::VPHADDUWDrr, X86::VPHADDUWDrm, 0 },
{ X86::VPHADDUWQrr, X86::VPHADDUWQrm, 0 },		{ X86::VPHADDUWQrr, X86::VPHADDUWQrm, 0 },
{ X86::VPHSUBBWrr, X86::VPHSUBBWrm, 0 },		{ X86::VPHSUBBWrr, X86::VPHSUBBWrm, 0 },
{ X86::VPHSUBDQrr, X86::VPHSUBDQrm, 0 },		{ X86::VPHSUBDQrr, X86::VPHSUBDQrm, 0 },
{ X86::VPHSUBWDrr, X86::VPHSUBWDrm, 0 },		{ X86::VPHSUBWDrr, X86::VPHSUBWDrm, 0 },
{ X86::VPROTBri, X86::VPROTBmi, 0 },		{ X86::VPROTBri, X86::VPROTBmi, 0 },
{ X86::VPROTBrr, X86::VPROTBmr, 0 },		{ X86::VPROTBrr, X86::VPROTBmr, 0 },
{ X86::VPROTDri, X86::VPROTDmi, 0 },		{ X86::VPROTDri, X86::VPROTDmi, 0 },
{ X86::VPROTDrr, X86::VPROTDmr, 0 },		{ X86::VPROTDrr, X86::VPROTDmr, 0 },
{ X86::VPROTQri, X86::VPROTQmi, 0 },		{ X86::VPROTQri, X86::VPROTQmi, 0 },
{ X86::VPROTQrr, X86::VPROTQmr, 0 },		{ X86::VPROTQrr, X86::VPROTQmr, 0 },
{ X86::VPROTWri, X86::VPROTWmi, 0 },		{ X86::VPROTWri, X86::VPROTWmi, 0 },
{ X86::VPROTWrr, X86::VPROTWmr, 0 },		{ X86::VPROTWrr, X86::VPROTWmr, 0 },
{ X86::VPSHABrr, X86::VPSHABmr, 0 },		{ X86::VPSHABrr, X86::VPSHABmr, 0 },
{ X86::VPSHADrr, X86::VPSHADmr, 0 },		{ X86::VPSHADrr, X86::VPSHADmr, 0 },
{ X86::VPSHAQrr, X86::VPSHAQmr, 0 },		{ X86::VPSHAQrr, X86::VPSHAQmr, 0 },
{ X86::VPSHAWrr, X86::VPSHAWmr, 0 },		{ X86::VPSHAWrr, X86::VPSHAWmr, 0 },
{ X86::VPSHLBrr, X86::VPSHLBmr, 0 },		{ X86::VPSHLBrr, X86::VPSHLBmr, 0 },
{ X86::VPSHLDrr, X86::VPSHLDmr, 0 },		{ X86::VPSHLDrr, X86::VPSHLDmr, 0 },
{ X86::VPSHLQrr, X86::VPSHLQmr, 0 },		{ X86::VPSHLQrr, X86::VPSHLQmr, 0 },
{ X86::VPSHLWrr, X86::VPSHLWmr, 0 },		{ X86::VPSHLWrr, X86::VPSHLWmr, 0 },

// LWP foldable instructions		// LWP foldable instructions
{ X86::LWPINS32rri, X86::LWPINS32rmi, 0 },		{ X86::LWPINS32rri, X86::LWPINS32rmi, 0 },
{ X86::LWPINS64rri, X86::LWPINS64rmi, 0 },		{ X86::LWPINS64rri, X86::LWPINS64rmi, 0 },
{ X86::LWPVAL32rri, X86::LWPVAL32rmi, 0 },		{ X86::LWPVAL32rri, X86::LWPVAL32rmi, 0 },
{ X86::LWPVAL64rri, X86::LWPVAL64rmi, 0 },		{ X86::LWPVAL64rri, X86::LWPVAL64rmi, 0 },

// BMI/BMI2/LZCNT/POPCNT/TBM foldable instructions		// BMI/BMI2/LZCNT/POPCNT/TBM foldable instructions
{ X86::BEXTR32rr, X86::BEXTR32rm, 0 },		{ X86::BEXTR32rr, X86::BEXTR32rm, 0 },
{ X86::BEXTR64rr, X86::BEXTR64rm, 0 },		{ X86::BEXTR64rr, X86::BEXTR64rm, 0 },
{ X86::BEXTRI32ri, X86::BEXTRI32mi, 0 },		{ X86::BEXTRI32ri, X86::BEXTRI32mi, 0 },
{ X86::BEXTRI64ri, X86::BEXTRI64mi, 0 },		{ X86::BEXTRI64ri, X86::BEXTRI64mi, 0 },
{ X86::BLCFILL32rr, X86::BLCFILL32rm, 0 },		{ X86::BLCFILL32rr, X86::BLCFILL32rm, 0 },
{ X86::BLCFILL64rr, X86::BLCFILL64rm, 0 },		{ X86::BLCFILL64rr, X86::BLCFILL64rm, 0 },
{ X86::BLCI32rr, X86::BLCI32rm, 0 },		{ X86::BLCI32rr, X86::BLCI32rm, 0 },
{ X86::BLCI64rr, X86::BLCI64rm, 0 },		{ X86::BLCI64rr, X86::BLCI64rm, 0 },
{ X86::BLCIC32rr, X86::BLCIC32rm, 0 },		{ X86::BLCIC32rr, X86::BLCIC32rm, 0 },
{ X86::BLCIC64rr, X86::BLCIC64rm, 0 },		{ X86::BLCIC64rr, X86::BLCIC64rm, 0 },
{ X86::BLCMSK32rr, X86::BLCMSK32rm, 0 },		{ X86::BLCMSK32rr, X86::BLCMSK32rm, 0 },
{ X86::BLCMSK64rr, X86::BLCMSK64rm, 0 },		{ X86::BLCMSK64rr, X86::BLCMSK64rm, 0 },
{ X86::BLCS32rr, X86::BLCS32rm, 0 },		{ X86::BLCS32rr, X86::BLCS32rm, 0 },
{ X86::BLCS64rr, X86::BLCS64rm, 0 },		{ X86::BLCS64rr, X86::BLCS64rm, 0 },
{ X86::BLSFILL32rr, X86::BLSFILL32rm, 0 },		{ X86::BLSFILL32rr, X86::BLSFILL32rm, 0 },
{ X86::BLSFILL64rr, X86::BLSFILL64rm, 0 },		{ X86::BLSFILL64rr, X86::BLSFILL64rm, 0 },
{ X86::BLSI32rr, X86::BLSI32rm, 0 },		{ X86::BLSI32rr, X86::BLSI32rm, 0 },
{ X86::BLSI64rr, X86::BLSI64rm, 0 },		{ X86::BLSI64rr, X86::BLSI64rm, 0 },
{ X86::BLSIC32rr, X86::BLSIC32rm, 0 },		{ X86::BLSIC32rr, X86::BLSIC32rm, 0 },
{ X86::BLSIC64rr, X86::BLSIC64rm, 0 },		{ X86::BLSIC64rr, X86::BLSIC64rm, 0 },
{ X86::BLSMSK32rr, X86::BLSMSK32rm, 0 },		{ X86::BLSMSK32rr, X86::BLSMSK32rm, 0 },
{ X86::BLSMSK64rr, X86::BLSMSK64rm, 0 },		{ X86::BLSMSK64rr, X86::BLSMSK64rm, 0 },
{ X86::BLSR32rr, X86::BLSR32rm, 0 },		{ X86::BLSR32rr, X86::BLSR32rm, 0 },
{ X86::BLSR64rr, X86::BLSR64rm, 0 },		{ X86::BLSR64rr, X86::BLSR64rm, 0 },
{ X86::BZHI32rr, X86::BZHI32rm, 0 },		{ X86::BZHI32rr, X86::BZHI32rm, 0 },
{ X86::BZHI64rr, X86::BZHI64rm, 0 },		{ X86::BZHI64rr, X86::BZHI64rm, 0 },
{ X86::LZCNT16rr, X86::LZCNT16rm, 0 },		{ X86::LZCNT16rr, X86::LZCNT16rm, 0 },
{ X86::LZCNT32rr, X86::LZCNT32rm, 0 },		{ X86::LZCNT32rr, X86::LZCNT32rm, 0 },
{ X86::LZCNT64rr, X86::LZCNT64rm, 0 },		{ X86::LZCNT64rr, X86::LZCNT64rm, 0 },
{ X86::POPCNT16rr, X86::POPCNT16rm, 0 },		{ X86::POPCNT16rr, X86::POPCNT16rm, 0 },
{ X86::POPCNT32rr, X86::POPCNT32rm, 0 },		{ X86::POPCNT32rr, X86::POPCNT32rm, 0 },
{ X86::POPCNT64rr, X86::POPCNT64rm, 0 },		{ X86::POPCNT64rr, X86::POPCNT64rm, 0 },
{ X86::RORX32ri, X86::RORX32mi, 0 },		{ X86::RORX32ri, X86::RORX32mi, 0 },
{ X86::RORX64ri, X86::RORX64mi, 0 },		{ X86::RORX64ri, X86::RORX64mi, 0 },
{ X86::SARX32rr, X86::SARX32rm, 0 },		{ X86::SARX32rr, X86::SARX32rm, 0 },
{ X86::SARX64rr, X86::SARX64rm, 0 },		{ X86::SARX64rr, X86::SARX64rm, 0 },
{ X86::SHRX32rr, X86::SHRX32rm, 0 },		{ X86::SHRX32rr, X86::SHRX32rm, 0 },
{ X86::SHRX64rr, X86::SHRX64rm, 0 },		{ X86::SHRX64rr, X86::SHRX64rm, 0 },
{ X86::SHLX32rr, X86::SHLX32rm, 0 },		{ X86::SHLX32rr, X86::SHLX32rm, 0 },
{ X86::SHLX64rr, X86::SHLX64rm, 0 },		{ X86::SHLX64rr, X86::SHLX64rm, 0 },
{ X86::T1MSKC32rr, X86::T1MSKC32rm, 0 },		{ X86::T1MSKC32rr, X86::T1MSKC32rm, 0 },
{ X86::T1MSKC64rr, X86::T1MSKC64rm, 0 },		{ X86::T1MSKC64rr, X86::T1MSKC64rm, 0 },
{ X86::TZCNT16rr, X86::TZCNT16rm, 0 },		{ X86::TZCNT16rr, X86::TZCNT16rm, 0 },
{ X86::TZCNT32rr, X86::TZCNT32rm, 0 },		{ X86::TZCNT32rr, X86::TZCNT32rm, 0 },
{ X86::TZCNT64rr, X86::TZCNT64rm, 0 },		{ X86::TZCNT64rr, X86::TZCNT64rm, 0 },
{ X86::TZMSK32rr, X86::TZMSK32rm, 0 },		{ X86::TZMSK32rr, X86::TZMSK32rm, 0 },
{ X86::TZMSK64rr, X86::TZMSK64rm, 0 },		{ X86::TZMSK64rr, X86::TZMSK64rm, 0 },

// AVX-512 foldable instructions		// AVX-512 foldable instructions
		{ X86::VPOPCNTDZrr, X86::VPOPCNTDZrm, TB_NO_REVERSE },
		{ X86::VPOPCNTQZrr, X86::VPOPCNTQZrm, TB_NO_REVERSE },
		RKSimonUnsubmitted Done Reply Inline Actions Why TB_NO_REVERSE? This is typically only used for instructions where the mem size doesn't match the reg size to prevent out of bounds loads. RKSimon: Why TB_NO_REVERSE? This is typically only used for instructions where the mem size doesn't…
		craig.topperUnsubmitted Done Reply Inline Actions These should also be alphabetized with the rest of the instructions in this section. craig.topper: These should also be alphabetized with the rest of the instructions in this section.
{ X86::VBROADCASTSSZr, X86::VBROADCASTSSZm, TB_NO_REVERSE },		{ X86::VBROADCASTSSZr, X86::VBROADCASTSSZm, TB_NO_REVERSE },
{ X86::VBROADCASTSDZr, X86::VBROADCASTSDZm, TB_NO_REVERSE },		{ X86::VBROADCASTSDZr, X86::VBROADCASTSDZm, TB_NO_REVERSE },
{ X86::VMOV64toPQIZrr, X86::VMOVQI2PQIZrm, 0 },		{ X86::VMOV64toPQIZrr, X86::VMOVQI2PQIZrm, 0 },
{ X86::VMOV64toSDZrr, X86::VMOV64toSDZrm, 0 },		{ X86::VMOV64toSDZrr, X86::VMOV64toSDZrm, 0 },
{ X86::VMOVDI2PDIZrr, X86::VMOVDI2PDIZrm, 0 },		{ X86::VMOVDI2PDIZrr, X86::VMOVDI2PDIZrm, 0 },
{ X86::VMOVDI2SSZrr, X86::VMOVDI2SSZrm, 0 },		{ X86::VMOVDI2SSZrr, X86::VMOVDI2SSZrm, 0 },
{ X86::VMOVAPDZrr, X86::VMOVAPDZrm, TB_ALIGN_64 },		{ X86::VMOVAPDZrr, X86::VMOVAPDZrm, TB_ALIGN_64 },
{ X86::VMOVAPSZrr, X86::VMOVAPSZrm, TB_ALIGN_64 },		{ X86::VMOVAPSZrr, X86::VMOVAPSZrm, TB_ALIGN_64 },
{ X86::VMOVDQA32Zrr, X86::VMOVDQA32Zrm, TB_ALIGN_64 },		{ X86::VMOVDQA32Zrr, X86::VMOVDQA32Zrm, TB_ALIGN_64 },
{ X86::VMOVDQA64Zrr, X86::VMOVDQA64Zrm, TB_ALIGN_64 },		{ X86::VMOVDQA64Zrr, X86::VMOVDQA64Zrm, TB_ALIGN_64 },
{ X86::VMOVDQU8Zrr, X86::VMOVDQU8Zrm, 0 },		{ X86::VMOVDQU8Zrr, X86::VMOVDQU8Zrm, 0 },
{ X86::VMOVDQU16Zrr, X86::VMOVDQU16Zrm, 0 },		{ X86::VMOVDQU16Zrr, X86::VMOVDQU16Zrm, 0 },
{ X86::VMOVDQU32Zrr, X86::VMOVDQU32Zrm, 0 },		{ X86::VMOVDQU32Zrr, X86::VMOVDQU32Zrm, 0 },
{ X86::VMOVDQU64Zrr, X86::VMOVDQU64Zrm, 0 },		{ X86::VMOVDQU64Zrr, X86::VMOVDQU64Zrm, 0 },
{ X86::VMOVUPDZrr, X86::VMOVUPDZrm, 0 },		{ X86::VMOVUPDZrr, X86::VMOVUPDZrm, 0 },
{ X86::VMOVUPSZrr, X86::VMOVUPSZrm, 0 },		{ X86::VMOVUPSZrr, X86::VMOVUPSZrm, 0 },
{ X86::VMOVZPQILo2PQIZrr,X86::VMOVQI2PQIZrm, TB_NO_REVERSE },		{ X86::VMOVZPQILo2PQIZrr,X86::VMOVQI2PQIZrm, TB_NO_REVERSE },
{ X86::VPABSBZrr, X86::VPABSBZrm, 0 },		{ X86::VPABSBZrr, X86::VPABSBZrm, 0 },
{ X86::VPABSDZrr, X86::VPABSDZrm, 0 },		{ X86::VPABSDZrr, X86::VPABSDZrm, 0 },
{ X86::VPABSQZrr, X86::VPABSQZrm, 0 },		{ X86::VPABSQZrr, X86::VPABSQZrm, 0 },
{ X86::VPABSWZrr, X86::VPABSWZrm, 0 },		{ X86::VPABSWZrr, X86::VPABSWZrm, 0 },
{ X86::VPERMILPDZri, X86::VPERMILPDZmi, 0 },		{ X86::VPERMILPDZri, X86::VPERMILPDZmi, 0 },
{ X86::VPERMILPSZri, X86::VPERMILPSZmi, 0 },		{ X86::VPERMILPSZri, X86::VPERMILPSZmi, 0 },
{ X86::VPERMPDZri, X86::VPERMPDZmi, 0 },		{ X86::VPERMPDZri, X86::VPERMPDZmi, 0 },
{ X86::VPERMQZri, X86::VPERMQZmi, 0 },		{ X86::VPERMQZri, X86::VPERMQZmi, 0 },
{ X86::VPMOVSXBDZrr, X86::VPMOVSXBDZrm, 0 },		{ X86::VPMOVSXBDZrr, X86::VPMOVSXBDZrm, 0 },
{ X86::VPMOVSXBQZrr, X86::VPMOVSXBQZrm, TB_NO_REVERSE },		{ X86::VPMOVSXBQZrr, X86::VPMOVSXBQZrm, TB_NO_REVERSE },
{ X86::VPMOVSXBWZrr, X86::VPMOVSXBWZrm, 0 },		{ X86::VPMOVSXBWZrr, X86::VPMOVSXBWZrm, 0 },
{ X86::VPMOVSXDQZrr, X86::VPMOVSXDQZrm, 0 },		{ X86::VPMOVSXDQZrr, X86::VPMOVSXDQZrm, 0 },
{ X86::VPMOVSXWDZrr, X86::VPMOVSXWDZrm, 0 },		{ X86::VPMOVSXWDZrr, X86::VPMOVSXWDZrm, 0 },
{ X86::VPMOVSXWQZrr, X86::VPMOVSXWQZrm, 0 },		{ X86::VPMOVSXWQZrr, X86::VPMOVSXWQZrm, 0 },
{ X86::VPMOVZXBDZrr, X86::VPMOVZXBDZrm, 0 },		{ X86::VPMOVZXBDZrr, X86::VPMOVZXBDZrm, 0 },
{ X86::VPMOVZXBQZrr, X86::VPMOVZXBQZrm, TB_NO_REVERSE },		{ X86::VPMOVZXBQZrr, X86::VPMOVZXBQZrm, TB_NO_REVERSE },
{ X86::VPMOVZXBWZrr, X86::VPMOVZXBWZrm, 0 },		{ X86::VPMOVZXBWZrr, X86::VPMOVZXBWZrm, 0 },
{ X86::VPMOVZXDQZrr, X86::VPMOVZXDQZrm, 0 },		{ X86::VPMOVZXDQZrr, X86::VPMOVZXDQZrm, 0 },
{ X86::VPMOVZXWDZrr, X86::VPMOVZXWDZrm, 0 },		{ X86::VPMOVZXWDZrr, X86::VPMOVZXWDZrm, 0 },
{ X86::VPMOVZXWQZrr, X86::VPMOVZXWQZrm, 0 },		{ X86::VPMOVZXWQZrr, X86::VPMOVZXWQZrm, 0 },
{ X86::VPSHUFDZri, X86::VPSHUFDZmi, 0 },		{ X86::VPSHUFDZri, X86::VPSHUFDZmi, 0 },
{ X86::VPSHUFHWZri, X86::VPSHUFHWZmi, 0 },		{ X86::VPSHUFHWZri, X86::VPSHUFHWZmi, 0 },
{ X86::VPSHUFLWZri, X86::VPSHUFLWZmi, 0 },		{ X86::VPSHUFLWZri, X86::VPSHUFLWZmi, 0 },
{ X86::VPSLLDQZ512rr, X86::VPSLLDQZ512rm, 0 },		{ X86::VPSLLDQZ512rr, X86::VPSLLDQZ512rm, 0 },
{ X86::VPSLLDZri, X86::VPSLLDZmi, 0 },		{ X86::VPSLLDZri, X86::VPSLLDZmi, 0 },
{ X86::VPSLLQZri, X86::VPSLLQZmi, 0 },		{ X86::VPSLLQZri, X86::VPSLLQZmi, 0 },
{ X86::VPSLLWZri, X86::VPSLLWZmi, 0 },		{ X86::VPSLLWZri, X86::VPSLLWZmi, 0 },
{ X86::VPSRADZri, X86::VPSRADZmi, 0 },		{ X86::VPSRADZri, X86::VPSRADZmi, 0 },
{ X86::VPSRAQZri, X86::VPSRAQZmi, 0 },		{ X86::VPSRAQZri, X86::VPSRAQZmi, 0 },
{ X86::VPSRAWZri, X86::VPSRAWZmi, 0 },		{ X86::VPSRAWZri, X86::VPSRAWZmi, 0 },
{ X86::VPSRLDQZ512rr, X86::VPSRLDQZ512rm, 0 },		{ X86::VPSRLDQZ512rr, X86::VPSRLDQZ512rm, 0 },
{ X86::VPSRLDZri, X86::VPSRLDZmi, 0 },		{ X86::VPSRLDZri, X86::VPSRLDZmi, 0 },
{ X86::VPSRLQZri, X86::VPSRLQZmi, 0 },		{ X86::VPSRLQZri, X86::VPSRLQZmi, 0 },
{ X86::VPSRLWZri, X86::VPSRLWZmi, 0 },		{ X86::VPSRLWZri, X86::VPSRLWZmi, 0 },

// AVX-512 foldable instructions (256-bit versions)		// AVX-512 foldable instructions (256-bit versions)
{ X86::VBROADCASTSSZ256r, X86::VBROADCASTSSZ256m, TB_NO_REVERSE },		{ X86::VBROADCASTSSZ256r, X86::VBROADCASTSSZ256m, TB_NO_REVERSE },
{ X86::VBROADCASTSDZ256r, X86::VBROADCASTSDZ256m, TB_NO_REVERSE },		{ X86::VBROADCASTSDZ256r, X86::VBROADCASTSDZ256m, TB_NO_REVERSE },
{ X86::VMOVAPDZ256rr, X86::VMOVAPDZ256rm, TB_ALIGN_32 },		{ X86::VMOVAPDZ256rr, X86::VMOVAPDZ256rm, TB_ALIGN_32 },
{ X86::VMOVAPSZ256rr, X86::VMOVAPSZ256rm, TB_ALIGN_32 },		{ X86::VMOVAPSZ256rr, X86::VMOVAPSZ256rm, TB_ALIGN_32 },
{ X86::VMOVDQA32Z256rr, X86::VMOVDQA32Z256rm, TB_ALIGN_32 },		{ X86::VMOVDQA32Z256rr, X86::VMOVDQA32Z256rm, TB_ALIGN_32 },
{ X86::VMOVDQA64Z256rr, X86::VMOVDQA64Z256rm, TB_ALIGN_32 },		{ X86::VMOVDQA64Z256rr, X86::VMOVDQA64Z256rm, TB_ALIGN_32 },
{ X86::VMOVDQU8Z256rr, X86::VMOVDQU8Z256rm, 0 },		{ X86::VMOVDQU8Z256rr, X86::VMOVDQU8Z256rm, 0 },
{ X86::VMOVDQU16Z256rr, X86::VMOVDQU16Z256rm, 0 },		{ X86::VMOVDQU16Z256rr, X86::VMOVDQU16Z256rm, 0 },
{ X86::VMOVDQU32Z256rr, X86::VMOVDQU32Z256rm, 0 },		{ X86::VMOVDQU32Z256rr, X86::VMOVDQU32Z256rm, 0 },
{ X86::VMOVDQU64Z256rr, X86::VMOVDQU64Z256rm, 0 },		{ X86::VMOVDQU64Z256rr, X86::VMOVDQU64Z256rm, 0 },
{ X86::VMOVUPDZ256rr, X86::VMOVUPDZ256rm, 0 },		{ X86::VMOVUPDZ256rr, X86::VMOVUPDZ256rm, 0 },
{ X86::VMOVUPSZ256rr, X86::VMOVUPSZ256rm, 0 },		{ X86::VMOVUPSZ256rr, X86::VMOVUPSZ256rm, 0 },
{ X86::VPABSBZ256rr, X86::VPABSBZ256rm, 0 },		{ X86::VPABSBZ256rr, X86::VPABSBZ256rm, 0 },
{ X86::VPABSDZ256rr, X86::VPABSDZ256rm, 0 },		{ X86::VPABSDZ256rr, X86::VPABSDZ256rm, 0 },
{ X86::VPABSQZ256rr, X86::VPABSQZ256rm, 0 },		{ X86::VPABSQZ256rr, X86::VPABSQZ256rm, 0 },
{ X86::VPABSWZ256rr, X86::VPABSWZ256rm, 0 },		{ X86::VPABSWZ256rr, X86::VPABSWZ256rm, 0 },
{ X86::VPERMILPDZ256ri, X86::VPERMILPDZ256mi, 0 },		{ X86::VPERMILPDZ256ri, X86::VPERMILPDZ256mi, 0 },
{ X86::VPERMILPSZ256ri, X86::VPERMILPSZ256mi, 0 },		{ X86::VPERMILPSZ256ri, X86::VPERMILPSZ256mi, 0 },
{ X86::VPERMPDZ256ri, X86::VPERMPDZ256mi, 0 },		{ X86::VPERMPDZ256ri, X86::VPERMPDZ256mi, 0 },
{ X86::VPERMQZ256ri, X86::VPERMQZ256mi, 0 },		{ X86::VPERMQZ256ri, X86::VPERMQZ256mi, 0 },
{ X86::VPMOVSXBDZ256rr, X86::VPMOVSXBDZ256rm, TB_NO_REVERSE },		{ X86::VPMOVSXBDZ256rr, X86::VPMOVSXBDZ256rm, TB_NO_REVERSE },
{ X86::VPMOVSXBQZ256rr, X86::VPMOVSXBQZ256rm, TB_NO_REVERSE },		{ X86::VPMOVSXBQZ256rr, X86::VPMOVSXBQZ256rm, TB_NO_REVERSE },
{ X86::VPMOVSXBWZ256rr, X86::VPMOVSXBWZ256rm, 0 },		{ X86::VPMOVSXBWZ256rr, X86::VPMOVSXBWZ256rm, 0 },
{ X86::VPMOVSXDQZ256rr, X86::VPMOVSXDQZ256rm, 0 },		{ X86::VPMOVSXDQZ256rr, X86::VPMOVSXDQZ256rm, 0 },
{ X86::VPMOVSXWDZ256rr, X86::VPMOVSXWDZ256rm, 0 },		{ X86::VPMOVSXWDZ256rr, X86::VPMOVSXWDZ256rm, 0 },
{ X86::VPMOVSXWQZ256rr, X86::VPMOVSXWQZ256rm, TB_NO_REVERSE },		{ X86::VPMOVSXWQZ256rr, X86::VPMOVSXWQZ256rm, TB_NO_REVERSE },
{ X86::VPMOVZXBDZ256rr, X86::VPMOVZXBDZ256rm, TB_NO_REVERSE },		{ X86::VPMOVZXBDZ256rr, X86::VPMOVZXBDZ256rm, TB_NO_REVERSE },
{ X86::VPMOVZXBQZ256rr, X86::VPMOVZXBQZ256rm, TB_NO_REVERSE },		{ X86::VPMOVZXBQZ256rr, X86::VPMOVZXBQZ256rm, TB_NO_REVERSE },
{ X86::VPMOVZXBWZ256rr, X86::VPMOVZXBWZ256rm, 0 },		{ X86::VPMOVZXBWZ256rr, X86::VPMOVZXBWZ256rm, 0 },
{ X86::VPMOVZXDQZ256rr, X86::VPMOVZXDQZ256rm, 0 },		{ X86::VPMOVZXDQZ256rr, X86::VPMOVZXDQZ256rm, 0 },
{ X86::VPMOVZXWDZ256rr, X86::VPMOVZXWDZ256rm, 0 },		{ X86::VPMOVZXWDZ256rr, X86::VPMOVZXWDZ256rm, 0 },
{ X86::VPMOVZXWQZ256rr, X86::VPMOVZXWQZ256rm, TB_NO_REVERSE },		{ X86::VPMOVZXWQZ256rr, X86::VPMOVZXWQZ256rm, TB_NO_REVERSE },
{ X86::VPSHUFDZ256ri, X86::VPSHUFDZ256mi, 0 },		{ X86::VPSHUFDZ256ri, X86::VPSHUFDZ256mi, 0 },
{ X86::VPSHUFHWZ256ri, X86::VPSHUFHWZ256mi, 0 },		{ X86::VPSHUFHWZ256ri, X86::VPSHUFHWZ256mi, 0 },
{ X86::VPSHUFLWZ256ri, X86::VPSHUFLWZ256mi, 0 },		{ X86::VPSHUFLWZ256ri, X86::VPSHUFLWZ256mi, 0 },
{ X86::VPSLLDQZ256rr, X86::VPSLLDQZ256rm, 0 },		{ X86::VPSLLDQZ256rr, X86::VPSLLDQZ256rm, 0 },
{ X86::VPSLLDZ256ri, X86::VPSLLDZ256mi, 0 },		{ X86::VPSLLDZ256ri, X86::VPSLLDZ256mi, 0 },
{ X86::VPSLLQZ256ri, X86::VPSLLQZ256mi, 0 },		{ X86::VPSLLQZ256ri, X86::VPSLLQZ256mi, 0 },
{ X86::VPSLLWZ256ri, X86::VPSLLWZ256mi, 0 },		{ X86::VPSLLWZ256ri, X86::VPSLLWZ256mi, 0 },
{ X86::VPSRADZ256ri, X86::VPSRADZ256mi, 0 },		{ X86::VPSRADZ256ri, X86::VPSRADZ256mi, 0 },
{ X86::VPSRAQZ256ri, X86::VPSRAQZ256mi, 0 },		{ X86::VPSRAQZ256ri, X86::VPSRAQZ256mi, 0 },
{ X86::VPSRAWZ256ri, X86::VPSRAWZ256mi, 0 },		{ X86::VPSRAWZ256ri, X86::VPSRAWZ256mi, 0 },
{ X86::VPSRLDQZ256rr, X86::VPSRLDQZ256rm, 0 },		{ X86::VPSRLDQZ256rr, X86::VPSRLDQZ256rm, 0 },
{ X86::VPSRLDZ256ri, X86::VPSRLDZ256mi, 0 },		{ X86::VPSRLDZ256ri, X86::VPSRLDZ256mi, 0 },
{ X86::VPSRLQZ256ri, X86::VPSRLQZ256mi, 0 },		{ X86::VPSRLQZ256ri, X86::VPSRLQZ256mi, 0 },
{ X86::VPSRLWZ256ri, X86::VPSRLWZ256mi, 0 },		{ X86::VPSRLWZ256ri, X86::VPSRLWZ256mi, 0 },

// AVX-512 foldable instructions (128-bit versions)		// AVX-512 foldable instructions (128-bit versions)
{ X86::VBROADCASTSSZ128r, X86::VBROADCASTSSZ128m, TB_NO_REVERSE },		{ X86::VBROADCASTSSZ128r, X86::VBROADCASTSSZ128m, TB_NO_REVERSE },
{ X86::VMOVAPDZ128rr, X86::VMOVAPDZ128rm, TB_ALIGN_16 },		{ X86::VMOVAPDZ128rr, X86::VMOVAPDZ128rm, TB_ALIGN_16 },
{ X86::VMOVAPSZ128rr, X86::VMOVAPSZ128rm, TB_ALIGN_16 },		{ X86::VMOVAPSZ128rr, X86::VMOVAPSZ128rm, TB_ALIGN_16 },
{ X86::VMOVDQA32Z128rr, X86::VMOVDQA32Z128rm, TB_ALIGN_16 },		{ X86::VMOVDQA32Z128rr, X86::VMOVDQA32Z128rm, TB_ALIGN_16 },
{ X86::VMOVDQA64Z128rr, X86::VMOVDQA64Z128rm, TB_ALIGN_16 },		{ X86::VMOVDQA64Z128rr, X86::VMOVDQA64Z128rm, TB_ALIGN_16 },
{ X86::VMOVDQU8Z128rr, X86::VMOVDQU8Z128rm, 0 },		{ X86::VMOVDQU8Z128rr, X86::VMOVDQU8Z128rm, 0 },
{ X86::VMOVDQU16Z128rr, X86::VMOVDQU16Z128rm, 0 },		{ X86::VMOVDQU16Z128rr, X86::VMOVDQU16Z128rm, 0 },
{ X86::VMOVDQU32Z128rr, X86::VMOVDQU32Z128rm, 0 },		{ X86::VMOVDQU32Z128rr, X86::VMOVDQU32Z128rm, 0 },
{ X86::VMOVDQU64Z128rr, X86::VMOVDQU64Z128rm, 0 },		{ X86::VMOVDQU64Z128rr, X86::VMOVDQU64Z128rm, 0 },
{ X86::VMOVUPDZ128rr, X86::VMOVUPDZ128rm, 0 },		{ X86::VMOVUPDZ128rr, X86::VMOVUPDZ128rm, 0 },
{ X86::VMOVUPSZ128rr, X86::VMOVUPSZ128rm, 0 },		{ X86::VMOVUPSZ128rr, X86::VMOVUPSZ128rm, 0 },
{ X86::VPABSBZ128rr, X86::VPABSBZ128rm, 0 },		{ X86::VPABSBZ128rr, X86::VPABSBZ128rm, 0 },
{ X86::VPABSDZ128rr, X86::VPABSDZ128rm, 0 },		{ X86::VPABSDZ128rr, X86::VPABSDZ128rm, 0 },
{ X86::VPABSQZ128rr, X86::VPABSQZ128rm, 0 },		{ X86::VPABSQZ128rr, X86::VPABSQZ128rm, 0 },
{ X86::VPABSWZ128rr, X86::VPABSWZ128rm, 0 },		{ X86::VPABSWZ128rr, X86::VPABSWZ128rm, 0 },
{ X86::VPERMILPDZ128ri, X86::VPERMILPDZ128mi, 0 },		{ X86::VPERMILPDZ128ri, X86::VPERMILPDZ128mi, 0 },
{ X86::VPERMILPSZ128ri, X86::VPERMILPSZ128mi, 0 },		{ X86::VPERMILPSZ128ri, X86::VPERMILPSZ128mi, 0 },
{ X86::VPMOVSXBDZ128rr, X86::VPMOVSXBDZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXBDZ128rr, X86::VPMOVSXBDZ128rm, TB_NO_REVERSE },
{ X86::VPMOVSXBQZ128rr, X86::VPMOVSXBQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXBQZ128rr, X86::VPMOVSXBQZ128rm, TB_NO_REVERSE },
{ X86::VPMOVSXBWZ128rr, X86::VPMOVSXBWZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXBWZ128rr, X86::VPMOVSXBWZ128rm, TB_NO_REVERSE },
{ X86::VPMOVSXDQZ128rr, X86::VPMOVSXDQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXDQZ128rr, X86::VPMOVSXDQZ128rm, TB_NO_REVERSE },
{ X86::VPMOVSXWDZ128rr, X86::VPMOVSXWDZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXWDZ128rr, X86::VPMOVSXWDZ128rm, TB_NO_REVERSE },
{ X86::VPMOVSXWQZ128rr, X86::VPMOVSXWQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVSXWQZ128rr, X86::VPMOVSXWQZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXBDZ128rr, X86::VPMOVZXBDZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXBDZ128rr, X86::VPMOVZXBDZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXBQZ128rr, X86::VPMOVZXBQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXBQZ128rr, X86::VPMOVZXBQZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXBWZ128rr, X86::VPMOVZXBWZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXBWZ128rr, X86::VPMOVZXBWZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXDQZ128rr, X86::VPMOVZXDQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXDQZ128rr, X86::VPMOVZXDQZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXWDZ128rr, X86::VPMOVZXWDZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXWDZ128rr, X86::VPMOVZXWDZ128rm, TB_NO_REVERSE },
{ X86::VPMOVZXWQZ128rr, X86::VPMOVZXWQZ128rm, TB_NO_REVERSE },		{ X86::VPMOVZXWQZ128rr, X86::VPMOVZXWQZ128rm, TB_NO_REVERSE },
{ X86::VPSHUFDZ128ri, X86::VPSHUFDZ128mi, 0 },		{ X86::VPSHUFDZ128ri, X86::VPSHUFDZ128mi, 0 },
{ X86::VPSHUFHWZ128ri, X86::VPSHUFHWZ128mi, 0 },		{ X86::VPSHUFHWZ128ri, X86::VPSHUFHWZ128mi, 0 },
{ X86::VPSHUFLWZ128ri, X86::VPSHUFLWZ128mi, 0 },		{ X86::VPSHUFLWZ128ri, X86::VPSHUFLWZ128mi, 0 },
{ X86::VPSLLDQZ128rr, X86::VPSLLDQZ128rm, 0 },		{ X86::VPSLLDQZ128rr, X86::VPSLLDQZ128rm, 0 },
{ X86::VPSLLDZ128ri, X86::VPSLLDZ128mi, 0 },		{ X86::VPSLLDZ128ri, X86::VPSLLDZ128mi, 0 },
{ X86::VPSLLQZ128ri, X86::VPSLLQZ128mi, 0 },		{ X86::VPSLLQZ128ri, X86::VPSLLQZ128mi, 0 },
{ X86::VPSLLWZ128ri, X86::VPSLLWZ128mi, 0 },		{ X86::VPSLLWZ128ri, X86::VPSLLWZ128mi, 0 },
{ X86::VPSRADZ128ri, X86::VPSRADZ128mi, 0 },		{ X86::VPSRADZ128ri, X86::VPSRADZ128mi, 0 },
{ X86::VPSRAQZ128ri, X86::VPSRAQZ128mi, 0 },		{ X86::VPSRAQZ128ri, X86::VPSRAQZ128mi, 0 },
{ X86::VPSRAWZ128ri, X86::VPSRAWZ128mi, 0 },		{ X86::VPSRAWZ128ri, X86::VPSRAWZ128mi, 0 },
{ X86::VPSRLDQZ128rr, X86::VPSRLDQZ128rm, 0 },		{ X86::VPSRLDQZ128rr, X86::VPSRLDQZ128rm, 0 },
{ X86::VPSRLDZ128ri, X86::VPSRLDZ128mi, 0 },		{ X86::VPSRLDZ128ri, X86::VPSRLDZ128mi, 0 },
{ X86::VPSRLQZ128ri, X86::VPSRLQZ128mi, 0 },		{ X86::VPSRLQZ128ri, X86::VPSRLQZ128mi, 0 },
{ X86::VPSRLWZ128ri, X86::VPSRLWZ128mi, 0 },		{ X86::VPSRLWZ128ri, X86::VPSRLWZ128mi, 0 },

// F16C foldable instructions		// F16C foldable instructions
{ X86::VCVTPH2PSrr, X86::VCVTPH2PSrm, 0 },		{ X86::VCVTPH2PSrr, X86::VCVTPH2PSrm, 0 },
{ X86::VCVTPH2PSYrr, X86::VCVTPH2PSYrm, 0 },		{ X86::VCVTPH2PSYrr, X86::VCVTPH2PSYrm, 0 },

// AES foldable instructions		// AES foldable instructions
{ X86::AESIMCrr, X86::AESIMCrm, TB_ALIGN_16 },		{ X86::AESIMCrr, X86::AESIMCrm, TB_ALIGN_16 },
{ X86::AESKEYGENASSIST128rr, X86::AESKEYGENASSIST128rm, TB_ALIGN_16 },		{ X86::AESKEYGENASSIST128rr, X86::AESKEYGENASSIST128rm, TB_ALIGN_16 },
{ X86::VAESIMCrr, X86::VAESIMCrm, 0 },		{ X86::VAESIMCrr, X86::VAESIMCrm, 0 },
{ X86::VAESKEYGENASSIST128rr, X86::VAESKEYGENASSIST128rm, 0 }		{ X86::VAESKEYGENASSIST128rr, X86::VAESKEYGENASSIST128rm, 0 }
		RKSimonUnsubmitted Done Reply Inline Actions Please revert these whitespace changes. RKSimon: Please revert these whitespace changes.
		oren_ben_simhonAuthorUnsubmitted Not Done Reply Inline Actions Shouldn't we follow clang-format formatting? oren_ben_simhon: Shouldn't we follow clang-format formatting?
};		};

for (X86MemoryFoldTableEntry Entry : MemoryFoldTable1) {		for (X86MemoryFoldTableEntry Entry : MemoryFoldTable1) {
AddTableEntry(RegOp2MemOpTable1, MemOp2RegOpTable,		AddTableEntry(RegOp2MemOpTable1, MemOp2RegOpTable,
Entry.RegOp, Entry.MemOp,		Entry.RegOp, Entry.MemOp,
// Index 1, folded load		// Index 1, folded load
Entry.Flags \| TB_INDEX_1 \| TB_FOLDED_LOAD);		Entry.Flags \| TB_INDEX_1 \| TB_FOLDED_LOAD);
}		}
▲ Show 20 Lines • Show All 1,257 Lines • ▼ Show 20 Lines
{ X86::VUNPCKLPSZ128rr, X86::VUNPCKLPSZ128rm, 0 },		{ X86::VUNPCKLPSZ128rr, X86::VUNPCKLPSZ128rm, 0 },
{ X86::VUNPCKLPSZ256rr, X86::VUNPCKLPSZ256rm, 0 },		{ X86::VUNPCKLPSZ256rr, X86::VUNPCKLPSZ256rm, 0 },
{ X86::VXORPDZ128rr, X86::VXORPDZ128rm, 0 },		{ X86::VXORPDZ128rr, X86::VXORPDZ128rm, 0 },
{ X86::VXORPDZ256rr, X86::VXORPDZ256rm, 0 },		{ X86::VXORPDZ256rr, X86::VXORPDZ256rm, 0 },
{ X86::VXORPSZ128rr, X86::VXORPSZ128rm, 0 },		{ X86::VXORPSZ128rr, X86::VXORPSZ128rm, 0 },
{ X86::VXORPSZ256rr, X86::VXORPSZ256rm, 0 },		{ X86::VXORPSZ256rr, X86::VXORPSZ256rm, 0 },

// AVX-512 masked foldable instructions		// AVX-512 masked foldable instructions
		{ X86::VPOPCNTDZrrkz, X86::VPOPCNTDZrmkz, TB_NO_REVERSE },
		{ X86::VPOPCNTQZrrkz, X86::VPOPCNTQZrmkz, TB_NO_REVERSE },
		craig.topperUnsubmitted Done Reply Inline Actions Alphabetize craig.topper: Alphabetize
{ X86::VBROADCASTSSZrkz, X86::VBROADCASTSSZmkz, TB_NO_REVERSE },		{ X86::VBROADCASTSSZrkz, X86::VBROADCASTSSZmkz, TB_NO_REVERSE },
{ X86::VBROADCASTSDZrkz, X86::VBROADCASTSDZmkz, TB_NO_REVERSE },		{ X86::VBROADCASTSDZrkz, X86::VBROADCASTSDZmkz, TB_NO_REVERSE },
{ X86::VPABSBZrrkz, X86::VPABSBZrmkz, 0 },		{ X86::VPABSBZrrkz, X86::VPABSBZrmkz, 0 },
{ X86::VPABSDZrrkz, X86::VPABSDZrmkz, 0 },		{ X86::VPABSDZrrkz, X86::VPABSDZrmkz, 0 },
{ X86::VPABSQZrrkz, X86::VPABSQZrmkz, 0 },		{ X86::VPABSQZrrkz, X86::VPABSQZrmkz, 0 },
{ X86::VPABSWZrrkz, X86::VPABSWZrmkz, 0 },		{ X86::VPABSWZrrkz, X86::VPABSWZrmkz, 0 },
{ X86::VPERMILPDZrikz, X86::VPERMILPDZmikz, 0 },		{ X86::VPERMILPDZrikz, X86::VPERMILPDZmikz, 0 },
{ X86::VPERMILPSZrikz, X86::VPERMILPSZmikz, 0 },		{ X86::VPERMILPSZrikz, X86::VPERMILPSZmikz, 0 },
▲ Show 20 Lines • Show All 605 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable3[] = {
{ X86::VUNPCKHPDZ128rrkz, X86::VUNPCKHPDZ128rmkz, 0 },		{ X86::VUNPCKHPDZ128rrkz, X86::VUNPCKHPDZ128rmkz, 0 },
{ X86::VUNPCKHPSZ128rrkz, X86::VUNPCKHPSZ128rmkz, 0 },		{ X86::VUNPCKHPSZ128rrkz, X86::VUNPCKHPSZ128rmkz, 0 },
{ X86::VUNPCKLPDZ128rrkz, X86::VUNPCKLPDZ128rmkz, 0 },		{ X86::VUNPCKLPDZ128rrkz, X86::VUNPCKLPDZ128rmkz, 0 },
{ X86::VUNPCKLPSZ128rrkz, X86::VUNPCKLPSZ128rmkz, 0 },		{ X86::VUNPCKLPSZ128rrkz, X86::VUNPCKLPSZ128rmkz, 0 },
{ X86::VXORPDZ128rrkz, X86::VXORPDZ128rmkz, 0 },		{ X86::VXORPDZ128rrkz, X86::VXORPDZ128rmkz, 0 },
{ X86::VXORPSZ128rrkz, X86::VXORPSZ128rmkz, 0 },		{ X86::VXORPSZ128rrkz, X86::VXORPSZ128rmkz, 0 },

// AVX-512 masked foldable instructions		// AVX-512 masked foldable instructions
		{ X86::VPOPCNTDZrrk, X86::VPOPCNTDZrmk, TB_NO_REVERSE },
		{ X86::VPOPCNTQZrrk, X86::VPOPCNTQZrmk, TB_NO_REVERSE },
		craig.topperUnsubmitted Done Reply Inline Actions Alphabetize craig.topper: Alphabetize
{ X86::VBROADCASTSSZrk, X86::VBROADCASTSSZmk, TB_NO_REVERSE },		{ X86::VBROADCASTSSZrk, X86::VBROADCASTSSZmk, TB_NO_REVERSE },
{ X86::VBROADCASTSDZrk, X86::VBROADCASTSDZmk, TB_NO_REVERSE },		{ X86::VBROADCASTSDZrk, X86::VBROADCASTSDZmk, TB_NO_REVERSE },
{ X86::VPABSBZrrk, X86::VPABSBZrmk, 0 },		{ X86::VPABSBZrrk, X86::VPABSBZrmk, 0 },
{ X86::VPABSDZrrk, X86::VPABSDZrmk, 0 },		{ X86::VPABSDZrrk, X86::VPABSDZrmk, 0 },
{ X86::VPABSQZrrk, X86::VPABSQZrmk, 0 },		{ X86::VPABSQZrrk, X86::VPABSQZrmk, 0 },
{ X86::VPABSWZrrk, X86::VPABSWZrmk, 0 },		{ X86::VPABSWZrrk, X86::VPABSWZrmk, 0 },
{ X86::VPERMILPDZrik, X86::VPERMILPDZmik, 0 },		{ X86::VPERMILPDZrik, X86::VPERMILPDZmik, 0 },
{ X86::VPERMILPSZrik, X86::VPERMILPSZmik, 0 },		{ X86::VPERMILPSZrik, X86::VPERMILPSZmik, 0 },
▲ Show 20 Lines • Show All 4,087 Lines • ▼ Show 20 Lines	inline static bool isDefConvertible(MachineInstr &MI) {
case X86::BZHI32rr: case X86::BZHI32rm:		case X86::BZHI32rr: case X86::BZHI32rm:
case X86::BZHI64rr: case X86::BZHI64rm:		case X86::BZHI64rr: case X86::BZHI64rm:
case X86::LZCNT16rr: case X86::LZCNT16rm:		case X86::LZCNT16rr: case X86::LZCNT16rm:
case X86::LZCNT32rr: case X86::LZCNT32rm:		case X86::LZCNT32rr: case X86::LZCNT32rm:
case X86::LZCNT64rr: case X86::LZCNT64rm:		case X86::LZCNT64rr: case X86::LZCNT64rm:
case X86::POPCNT16rr:case X86::POPCNT16rm:		case X86::POPCNT16rr:case X86::POPCNT16rm:
case X86::POPCNT32rr:case X86::POPCNT32rm:		case X86::POPCNT32rr:case X86::POPCNT32rm:
case X86::POPCNT64rr:case X86::POPCNT64rm:		case X86::POPCNT64rr:case X86::POPCNT64rm:
		case X86::VPOPCNTDZrrk: case X86::VPOPCNTDZrmk:
		case X86::VPOPCNTQZrrk: case X86::VPOPCNTQZrmk:
		case X86::VPOPCNTDZrrkz: case X86::VPOPCNTDZrmkz:
		case X86::VPOPCNTQZrrkz: case X86::VPOPCNTQZrmkz:
		case X86::VPOPCNTDZrr: case X86::VPOPCNTDZrm:
		case X86::VPOPCNTQZrr: case X86::VPOPCNTQZrm:
		craig.topperUnsubmitted Done Reply Inline Actions I think this also an EFLAGS related piece of code. So the vector pop br shouldn't be here. craig.topper: I think this also an EFLAGS related piece of code. So the vector pop br shouldn't be here.
		RKSimonUnsubmitted Done Reply Inline Actions +1 - I don't think this is going to work for vectors, but you can try later if you want. RKSimon: +1 - I don't think this is going to work for vectors, but you can try later if you want.
case X86::TZCNT16rr: case X86::TZCNT16rm:		case X86::TZCNT16rr: case X86::TZCNT16rm:
case X86::TZCNT32rr: case X86::TZCNT32rm:		case X86::TZCNT32rr: case X86::TZCNT32rm:
case X86::TZCNT64rr: case X86::TZCNT64rm:		case X86::TZCNT64rr: case X86::TZCNT64rm:
return true;		return true;
}		}
}		}

/// Check whether the use can be converted to remove a comparison against zero.		/// Check whether the use can be converted to remove a comparison against zero.
static X86::CondCode isUseDefConvertible(MachineInstr &MI) {		static X86::CondCode isUseDefConvertible(MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return X86::COND_INVALID;		default: return X86::COND_INVALID;
case X86::LZCNT16rr: case X86::LZCNT16rm:		case X86::LZCNT16rr: case X86::LZCNT16rm:
case X86::LZCNT32rr: case X86::LZCNT32rm:		case X86::LZCNT32rr: case X86::LZCNT32rm:
case X86::LZCNT64rr: case X86::LZCNT64rm:		case X86::LZCNT64rr: case X86::LZCNT64rm:
return X86::COND_B;		return X86::COND_B;
case X86::POPCNT16rr:case X86::POPCNT16rm:		case X86::POPCNT16rr:case X86::POPCNT16rm:
case X86::POPCNT32rr:case X86::POPCNT32rm:		case X86::POPCNT32rr:case X86::POPCNT32rm:
case X86::POPCNT64rr:case X86::POPCNT64rm:		case X86::POPCNT64rr:case X86::POPCNT64rm:
return X86::COND_E;		return X86::COND_E;
		craig.topperUnsubmitted Done Reply Inline Actions I don't think belongs here. These instructions are ones that update EFLAGS Z flag based on their output being 0. That's not ture of VPOPCNTD/Q. craig.topper: I don't think belongs here. These instructions are ones that update EFLAGS Z flag based on…
case X86::TZCNT16rr: case X86::TZCNT16rm:		case X86::TZCNT16rr: case X86::TZCNT16rm:
case X86::TZCNT32rr: case X86::TZCNT32rm:		case X86::TZCNT32rr: case X86::TZCNT32rm:
case X86::TZCNT64rr: case X86::TZCNT64rm:		case X86::TZCNT64rr: case X86::TZCNT64rm:
return X86::COND_B;		return X86::COND_B;
}		}
}		}

/// Check if there exists an earlier instruction that		/// Check if there exists an earlier instruction that
▲ Show 20 Lines • Show All 3,480 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 807 Lines • ▼ Show 20 Lines
	def HasAVX1Only : Predicate<"Subtarget->hasAVX() && !Subtarget->hasAVX2()">;			def HasAVX1Only : Predicate<"Subtarget->hasAVX() && !Subtarget->hasAVX2()">;
	def HasAVX512 : Predicate<"Subtarget->hasAVX512()">,			def HasAVX512 : Predicate<"Subtarget->hasAVX512()">,
	AssemblerPredicate<"FeatureAVX512", "AVX-512 ISA">;			AssemblerPredicate<"FeatureAVX512", "AVX-512 ISA">;
	def UseAVX : Predicate<"Subtarget->hasAVX() && !Subtarget->hasAVX512()">;			def UseAVX : Predicate<"Subtarget->hasAVX() && !Subtarget->hasAVX512()">;
	def UseAVX2 : Predicate<"Subtarget->hasAVX2() && !Subtarget->hasAVX512()">;			def UseAVX2 : Predicate<"Subtarget->hasAVX2() && !Subtarget->hasAVX512()">;
	def NoAVX512 : Predicate<"!Subtarget->hasAVX512()">;			def NoAVX512 : Predicate<"!Subtarget->hasAVX512()">;
	def HasCDI : Predicate<"Subtarget->hasCDI()">,			def HasCDI : Predicate<"Subtarget->hasCDI()">,
	AssemblerPredicate<"FeatureCDI", "AVX-512 CD ISA">;			AssemblerPredicate<"FeatureCDI", "AVX-512 CD ISA">;
				def HasVPOPCNTDQ : Predicate<"Subtarget->hasVPOPCNTDQ()">,
				AssemblerPredicate<"FeatureVPOPCNTDQ", "AVX-512 VPOPCNTDQ ISA">;
	def HasPFI : Predicate<"Subtarget->hasPFI()">,			def HasPFI : Predicate<"Subtarget->hasPFI()">,
				craig.topperUnsubmitted Done Reply Inline Actions I don't see NoVPOPCNTDQ being used anywhere so we probably shouldn't add it. craig.topper: I don't see NoVPOPCNTDQ being used anywhere so we probably shouldn't add it.
	AssemblerPredicate<"FeaturePFI", "AVX-512 PF ISA">;			AssemblerPredicate<"FeaturePFI", "AVX-512 PF ISA">;
	def HasERI : Predicate<"Subtarget->hasERI()">,			def HasERI : Predicate<"Subtarget->hasERI()">,
	AssemblerPredicate<"FeatureERI", "AVX-512 ER ISA">;			AssemblerPredicate<"FeatureERI", "AVX-512 ER ISA">;
	def HasDQI : Predicate<"Subtarget->hasDQI()">,			def HasDQI : Predicate<"Subtarget->hasDQI()">,
	AssemblerPredicate<"FeatureDQI", "AVX-512 DQ ISA">;			AssemblerPredicate<"FeatureDQI", "AVX-512 DQ ISA">;
	def NoDQI : Predicate<"!Subtarget->hasDQI()">;			def NoDQI : Predicate<"!Subtarget->hasDQI()">;
	def HasBWI : Predicate<"Subtarget->hasBWI()">,			def HasBWI : Predicate<"Subtarget->hasBWI()">,
	AssemblerPredicate<"FeatureBWI", "AVX-512 BW ISA">;			AssemblerPredicate<"FeatureBWI", "AVX-512 BW ISA">;
	▲ Show 20 Lines • Show All 2,428 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.h

Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	protected:
bool HasPFI;		bool HasPFI;

/// Processor has AVX-512 Exponential and Reciprocal Instructions		/// Processor has AVX-512 Exponential and Reciprocal Instructions
bool HasERI;		bool HasERI;

/// Processor has AVX-512 Conflict Detection Instructions		/// Processor has AVX-512 Conflict Detection Instructions
bool HasCDI;		bool HasCDI;

		/// Processor has AVX-512 population count Instructions
		bool HasVPOPCNTDQ;

/// Processor has AVX-512 Doubleword and Quadword instructions		/// Processor has AVX-512 Doubleword and Quadword instructions
bool HasDQI;		bool HasDQI;

/// Processor has AVX-512 Byte and Word instructions		/// Processor has AVX-512 Byte and Word instructions
bool HasBWI;		bool HasBWI;

/// Processor has AVX-512 Vector Length eXtenstions		/// Processor has AVX-512 Vector Length eXtenstions
bool HasVLX;		bool HasVLX;
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	public:
bool hasSlowDivide32() const { return HasSlowDivide32; }		bool hasSlowDivide32() const { return HasSlowDivide32; }
bool hasSlowDivide64() const { return HasSlowDivide64; }		bool hasSlowDivide64() const { return HasSlowDivide64; }
bool padShortFunctions() const { return PadShortFunctions; }		bool padShortFunctions() const { return PadShortFunctions; }
bool callRegIndirect() const { return CallRegIndirect; }		bool callRegIndirect() const { return CallRegIndirect; }
bool LEAusesAG() const { return LEAUsesAG; }		bool LEAusesAG() const { return LEAUsesAG; }
bool slowLEA() const { return SlowLEA; }		bool slowLEA() const { return SlowLEA; }
bool slowIncDec() const { return SlowIncDec; }		bool slowIncDec() const { return SlowIncDec; }
bool hasCDI() const { return HasCDI; }		bool hasCDI() const { return HasCDI; }
		bool hasVPOPCNTDQ() const { return HasVPOPCNTDQ; }
bool hasPFI() const { return HasPFI; }		bool hasPFI() const { return HasPFI; }
bool hasERI() const { return HasERI; }		bool hasERI() const { return HasERI; }
bool hasDQI() const { return HasDQI; }		bool hasDQI() const { return HasDQI; }
bool hasBWI() const { return HasBWI; }		bool hasBWI() const { return HasBWI; }
bool hasVLX() const { return HasVLX; }		bool hasVLX() const { return HasVLX; }
bool hasPKU() const { return HasPKU; }		bool hasPKU() const { return HasPKU; }
bool hasMPX() const { return HasMPX; }		bool hasMPX() const { return HasMPX; }
bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }		bool hasCLFLUSHOPT() const { return HasCLFLUSHOPT; }
▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

lib/Target/X86/X86Subtarget.cpp

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	void X86Subtarget::initializeEnvironment() {
HasBMI2 = false;		HasBMI2 = false;
HasVBMI = false;		HasVBMI = false;
HasIFMA = false;		HasIFMA = false;
HasRTM = false;		HasRTM = false;
HasERI = false;		HasERI = false;
HasCDI = false;		HasCDI = false;
HasPFI = false;		HasPFI = false;
HasDQI = false;		HasDQI = false;
		HasVPOPCNTDQ = false;
HasBWI = false;		HasBWI = false;
HasVLX = false;		HasVLX = false;
HasADX = false;		HasADX = false;
HasPKU = false;		HasPKU = false;
HasSHA = false;		HasSHA = false;
HasPRFCHW = false;		HasPRFCHW = false;
HasRDSEED = false;		HasRDSEED = false;
HasLAHFSAHF = false;		HasLAHFSAHF = false;
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512vpopcntdq \| FileCheck %s
				RKSimonUnsubmitted Done Reply Inline Actions Add --show-mc-encoding ? RKSimon: Add --show-mc-encoding ?
				craig.topperUnsubmitted Done Reply Inline Actions But should we even have this test file or just use the the popcount tests since these aren't x86 specific intrinsics anymore? craig.topper: But should we even have this test file or just use the the popcount tests since these aren't…
				craig.topperUnsubmitted Done Reply Inline Actions Nevermind I guess we need this for mask testing? I assume the generic test doesn't cover it. craig.topper: Nevermind I guess we need this for mask testing? I assume the generic test doesn't cover it.
				oren_ben_simhonAuthorUnsubmitted Not Done Reply Inline Actions I moved the tests to corresponding vector_popcnt_.ll files. Also the tests check for X86 instructions as such should reside in X86 directory, oren_ben_simhon:* I moved the tests to corresponding vector_popcnt_*.ll files. Also the tests check for X86…
				oren_ben_simhonAuthorUnsubmitted Not Done Reply Inline Actions Add --show-mc-encoding ? Encoding is not tested in this file. See file test/MC/X86/x86-64-avx512vpopcntdq.s. oren_ben_simhon: > Add --show-mc-encoding ? Encoding is not tested in this file. See file test/MC/X86/x86-64…

				RKSimonUnsubmitted Done Reply Inline Actions Probably worth testing on i686-unknown-unknown triple as well. I know its overlaps mc test coverage but adding --show-mc-encoding would be trivial since the filechecks are auto-generated, we do this on many other intrinsics test files. RKSimon: Probably worth testing on i686-unknown-unknown triple as well. I know its overlaps mc test…
				declare <16 x i32> @llvm.ctpop.v16i32(<16 x i32>)

				define <16 x i32> @test_vpopcnt_d(<16 x i32> %a) {
				; CHECK-LABEL: test_vpopcnt_d:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpopcntd %zmm0, %zmm0
				; CHECK-NEXT: retq
				%res = tail call <16 x i32> @llvm.ctpop.v16i32(<16 x i32> %a)
				ret <16 x i32> %res
				}

				define <16 x i32> @test_mask_vpopcnt_d(<16 x i32> %a, i16 %mask, <16 x i32> %b) {
				; CHECK-LABEL: test_mask_vpopcnt_d:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpopcntd %zmm1, %zmm0 {%k1}
				; CHECK-NEXT: retq
				%1 = tail call <16 x i32> @llvm.ctpop.v16i32(<16 x i32> %b)
				%2 = bitcast i16 %mask to <16 x i1>
				%3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> %a
				ret <16 x i32> %3
				}

				define <16 x i32> @test_maskz_vpopcnt_d(i16 %mask, <16 x i32> %a) {
				; CHECK-LABEL: test_maskz_vpopcnt_d:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpopcntd %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%1 = tail call <16 x i32> @llvm.ctpop.v16i32(<16 x i32> %a)
				%2 = bitcast i16 %mask to <16 x i1>
				%3 = select <16 x i1> %2, <16 x i32> %1, <16 x i32> zeroinitializer
				ret <16 x i32> %3
				}

				declare <8 x i64> @llvm.ctpop.v8i64(<8 x i64>)

				define <8 x i64> @test_vpopcnt_q(<8 x i64> %a) {
				; CHECK-LABEL: test_vpopcnt_q:
				; CHECK: ## BB#0:
				; CHECK-NEXT: vpopcntq %zmm0, %zmm0
				; CHECK-NEXT: retq
				%1 = tail call <8 x i64> @llvm.ctpop.v8i64(<8 x i64> %a)
				ret <8 x i64> %1
				}

				define <8 x i64> @test_mask_vpopcnt_q(<8 x i64> %a, <8 x i64> %b, i8 %mask) {
				; CHECK-LABEL: test_mask_vpopcnt_q:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpopcntq %zmm0, %zmm1 {%k1}
				; CHECK-NEXT: vmovdqa64 %zmm1, %zmm0
				; CHECK-NEXT: retq
				%1 = tail call <8 x i64> @llvm.ctpop.v8i64(<8 x i64> %a)
				%2 = bitcast i8 %mask to <8 x i1>
				%3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> %b
				ret <8 x i64> %3
				}

				define <8 x i64> @test_maskz_vpopcnt_q(<8 x i64> %a, i8 %mask) {
				; CHECK-LABEL: test_maskz_vpopcnt_q:
				; CHECK: ## BB#0:
				; CHECK-NEXT: kmovw %edi, %k1
				; CHECK-NEXT: vpopcntq %zmm0, %zmm0 {%k1} {z}
				; CHECK-NEXT: retq
				%1 = tail call <8 x i64> @llvm.ctpop.v8i64(<8 x i64> %a)
				%2 = bitcast i8 %mask to <8 x i1>
				%3 = select <8 x i1> %2, <8 x i64> %1, <8 x i64> zeroinitializer
				ret <8 x i64> %3
				}

				declare <2 x i64> @llvm.ctpop.v2i64(<2 x i64>)
				declare <4 x i32> @llvm.ctpop.v4i32(<4 x i32>)

				define <2 x i64> @testv2i64(<2 x i64> %in) nounwind {
				; CHECK-LABEL: testv2i64:
				; CHECK: # BB#0:
				; CHECK-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<def>
				; CHECK-NEXT: vpopcntq %zmm0, %zmm0
				; CHECK-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
				; CHECK: retq
				%out = call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> %in)
				ret <2 x i64> %out
				}

				define <4 x i32> @testv4i32(<4 x i32> %in) nounwind {
				; CHECK-LABEL: testv4i32:
				; CHECK: # BB#0:
				; CHECK-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<def>
				; CHECK-NEXT: vpopcntd %zmm0, %zmm0
				; CHECK-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
				; CHECK: retq
				%out = call <4 x i32> @llvm.ctpop.v4i32(<4 x i32> %in)
				ret <4 x i32> %out
				}

				declare <4 x i64> @llvm.ctpop.v4i64(<4 x i64>)
				declare <8 x i32> @llvm.ctpop.v8i32(<8 x i32>)

				define <4 x i64> @testv4i64(<4 x i64> %in) nounwind {
				; CHECK-LABEL: testv4i64:
				; CHECK: # BB#0:
				; CHECK-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<def>
				; CHECK-NEXT: vpopcntq %zmm0, %zmm0
				; CHECK-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<kill>
				; CHECK-NEXT: retq
				%out = call <4 x i64> @llvm.ctpop.v4i64(<4 x i64> %in)
				ret <4 x i64> %out
				}

				define <8 x i32> @testv8i32(<8 x i32> %in) nounwind {
				; CHECK-LABEL: testv8i32:
				; CHECK: # BB#0:
				; CHECK-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<def>
				; CHECK-NEXT: vpopcntd %zmm0, %zmm0
				; CHECK-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<kill>
				; CHECK-NEXT: retq
				%out = call <8 x i32> @llvm.ctpop.v8i32(<8 x i32> %in)
				ret <8 x i32> %out
				}

test/CodeGen/X86/vector-tzcnt-128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse3 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE3			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse3 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE3
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSSE3			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+ssse3 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSSE3
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE41			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=ALL --check-prefix=SSE --check-prefix=SSE41
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CDVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CDVL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,-avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CD			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,-avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CD
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vpopcntdq \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512VPOPCNTDQ
				RKSimonUnsubmitted Done Reply Inline Actions Re-generate these files, don't manually edit them. RKSimon: Re-generate these files, don't manually edit them.
	;			;
	; Just one 32-bit run to make sure we do reasonable things for i64 tzcnt.			; Just one 32-bit run to make sure we do reasonable things for i64 tzcnt.
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=ALL --check-prefix=X32-SSE --check-prefix=X32-SSE41			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefix=ALL --check-prefix=X32-SSE --check-prefix=X32-SSE41

	define <2 x i64> @testv2i64(<2 x i64> %in) nounwind {			define <2 x i64> @testv2i64(<2 x i64> %in) nounwind {
	; SSE2-LABEL: testv2i64:			; SSE2-LABEL: testv2i64:
	; SSE2: # BB#0:			; SSE2: # BB#0:
	; SSE2-NEXT: pxor %xmm1, %xmm1			; SSE2-NEXT: pxor %xmm1, %xmm1
	▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	; AVX512CD-NEXT: vpsubq %xmm0, %xmm1, %xmm1			; AVX512CD-NEXT: vpsubq %xmm0, %xmm1, %xmm1
	; AVX512CD-NEXT: vpand %xmm1, %xmm0, %xmm0			; AVX512CD-NEXT: vpand %xmm1, %xmm0, %xmm0
	; AVX512CD-NEXT: vplzcntq %zmm0, %zmm0			; AVX512CD-NEXT: vplzcntq %zmm0, %zmm0
	; AVX512CD-NEXT: vmovdqa {{.*#+}} xmm1 = [63,63]			; AVX512CD-NEXT: vmovdqa {{.*#+}} xmm1 = [63,63]
	; AVX512CD-NEXT: vpsubq %xmm0, %xmm1, %xmm0			; AVX512CD-NEXT: vpsubq %xmm0, %xmm1, %xmm0
	; AVX512CD-NEXT: vzeroupper			; AVX512CD-NEXT: vzeroupper
	; AVX512CD-NEXT: retq			; AVX512CD-NEXT: retq
	;			;
				; AVX512VPOPCNTDQ-LABEL: testv2i64u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512VPOPCNTDQ-NEXT: vpsubq %xmm0, %xmm1, %xmm1
				; AVX512VPOPCNTDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
				; AVX512VPOPCNTDQ-NEXT: vpsubq {{.*}}(%rip), %xmm0, %xmm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntq %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
				; AVX512VPOPCNTDQ-NEXT: vzeroupper
				; AVX512VPOPCNTDQ-NEXT: retq
				;
	; X32-SSE-LABEL: testv2i64u:			; X32-SSE-LABEL: testv2i64u:
	; X32-SSE: # BB#0:			; X32-SSE: # BB#0:
	; X32-SSE-NEXT: pxor %xmm1, %xmm1			; X32-SSE-NEXT: pxor %xmm1, %xmm1
	; X32-SSE-NEXT: pxor %xmm2, %xmm2			; X32-SSE-NEXT: pxor %xmm2, %xmm2
	; X32-SSE-NEXT: psubq %xmm0, %xmm2			; X32-SSE-NEXT: psubq %xmm0, %xmm2
	; X32-SSE-NEXT: pand %xmm0, %xmm2			; X32-SSE-NEXT: pand %xmm0, %xmm2
	; X32-SSE-NEXT: psubq {{\.LCPI.*}}, %xmm2			; X32-SSE-NEXT: psubq {{\.LCPI.*}}, %xmm2
	; X32-SSE-NEXT: movdqa {{.*#+}} xmm3 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; X32-SSE-NEXT: movdqa {{.*#+}} xmm3 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	▲ Show 20 Lines • Show All 400 Lines • ▼ Show 20 Lines
	; AVX512CD-NEXT: vpsubd %xmm0, %xmm1, %xmm1			; AVX512CD-NEXT: vpsubd %xmm0, %xmm1, %xmm1
	; AVX512CD-NEXT: vpand %xmm1, %xmm0, %xmm0			; AVX512CD-NEXT: vpand %xmm1, %xmm0, %xmm0
	; AVX512CD-NEXT: vplzcntd %zmm0, %zmm0			; AVX512CD-NEXT: vplzcntd %zmm0, %zmm0
	; AVX512CD-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1			; AVX512CD-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
	; AVX512CD-NEXT: vpsubd %xmm0, %xmm1, %xmm0			; AVX512CD-NEXT: vpsubd %xmm0, %xmm1, %xmm0
	; AVX512CD-NEXT: vzeroupper			; AVX512CD-NEXT: vzeroupper
	; AVX512CD-NEXT: retq			; AVX512CD-NEXT: retq
	;			;
				; AVX512VPOPCNTDQ-LABEL: testv4i32u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512VPOPCNTDQ-NEXT: vpsubd %xmm0, %xmm1, %xmm1
				; AVX512VPOPCNTDQ-NEXT: vpand %xmm1, %xmm0, %xmm0
				; AVX512VPOPCNTDQ-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
				; AVX512VPOPCNTDQ-NEXT: vpsubd %xmm1, %xmm0, %xmm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntd %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: # kill: %XMM0<def> %XMM0<kill> %ZMM0<kill>
				; AVX512VPOPCNTDQ-NEXT: vzeroupper
				; AVX512VPOPCNTDQ-NEXT: retq
				;
	; X32-SSE-LABEL: testv4i32u:			; X32-SSE-LABEL: testv4i32u:
	; X32-SSE: # BB#0:			; X32-SSE: # BB#0:
	; X32-SSE-NEXT: pxor %xmm1, %xmm1			; X32-SSE-NEXT: pxor %xmm1, %xmm1
	; X32-SSE-NEXT: pxor %xmm2, %xmm2			; X32-SSE-NEXT: pxor %xmm2, %xmm2
	; X32-SSE-NEXT: psubd %xmm0, %xmm2			; X32-SSE-NEXT: psubd %xmm0, %xmm2
	; X32-SSE-NEXT: pand %xmm0, %xmm2			; X32-SSE-NEXT: pand %xmm0, %xmm2
	; X32-SSE-NEXT: psubd {{\.LCPI.*}}, %xmm2			; X32-SSE-NEXT: psubd {{\.LCPI.*}}, %xmm2
	; X32-SSE-NEXT: movdqa {{.*#+}} xmm0 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; X32-SSE-NEXT: movdqa {{.*#+}} xmm0 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	▲ Show 20 Lines • Show All 699 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-tzcnt-256.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX1
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX2
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CDVL			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,+avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CDVL
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,-avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CD			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512cd,-avx512vl \| FileCheck %s --check-prefix=ALL --check-prefix=AVX --check-prefix=AVX512CD
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512vpopcntdq \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512VPOPCNTDQ
				RKSimonUnsubmitted Done Reply Inline Actions Re-generate these files, don't manually edit them. RKSimon: Re-generate these files, don't manually edit them.
	;			;
	; Just one 32-bit run to make sure we do reasonable things for i64 tzcnt.			; Just one 32-bit run to make sure we do reasonable things for i64 tzcnt.
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=X32-AVX --check-prefix=X32-AVX2			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefix=ALL --check-prefix=X32-AVX --check-prefix=X32-AVX2

	define <4 x i64> @testv4i64(<4 x i64> %in) nounwind {			define <4 x i64> @testv4i64(<4 x i64> %in) nounwind {
	; AVX1-LABEL: testv4i64:			; AVX1-LABEL: testv4i64:
	; AVX1: # BB#0:			; AVX1: # BB#0:
	; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	; AVX512CD-NEXT: vpxor %ymm1, %ymm1, %ymm1			; AVX512CD-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX512CD-NEXT: vpsubq %ymm0, %ymm1, %ymm1			; AVX512CD-NEXT: vpsubq %ymm0, %ymm1, %ymm1
	; AVX512CD-NEXT: vpand %ymm1, %ymm0, %ymm0			; AVX512CD-NEXT: vpand %ymm1, %ymm0, %ymm0
	; AVX512CD-NEXT: vplzcntq %zmm0, %zmm0			; AVX512CD-NEXT: vplzcntq %zmm0, %zmm0
	; AVX512CD-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1			; AVX512CD-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1
	; AVX512CD-NEXT: vpsubq %ymm0, %ymm1, %ymm0			; AVX512CD-NEXT: vpsubq %ymm0, %ymm1, %ymm0
	; AVX512CD-NEXT: retq			; AVX512CD-NEXT: retq
	;			;
				; AVX512VPOPCNTDQ-LABEL: testv4i64u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxor %ymm1, %ymm1, %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpsubq %ymm0, %ymm1, %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpand %ymm1, %ymm0, %ymm0
				; AVX512VPOPCNTDQ-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpsubq %ymm1, %ymm0, %ymm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntq %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<kill>
				; AVX512VPOPCNTDQ-NEXT: retq
				;
	; X32-AVX-LABEL: testv4i64u:			; X32-AVX-LABEL: testv4i64u:
	; X32-AVX: # BB#0:			; X32-AVX: # BB#0:
	; X32-AVX-NEXT: vpxor %ymm1, %ymm1, %ymm1			; X32-AVX-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; X32-AVX-NEXT: vpsubq %ymm0, %ymm1, %ymm2			; X32-AVX-NEXT: vpsubq %ymm0, %ymm1, %ymm2
	; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm0			; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm0
	; X32-AVX-NEXT: vpsubq {{\.LCPI.*}}, %ymm0, %ymm0			; X32-AVX-NEXT: vpsubq {{\.LCPI.*}}, %ymm0, %ymm0
	; X32-AVX-NEXT: vmovdqa {{.*#+}} ymm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; X32-AVX-NEXT: vmovdqa {{.*#+}} ymm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm3			; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm3
	▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines
	; AVX512CD-NEXT: vpxor %ymm1, %ymm1, %ymm1			; AVX512CD-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; AVX512CD-NEXT: vpsubd %ymm0, %ymm1, %ymm1			; AVX512CD-NEXT: vpsubd %ymm0, %ymm1, %ymm1
	; AVX512CD-NEXT: vpand %ymm1, %ymm0, %ymm0			; AVX512CD-NEXT: vpand %ymm1, %ymm0, %ymm0
	; AVX512CD-NEXT: vplzcntd %zmm0, %zmm0			; AVX512CD-NEXT: vplzcntd %zmm0, %zmm0
	; AVX512CD-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1			; AVX512CD-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
	; AVX512CD-NEXT: vpsubd %ymm0, %ymm1, %ymm0			; AVX512CD-NEXT: vpsubd %ymm0, %ymm1, %ymm0
	; AVX512CD-NEXT: retq			; AVX512CD-NEXT: retq
	;			;
				; AVX512VPOPCNTDQ-LABEL: testv8i32u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxor %ymm1, %ymm1, %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpsubd %ymm0, %ymm1, %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpand %ymm1, %ymm0, %ymm0
				; AVX512VPOPCNTDQ-NEXT: vpbroadcastd {{.*}}(%rip), %ymm1
				; AVX512VPOPCNTDQ-NEXT: vpsubd %ymm1, %ymm0, %ymm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntd %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: # kill: %YMM0<def> %YMM0<kill> %ZMM0<kill>
				; AVX512VPOPCNTDQ-NEXT: retq
				;
	; X32-AVX-LABEL: testv8i32u:			; X32-AVX-LABEL: testv8i32u:
	; X32-AVX: # BB#0:			; X32-AVX: # BB#0:
	; X32-AVX-NEXT: vpxor %ymm1, %ymm1, %ymm1			; X32-AVX-NEXT: vpxor %ymm1, %ymm1, %ymm1
	; X32-AVX-NEXT: vpsubd %ymm0, %ymm1, %ymm2			; X32-AVX-NEXT: vpsubd %ymm0, %ymm1, %ymm2
	; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm0			; X32-AVX-NEXT: vpand %ymm2, %ymm0, %ymm0
	; X32-AVX-NEXT: vpbroadcastd {{\.LCPI.*}}, %ymm2			; X32-AVX-NEXT: vpbroadcastd {{\.LCPI.*}}, %ymm2
	; X32-AVX-NEXT: vpsubd %ymm2, %ymm0, %ymm0			; X32-AVX-NEXT: vpsubd %ymm2, %ymm0, %ymm0
	; X32-AVX-NEXT: vmovdqa {{.*#+}} ymm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]			; X32-AVX-NEXT: vmovdqa {{.*#+}} ymm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
	▲ Show 20 Lines • Show All 557 Lines • Show Last 20 Lines

test/CodeGen/X86/vector-tzcnt-512.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512cd,-avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512CD			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512cd,-avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512CD
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512cd,+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512CDBW			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512cd,+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512CDBW
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=-avx512cd,+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512BW			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=-avx512cd,+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512BW
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512vpopcntdq \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512VPOPCNTDQ
				RKSimonUnsubmitted Done Reply Inline Actions Re-generate these files, don't manually edit them. RKSimon: Re-generate these files, don't manually edit them.

	define <8 x i64> @testv8i64(<8 x i64> %in) nounwind {			define <8 x i64> @testv8i64(<8 x i64> %in) nounwind {
	; AVX512CD-LABEL: testv8i64:			; AVX512CD-LABEL: testv8i64:
	; AVX512CD: ## BB#0:			; AVX512CD: ## BB#0:
	; AVX512CD-NEXT: vpxord %zmm1, %zmm1, %zmm1			; AVX512CD-NEXT: vpxord %zmm1, %zmm1, %zmm1
	; AVX512CD-NEXT: vpsubq %zmm0, %zmm1, %zmm1			; AVX512CD-NEXT: vpsubq %zmm0, %zmm1, %zmm1
	; AVX512CD-NEXT: vpandq %zmm1, %zmm0, %zmm0			; AVX512CD-NEXT: vpandq %zmm1, %zmm0, %zmm0
	; AVX512CD-NEXT: vpsubq {{.*}}(%rip){1to8}, %zmm0, %zmm0			; AVX512CD-NEXT: vpsubq {{.*}}(%rip){1to8}, %zmm0, %zmm0
	▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vmovdqu8 {{.*#+}} zmm4 = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]			; AVX512BW-NEXT: vmovdqu8 {{.*#+}} zmm4 = [0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4]
	; AVX512BW-NEXT: vpshufb %zmm3, %zmm4, %zmm3			; AVX512BW-NEXT: vpshufb %zmm3, %zmm4, %zmm3
	; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm0			; AVX512BW-NEXT: vpsrlw $4, %zmm0, %zmm0
	; AVX512BW-NEXT: vpandq %zmm2, %zmm0, %zmm0			; AVX512BW-NEXT: vpandq %zmm2, %zmm0, %zmm0
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm4, %zmm0			; AVX512BW-NEXT: vpshufb %zmm0, %zmm4, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: vpsadbw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsadbw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VPOPCNTDQ-LABEL: testv8i64u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxord %zmm1, %zmm1, %zmm1
				; AVX512VPOPCNTDQ-NEXT: vpsubq %zmm0, %zmm1, %zmm1
				; AVX512VPOPCNTDQ-NEXT: vpandq %zmm1, %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: vpsubq {{.*}}(%rip){1to8}, %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntq %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: retq
	%out = call <8 x i64> @llvm.cttz.v8i64(<8 x i64> %in, i1 -1)			%out = call <8 x i64> @llvm.cttz.v8i64(<8 x i64> %in, i1 -1)
	ret <8 x i64> %out			ret <8 x i64> %out
	}			}

	define <16 x i32> @testv16i32(<16 x i32> %in) nounwind {			define <16 x i32> @testv16i32(<16 x i32> %in) nounwind {
	; AVX512CD-LABEL: testv16i32:			; AVX512CD-LABEL: testv16i32:
	; AVX512CD: ## BB#0:			; AVX512CD: ## BB#0:
	; AVX512CD-NEXT: vpxord %zmm1, %zmm1, %zmm1			; AVX512CD-NEXT: vpxord %zmm1, %zmm1, %zmm1
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; AVX512BW-NEXT: vpshufb %zmm0, %zmm4, %zmm0			; AVX512BW-NEXT: vpshufb %zmm0, %zmm4, %zmm0
	; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0			; AVX512BW-NEXT: vpaddb %zmm3, %zmm0, %zmm0
	; AVX512BW-NEXT: vpunpckhdq {{.*#+}} zmm2 = zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[10],zmm1[10],zmm0[11],zmm1[11],zmm0[14],zmm1[14],zmm0[15],zmm1[15]			; AVX512BW-NEXT: vpunpckhdq {{.*#+}} zmm2 = zmm0[2],zmm1[2],zmm0[3],zmm1[3],zmm0[6],zmm1[6],zmm0[7],zmm1[7],zmm0[10],zmm1[10],zmm0[11],zmm1[11],zmm0[14],zmm1[14],zmm0[15],zmm1[15]
	; AVX512BW-NEXT: vpsadbw %zmm1, %zmm2, %zmm2			; AVX512BW-NEXT: vpsadbw %zmm1, %zmm2, %zmm2
	; AVX512BW-NEXT: vpunpckldq {{.*#+}} zmm0 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[8],zmm1[8],zmm0[9],zmm1[9],zmm0[12],zmm1[12],zmm0[13],zmm1[13]			; AVX512BW-NEXT: vpunpckldq {{.*#+}} zmm0 = zmm0[0],zmm1[0],zmm0[1],zmm1[1],zmm0[4],zmm1[4],zmm0[5],zmm1[5],zmm0[8],zmm1[8],zmm0[9],zmm1[9],zmm0[12],zmm1[12],zmm0[13],zmm1[13]
	; AVX512BW-NEXT: vpsadbw %zmm1, %zmm0, %zmm0			; AVX512BW-NEXT: vpsadbw %zmm1, %zmm0, %zmm0
	; AVX512BW-NEXT: vpackuswb %zmm2, %zmm0, %zmm0			; AVX512BW-NEXT: vpackuswb %zmm2, %zmm0, %zmm0
	; AVX512BW-NEXT: retq			; AVX512BW-NEXT: retq
				;
				; AVX512VPOPCNTDQ-LABEL: testv16i32u:
				; AVX512VPOPCNTDQ: # BB#0:
				; AVX512VPOPCNTDQ-NEXT: vpxord %zmm1, %zmm1, %zmm1
				; AVX512VPOPCNTDQ-NEXT: vpsubd %zmm0, %zmm1, %zmm1
				; AVX512VPOPCNTDQ-NEXT: vpandd %zmm1, %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: vpsubd {{.*}}(%rip){1to16}, %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: vpopcntd %zmm0, %zmm0
				; AVX512VPOPCNTDQ-NEXT: retq
	%out = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %in, i1 -1)			%out = call <16 x i32> @llvm.cttz.v16i32(<16 x i32> %in, i1 -1)
	ret <16 x i32> %out			ret <16 x i32> %out
	}			}

	define <32 x i16> @testv32i16(<32 x i16> %in) nounwind {			define <32 x i16> @testv32i16(<32 x i16> %in) nounwind {
	; AVX512CD-LABEL: testv32i16:			; AVX512CD-LABEL: testv32i16:
	; AVX512CD: ## BB#0:			; AVX512CD: ## BB#0:
	; AVX512CD-NEXT: vpxor %ymm2, %ymm2, %ymm2			; AVX512CD-NEXT: vpxor %ymm2, %ymm2, %ymm2
	▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

test/MC/X86/x86-64-avx512vpopcntdq.s

				// RUN: llvm-mc -triple x86_64-unknown-unknown -mattr=+avx512vpopcntdq --show-encoding %s \| FileCheck %s

				// CHECK: vpopcntq %zmm25, %zmm20
				// CHECK: encoding: [0x62,0x82,0xfd,0x48,0x55,0xe1]
				vpopcntq %zmm25, %zmm20

				// CHECK: vpopcntq %zmm25, %zmm20 {%k6}
				// CHECK: encoding: [0x62,0x82,0xfd,0x4e,0x55,0xe1]
				vpopcntq %zmm25, %zmm20 {%k6}

				// CHECK: vpopcntq %zmm25, %zmm20 {%k6} {z}
				// CHECK: encoding: [0x62,0x82,0xfd,0xce,0x55,0xe1]
				vpopcntq %zmm25, %zmm20 {%k6} {z}

				// CHECK: vpopcntq (%rcx), %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x21]
				vpopcntq (%rcx), %zmm20

				// CHECK: vpopcntq 291(%rax,%r14,8), %zmm20
				// CHECK: encoding: [0x62,0xa2,0xfd,0x48,0x55,0xa4,0xf0,0x23,0x01,0x00,0x00]
				vpopcntq 291(%rax,%r14,8), %zmm20

				// CHECK: vpopcntq (%rcx){1to8}, %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x21]
				vpopcntq (%rcx){1to8}, %zmm20

				// CHECK: vpopcntq 4064(%rdx), %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0xa2,0xe0,0x0f,0x00,0x00]
				vpopcntq 4064(%rdx), %zmm20

				// CHECK: vpopcntq 4096(%rdx), %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x62,0x40]
				vpopcntq 4096(%rdx), %zmm20

				// CHECK: vpopcntq -4096(%rdx), %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x62,0xc0]
				vpopcntq -4096(%rdx), %zmm20

				// CHECK: vpopcntq -4128(%rdx), %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0xa2,0xe0,0xef,0xff,0xff]
				vpopcntq -4128(%rdx), %zmm20

				// CHECK: vpopcntq 1016(%rdx){1to8}, %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x62,0x7f]
				vpopcntq 1016(%rdx){1to8}, %zmm20

				// CHECK: vpopcntq 1024(%rdx){1to8}, %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0xa2,0x00,0x04,0x00,0x00]
				vpopcntq 1024(%rdx){1to8}, %zmm20

				// CHECK: vpopcntq -1024(%rdx){1to8}, %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x62,0x80]
				vpopcntq -1024(%rdx){1to8}, %zmm20

				// CHECK: vpopcntq -1032(%rdx){1to8}, %zmm20
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0xa2,0xf8,0xfb,0xff,0xff]
				vpopcntq -1032(%rdx){1to8}, %zmm20

				// CHECK: vpopcntq %zmm21, %zmm17
				// CHECK: encoding: [0x62,0xa2,0xfd,0x48,0x55,0xcd]
				vpopcntq %zmm21, %zmm17

				// CHECK: vpopcntq %zmm21, %zmm17 {%k6}
				// CHECK: encoding: [0x62,0xa2,0xfd,0x4e,0x55,0xcd]
				vpopcntq %zmm21, %zmm17 {%k6}

				// CHECK: vpopcntq %zmm21, %zmm17 {%k6} {z}
				// CHECK: encoding: [0x62,0xa2,0xfd,0xce,0x55,0xcd]
				vpopcntq %zmm21, %zmm17 {%k6} {z}

				// CHECK: vpopcntq (%rcx), %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x09]
				vpopcntq (%rcx), %zmm17

				// CHECK: vpopcntq 4660(%rax,%r14,8), %zmm17
				// CHECK: encoding: [0x62,0xa2,0xfd,0x48,0x55,0x8c,0xf0,0x34,0x12,0x00,0x00]
				vpopcntq 4660(%rax,%r14,8), %zmm17

				// CHECK: vpopcntq (%rcx){1to8}, %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x09]
				vpopcntq (%rcx){1to8}, %zmm17

				// CHECK: vpopcntq 4064(%rdx), %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x8a,0xe0,0x0f,0x00,0x00]
				vpopcntq 4064(%rdx), %zmm17

				// CHECK: vpopcntq 4096(%rdx), %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x4a,0x40]
				vpopcntq 4096(%rdx), %zmm17

				// CHECK: vpopcntq -4096(%rdx), %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x4a,0xc0]
				vpopcntq -4096(%rdx), %zmm17

				// CHECK: vpopcntq -4128(%rdx), %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x48,0x55,0x8a,0xe0,0xef,0xff,0xff]
				vpopcntq -4128(%rdx), %zmm17

				// CHECK: vpopcntq 1016(%rdx){1to8}, %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x4a,0x7f]
				vpopcntq 1016(%rdx){1to8}, %zmm17

				// CHECK: vpopcntq 1024(%rdx){1to8}, %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x8a,0x00,0x04,0x00,0x00]
				vpopcntq 1024(%rdx){1to8}, %zmm17

				// CHECK: vpopcntq -1024(%rdx){1to8}, %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x4a,0x80]
				vpopcntq -1024(%rdx){1to8}, %zmm17

				// CHECK: vpopcntq -1032(%rdx){1to8}, %zmm17
				// CHECK: encoding: [0x62,0xe2,0xfd,0x58,0x55,0x8a,0xf8,0xfb,0xff,0xff]
				vpopcntq -1032(%rdx){1to8}, %zmm17

				// CHECK: vpopcntd %zmm19, %zmm25
				// CHECK: encoding: [0x62,0x22,0x7d,0x48,0x55,0xcb]
				vpopcntd %zmm19, %zmm25

				// CHECK: vpopcntd %zmm19, %zmm25 {%k4}
				// CHECK: encoding: [0x62,0x22,0x7d,0x4c,0x55,0xcb]
				vpopcntd %zmm19, %zmm25 {%k4}

				// CHECK: vpopcntd %zmm19, %zmm25 {%k4} {z}
				// CHECK: encoding: [0x62,0x22,0x7d,0xcc,0x55,0xcb]
				vpopcntd %zmm19, %zmm25 {%k4} {z}

				// CHECK: vpopcntd (%rcx), %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x09]
				vpopcntd (%rcx), %zmm25

				// CHECK: vpopcntd 291(%rax,%r14,8), %zmm25
				// CHECK: encoding: [0x62,0x22,0x7d,0x48,0x55,0x8c,0xf0,0x23,0x01,0x00,0x00]
				vpopcntd 291(%rax,%r14,8), %zmm25

				// CHECK: vpopcntd (%rcx){1to16}, %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x09]
				vpopcntd (%rcx){1to16}, %zmm25

				// CHECK: vpopcntd 4064(%rdx), %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x8a,0xe0,0x0f,0x00,0x00]
				vpopcntd 4064(%rdx), %zmm25

				// CHECK: vpopcntd 4096(%rdx), %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x4a,0x40]
				vpopcntd 4096(%rdx), %zmm25

				// CHECK: vpopcntd -4096(%rdx), %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x4a,0xc0]
				vpopcntd -4096(%rdx), %zmm25

				// CHECK: vpopcntd -4128(%rdx), %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x8a,0xe0,0xef,0xff,0xff]
				vpopcntd -4128(%rdx), %zmm25

				// CHECK: vpopcntd 508(%rdx){1to16}, %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x4a,0x7f]
				vpopcntd 508(%rdx){1to16}, %zmm25

				// CHECK: vpopcntd 512(%rdx){1to16}, %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x8a,0x00,0x02,0x00,0x00]
				vpopcntd 512(%rdx){1to16}, %zmm25

				// CHECK: vpopcntd -512(%rdx){1to16}, %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x4a,0x80]
				vpopcntd -512(%rdx){1to16}, %zmm25

				// CHECK: vpopcntd -516(%rdx){1to16}, %zmm25
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x8a,0xfc,0xfd,0xff,0xff]
				vpopcntd -516(%rdx){1to16}, %zmm25

				// CHECK: vpopcntd %zmm21, %zmm26
				// CHECK: encoding: [0x62,0x22,0x7d,0x48,0x55,0xd5]
				vpopcntd %zmm21, %zmm26

				// CHECK: vpopcntd %zmm21, %zmm26 {%k4}
				// CHECK: encoding: [0x62,0x22,0x7d,0x4c,0x55,0xd5]
				vpopcntd %zmm21, %zmm26 {%k4}

				// CHECK: vpopcntd %zmm21, %zmm26 {%k4} {z}
				// CHECK: encoding: [0x62,0x22,0x7d,0xcc,0x55,0xd5]
				vpopcntd %zmm21, %zmm26 {%k4} {z}

				// CHECK: vpopcntd (%rcx), %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x11]
				vpopcntd (%rcx), %zmm26

				// CHECK: vpopcntd 4660(%rax,%r14,8), %zmm26
				// CHECK: encoding: [0x62,0x22,0x7d,0x48,0x55,0x94,0xf0,0x34,0x12,0x00,0x00]
				vpopcntd 4660(%rax,%r14,8), %zmm26

				// CHECK: vpopcntd (%rcx){1to16}, %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x11]
				vpopcntd (%rcx){1to16}, %zmm26

				// CHECK: vpopcntd 4064(%rdx), %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x92,0xe0,0x0f,0x00,0x00]
				vpopcntd 4064(%rdx), %zmm26

				// CHECK: vpopcntd 4096(%rdx), %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x52,0x40]
				vpopcntd 4096(%rdx), %zmm26

				// CHECK: vpopcntd -4096(%rdx), %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x52,0xc0]
				vpopcntd -4096(%rdx), %zmm26

				// CHECK: vpopcntd -4128(%rdx), %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x48,0x55,0x92,0xe0,0xef,0xff,0xff]
				vpopcntd -4128(%rdx), %zmm26

				// CHECK: vpopcntd 508(%rdx){1to16}, %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x52,0x7f]
				vpopcntd 508(%rdx){1to16}, %zmm26

				// CHECK: vpopcntd 512(%rdx){1to16}, %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x92,0x00,0x02,0x00,0x00]
				vpopcntd 512(%rdx){1to16}, %zmm26

				// CHECK: vpopcntd -512(%rdx){1to16}, %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x52,0x80]
				vpopcntd -512(%rdx){1to16}, %zmm26

				// CHECK: vpopcntd -516(%rdx){1to16}, %zmm26
				// CHECK: encoding: [0x62,0x62,0x7d,0x58,0x55,0x92,0xfc,0xfd,0xff,0xff]
				vpopcntd -516(%rdx){1to16}, %zmm26

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Adding vpopcntd and vpopcntq instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 99054

lib/Support/Host.cpp

lib/Target/X86/X86.td

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86InstrAVX512.td

lib/Target/X86/X86InstrInfo.cpp

lib/Target/X86/X86InstrInfo.td

lib/Target/X86/X86Subtarget.h

lib/Target/X86/X86Subtarget.cpp

test/CodeGen/X86/avx512vpopcntdq-intrinsics.ll

test/CodeGen/X86/vector-tzcnt-128.ll

test/CodeGen/X86/vector-tzcnt-256.ll

test/CodeGen/X86/vector-tzcnt-512.ll

test/MC/X86/x86-64-avx512vpopcntdq.s

[X86] Adding vpopcntd and vpopcntq instructions
ClosedPublic