This is an archive of the discontinued LLVM Phabricator instance.

clang/lib/Basic/Targets/X86.cpp
1074	alphabetical order.
llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll
3	`X64,CHECK`
4	`X86,CHECK`
llvm/test/CodeGen/X86/stack-folding-int-avxvnniint16.ll
3	Is this required?
3	Don't need `avx2`
llvm/test/MC/Disassembler/X86/avx-vnni-int16.txt
1 ↗	(On Diff #539822)	Remove blank line.
llvm/test/MC/Disassembler/X86/x86-64-avx-vnni-int16.txt
1–2 ↗	(On Diff #539822)	Remove this.

Address comments, fix lit fails, align naming convention in IntrinsicsX86.td

llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll
3	This couldn't help merging the CHECKs here. Do we need it?

craig.topper added inline comments.Jul 13 2023, 8:25 PM

llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll
4	I thought the common prefix had to be first? But I might be wrong

craig.topper added inline comments.Jul 13 2023, 8:30 PM

llvm/lib/Target/X86/X86InstrSSE.td
8303	This needs to be indented 1 character more
8306	This needs to be indented 1 character more so that it looks nested under the `set`

pengfei added inline comments.Jul 13 2023, 8:49 PM

llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll
4	You are right 👍

Harbormaster completed remote builds in B245264: Diff 540244.Jul 13 2023, 11:51 PM

Address comments.

Harbormaster completed remote builds in B245716: Diff 540849.Jul 16 2023, 8:59 PM

Rename disassembler tests and remove -x86-asm-syntax=intel

Remove -check-prefix=CHECK

Harbormaster completed remote builds in B245754: Diff 540908.Jul 17 2023, 4:02 AM

Remove #include <stddef.h>

FreddyYe retitled this revision from Add AVX-VNNI-INT16 instructions. to [X86] Add AVX-VNNI-INT16 instructions..Jul 17 2023, 4:41 AM

RKSimon added inline comments.Jul 17 2023, 5:33 AM

clang/lib/Headers/avxvnniint16intrin.h
27	doxygen descriptions?
llvm/test/MC/Disassembler/X86/avx-vnni-int16-64.txt
4	try to use some x86_64-- specific registers to improve test coverage

Harbormaster completed remote builds in B245791: Diff 540953.Jul 17 2023, 6:55 AM

Address comments.

Harbormaster completed remote builds in B246193: Diff 541494.Jul 18 2023, 2:28 PM

Add missing in doxygen

Harbormaster completed remote builds in B246400: Diff 541796.Jul 19 2023, 1:58 AM

ping... Anyone help accept?

LGTM.

This revision is now accepted and ready to land.Jul 19 2023, 10:38 PM

skan accepted this revision.Jul 19 2023, 11:19 PM

rebase

This revision was landed with ongoing or failed builds.Jul 19 2023, 11:31 PM

Closed by commit rG1c154bd75515: [X86] Add AVX-VNNI-INT16 instructions. (authored by FreddyYe). · Explain Why

This revision was automatically updated to reflect the committed changes.

FreddyYe added a commit: rG1c154bd75515: [X86] Add AVX-VNNI-INT16 instructions..

Harbormaster completed remote builds in B246773: Diff 542317.Jul 20 2023, 1:04 AM

We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:

CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>

I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?

In D155145#4543326, @anna wrote:

We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:

CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>

I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?

As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?

In D155145#4544068, @pengfei wrote:
In D155145#4543326, @anna wrote:
We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:
CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>
I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?
As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?

I've investigated what is going on. With this patch, we are now passing in +avxvnniint16 into machine attributes. With that attribute, we now generate an instruction which is illegal on sandybridge machine:

 0x3013f2af:	jmpq   0x3013f09b
   0x3013f2b4:	mov    %rax,%rdi
   0x3013f2b7:	and    $0xfffffffffffffff0,%rdi
=> 0x3013f2bb:	vpbroadcastd %xmm0,%ymm2
   0x3013f2c0:	vpbroadcastd %xmm1,%ymm3

The instruction vpbroadcastd %xmm0,%ymm2 requires AVX2 CPU flag: https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only AVX flag.

This is the complete mattr generated:

!3 = !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}

I've confirmed if we changed to -avxvnniint16 we do not generate vpbroadcastd.

W.r.t. how we get the machine attributes generated through our front-end:

if (!sys::getHostCPUFeatures(Features))
      return std::move(mattr);
  
    // Fill mattr with default values.
    mattr.reserve(Features.getNumItems());
    for (auto &I : Features) {
      std::string attr(I.first());
      mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
    }

So, the problem is in getHostCPUFeatures, possibly this line from the patch :
Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;.

In D155145#4551526, @anna wrote:
In D155145#4544068, @pengfei wrote:
In D155145#4543326, @anna wrote:
We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:
CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>
I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?
As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?
I've investigated what is going on. With this patch, we are now passing in +avxvnniint16 into machine attributes. With that attribute, we now generate an instruction which is illegal on sandybridge machine:
 0x3013f2af:	jmpq   0x3013f09b
   0x3013f2b4:	mov    %rax,%rdi
   0x3013f2b7:	and    $0xfffffffffffffff0,%rdi
=> 0x3013f2bb:	vpbroadcastd %xmm0,%ymm2
   0x3013f2c0:	vpbroadcastd %xmm1,%ymm3
The instruction vpbroadcastd %xmm0,%ymm2 requires AVX2 CPU flag: https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only AVX flag.

This is the complete mattr generated:
!3 = !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
I've confirmed if we changed to -avxvnniint16 we do not generate vpbroadcastd.

W.r.t. how we get the machine attributes generated through our front-end:
if (!sys::getHostCPUFeatures(Features))
      return std::move(mattr);
  
    // Fill mattr with default values.
    mattr.reserve(Features.getNumItems());
    for (auto &I : Features) {
      std::string attr(I.first());
      mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
    }
So, the problem is in getHostCPUFeatures, possibly this line from the patch :
Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;.

Does this patch help

diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 1141df09307c..11a6879fb76a 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
   Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
   Features["amx-int8"]   = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave;
   bool HasLeaf7Subleaf1 =
-      MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+      HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
   Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
   Features["sm3"]        = HasLeaf7Subleaf1 && ((EAX >> 1) & 1);
   Features["sm4"]        = HasLeaf7Subleaf1 && ((EAX >> 2) & 1);

In D155145#4551621, @craig.topper wrote:
In D155145#4551526, @anna wrote:
In D155145#4544068, @pengfei wrote:
In D155145#4543326, @anna wrote:
We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:
CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>
I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?
As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?
I've investigated what is going on. With this patch, we are now passing in +avxvnniint16 into machine attributes. With that attribute, we now generate an instruction which is illegal on sandybridge machine:
 0x3013f2af:	jmpq   0x3013f09b
   0x3013f2b4:	mov    %rax,%rdi
   0x3013f2b7:	and    $0xfffffffffffffff0,%rdi
=> 0x3013f2bb:	vpbroadcastd %xmm0,%ymm2
   0x3013f2c0:	vpbroadcastd %xmm1,%ymm3
The instruction vpbroadcastd %xmm0,%ymm2 requires AVX2 CPU flag: https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only AVX flag.

This is the complete mattr generated:
!3 = !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
I've confirmed if we changed to -avxvnniint16 we do not generate vpbroadcastd.

W.r.t. how we get the machine attributes generated through our front-end:
if (!sys::getHostCPUFeatures(Features))
      return std::move(mattr);
  
    // Fill mattr with default values.
    mattr.reserve(Features.getNumItems());
    for (auto &I : Features) {
      std::string attr(I.first());
      mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
    }
So, the problem is in getHostCPUFeatures, possibly this line from the patch :
Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;.
Does this patch help
diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 1141df09307c..11a6879fb76a 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
   Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
   Features["amx-int8"]   = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave;
   bool HasLeaf7Subleaf1 =
-      MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+      HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
   Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
   Features["sm3"]        = HasLeaf7Subleaf1 && ((EAX >> 1) & 1);
   Features["sm4"]        = HasLeaf7Subleaf1 && ((EAX >> 2) & 1);

Yes, @craig.topper that works! Thanks! Could you pls land the patch, if possible?

In D155145#4553922, @anna wrote:
In D155145#4551621, @craig.topper wrote:
In D155145#4551526, @anna wrote:
In D155145#4544068, @pengfei wrote:
In D155145#4543326, @anna wrote:
We see a crash bisected to this patch about using an illegal instruction.
Here's the CPUInfo for the machine:
CPU info:
current cpu id: 22
total 32(physical cores 16) (assigned logical cores 32) (assigned physical cores 16) (assigned_sockets:2 of 2) (8 cores per cpu, 2 threads per core) family 6 model 45 stepping 7 microcode 0x71a, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, aes, clmul, ht, tsc, tscinvbit, tscinv, clflush
AvgLoads: 0.30, 0.10, 0.18
CPU Model and flags from /proc/cpuinfo:
model name      : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
Online cpus: 0-31
Offline cpus:
BIOS frequency limitation: <Not Available>
Frequency switch latency (ns): 20000
Available cpu frequencies: <Not Available>
Current governor: schedutil
Core performance/turbo boost: <Not Available>
I don't see avxvnniint16 in the flags list nor avx2. So, this (relatively new) instruction shouldn't be generated for this machine. Any ideas on why this might be happening?
As far as I can see from the patch, the only way to generate avxvnniint16 instructions is to call its specific intrinsics explicitly. And we will check compiling options in FE before allowing to call the intrinsics. We do have an optimization to generate vnni instructions without intrinsics, but we haven't extend it to avxvnniint16 so far.
So I don't know what's wrong in your case, could you provide a reproducer for your problem?
I've investigated what is going on. With this patch, we are now passing in +avxvnniint16 into machine attributes. With that attribute, we now generate an instruction which is illegal on sandybridge machine:
 0x3013f2af:	jmpq   0x3013f09b
   0x3013f2b4:	mov    %rax,%rdi
   0x3013f2b7:	and    $0xfffffffffffffff0,%rdi
=> 0x3013f2bb:	vpbroadcastd %xmm0,%ymm2
   0x3013f2c0:	vpbroadcastd %xmm1,%ymm3
The instruction vpbroadcastd %xmm0,%ymm2 requires AVX2 CPU flag: https://www.felixcloutier.com/x86/vpbroadcast. However, the machine has only AVX flag.

This is the complete mattr generated:
!3 = !{!"-mattr=-prfchw,-cldemote,+avx,+aes,+sahf,+pclmul,-xop,+crc32,-xsaves,-avx512fp16,-sm4,+sse4.1,-avx512ifma,+xsave,-avx512pf,+sse4.2,-tsxldtrk,-ptwrite,-widekl,-sm3,-invpcid,+64bit,-xsavec,-avx512vpopcntdq,+cmov,-avx512vp2intersect,-avx512cd,-movbe,-avxvnniint8,-avx512er,-amx-int8,-kl,-sha512,-avxvnni,-rtm,-adx,-avx2,-hreset,-movdiri,-serialize,-vpclmulqdq,-avx512vl,-uintr,-clflushopt,-raoint,-cmpccxadd,-bmi,-amx-tile,+sse,-gfni,+avxvnniint16,-amx-fp16,+xsaveopt,-rdrnd,-avx512f,-amx-bf16,-avx512bf16,-avx512vnni,+cx8,-avx512bw,+sse3,-pku,-fsgsbase,-clzero,-mwaitx,-lwp,-lzcnt,-sha,-movdir64b,-wbnoinvd,-enqcmd,-prefetchwt1,-avxneconvert,-tbm,-pconfig,-amx-complex,+ssse3,+cx16,-bmi2,-fma,+popcnt,-avxifma,-f16c,-avx512bitalg,-rdpru,-clwb,+mmx,+sse2,-rdseed,-avx512vbmi2,-prefetchi,-rdpid,-fma4,-avx512vbmi,-shstk,-vaes,-waitpkg,-sgx,+fxsr,-avx512dq,-sse4a"}
I've confirmed if we changed to -avxvnniint16 we do not generate vpbroadcastd.

W.r.t. how we get the machine attributes generated through our front-end:
if (!sys::getHostCPUFeatures(Features))
      return std::move(mattr);
  
    // Fill mattr with default values.
    mattr.reserve(Features.getNumItems());
    for (auto &I : Features) {
      std::string attr(I.first());
      mattr.emplace_back(std::string(I.second ? "+" : "-") + attr);
    }
So, the problem is in getHostCPUFeatures, possibly this line from the patch :
Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;.
Does this patch help
diff --git a/llvm/lib/TargetParser/Host.cpp b/llvm/lib/TargetParser/Host.cpp
index 1141df09307c..11a6879fb76a 100644
--- a/llvm/lib/TargetParser/Host.cpp
+++ b/llvm/lib/TargetParser/Host.cpp
@@ -1769,7 +1769,7 @@ bool sys::getHostCPUFeatures(StringMap<bool> &Features) {
   Features["amx-tile"]   = HasLeaf7 && ((EDX >> 24) & 1) && HasAMXSave;
   Features["amx-int8"]   = HasLeaf7 && ((EDX >> 25) & 1) && HasAMXSave;
   bool HasLeaf7Subleaf1 =
-      MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+      HasLeaf7 && EAX >= 1 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
   Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
   Features["sm3"]        = HasLeaf7Subleaf1 && ((EAX >> 1) & 1);
   Features["sm4"]        = HasLeaf7Subleaf1 && ((EAX >> 2) & 1);
Yes, @craig.topper that works! Thanks! Could you pls land the patch, if possible?

Can you try to help us understand what is happening?

Sandy Bridge doesn't use leaf 7 at all, but has leaves after it. I thought it should always return 0 for all EAX, EBX, ECX, EDX. The EAX value for Leaf 7 contains how many subleaves of leaf 7 exist.

The documentation says that invalid subleaves of leaf 7 should return all 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax from subleaf 0 doesn't say subleaf 1 is supported.

Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:

 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this

Takes a while to get a build on this machine, should have the output soon.

Thanks @anna and @craig.topper
I think we can dump the value with the simple code

$ cat cpuid.c
#include <stdio.h>
#include <cpuid.h>

int main() {
  unsigned int info[4];
  for (int i = 0; i < 2; ++i) {
    __get_cpuid_count(7, i, info, info + 1, info + 2, info + 3);
    printf("%08x\n", info[0]);
    printf("%08x\n", info[1]);
    printf("%08x\n", info[2]);
    printf("%08x\n", info[3]);
  }
}

$ clang cpuid.c && ./a.out

In D155145#4554786, @anna wrote:
Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:
 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this
Takes a while to get a build on this machine, should have the output soon.

@craig.topper here is the output:

Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7 calculation
Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7Subleaf1 calculation

So, with your patch HasLeaf7Subleaf1 is 0 as EAX is 0. Pls let me know if you need any additional diagnostics output (we actually lose access to the machine on friday, since it is being retired!).

The documentation says that invalid subleaves of leaf 7 should return all 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax from subleaf 0 doesn't say subleaf 1 is supported.

This means the CPUID doesn't satisfy the documentation since EDX != 0 for SubLeaf1?

In D155145#4556157, @anna wrote:
In D155145#4554786, @anna wrote:
Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:
 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this
Takes a while to get a build on this machine, should have the output soon.
@craig.topper here is the output:
Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7 calculation
Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7Subleaf1 calculation
So, with your patch HasLeaf7Subleaf1 is 0 as EAX is 0. Pls let me know if you need any additional diagnostics output (we actually lose access to the machine on friday, since it is being retired!).

The documentation says that invalid subleaves of leaf 7 should return all 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax from subleaf 0 doesn't say subleaf 1 is supported.

This means the CPUID doesn't satisfy the documentation since EDX != 0 for SubLeaf1?

The identical EDX value looks dubious to me. Could you compile and run above code and paste the result here? Thanks!

In D155145#4556157, @anna wrote:
In D155145#4554786, @anna wrote:
Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:
 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this
Takes a while to get a build on this machine, should have the output soon.
@craig.topper here is the output:
Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7 calculation
Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7Subleaf1 calculation
So, with your patch HasLeaf7Subleaf1 is 0 as EAX is 0. Pls let me know if you need any additional diagnostics output (we actually lose access to the machine on friday, since it is being retired!).

The documentation says that invalid subleaves of leaf 7 should return all 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax from subleaf 0 doesn't say subleaf 1 is supported.

This means the CPUID doesn't satisfy the documentation since EDX != 0 for SubLeaf1?

Interestingly all of the bits set in EDX are features that were things that were added in microcode patches in the wake of vulnerabilities like Spectre and Meltdown. Maybe the microcode patch forgot to check the subleaf since there was no subleaf implemented when sandy bridge was originally made.

I think my patch is the correct fix given that information. I'll post a patch for review shortly.

In D155145#4556178, @craig.topper wrote:
In D155145#4556157, @anna wrote:
In D155145#4554786, @anna wrote:
Can you capture the values of EAX, EBX, ECX, and EDX after the two calls to getX86CpuIDAndInfoEx that have 0x7 as the first argument? Maybe there's a bug in CPUID on Sandy Bridge.

Sure, on the original code before the patch you suggested right?
The two calls are:
 bool HasLeaf7 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting fsgsbase the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
  
    Features["fsgsbase"]   = HasLeaf7 && ((EBX >>  0) & 1);
....
bool HasLeaf7Subleaf1 =
        MaxLevel >= 7 && !getX86CpuIDAndInfoEx(0x7, 0x1, &EAX, &EBX, &ECX, &EDX);
+   llvm::errs() << "Before setting sha512 the value for EAX: " << EAX
+                     << " EBX: " << EBX << " ECX: " << ECX << "  EDX: " << EDX
+                     << "\n";
    Features["sha512"]     = HasLeaf7Subleaf1 && ((EAX >> 0) & 1);
...
we set avxvnniint16 after this
Takes a while to get a build on this machine, should have the output soon.
@craig.topper here is the output:
Before setting fsgsbase the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7 calculation
Before setting sha512 the value for EAX: 0 EBX: 0 ECX: 0  EDX: 2617246720 // this is after the HasLeaf7Subleaf1 calculation
So, with your patch HasLeaf7Subleaf1 is 0 as EAX is 0. Pls let me know if you need any additional diagnostics output (we actually lose access to the machine on friday, since it is being retired!).

The documentation says that invalid subleaves of leaf 7 should return all 0s. So we thought it was safe to check the bits of sub leaf 1 even if eax from subleaf 0 doesn't say subleaf 1 is supported.

This means the CPUID doesn't satisfy the documentation since EDX != 0 for SubLeaf1?
Interestingly all of the bits set in EDX are features that were things that were added in microcode patches in the wake of vulnerabilities like Spectre and Meltdown. Maybe the microcode patch forgot to check the subleaf since there was no subleaf implemented when sandy bridge was originally made.

I think my patch is the correct fix given that information. I'll post a patch for review shortly.

Thanks Craig! That makes sense to me.

thank you @craig.topper and @pengfei .

anna mentioned this in D156963: [X86] Workaround possible CPUID bug in Sandy Bridge..Aug 3 2023, 7:12 AM

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

include/

clang/

Basic/

BuiltinsX86.def

14 lines

Driver/

Options.td

2 lines

lib/

Basic/

Targets/

X86.h

1 line

X86.cpp

6 lines

Headers/

CMakeLists.txt

1 line

avxvnniint16intrin.h

473 lines

immintrin.h

5 lines

test/

CodeGen/

X86/

avxvnniint16-builtins.c

76 lines

attr-target-x86.c

4 lines

Driver/

x86-target-features.c

5 lines

Preprocessor/

x86_target_features.c

14 lines

llvm/

docs/

ReleaseNotes.rst

1 line

include/

llvm/

IR/

IntrinsicsX86.td

61 lines

TargetParser/

X86TargetParser.def

1 line

lib/

Target/

X86/

4 lines

4 lines

1 line

44 lines

TargetParser/

Host.cpp

1 line

X86TargetParser.cpp

1 line

test/

CodeGen/

X86/

avxvnniint16-intrinsics.ll

123 lines

stack-folding-int-avxvnniint16.ll

271 lines

MC/

Disassembler/

X86/

avx-vnni-int16-32.txt

339 lines

avx-vnni-int16-64.txt

339 lines

X86/

avx-vnni-int16-32-att.s

338 lines

avx-vnni-int16-32-intel.s

338 lines

avx-vnni-int16-64-att.s

338 lines

avx-vnni-int16-64-intel.s

338 lines

TableGen/

x86-fold-tables.inc

12 lines

Diff 542318

clang/docs/ReleaseNotes.rst

Show First 20 Lines • Show All 818 Lines • ▼ Show 20 Lines	- Support ISA of ``SHA512``.
* Support intrinsic of ``_mm256_sha512rnds2_epi64``.		* Support intrinsic of ``_mm256_sha512rnds2_epi64``.
- Support ISA of ``SM3``.		- Support ISA of ``SM3``.
* Support intrinsic of ``_mm_sm3msg1_epi32``.		* Support intrinsic of ``_mm_sm3msg1_epi32``.
* Support intrinsic of ``_mm_sm3msg2_epi32``.		* Support intrinsic of ``_mm_sm3msg2_epi32``.
* Support intrinsic of ``_mm_sm3rnds2_epi32``.		* Support intrinsic of ``_mm_sm3rnds2_epi32``.
- Support ISA of ``SM4``.		- Support ISA of ``SM4``.
* Support intrinsic of ``_mm(256)_sm4key4_epi32``.		* Support intrinsic of ``_mm(256)_sm4key4_epi32``.
* Support intrinsic of ``_mm(256)_sm4rnds4_epi32``.		* Support intrinsic of ``_mm(256)_sm4rnds4_epi32``.
		- Support ISA of ``AVX-VNNI-INT16``.
		* Support intrinsic of ``_mm(256)_dpwsud(s)_epi32``.
		* Support intrinsic of ``_mm(256)_dpwusd(s)_epi32``.
		* Support intrinsic of ``_mm(256)_dpwuud(s)_epi32``.

Arm and AArch64 Support		Arm and AArch64 Support
^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^

- The hard-float ABI is now available in Armv8.1-M configurations that		- The hard-float ABI is now available in Armv8.1-M configurations that
have integer MVE instructions (and therefore have FP registers) but		have integer MVE instructions (and therefore have FP registers) but
no scalar or vector floating point computation. Previously, trying		no scalar or vector floating point computation. Previously, trying
to select the hard-float ABI on such a target (via		to select the hard-float ABI on such a target (via
▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

clang/include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 2,110 Lines • ▼ Show 20 Lines
	TARGET_HEADER_BUILTIN(__readfsdword, "UNiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readfsdword, "UNiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(__readfsqword, "ULLiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readfsqword, "ULLiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")

	TARGET_HEADER_BUILTIN(__readgsbyte, "UcUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readgsbyte, "UcUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(__readgsword, "UsUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readgsword, "UsUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(__readgsdword, "UNiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readgsdword, "UNiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(__readgsqword, "ULLiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__readgsqword, "ULLiUNi", "nh", INTRIN_H, ALL_MS_LANGUAGES, "")

				// AVX-VNNI-INT16
				TARGET_BUILTIN(__builtin_ia32_vpdpwsud128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwsud256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwsuds128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwsuds256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwusd128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwusd256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwusds128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwusds256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwuud128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwuud256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwuuds128, "V4iV4iV4iV4i", "nV:128:", "avxvnniint16")
				TARGET_BUILTIN(__builtin_ia32_vpdpwuuds256, "V8iV8iV8iV8i", "nV:256:", "avxvnniint16")

	// AVX-NE-CONVERT			// AVX-NE-CONVERT
	TARGET_BUILTIN(__builtin_ia32_vbcstnebf162ps128, "V4fyC*", "nV:128:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vbcstnebf162ps128, "V4fyC*", "nV:128:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vbcstnebf162ps256, "V8fyC*", "nV:256:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vbcstnebf162ps256, "V8fyC*", "nV:256:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vbcstnesh2ps128, "V4fxC*", "nV:128:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vbcstnesh2ps128, "V4fxC*", "nV:128:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vbcstnesh2ps256, "V8fxC*", "nV:256:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vbcstnesh2ps256, "V8fxC*", "nV:256:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vcvtneebf162ps128, "V4fV8yC*", "nV:128:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vcvtneebf162ps128, "V4fV8yC*", "nV:128:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vcvtneebf162ps256, "V8fV16yC*", "nV:256:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vcvtneebf162ps256, "V8fV16yC*", "nV:256:", "avxneconvert")
	TARGET_BUILTIN(__builtin_ia32_vcvtneeph2ps128, "V4fV8xC*", "nV:128:", "avxneconvert")			TARGET_BUILTIN(__builtin_ia32_vcvtneeph2ps128, "V4fV8xC*", "nV:128:", "avxneconvert")
	Show All 36 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,954 Lines • ▼ Show 20 Lines
	def mavx512vpopcntdq : Flag<["-"], "mavx512vpopcntdq">, Group<m_x86_Features_Group>;			def mavx512vpopcntdq : Flag<["-"], "mavx512vpopcntdq">, Group<m_x86_Features_Group>;
	def mno_avx512vpopcntdq : Flag<["-"], "mno-avx512vpopcntdq">, Group<m_x86_Features_Group>;			def mno_avx512vpopcntdq : Flag<["-"], "mno-avx512vpopcntdq">, Group<m_x86_Features_Group>;
	def mavx512vp2intersect : Flag<["-"], "mavx512vp2intersect">, Group<m_x86_Features_Group>;			def mavx512vp2intersect : Flag<["-"], "mavx512vp2intersect">, Group<m_x86_Features_Group>;
	def mno_avx512vp2intersect : Flag<["-"], "mno-avx512vp2intersect">, Group<m_x86_Features_Group>;			def mno_avx512vp2intersect : Flag<["-"], "mno-avx512vp2intersect">, Group<m_x86_Features_Group>;
	def mavxifma : Flag<["-"], "mavxifma">, Group<m_x86_Features_Group>;			def mavxifma : Flag<["-"], "mavxifma">, Group<m_x86_Features_Group>;
	def mno_avxifma : Flag<["-"], "mno-avxifma">, Group<m_x86_Features_Group>;			def mno_avxifma : Flag<["-"], "mno-avxifma">, Group<m_x86_Features_Group>;
	def mavxneconvert : Flag<["-"], "mavxneconvert">, Group<m_x86_Features_Group>;			def mavxneconvert : Flag<["-"], "mavxneconvert">, Group<m_x86_Features_Group>;
	def mno_avxneconvert : Flag<["-"], "mno-avxneconvert">, Group<m_x86_Features_Group>;			def mno_avxneconvert : Flag<["-"], "mno-avxneconvert">, Group<m_x86_Features_Group>;
				def mavxvnniint16 : Flag<["-"], "mavxvnniint16">, Group<m_x86_Features_Group>;
				def mno_avxvnniint16 : Flag<["-"], "mno-avxvnniint16">, Group<m_x86_Features_Group>;
	def mavxvnniint8 : Flag<["-"], "mavxvnniint8">, Group<m_x86_Features_Group>;			def mavxvnniint8 : Flag<["-"], "mavxvnniint8">, Group<m_x86_Features_Group>;
	def mno_avxvnniint8 : Flag<["-"], "mno-avxvnniint8">, Group<m_x86_Features_Group>;			def mno_avxvnniint8 : Flag<["-"], "mno-avxvnniint8">, Group<m_x86_Features_Group>;
	def mavxvnni : Flag<["-"], "mavxvnni">, Group<m_x86_Features_Group>;			def mavxvnni : Flag<["-"], "mavxvnni">, Group<m_x86_Features_Group>;
	def mno_avxvnni : Flag<["-"], "mno-avxvnni">, Group<m_x86_Features_Group>;			def mno_avxvnni : Flag<["-"], "mno-avxvnni">, Group<m_x86_Features_Group>;
	def madx : Flag<["-"], "madx">, Group<m_x86_Features_Group>;			def madx : Flag<["-"], "madx">, Group<m_x86_Features_Group>;
	def mno_adx : Flag<["-"], "mno-adx">, Group<m_x86_Features_Group>;			def mno_adx : Flag<["-"], "mno-adx">, Group<m_x86_Features_Group>;
	def maes : Flag<["-"], "maes">, Group<m_x86_Features_Group>;			def maes : Flag<["-"], "maes">, Group<m_x86_Features_Group>;
	def mno_aes : Flag<["-"], "mno-aes">, Group<m_x86_Features_Group>;			def mno_aes : Flag<["-"], "mno-aes">, Group<m_x86_Features_Group>;
	▲ Show 20 Lines • Show All 2,481 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	class LLVM_LIBRARY_VISIBILITY X86TargetInfo : public TargetInfo {
bool HasLAHFSAHF = false;		bool HasLAHFSAHF = false;
bool HasWBNOINVD = false;		bool HasWBNOINVD = false;
bool HasWAITPKG = false;		bool HasWAITPKG = false;
bool HasMOVDIRI = false;		bool HasMOVDIRI = false;
bool HasMOVDIR64B = false;		bool HasMOVDIR64B = false;
bool HasPTWRITE = false;		bool HasPTWRITE = false;
bool HasINVPCID = false;		bool HasINVPCID = false;
bool HasENQCMD = false;		bool HasENQCMD = false;
		bool HasAVXVNNIINT16 = false;
bool HasAMXFP16 = false;		bool HasAMXFP16 = false;
bool HasCMPCCXADD = false;		bool HasCMPCCXADD = false;
bool HasRAOINT = false;		bool HasRAOINT = false;
bool HasAVXVNNIINT8 = false;		bool HasAVXVNNIINT8 = false;
bool HasAVXNECONVERT = false;		bool HasAVXNECONVERT = false;
bool HasKL = false; // For key locker		bool HasKL = false; // For key locker
bool HasWIDEKL = false; // For wide key locker		bool HasWIDEKL = false; // For wide key locker
bool HasHRESET = false;		bool HasHRESET = false;
▲ Show 20 Lines • Show All 839 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.cpp

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	for (const auto &Feature : Features) {
} else if (Feature == "+raoint") {		} else if (Feature == "+raoint") {
HasRAOINT = true;		HasRAOINT = true;
} else if (Feature == "+avxifma") {		} else if (Feature == "+avxifma") {
HasAVXIFMA = true;		HasAVXIFMA = true;
} else if (Feature == "+avxneconvert") {		} else if (Feature == "+avxneconvert") {
HasAVXNECONVERT= true;		HasAVXNECONVERT= true;
} else if (Feature == "+avxvnni") {		} else if (Feature == "+avxvnni") {
HasAVXVNNI = true;		HasAVXVNNI = true;
		} else if (Feature == "+avxvnniint16") {
		HasAVXVNNIINT16 = true;
} else if (Feature == "+avxvnniint8") {		} else if (Feature == "+avxvnniint8") {
HasAVXVNNIINT8 = true;		HasAVXVNNIINT8 = true;
} else if (Feature == "+serialize") {		} else if (Feature == "+serialize") {
HasSERIALIZE = true;		HasSERIALIZE = true;
} else if (Feature == "+tsxldtrk") {		} else if (Feature == "+tsxldtrk") {
HasTSXLDTRK = true;		HasTSXLDTRK = true;
} else if (Feature == "+uintr") {		} else if (Feature == "+uintr") {
HasUINTR = true;		HasUINTR = true;
▲ Show 20 Lines • Show All 467 Lines • ▼ Show 20 Lines	void X86TargetInfo::getTargetDefines(const LangOptions &Opts,
if (HasRAOINT)		if (HasRAOINT)
Builder.defineMacro("__RAOINT__");		Builder.defineMacro("__RAOINT__");
if (HasAVXIFMA)		if (HasAVXIFMA)
Builder.defineMacro("__AVXIFMA__");		Builder.defineMacro("__AVXIFMA__");
if (HasAVXNECONVERT)		if (HasAVXNECONVERT)
Builder.defineMacro("__AVXNECONVERT__");		Builder.defineMacro("__AVXNECONVERT__");
if (HasAVXVNNI)		if (HasAVXVNNI)
Builder.defineMacro("__AVXVNNI__");		Builder.defineMacro("__AVXVNNI__");
		if (HasAVXVNNIINT16)
		Builder.defineMacro("__AVXVNNIINT16__");
if (HasAVXVNNIINT8)		if (HasAVXVNNIINT8)
Builder.defineMacro("__AVXVNNIINT8__");		Builder.defineMacro("__AVXVNNIINT8__");
if (HasSERIALIZE)		if (HasSERIALIZE)
Builder.defineMacro("__SERIALIZE__");		Builder.defineMacro("__SERIALIZE__");
if (HasTSXLDTRK)		if (HasTSXLDTRK)
Builder.defineMacro("__TSXLDTRK__");		Builder.defineMacro("__TSXLDTRK__");
if (HasUINTR)		if (HasUINTR)
Builder.defineMacro("__UINTR__");		Builder.defineMacro("__UINTR__");
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	return llvm::StringSwitch<bool>(Name)
.Case("avx512vl", true)		.Case("avx512vl", true)
.Case("avx512vbmi", true)		.Case("avx512vbmi", true)
.Case("avx512vbmi2", true)		.Case("avx512vbmi2", true)
.Case("avx512ifma", true)		.Case("avx512ifma", true)
.Case("avx512vp2intersect", true)		.Case("avx512vp2intersect", true)
.Case("avxifma", true)		.Case("avxifma", true)
.Case("avxneconvert", true)		.Case("avxneconvert", true)
.Case("avxvnni", true)		.Case("avxvnni", true)
		.Case("avxvnniint16", true)
.Case("avxvnniint8", true)		.Case("avxvnniint8", true)
.Case("bmi", true)		.Case("bmi", true)
.Case("bmi2", true)		.Case("bmi2", true)
.Case("cldemote", true)		.Case("cldemote", true)
.Case("clflushopt", true)		.Case("clflushopt", true)
.Case("clwb", true)		.Case("clwb", true)
.Case("clzero", true)		.Case("clzero", true)
.Case("cmpccxadd", true)		.Case("cmpccxadd", true)
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	return llvm::StringSwitch<bool>(Feature)
.Case("avx512dq", HasAVX512DQ)		.Case("avx512dq", HasAVX512DQ)
.Case("avx512bitalg", HasAVX512BITALG)		.Case("avx512bitalg", HasAVX512BITALG)
.Case("avx512bw", HasAVX512BW)		.Case("avx512bw", HasAVX512BW)
.Case("avx512vl", HasAVX512VL)		.Case("avx512vl", HasAVX512VL)
.Case("avx512vbmi", HasAVX512VBMI)		.Case("avx512vbmi", HasAVX512VBMI)
.Case("avx512vbmi2", HasAVX512VBMI2)		.Case("avx512vbmi2", HasAVX512VBMI2)
.Case("avx512ifma", HasAVX512IFMA)		.Case("avx512ifma", HasAVX512IFMA)
.Case("avx512vp2intersect", HasAVX512VP2INTERSECT)		.Case("avx512vp2intersect", HasAVX512VP2INTERSECT)
.Case("avxifma", HasAVXIFMA)		.Case("avxifma", HasAVXIFMA)
		pengfeiUnsubmitted Done Reply Inline Actions alphabetical order. pengfei: alphabetical order.
.Case("avxneconvert", HasAVXNECONVERT)		.Case("avxneconvert", HasAVXNECONVERT)
.Case("avxvnni", HasAVXVNNI)		.Case("avxvnni", HasAVXVNNI)
		.Case("avxvnniint16", HasAVXVNNIINT16)
.Case("avxvnniint8", HasAVXVNNIINT8)		.Case("avxvnniint8", HasAVXVNNIINT8)
.Case("bmi", HasBMI)		.Case("bmi", HasBMI)
.Case("bmi2", HasBMI2)		.Case("bmi2", HasBMI2)
.Case("cldemote", HasCLDEMOTE)		.Case("cldemote", HasCLDEMOTE)
.Case("clflushopt", HasCLFLUSHOPT)		.Case("clflushopt", HasCLFLUSHOPT)
.Case("clwb", HasCLWB)		.Case("clwb", HasCLWB)
.Case("clzero", HasCLZERO)		.Case("clzero", HasCLZERO)
.Case("cmpccxadd", HasCMPCCXADD)		.Case("cmpccxadd", HasCMPCCXADD)
▲ Show 20 Lines • Show All 548 Lines • Show Last 20 Lines

clang/lib/Headers/CMakeLists.txt

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	# Intrinsics
avx512vlvp2intersectintrin.h		avx512vlvp2intersectintrin.h
avx512vnniintrin.h		avx512vnniintrin.h
avx512vp2intersectintrin.h		avx512vp2intersectintrin.h
avx512vpopcntdqintrin.h		avx512vpopcntdqintrin.h
avx512vpopcntdqvlintrin.h		avx512vpopcntdqvlintrin.h
avxifmaintrin.h		avxifmaintrin.h
avxintrin.h		avxintrin.h
avxneconvertintrin.h		avxneconvertintrin.h
		avxvnniint16intrin.h
avxvnniint8intrin.h		avxvnniint8intrin.h
avxvnniintrin.h		avxvnniintrin.h
bmi2intrin.h		bmi2intrin.h
bmiintrin.h		bmiintrin.h
cetintrin.h		cetintrin.h
cldemoteintrin.h		cldemoteintrin.h
clflushoptintrin.h		clflushoptintrin.h
clwbintrin.h		clwbintrin.h
▲ Show 20 Lines • Show All 575 Lines • Show Last 20 Lines

clang/lib/Headers/avxvnniint16intrin.h

This file was added.

				/*===----------- avxvnniint16intrin.h - AVXVNNIINT16 intrinsics-------------===
				*
				* Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				* See https://llvm.org/LICENSE.txt for license information.
				* SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				*
				*===-----------------------------------------------------------------------===
				*/

				#ifndef __IMMINTRIN_H
				#error \
				"Never use <avxvnniint16intrin.h> directly; include <immintrin.h> instead."
				#endif // __IMMINTRIN_H

				#ifndef __AVXVNNIINT16INTRIN_H
				#define __AVXVNNIINT16INTRIN_H

				/* Define the default attributes for the functions in this file. */
				#define __DEFAULT_FN_ATTRS128 \
				__attribute__((__always_inline__, __nodebug__, __target__("avxvnniint16"), \
				__min_vector_width__(128)))
				#define __DEFAULT_FN_ATTRS256 \
				__attribute__((__always_inline__, __nodebug__, __target__("avxvnniint16"), \
				__min_vector_width__(256)))

				/// Multiply groups of 2 adjacent pairs of signed 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				RKSimonUnsubmitted Done Reply Inline Actions doxygen descriptions? RKSimon: doxygen descriptions?
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpwsud_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUD instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x int].
				/// \param __A
				/// A 128-bit vector of [8 x short].
				/// \param __B
				/// A 128-bit vector of [8 x unsigned short].
				/// \returns
				/// A 128-bit vector of [4 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := SignExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := SignExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwsud_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwsud128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of signed 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwsud_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUD instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x int].
				/// \param __A
				/// A 256-bit vector of [16 x short].
				/// \param __B
				/// A 256-bit vector of [16 x unsigned short].
				/// \returns
				/// A 256-bit vector of [8 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := SignExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := SignExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwsud_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwsud256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of signed 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x int].
				/// \param __A
				/// A 128-bit vector of [8 x short].
				/// \param __B
				/// A 128-bit vector of [8 x unsigned short].
				/// \returns
				/// A 128-bit vector of [4 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := SignExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := SignExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwsuds_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwsuds128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of signed 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x int].
				/// \param __A
				/// A 256-bit vector of [16 x short].
				/// \param __B
				/// A 256-bit vector of [16 x unsigned short].
				/// \returns
				/// A 256-bit vector of [8 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := SignExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := SignExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwsuds256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding signed 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpbusd_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWUSD instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x int].
				/// \param __A
				/// A 128-bit vector of [8 x unsigned short].
				/// \param __B
				/// A 128-bit vector of [8 x short].
				/// \returns
				/// A 128-bit vector of [4 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) SignExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) SignExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwusd_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwusd128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding signed 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwusd_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWUSD instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x int].
				/// \param __A
				/// A 256-bit vector of [16 x unsigned short].
				/// \param __B
				/// A 256-bit vector of [16 x short].
				/// \returns
				/// A 256-bit vector of [8 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) SignExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) SignExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwusd_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwusd256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding signed 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpwusds_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x int].
				/// \param __A
				/// A 128-bit vector of [8 x unsigned short].
				/// \param __B
				/// A 128-bit vector of [8 x short].
				/// \returns
				/// A 128-bit vector of [4 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) SignExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) SignExtend32(__B.word[2*j+1])
				/// dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwusds_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwusds128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding signed 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwsuds_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x int].
				/// \param __A
				/// A 256-bit vector of [16 x unsigned short].
				/// \param __B
				/// A 256-bit vector of [16 x short].
				/// \returns
				/// A 256-bit vector of [8 x int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) SignExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) SignExtend32(__B.word[2*j+1])
				/// dst.dword[j] := SIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwusds_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwusds256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpwuud_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWUUD instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x unsigned int].
				/// \param __A
				/// A 128-bit vector of [8 x unsigned short].
				/// \param __B
				/// A 128-bit vector of [8 x unsigned short].
				/// \returns
				/// A 128-bit vector of [4 x unsigned int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwuud_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwuud128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W, and store the packed 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwuud_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWUUD instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x unsigned int].
				/// \param __A
				/// A 256-bit vector of [16 x unsigned short].
				/// \param __B
				/// A 256-bit vector of [16 x unsigned short].
				/// \returns
				/// A 256-bit vector of [8 x unsigned int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := __W.dword[j] + tmp1 + tmp2
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwuud_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwuud256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m128i _mm_dpwsuds_epi32(__m128i __W, __m128i __A, __m128i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 128-bit vector of [4 x unsigned int].
				/// \param __A
				/// A 128-bit vector of [8 x unsigned short].
				/// \param __B
				/// A 128-bit vector of [8 x unsigned short].
				/// \returns
				/// A 128-bit vector of [4 x unsigned int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 3
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:128] := 0
				/// \endcode
				static __inline__ __m128i __DEFAULT_FN_ATTRS128 _mm_dpwuuds_epi32(__m128i __W,
				__m128i __A,
				__m128i __B) {
				return (__m128i)__builtin_ia32_vpdpwuuds128((__v4si)__W, (__v4si)__A,
				(__v4si)__B);
				}

				/// Multiply groups of 2 adjacent pairs of unsigned 16-bit integers in \a __A with
				/// corresponding unsigned 16-bit integers in \a __B, producing 2 intermediate
				/// signed 16-bit results. Sum these 2 results with the corresponding
				/// 32-bit integer in \a __W with signed saturation, and store the packed
				/// 32-bit results in \a dst.
				///
				/// \headerfile <immintrin.h>
				///
				/// \code
				/// __m256i _mm256_dpwuuds_epi32(__m256i __W, __m256i __A, __m256i __B)
				/// \endcode
				///
				/// This intrinsic corresponds to the \c VPDPWSUDS instruction.
				///
				/// \param __W
				/// A 256-bit vector of [8 x unsigned int].
				/// \param __A
				/// A 256-bit vector of [16 x unsigned short].
				/// \param __B
				/// A 256-bit vector of [16 x unsigned short].
				/// \returns
				/// A 256-bit vector of [8 x unsigned int].
				///
				/// \code{.operation}
				/// FOR j := 0 to 7
				/// tmp1.dword := ZeroExtend32(__A.word[2j]) ZeroExtend32(__B.word[2*j])
				/// tmp2.dword := ZeroExtend32(__A.word[2j+1]) ZeroExtend32(__B.word[2*j+1])
				/// dst.dword[j] := UNSIGNED_DWORD_SATURATE(__W.dword[j] + tmp1 + tmp2)
				/// ENDFOR
				/// dst[MAX:256] := 0
				/// \endcode
				static __inline__ __m256i __DEFAULT_FN_ATTRS256
				_mm256_dpwuuds_epi32(__m256i __W, __m256i __A, __m256i __B) {
				return (__m256i)__builtin_ia32_vpdpwuuds256((__v8si)__W, (__v8si)__A,
				(__v8si)__B);
				}

				#undef __DEFAULT_FN_ATTRS128
				#undef __DEFAULT_FN_ATTRS256

				#endif // __AVXVNNIINT16INTRIN_H

clang/lib/Headers/immintrin.h

	Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	#endif			#endif

	#if !(defined(_MSC_VER) \|\| defined(__SCE__)) \|\| __has_feature(modules) \|\| \			#if !(defined(_MSC_VER) \|\| defined(__SCE__)) \|\| __has_feature(modules) \|\| \
	defined(__SM4__)			defined(__SM4__)
	#include <sm4intrin.h>			#include <sm4intrin.h>
	#endif			#endif

	#if !(defined(_MSC_VER) \|\| defined(__SCE__)) \|\| __has_feature(modules) \|\| \			#if !(defined(_MSC_VER) \|\| defined(__SCE__)) \|\| __has_feature(modules) \|\| \
				defined(__AVXVNNIINT16__)
				#include <avxvnniint16intrin.h>
				#endif

				#if !(defined(_MSC_VER) \|\| defined(__SCE__)) \|\| __has_feature(modules) \|\| \
	defined(__RDPID__)			defined(__RDPID__)
	/// Returns the value of the IA32_TSC_AUX MSR (0xc0000103).			/// Returns the value of the IA32_TSC_AUX MSR (0xc0000103).
	///			///
	/// \headerfile <immintrin.h>			/// \headerfile <immintrin.h>
	///			///
	/// This intrinsic corresponds to the <c> RDPID </c> instruction.			/// This intrinsic corresponds to the <c> RDPID </c> instruction.
	static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__, __target__("rdpid")))			static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__, __target__("rdpid")))
	_rdpid_u32(void) {			_rdpid_u32(void) {
	▲ Show 20 Lines • Show All 458 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avxvnniint16-builtins.c

This file was added.

				// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avxvnniint16 -emit-llvm -o - -Wall -Werror \| FileCheck %s
				// RUN: %clang_cc1 %s -ffreestanding -triple=i386-unknown-unknown -target-feature +avxvnniint16 -emit-llvm -o - -Wall -Werror \| FileCheck %s

				#include <immintrin.h>

				__m128i test_mm_dpwsud_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwsud_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwsud.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwsud_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwsud_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwsud_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwsud.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwsud_epi32(__A, __B, __C);
				}

				__m128i test_mm_dpwsuds_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwsuds_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwsuds.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwsuds_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwsuds_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwsuds_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwsuds.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwsuds_epi32(__A, __B, __C);
				}

				__m128i test_mm_dpwusd_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwusd_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwusd.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwusd_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwusd_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwusd_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwusd.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwusd_epi32(__A, __B, __C);
				}

				__m128i test_mm_dpwusds_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwusds_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwusds.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwusds_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwusds_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwusds_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwusds.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwusds_epi32(__A, __B, __C);
				}

				__m128i test_mm_dpwuud_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwuud_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwuud_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwuud_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwuud_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwuud_epi32(__A, __B, __C);
				}

				__m128i test_mm_dpwuuds_epi32(__m128i __A, __m128i __B, __m128i __C) {
				// CHECK-LABEL: @test_mm_dpwuuds_epi32(
				// CHECK: call <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %{{.}}, <4 x i32> %{{.}}, <4 x i32> %{{.*}})
				return _mm_dpwuuds_epi32(__A, __B, __C);
				}

				__m256i test_mm256_dpwuuds_epi32(__m256i __A, __m256i __B, __m256i __C) {
				// CHECK-LABEL: @test_mm256_dpwuuds_epi32(
				// CHECK: call <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}})
				return _mm256_dpwuuds_epi32(__A, __B, __C);
				}

clang/test/CodeGen/attr-target-x86.c

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	// CHECK: qax{{.*}} #5			// CHECK: qax{{.*}} #5
	// CHECK: qq{{.*}} #6			// CHECK: qq{{.*}} #6
	// CHECK: lake{{.*}} #7			// CHECK: lake{{.*}} #7
	// CHECK: use_before_def{{.*}} #7			// CHECK: use_before_def{{.*}} #7
	// CHECK: walrus{{.*}} #8			// CHECK: walrus{{.*}} #8
	// CHECK: #0 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87" "tune-cpu"="i686"			// CHECK: #0 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87" "tune-cpu"="i686"
	// CHECK: #1 = {{.*}}"target-cpu"="ivybridge" "target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt"			// CHECK: #1 = {{.*}}"target-cpu"="ivybridge" "target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt"
	// CHECK-NOT: tune-cpu			// CHECK-NOT: tune-cpu
	// CHECK: #2 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-aes,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxifma,-avxneconvert,-avxvnni,-avxvnniint8,-f16c,-fma,-fma4,-gfni,-kl,-pclmul,-sha,-sha512,-sm3,-sm4,-sse2,-sse3,-sse4.1,-sse4.2,-sse4a,-ssse3,-vaes,-vpclmulqdq,-widekl,-xop" "tune-cpu"="i686"			// CHECK: #2 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-aes,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-f16c,-fma,-fma4,-gfni,-kl,-pclmul,-sha,-sha512,-sm3,-sm4,-sse2,-sse3,-sse4.1,-sse4.2,-sse4a,-ssse3,-vaes,-vpclmulqdq,-widekl,-xop" "tune-cpu"="i686"
	// CHECK: #3 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="i686"			// CHECK: #3 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+crc32,+cx8,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87" "tune-cpu"="i686"
	// CHECK: #4 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxifma,-avxneconvert,-avxvnni,-avxvnniint8,-f16c,-fma,-fma4,-sha512,-sm3,-sm4,-sse4.1,-sse4.2,-vaes,-vpclmulqdq,-xop" "tune-cpu"="i686"			// CHECK: #4 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-f16c,-fma,-fma4,-sha512,-sm3,-sm4,-sse4.1,-sse4.2,-vaes,-vpclmulqdq,-xop" "tune-cpu"="i686"
	// CHECK: #5 = {{.*}}"target-cpu"="ivybridge" "target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-vaes"			// CHECK: #5 = {{.*}}"target-cpu"="ivybridge" "target-features"="+avx,+cmov,+crc32,+cx16,+cx8,+f16c,+fsgsbase,+fxsr,+mmx,+pclmul,+popcnt,+rdrnd,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave,+xsaveopt,-aes,-vaes"
	// CHECK-NOT: tune-cpu			// CHECK-NOT: tune-cpu
	// CHECK: #6 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-3dnow,-3dnowa,-mmx"			// CHECK: #6 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87,-3dnow,-3dnowa,-mmx"
	// CHECK: #7 = {{.*}}"target-cpu"="lakemont" "target-features"="+cx8,+mmx"			// CHECK: #7 = {{.*}}"target-cpu"="lakemont" "target-features"="+cx8,+mmx"
	// CHECK-NOT: tune-cpu			// CHECK-NOT: tune-cpu
	// CHECK: #8 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87" "tune-cpu"="sandybridge"			// CHECK: #8 = {{.*}}"target-cpu"="i686" "target-features"="+cmov,+cx8,+x87" "tune-cpu"="sandybridge"

	// CHECK: "target-cpu"="x86-64-v2"			// CHECK: "target-cpu"="x86-64-v2"
	// CHECK-SAME: "target-features"="+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87"			// CHECK-SAME: "target-features"="+cmov,+crc32,+cx16,+cx8,+fxsr,+mmx,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87"
	// CHECK: "target-cpu"="x86-64-v3"			// CHECK: "target-cpu"="x86-64-v3"
	// CHECK-SAME: "target-features"="+avx,+avx2,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fxsr,+lzcnt,+mmx,+movbe,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK-SAME: "target-features"="+avx,+avx2,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fxsr,+lzcnt,+mmx,+movbe,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"
	// CHECK: "target-cpu"="x86-64-v4"			// CHECK: "target-cpu"="x86-64-v4"
	// CHECK-SAME: "target-features"="+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fxsr,+lzcnt,+mmx,+movbe,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"			// CHECK-SAME: "target-features"="+avx,+avx2,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512vl,+bmi,+bmi2,+cmov,+crc32,+cx16,+cx8,+f16c,+fma,+fxsr,+lzcnt,+mmx,+movbe,+popcnt,+sahf,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave"

clang/test/Driver/x86-target-features.c

	Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
	// SM3: "-target-feature" "+sm3"			// SM3: "-target-feature" "+sm3"
	// NO-SM3: "-target-feature" "-sm3"			// NO-SM3: "-target-feature" "-sm3"

	// RUN: %clang --target=i386 -msm4 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=SM4 %s			// RUN: %clang --target=i386 -msm4 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=SM4 %s
	// RUN: %clang --target=i386 -mno-sm4 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-SM4 %s			// RUN: %clang --target=i386 -mno-sm4 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-SM4 %s
	// SM4: "-target-feature" "+sm4"			// SM4: "-target-feature" "+sm4"
	// NO-SM4: "-target-feature" "-sm4"			// NO-SM4: "-target-feature" "-sm4"

				// RUN: %clang --target=i386 -mavxvnniint16 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=AVXVNNIINT16 %s
				// RUN: %clang --target=i386 -mno-avxvnniint16 %s -### -o %t.o 2>&1 \| FileCheck -check-prefix=NO-AVXVNNIINT16 %s
				// AVXVNNIINT16: "-target-feature" "+avxvnniint16"
				// NO-AVXVNNIINT16: "-target-feature" "-avxvnniint16"

	// RUN: %clang --target=i386 -march=i386 -mcrc32 %s -### 2>&1 \| FileCheck -check-prefix=CRC32 %s			// RUN: %clang --target=i386 -march=i386 -mcrc32 %s -### 2>&1 \| FileCheck -check-prefix=CRC32 %s
	// RUN: %clang --target=i386 -march=i386 -mno-crc32 %s -### 2>&1 \| FileCheck -check-prefix=NO-CRC32 %s			// RUN: %clang --target=i386 -march=i386 -mno-crc32 %s -### 2>&1 \| FileCheck -check-prefix=NO-CRC32 %s
	// CRC32: "-target-feature" "+crc32"			// CRC32: "-target-feature" "+crc32"
	// NO-CRC32: "-target-feature" "-crc32"			// NO-CRC32: "-target-feature" "-crc32"

	// RUN: %clang --target=i386 -march=i386 -mharden-sls=return %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-RET,NO-SLS %s			// RUN: %clang --target=i386 -march=i386 -mharden-sls=return %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-RET,NO-SLS %s
	// RUN: %clang --target=i386 -march=i386 -mharden-sls=indirect-jmp %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-IJMP,NO-SLS %s			// RUN: %clang --target=i386 -march=i386 -mharden-sls=indirect-jmp %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-IJMP,NO-SLS %s
	// RUN: %clang --target=i386 -march=i386 -mharden-sls=none -mharden-sls=all %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-IJMP,SLS-RET %s			// RUN: %clang --target=i386 -march=i386 -mharden-sls=none -mharden-sls=all %s -### -o %t.o 2>&1 \| FileCheck -check-prefixes=SLS-IJMP,SLS-RET %s
	Show All 10 Lines

clang/test/Preprocessor/x86_target_features.c

	Show First 20 Lines • Show All 694 Lines • ▼ Show 20 Lines
	// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mno-sm4 -x c -E -dM -o - %s \| FileCheck -check-prefix=NOSM4 %s			// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mno-sm4 -x c -E -dM -o - %s \| FileCheck -check-prefix=NOSM4 %s
	// NOSM4-NOT: #define __SM4__ 1			// NOSM4-NOT: #define __SM4__ 1

	// RUN: %clang -target i686-unknown-linux-gnu -march=atom -msm4 -mno-avx -x c -E -dM -o - %s \| FileCheck -check-prefix=SM4NOAVX %s			// RUN: %clang -target i686-unknown-linux-gnu -march=atom -msm4 -mno-avx -x c -E -dM -o - %s \| FileCheck -check-prefix=SM4NOAVX %s

	// SM4NOAVX-NOT: #define __AVX__ 1			// SM4NOAVX-NOT: #define __AVX__ 1
	// SM4NOAVX-NOT: #define __SM4__ 1			// SM4NOAVX-NOT: #define __SM4__ 1

				// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxvnniint16 -x c -E -dM -o - %s \| FileCheck -check-prefix=AVXVNNIINT16 %s

				// AVXVNNIINT16: #define __AVX2__ 1
				// AVXVNNIINT16: #define __AVXVNNIINT16__ 1

				// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mno-avxvnniint16 -x c -E -dM -o - %s \| FileCheck -check-prefix=NOAVXVNNIINT16 %s

				// NOAVXVNNIINT16-NOT: #define __AVXVNNIINT16__ 1

				// RUN: %clang -target i686-unknown-linux-gnu -march=atom -mavxvnniint16 -mno-avx2 -x c -E -dM -o - %s \| FileCheck -check-prefix=AVXVNNIINT16NOAVX2 %s

				// AVXVNNIINT16NOAVX2-NOT: #define __AVX2__ 1
				// AVXVNNIINT16NOAVX2-NOT: #define __AVXVNNIINT16__ 1

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mcrc32 -x c -E -dM -o - %s \| FileCheck -check-prefix=CRC32 %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mcrc32 -x c -E -dM -o - %s \| FileCheck -check-prefix=CRC32 %s

	// CRC32: #define __CRC32__ 1			// CRC32: #define __CRC32__ 1

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-crc32 -x c -E -dM -o - %s \| FileCheck -check-prefix=NOCRC32 %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-crc32 -x c -E -dM -o - %s \| FileCheck -check-prefix=NOCRC32 %s

	// NOCRC32-NOT: #define __CRC32__ 1			// NOCRC32-NOT: #define __CRC32__ 1

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mrdpru -x c -E -dM -o - %s \| FileCheck -check-prefix=RDPRU %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mrdpru -x c -E -dM -o - %s \| FileCheck -check-prefix=RDPRU %s

	// RDPRU: #define __RDPRU__ 1			// RDPRU: #define __RDPRU__ 1

	// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-rdpru -x c -E -dM -o - %s \| FileCheck -check-prefix=NORDPRU %s			// RUN: %clang -target i386-unknown-linux-gnu -march=i386 -mno-rdpru -x c -E -dM -o - %s \| FileCheck -check-prefix=NORDPRU %s

	// NORDPRU-NOT: #define __RDPRU__ 1			// NORDPRU-NOT: #define __RDPRU__ 1

llvm/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
	--------------------------			--------------------------

	* ``__builtin_unpredictable`` (unpredictable metadata in LLVM IR), is handled by X86 Backend.			* ``__builtin_unpredictable`` (unpredictable metadata in LLVM IR), is handled by X86 Backend.
	``X86CmovConversion`` pass now respects this builtin and does not convert CMOVs to branches.			``X86CmovConversion`` pass now respects this builtin and does not convert CMOVs to branches.
	* Add support for the ``PBNDKB`` instruction.			* Add support for the ``PBNDKB`` instruction.
	* Support ISA of ``SHA512``.			* Support ISA of ``SHA512``.
	* Support ISA of ``SM3``.			* Support ISA of ``SM3``.
	* Support ISA of ``SM4``.			* Support ISA of ``SM4``.
				* Support ISA of ``AVX-VNNI-INT16``.

	Changes to the OCaml bindings			Changes to the OCaml bindings
	-----------------------------			-----------------------------

	Changes to the Python bindings			Changes to the Python bindings
	------------------------------			------------------------------

	* The python bindings have been removed.			* The python bindings have been removed.
	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,047 Lines • ▼ Show 20 Lines	def int_x86_avx2_vpdpbuuds_128
DefaultAttrsIntrinsic<[llvm_v4i32_ty],		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_x86_avx2_vpdpbuuds_256		def int_x86_avx2_vpdpbuuds_256
: ClangBuiltin<"__builtin_ia32_vpdpbuuds256">,		: ClangBuiltin<"__builtin_ia32_vpdpbuuds256">,
DefaultAttrsIntrinsic<[llvm_v8i32_ty],		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
[IntrNoMem]>;		[IntrNoMem]>;

		def int_x86_avx2_vpdpwsud_128
		: ClangBuiltin<"__builtin_ia32_vpdpwsud128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwsud_256
		: ClangBuiltin<"__builtin_ia32_vpdpwsud256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwsuds_128
		: ClangBuiltin<"__builtin_ia32_vpdpwsuds128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwsuds_256
		: ClangBuiltin<"__builtin_ia32_vpdpwsuds256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwusd_128
		: ClangBuiltin<"__builtin_ia32_vpdpwusd128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwusd_256
		: ClangBuiltin<"__builtin_ia32_vpdpwusd256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwusds_128
		: ClangBuiltin<"__builtin_ia32_vpdpwusds128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwusds_256
		: ClangBuiltin<"__builtin_ia32_vpdpwusds256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwuud_128
		: ClangBuiltin<"__builtin_ia32_vpdpwuud128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwuud_256
		: ClangBuiltin<"__builtin_ia32_vpdpwuud256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwuuds_128
		: ClangBuiltin<"__builtin_ia32_vpdpwuuds128">,
		DefaultAttrsIntrinsic<[llvm_v4i32_ty],
		[llvm_v4i32_ty, llvm_v4i32_ty, llvm_v4i32_ty],
		[IntrNoMem]>;
		def int_x86_avx2_vpdpwuuds_256
		: ClangBuiltin<"__builtin_ia32_vpdpwuuds256">,
		DefaultAttrsIntrinsic<[llvm_v8i32_ty],
		[llvm_v8i32_ty, llvm_v8i32_ty, llvm_v8i32_ty],
		[IntrNoMem]>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// XOP		// XOP

let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_xop_vpermil2pd : ClangBuiltin<"__builtin_ia32_vpermil2pd">,		def int_x86_xop_vpermil2pd : ClangBuiltin<"__builtin_ia32_vpermil2pd">,
DefaultAttrsIntrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty,		DefaultAttrsIntrinsic<[llvm_v2f64_ty], [llvm_v2f64_ty, llvm_v2f64_ty,
▲ Show 20 Lines • Show All 4,401 Lines • Show Last 20 Lines

llvm/include/llvm/TargetParser/X86TargetParser.def

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines
	X86_FEATURE (CMPCCXADD, "cmpccxadd")			X86_FEATURE (CMPCCXADD, "cmpccxadd")
	X86_FEATURE (AVXNECONVERT, "avxneconvert")			X86_FEATURE (AVXNECONVERT, "avxneconvert")
	X86_FEATURE (AVXVNNI, "avxvnni")			X86_FEATURE (AVXVNNI, "avxvnni")
	X86_FEATURE (AVXIFMA, "avxifma")			X86_FEATURE (AVXIFMA, "avxifma")
	X86_FEATURE (AVXVNNIINT8, "avxvnniint8")			X86_FEATURE (AVXVNNIINT8, "avxvnniint8")
	X86_FEATURE (SHA512, "sha512")			X86_FEATURE (SHA512, "sha512")
	X86_FEATURE (SM3, "sm3")			X86_FEATURE (SM3, "sm3")
	X86_FEATURE (SM4, "sm4")			X86_FEATURE (SM4, "sm4")
				X86_FEATURE (AVXVNNIINT16, "avxvnniint16")
	// These features aren't really CPU features, but the frontend can set them.			// These features aren't really CPU features, but the frontend can set them.
	X86_FEATURE (RETPOLINE_EXTERNAL_THUNK, "retpoline-external-thunk")			X86_FEATURE (RETPOLINE_EXTERNAL_THUNK, "retpoline-external-thunk")
	X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches")			X86_FEATURE (RETPOLINE_INDIRECT_BRANCHES, "retpoline-indirect-branches")
	X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls")			X86_FEATURE (RETPOLINE_INDIRECT_CALLS, "retpoline-indirect-calls")
	X86_FEATURE (LVI_CFI, "lvi-cfi")			X86_FEATURE (LVI_CFI, "lvi-cfi")
	X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening")			X86_FEATURE (LVI_LOAD_HARDENING, "lvi-load-hardening")
	#undef X86_FEATURE_COMPAT			#undef X86_FEATURE_COMPAT
	#undef X86_FEATURE			#undef X86_FEATURE

llvm/lib/Target/X86/X86.td

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	// currently.			// currently.
	def FeatureFP16 : SubtargetFeature<"avx512fp16", "HasFP16", "true",			def FeatureFP16 : SubtargetFeature<"avx512fp16", "HasFP16", "true",
	"Support 16-bit floating point",			"Support 16-bit floating point",
	[FeatureBWI, FeatureVLX, FeatureDQI]>;			[FeatureBWI, FeatureVLX, FeatureDQI]>;
	def FeatureAVXVNNIINT8 : SubtargetFeature<"avxvnniint8",			def FeatureAVXVNNIINT8 : SubtargetFeature<"avxvnniint8",
	"HasAVXVNNIINT8", "true",			"HasAVXVNNIINT8", "true",
	"Enable AVX-VNNI-INT8",			"Enable AVX-VNNI-INT8",
	[FeatureAVX2]>;			[FeatureAVX2]>;
				def FeatureAVXVNNIINT16 : SubtargetFeature<"avxvnniint16",
				"HasAVXVNNIINT16", "true",
				"Enable AVX-VNNI-INT16",
				[FeatureAVX2]>;
	def FeaturePCLMUL : SubtargetFeature<"pclmul", "HasPCLMUL", "true",			def FeaturePCLMUL : SubtargetFeature<"pclmul", "HasPCLMUL", "true",
	"Enable packed carry-less multiplication instructions",			"Enable packed carry-less multiplication instructions",
	[FeatureSSE2]>;			[FeatureSSE2]>;
	def FeatureGFNI : SubtargetFeature<"gfni", "HasGFNI", "true",			def FeatureGFNI : SubtargetFeature<"gfni", "HasGFNI", "true",
	"Enable Galois Field Arithmetic Instructions",			"Enable Galois Field Arithmetic Instructions",
	[FeatureSSE2]>;			[FeatureSSE2]>;
	def FeatureVPCLMULQDQ : SubtargetFeature<"vpclmulqdq", "HasVPCLMULQDQ", "true",			def FeatureVPCLMULQDQ : SubtargetFeature<"vpclmulqdq", "HasVPCLMULQDQ", "true",
	"Enable vpclmulqdq instructions",			"Enable vpclmulqdq instructions",
	▲ Show 20 Lines • Show All 1,684 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,559 Lines • ▼ Show 20 Lines	bool X86InstrInfo::findCommutedOpIndices(const MachineInstr &MI,
case X86::VPTERNLOGQZ128rmbikz:		case X86::VPTERNLOGQZ128rmbikz:
case X86::VPTERNLOGQZ256rmbikz:		case X86::VPTERNLOGQZ256rmbikz:
case X86::VPTERNLOGQZrmbikz:		case X86::VPTERNLOGQZrmbikz:
return findThreeSrcCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);		return findThreeSrcCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);
case X86::VPDPWSSDYrr:		case X86::VPDPWSSDYrr:
case X86::VPDPWSSDrr:		case X86::VPDPWSSDrr:
case X86::VPDPWSSDSYrr:		case X86::VPDPWSSDSYrr:
case X86::VPDPWSSDSrr:		case X86::VPDPWSSDSrr:
		case X86::VPDPWUUDrr:
		case X86::VPDPWUUDYrr:
		case X86::VPDPWUUDSrr:
		case X86::VPDPWUUDSYrr:
case X86::VPDPBSSDSrr:		case X86::VPDPBSSDSrr:
case X86::VPDPBSSDSYrr:		case X86::VPDPBSSDSYrr:
case X86::VPDPBSSDrr:		case X86::VPDPBSSDrr:
case X86::VPDPBSSDYrr:		case X86::VPDPBSSDYrr:
case X86::VPDPBUUDSrr:		case X86::VPDPBUUDSrr:
case X86::VPDPBUUDSYrr:		case X86::VPDPBUUDSYrr:
case X86::VPDPBUUDrr:		case X86::VPDPBUUDrr:
case X86::VPDPBUUDYrr:		case X86::VPDPBUUDYrr:
▲ Show 20 Lines • Show All 7,344 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 918 Lines • ▼ Show 20 Lines
	def NoVLX : Predicate<"!Subtarget->hasVLX()">;			def NoVLX : Predicate<"!Subtarget->hasVLX()">;
	def NoVLX_Or_NoBWI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasBWI()">;			def NoVLX_Or_NoBWI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasBWI()">;
	def NoVLX_Or_NoDQI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasDQI()">;			def NoVLX_Or_NoDQI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasDQI()">;
	def HasPKU : Predicate<"Subtarget->hasPKU()">;			def HasPKU : Predicate<"Subtarget->hasPKU()">;
	def HasVNNI : Predicate<"Subtarget->hasVNNI()">;			def HasVNNI : Predicate<"Subtarget->hasVNNI()">;
	def HasVP2INTERSECT : Predicate<"Subtarget->hasVP2INTERSECT()">;			def HasVP2INTERSECT : Predicate<"Subtarget->hasVP2INTERSECT()">;
	def HasBF16 : Predicate<"Subtarget->hasBF16()">;			def HasBF16 : Predicate<"Subtarget->hasBF16()">;
	def HasFP16 : Predicate<"Subtarget->hasFP16()">;			def HasFP16 : Predicate<"Subtarget->hasFP16()">;
				def HasAVXVNNIINT16 : Predicate<"Subtarget->hasAVXVNNIINT16()">;
	def HasAVXVNNIINT8 : Predicate<"Subtarget->hasAVXVNNIINT8()">;			def HasAVXVNNIINT8 : Predicate<"Subtarget->hasAVXVNNIINT8()">;
	def HasAVXVNNI : Predicate <"Subtarget->hasAVXVNNI()">;			def HasAVXVNNI : Predicate <"Subtarget->hasAVXVNNI()">;
	def NoVLX_Or_NoVNNI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasVNNI()">;			def NoVLX_Or_NoVNNI : Predicate<"!Subtarget->hasVLX() \|\| !Subtarget->hasVNNI()">;

	def HasBITALG : Predicate<"Subtarget->hasBITALG()">;			def HasBITALG : Predicate<"Subtarget->hasBITALG()">;
	def HasPOPCNT : Predicate<"Subtarget->hasPOPCNT()">;			def HasPOPCNT : Predicate<"Subtarget->hasPOPCNT()">;
	def HasAES : Predicate<"Subtarget->hasAES()">;			def HasAES : Predicate<"Subtarget->hasAES()">;
	def HasVAES : Predicate<"Subtarget->hasVAES()">;			def HasVAES : Predicate<"Subtarget->hasVAES()">;
	▲ Show 20 Lines • Show All 534 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,294 Lines • ▼ Show 20 Lines	def : InstAlias<"vcvtneps2bf16x\t{$src, $dst\|$dst, $src}",
(VCVTNEPS2BF16rr VR128:$dst, VR128:$src), 0, "att">;		(VCVTNEPS2BF16rr VR128:$dst, VR128:$src), 0, "att">;
def : InstAlias<"vcvtneps2bf16y\t{$src, $dst\|$dst, $src}",		def : InstAlias<"vcvtneps2bf16y\t{$src, $dst\|$dst, $src}",
(VCVTNEPS2BF16Yrr VR128:$dst, VR256:$src), 0, "att">;		(VCVTNEPS2BF16Yrr VR128:$dst, VR256:$src), 0, "att">;

// FIXME: Is there a better scheduler class for SHA512 than WriteVecIMul?		// FIXME: Is there a better scheduler class for SHA512 than WriteVecIMul?
let Predicates = [HasSHA512], Constraints = "$src1 = $dst" in {		let Predicates = [HasSHA512], Constraints = "$src1 = $dst" in {
def VSHA512MSG1rr : I<0xcc, MRMSrcReg, (outs VR256:$dst),		def VSHA512MSG1rr : I<0xcc, MRMSrcReg, (outs VR256:$dst),
(ins VR256:$src1, VR128:$src2),		(ins VR256:$src1, VR128:$src2),
"vsha512msg1\t{$src2, $dst\|$dst, $src2}",		"vsha512msg1\t{$src2, $dst\|$dst, $src2}",
		craig.topperUnsubmitted Done Reply Inline Actions This needs to be indented 1 character more craig.topper: This needs to be indented 1 character more
[(set VR256:$dst,		[(set VR256:$dst,
(int_x86_vsha512msg1 VR256:$src1, VR128:$src2))]>, VEX_L,		(int_x86_vsha512msg1 VR256:$src1, VR128:$src2))]>, VEX_L,
VEX, T8XD, Sched<[WriteVecIMul]>;		VEX, T8XD, Sched<[WriteVecIMul]>;
		craig.topperUnsubmitted Done Reply Inline Actions This needs to be indented 1 character more so that it looks nested under the `set` craig.topper: This needs to be indented 1 character more so that it looks nested under the `set`
def VSHA512MSG2rr : I<0xcd, MRMSrcReg, (outs VR256:$dst),		def VSHA512MSG2rr : I<0xcd, MRMSrcReg, (outs VR256:$dst),
(ins VR256:$src1, VR256:$src2),		(ins VR256:$src1, VR256:$src2),
"vsha512msg2\t{$src2, $dst\|$dst, $src2}",		"vsha512msg2\t{$src2, $dst\|$dst, $src2}",
[(set VR256:$dst,		[(set VR256:$dst,
(int_x86_vsha512msg2 VR256:$src1, VR256:$src2))]>, VEX_L,		(int_x86_vsha512msg2 VR256:$src1, VR256:$src2))]>, VEX_L,
VEX, T8XD, Sched<[WriteVecIMul]>;		VEX, T8XD, Sched<[WriteVecIMul]>;
def VSHA512RNDS2rr : I<0xcb, MRMSrcReg, (outs VR256:$dst),		def VSHA512RNDS2rr : I<0xcb, MRMSrcReg, (outs VR256:$dst),
(ins VR256:$src1, VR256:$src2, VR128:$src3),		(ins VR256:$src1, VR256:$src2, VR128:$src3),
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	def rm : I<0xda, MRMSrcMem, (outs RC:$dst),
Sched<[WriteVecIMul]>;		Sched<[WriteVecIMul]>;
}		}
}		}

defm VSM4KEY4 : SM4_Base<"vsm4key4", VR128, "128", loadv4i32, i128mem>, T8XS, VEX_4V;		defm VSM4KEY4 : SM4_Base<"vsm4key4", VR128, "128", loadv4i32, i128mem>, T8XS, VEX_4V;
defm VSM4KEY4Y : SM4_Base<"vsm4key4", VR256, "256", loadv8i32, i256mem>, T8XS, VEX_L, VEX_4V;		defm VSM4KEY4Y : SM4_Base<"vsm4key4", VR256, "256", loadv8i32, i256mem>, T8XS, VEX_L, VEX_4V;
defm VSM4RNDS4 : SM4_Base<"vsm4rnds4", VR128, "128", loadv4i32, i128mem>, T8XD, VEX_4V;		defm VSM4RNDS4 : SM4_Base<"vsm4rnds4", VR128, "128", loadv4i32, i128mem>, T8XD, VEX_4V;
defm VSM4RNDS4Y : SM4_Base<"vsm4rnds4", VR256, "256", loadv8i32, i256mem>, T8XD, VEX_L, VEX_4V;		defm VSM4RNDS4Y : SM4_Base<"vsm4rnds4", VR256, "256", loadv8i32, i256mem>, T8XD, VEX_L, VEX_4V;

		let Predicates = [HasAVXVNNIINT16], Constraints = "$src1 = $dst" in
		multiclass avx_vnni_int16<bits<8> opc, string OpcodeStr, bit IsCommutable> {
		let isCommutable = IsCommutable in
		def rr : I<opc, MRMSrcReg, (outs VR128:$dst),
		(ins VR128:$src1, VR128:$src2, VR128:$src3),
		!strconcat(OpcodeStr, "\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
		[(set VR128:$dst,
		(v4i32 (!cast<Intrinsic>("int_x86_avx2_"#OpcodeStr#"_128")
		VR128:$src1, VR128:$src2, VR128:$src3)))]>,
		VEX_4V, Sched<[SchedWriteVecIMul.XMM]>;

		def rm : I<opc, MRMSrcMem, (outs VR128:$dst),
		(ins VR128:$src1, VR128:$src2, i128mem:$src3),
		!strconcat(OpcodeStr, "\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
		[(set VR128:$dst,
		(v4i32 (!cast<Intrinsic>("int_x86_avx2_"#OpcodeStr#"_128")
		VR128:$src1, VR128:$src2, (loadv4i32 addr:$src3))))]>,
		VEX_4V, Sched<[SchedWriteVecIMul.XMM]>;

		let isCommutable = IsCommutable in
		def Yrr : I<opc, MRMSrcReg, (outs VR256:$dst),
		(ins VR256:$src1, VR256:$src2, VR256:$src3),
		!strconcat(OpcodeStr, "\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
		[(set VR256:$dst,
		(v8i32 (!cast<Intrinsic>("int_x86_avx2_"#OpcodeStr#"_256")
		VR256:$src1, VR256:$src2, VR256:$src3)))]>,
		VEX_4V, VEX_L, Sched<[SchedWriteVecIMul.YMM]>;

		def Yrm : I<opc, MRMSrcMem, (outs VR256:$dst),
		(ins VR256:$src1, VR256:$src2, i256mem:$src3),
		!strconcat(OpcodeStr, "\t{$src3, $src2, $dst\|$dst, $src2, $src3}"),
		[(set VR256:$dst,
		(v8i32 (!cast<Intrinsic>("int_x86_avx2_"#OpcodeStr#"_256")
		VR256:$src1, VR256:$src2, (loadv8i32 addr:$src3))))]>,
		VEX_4V, VEX_L, Sched<[SchedWriteVecIMul.YMM]>;
		}

		defm VPDPWSUD : avx_vnni_int16<0xd2, "vpdpwsud", 0>, T8XS;
		defm VPDPWSUDS : avx_vnni_int16<0xd3, "vpdpwsuds", 0>, T8XS;
		defm VPDPWUSD : avx_vnni_int16<0xd2, "vpdpwusd", 0>, T8PD;
		defm VPDPWUSDS : avx_vnni_int16<0xd3, "vpdpwusds", 0>, T8PD;
		defm VPDPWUUD : avx_vnni_int16<0xd2, "vpdpwuud", 1>, T8PS;
		defm VPDPWUUDS : avx_vnni_int16<0xd3, "vpdpwuuds", 1>, T8PS;

llvm/lib/TargetParser/Host.cpp

Show First 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines	#endif
Features["avx512bf16"] = HasLeaf7Subleaf1 && ((EAX >> 5) & 1) && HasAVX512Save;		Features["avx512bf16"] = HasLeaf7Subleaf1 && ((EAX >> 5) & 1) && HasAVX512Save;
Features["amx-fp16"] = HasLeaf7Subleaf1 && ((EAX >> 21) & 1) && HasAMXSave;		Features["amx-fp16"] = HasLeaf7Subleaf1 && ((EAX >> 21) & 1) && HasAMXSave;
Features["cmpccxadd"] = HasLeaf7Subleaf1 && ((EAX >> 7) & 1);		Features["cmpccxadd"] = HasLeaf7Subleaf1 && ((EAX >> 7) & 1);
Features["hreset"] = HasLeaf7Subleaf1 && ((EAX >> 22) & 1);		Features["hreset"] = HasLeaf7Subleaf1 && ((EAX >> 22) & 1);
Features["avxifma"] = HasLeaf7Subleaf1 && ((EAX >> 23) & 1) && HasAVXSave;		Features["avxifma"] = HasLeaf7Subleaf1 && ((EAX >> 23) & 1) && HasAVXSave;
Features["avxvnniint8"] = HasLeaf7Subleaf1 && ((EDX >> 4) & 1) && HasAVXSave;		Features["avxvnniint8"] = HasLeaf7Subleaf1 && ((EDX >> 4) & 1) && HasAVXSave;
Features["avxneconvert"] = HasLeaf7Subleaf1 && ((EDX >> 5) & 1) && HasAVXSave;		Features["avxneconvert"] = HasLeaf7Subleaf1 && ((EDX >> 5) & 1) && HasAVXSave;
Features["amx-complex"] = HasLeaf7Subleaf1 && ((EDX >> 8) & 1) && HasAMXSave;		Features["amx-complex"] = HasLeaf7Subleaf1 && ((EDX >> 8) & 1) && HasAMXSave;
		Features["avxvnniint16"] = HasLeaf7Subleaf1 && ((EDX >> 10) & 1) && HasAVXSave;
Features["prefetchi"] = HasLeaf7Subleaf1 && ((EDX >> 14) & 1);		Features["prefetchi"] = HasLeaf7Subleaf1 && ((EDX >> 14) & 1);

bool HasLeafD = MaxLevel >= 0xd &&		bool HasLeafD = MaxLevel >= 0xd &&
!getX86CpuIDAndInfoEx(0xd, 0x1, &EAX, &EBX, &ECX, &EDX);		!getX86CpuIDAndInfoEx(0xd, 0x1, &EAX, &EBX, &ECX, &EDX);

// Only enable XSAVE if OS has enabled support for saving YMM state.		// Only enable XSAVE if OS has enabled support for saving YMM state.
Features["xsaveopt"] = HasLeafD && ((EAX >> 0) & 1) && HasAVXSave;		Features["xsaveopt"] = HasLeafD && ((EAX >> 0) & 1) && HasAVXSave;
Features["xsavec"] = HasLeafD && ((EAX >> 1) & 1) && HasAVXSave;		Features["xsavec"] = HasLeafD && ((EAX >> 1) & 1) && HasAVXSave;
▲ Show 20 Lines • Show All 169 Lines • Show Last 20 Lines

llvm/lib/TargetParser/X86TargetParser.cpp

	Show First 20 Lines • Show All 648 Lines • ▼ Show 20 Lines
	constexpr FeatureBitset ImpliedFeaturesAMX_FP16 = FeatureAMX_TILE;			constexpr FeatureBitset ImpliedFeaturesAMX_FP16 = FeatureAMX_TILE;
	constexpr FeatureBitset ImpliedFeaturesAMX_INT8 = FeatureAMX_TILE;			constexpr FeatureBitset ImpliedFeaturesAMX_INT8 = FeatureAMX_TILE;
	constexpr FeatureBitset ImpliedFeaturesAMX_COMPLEX = FeatureAMX_TILE;			constexpr FeatureBitset ImpliedFeaturesAMX_COMPLEX = FeatureAMX_TILE;
	constexpr FeatureBitset ImpliedFeaturesHRESET = {};			constexpr FeatureBitset ImpliedFeaturesHRESET = {};

	constexpr FeatureBitset ImpliedFeaturesPREFETCHI = {};			constexpr FeatureBitset ImpliedFeaturesPREFETCHI = {};
	constexpr FeatureBitset ImpliedFeaturesCMPCCXADD = {};			constexpr FeatureBitset ImpliedFeaturesCMPCCXADD = {};
	constexpr FeatureBitset ImpliedFeaturesRAOINT = {};			constexpr FeatureBitset ImpliedFeaturesRAOINT = {};
				constexpr FeatureBitset ImpliedFeaturesAVXVNNIINT16 = FeatureAVX2;
	constexpr FeatureBitset ImpliedFeaturesAVXVNNIINT8 = FeatureAVX2;			constexpr FeatureBitset ImpliedFeaturesAVXVNNIINT8 = FeatureAVX2;
	constexpr FeatureBitset ImpliedFeaturesAVXIFMA = FeatureAVX2;			constexpr FeatureBitset ImpliedFeaturesAVXIFMA = FeatureAVX2;
	constexpr FeatureBitset ImpliedFeaturesAVXNECONVERT = FeatureAVX2;			constexpr FeatureBitset ImpliedFeaturesAVXNECONVERT = FeatureAVX2;
	constexpr FeatureBitset ImpliedFeaturesSHA512 = FeatureAVX;			constexpr FeatureBitset ImpliedFeaturesSHA512 = FeatureAVX;
	constexpr FeatureBitset ImpliedFeaturesAVX512FP16 =			constexpr FeatureBitset ImpliedFeaturesAVX512FP16 =
	FeatureAVX512BW \| FeatureAVX512DQ \| FeatureAVX512VL;			FeatureAVX512BW \| FeatureAVX512DQ \| FeatureAVX512VL;
	// Key Locker Features			// Key Locker Features
	constexpr FeatureBitset ImpliedFeaturesKL = FeatureSSE2;			constexpr FeatureBitset ImpliedFeaturesKL = FeatureSSE2;
	▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxvnniint16 \| FileCheck %s
				; RUN: llc < %s -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avxvnniint16 \| FileCheck %s
				pengfeiUnsubmitted Done Reply Inline Actions `X64,CHECK` pengfei: `X64,CHECK`
				FreddyYeAuthorUnsubmitted Done Reply Inline Actions This couldn't help merging the CHECKs here. Do we need it? FreddyYe: This couldn't help merging the CHECKs here. Do we need it?

				pengfeiUnsubmitted Done Reply Inline Actions `X86,CHECK` pengfei: `X86,CHECK`
				craig.topperUnsubmitted Done Reply Inline Actions I thought the common prefix had to be first? But I might be wrong craig.topper: I thought the common prefix had to be first? But I might be wrong
				pengfeiUnsubmitted Done Reply Inline Actions You are right 👍 pengfei: You are right 👍
				define <4 x i32> @test_int_x86_avx2_vpdpwsud_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsud_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwsud %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x72,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwsud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwsud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwsud_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsud_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwsud %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x76,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwsud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwsud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwsuds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsuds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwsuds %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x72,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwsuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwsuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwsuds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsuds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwsuds %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x76,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwsuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwsuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwusd_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusd_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwusd %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x71,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwusd.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwusd.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwusd_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusd_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwusd %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x75,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwusd.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwusd.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwusds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwusds %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x71,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwusds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwusds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwusds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwusds %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x75,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwusds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwusds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwuud_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwuud %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x70,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwuud_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwuud %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x74,0xd2,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwuuds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwuuds %xmm2, %xmm1, %xmm0 # encoding: [0xc4,0xe2,0x70,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}
				declare <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)

				define <8 x i32> @test_int_x86_avx2_vpdpwuuds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpdpwuuds %ymm2, %ymm1, %ymm0 # encoding: [0xc4,0xe2,0x74,0xd3,0xc2]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}
				declare <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

llvm/test/CodeGen/X86/stack-folding-int-avxvnniint16.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -O3 -disable-peephole -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avxvnniint16 \| FileCheck %s

				pengfeiUnsubmitted Done Reply Inline Actions Is this required? pengfei: Is this required?
				pengfeiUnsubmitted Done Reply Inline Actions Don't need `avx2` pengfei: Don't need `avx2`
				declare <4 x i32> @llvm.x86.avx2.vpdpwsud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwsud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				declare <4 x i32> @llvm.x86.avx2.vpdpwsuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwsuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				declare <4 x i32> @llvm.x86.avx2.vpdpwusd.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwusd.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				declare <4 x i32> @llvm.x86.avx2.vpdpwusds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwusds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				declare <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				declare <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				declare <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)

				define <4 x i32> @test_int_x86_avx2_vpdpwsud_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsud_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwsud {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x72,0xd2,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwsud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwsud_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsud_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwsud {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x76,0xd2,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwsud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwsuds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsuds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwsuds {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x72,0xd3,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwsuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwsuds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwsuds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwsuds {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x76,0xd3,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwsuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwusd_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusd_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwusd {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x71,0xd2,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwusd.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwusd_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusd_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwusd {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x75,0xd2,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwusd.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwusds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwusds {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x71,0xd3,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwusds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwusds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwusds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwusds {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x75,0xd3,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwusds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwuud_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuud {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x70,0xd2,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwuud_128_commuted(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_128_commuted:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuud {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x70,0xd2,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuud.128(<4 x i32> %A, <4 x i32> %C, <4 x i32> %B)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwuud_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuud {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x74,0xd2,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwuud_256_commuted(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuud_256_commuted:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuud {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x74,0xd2,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuud.256(<8 x i32> %A, <8 x i32> %C, <8 x i32> %B)
				ret <8 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwuuds_128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuuds {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x70,0xd3,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C)
				ret <4 x i32> %ret
				}

				define <4 x i32> @test_int_x86_avx2_vpdpwuuds_128_commuted(<4 x i32> %A, <4 x i32> %B, <4 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_128_commuted:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovaps %xmm2, {{[-0-9]+}}(%r{{[sb]}}p) # 16-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xf8,0x29,0x54,0x24,0xe8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuuds {{[-0-9]+}}(%r{{[sb]}}p), %xmm1, %xmm0 # 16-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x70,0xd3,0x44,0x24,0xe8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <4 x i32> @llvm.x86.avx2.vpdpwuuds.128(<4 x i32> %A, <4 x i32> %C, <4 x i32> %B)
				ret <4 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwuuds_256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuuds {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x74,0xd3,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C)
				ret <8 x i32> %ret
				}

				define <8 x i32> @test_int_x86_avx2_vpdpwuuds_256_commuted(<8 x i32> %A, <8 x i32> %B, <8 x i32> %C) {
				; CHECK-LABEL: test_int_x86_avx2_vpdpwuuds_256_commuted:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vmovups %ymm2, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
				; CHECK-NEXT: # encoding: [0xc5,0xfc,0x11,0x54,0x24,0xd8]
				; CHECK-NEXT: #APP
				; CHECK-NEXT: nop # encoding: [0x90]
				; CHECK-NEXT: #NO_APP
				; CHECK-NEXT: vpdpwuuds {{[-0-9]+}}(%r{{[sb]}}p), %ymm1, %ymm0 # 32-byte Folded Reload
				; CHECK-NEXT: # encoding: [0xc4,0xe2,0x74,0xd3,0x44,0x24,0xd8]
				; CHECK-NEXT: retq # encoding: [0xc3]
				%1 = tail call <2 x i64> asm sideeffect "nop", "=x,~{xmm3},~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{xmm16},~{xmm17},~{xmm18},~{xmm19},~{xmm20},~{xmm21},~{xmm22},~{xmm23},~{xmm24},~{xmm25},~{xmm26},~{xmm27},~{xmm28},~{xmm29},~{xmm30},~{xmm31},~{flags}"()
				%ret = call <8 x i32> @llvm.x86.avx2.vpdpwuuds.256(<8 x i32> %A, <8 x i32> %C, <8 x i32> %B)
				ret <8 x i32> %ret
				}

llvm/test/MC/Disassembler/X86/avx-vnni-int16-32.txt

This file was added.

				# RUN: llvm-mc --disassemble %s -triple=i386-unknown-unknown \| FileCheck %s --check-prefixes=ATT
				# RUN: llvm-mc --disassemble %s -triple=i386-unknown-unknown --output-asm-variant=1 \| FileCheck %s --check-prefixes=INTEL

				# ATT: vpdpwsud %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymm4
				0xc4,0xe2,0x66,0xd2,0xd4

				# ATT: vpdpwsud %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmm4
				0xc4,0xe2,0x62,0xd2,0xd4

				# ATT: vpdpwsud 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x66,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwsud 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x66,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwsud (%eax), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x66,0xd2,0x10

				# ATT: vpdpwsud -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x66,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwsud 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x66,0xd2,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwsud -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwsud ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x66,0xd2,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwsud 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x62,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwsud 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x62,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwsud (%eax), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x62,0xd2,0x10

				# ATT: vpdpwsud -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x62,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwsud 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x62,0xd2,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwsud -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwsud xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x62,0xd2,0x92,0x00,0xf8,0xff,0xff

				# ATT: vpdpwsuds %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymm4
				0xc4,0xe2,0x66,0xd3,0xd4

				# ATT: vpdpwsuds %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmm4
				0xc4,0xe2,0x62,0xd3,0xd4

				# ATT: vpdpwsuds 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x66,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwsuds 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x66,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwsuds (%eax), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x66,0xd3,0x10

				# ATT: vpdpwsuds -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x66,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwsuds 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x66,0xd3,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwsuds -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwsuds ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x66,0xd3,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwsuds 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x62,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwsuds 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x62,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwsuds (%eax), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x62,0xd3,0x10

				# ATT: vpdpwsuds -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x62,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwsuds 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x62,0xd3,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwsuds -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwsuds xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x62,0xd3,0x92,0x00,0xf8,0xff,0xff

				# ATT: vpdpwusd %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymm4
				0xc4,0xe2,0x65,0xd2,0xd4

				# ATT: vpdpwusd %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmm4
				0xc4,0xe2,0x61,0xd2,0xd4

				# ATT: vpdpwusd 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x65,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwusd 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x65,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwusd (%eax), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x65,0xd2,0x10

				# ATT: vpdpwusd -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x65,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwusd 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x65,0xd2,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwusd -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwusd ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x65,0xd2,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwusd 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x61,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwusd 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x61,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwusd (%eax), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x61,0xd2,0x10

				# ATT: vpdpwusd -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x61,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwusd 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x61,0xd2,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwusd -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwusd xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x61,0xd2,0x92,0x00,0xf8,0xff,0xff

				# ATT: vpdpwusds %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymm4
				0xc4,0xe2,0x65,0xd3,0xd4

				# ATT: vpdpwusds %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmm4
				0xc4,0xe2,0x61,0xd3,0xd4

				# ATT: vpdpwusds 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x65,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwusds 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x65,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwusds (%eax), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x65,0xd3,0x10

				# ATT: vpdpwusds -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x65,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwusds 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x65,0xd3,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwusds -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwusds ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x65,0xd3,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwusds 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x61,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwusds 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x61,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwusds (%eax), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x61,0xd3,0x10

				# ATT: vpdpwusds -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x61,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwusds 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x61,0xd3,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwusds -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwusds xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x61,0xd3,0x92,0x00,0xf8,0xff,0xff

				# ATT: vpdpwuud %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymm4
				0xc4,0xe2,0x64,0xd2,0xd4

				# ATT: vpdpwuud %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmm4
				0xc4,0xe2,0x60,0xd2,0xd4

				# ATT: vpdpwuud 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x64,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwuud 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x64,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwuud (%eax), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x64,0xd2,0x10

				# ATT: vpdpwuud -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x64,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwuud 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x64,0xd2,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwuud -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwuud ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x64,0xd2,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwuud 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x60,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwuud 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x60,0xd2,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwuud (%eax), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x60,0xd2,0x10

				# ATT: vpdpwuud -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x60,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwuud 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x60,0xd2,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwuud -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwuud xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x60,0xd2,0x92,0x00,0xf8,0xff,0xff

				# ATT: vpdpwuuds %ymm4, %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymm4
				0xc4,0xe2,0x64,0xd3,0xd4

				# ATT: vpdpwuuds %xmm4, %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmm4
				0xc4,0xe2,0x60,0xd3,0xd4

				# ATT: vpdpwuuds 268435456(%esp,%esi,8), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x64,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwuuds 291(%edi,%eax,4), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x64,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwuuds (%eax), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [eax]
				0xc4,0xe2,0x64,0xd3,0x10

				# ATT: vpdpwuuds -1024(,%ebp,2), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				0xc4,0xe2,0x64,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwuuds 4064(%ecx), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [ecx + 4064]
				0xc4,0xe2,0x64,0xd3,0x91,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwuuds -4096(%edx), %ymm3, %ymm2
				# INTEL: vpdpwuuds ymm2, ymm3, ymmword ptr [edx - 4096]
				0xc4,0xe2,0x64,0xd3,0x92,0x00,0xf0,0xff,0xff

				# ATT: vpdpwuuds 268435456(%esp,%esi,8), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				0xc4,0xe2,0x60,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10

				# ATT: vpdpwuuds 291(%edi,%eax,4), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				0xc4,0xe2,0x60,0xd3,0x94,0x87,0x23,0x01,0x00,0x00

				# ATT: vpdpwuuds (%eax), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [eax]
				0xc4,0xe2,0x60,0xd3,0x10

				# ATT: vpdpwuuds -512(,%ebp,2), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				0xc4,0xe2,0x60,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwuuds 2032(%ecx), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [ecx + 2032]
				0xc4,0xe2,0x60,0xd3,0x91,0xf0,0x07,0x00,0x00

				# ATT: vpdpwuuds -2048(%edx), %xmm3, %xmm2
				# INTEL: vpdpwuuds xmm2, xmm3, xmmword ptr [edx - 2048]
				0xc4,0xe2,0x60,0xd3,0x92,0x00,0xf8,0xff,0xff

llvm/test/MC/Disassembler/X86/avx-vnni-int16-64.txt

This file was added.

				# RUN: llvm-mc --disassemble %s -triple=x86_64 \| FileCheck %s --check-prefixes=ATT
				# RUN: llvm-mc --disassemble %s -triple=x86_64 --output-asm-variant=1 \| FileCheck %s --check-prefixes=INTEL

				# ATT: vpdpwsud %ymm4, %ymm13, %ymm12
				RKSimonUnsubmitted Done Reply Inline Actions try to use some x86_64-- specific registers to improve test coverage RKSimon: try to use some x86_64-- specific registers to improve test coverage
				# INTEL: vpdpwsud ymm12, ymm13, ymm4
				0xc4,0x62,0x16,0xd2,0xe4

				# ATT: vpdpwsud %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmm4
				0xc4,0x62,0x12,0xd2,0xe4

				# ATT: vpdpwsud 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x16,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwsud 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x16,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwsud (%rip), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x16,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwsud -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x16,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwsud 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x16,0xd2,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwsud -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwsud ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x16,0xd2,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwsud 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x12,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwsud 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x12,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwsud (%rip), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x12,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwsud -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x12,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwsud 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x12,0xd2,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwsud -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwsud xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x12,0xd2,0xa2,0x00,0xf8,0xff,0xff

				# ATT: vpdpwsuds %ymm4, %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymm4
				0xc4,0x62,0x16,0xd3,0xe4

				# ATT: vpdpwsuds %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmm4
				0xc4,0x62,0x12,0xd3,0xe4

				# ATT: vpdpwsuds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x16,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwsuds 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x16,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwsuds (%rip), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x16,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwsuds -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x16,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwsuds 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x16,0xd3,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwsuds -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwsuds ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x16,0xd3,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwsuds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x12,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwsuds 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x12,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwsuds (%rip), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x12,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwsuds -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x12,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwsuds 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x12,0xd3,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwsuds -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwsuds xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x12,0xd3,0xa2,0x00,0xf8,0xff,0xff

				# ATT: vpdpwusd %ymm4, %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymm4
				0xc4,0x62,0x15,0xd2,0xe4

				# ATT: vpdpwusd %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmm4
				0xc4,0x62,0x11,0xd2,0xe4

				# ATT: vpdpwusd 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x15,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwusd 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x15,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwusd (%rip), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x15,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwusd -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x15,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwusd 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x15,0xd2,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwusd -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwusd ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x15,0xd2,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwusd 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x11,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwusd 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x11,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwusd (%rip), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x11,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwusd -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x11,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwusd 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x11,0xd2,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwusd -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwusd xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x11,0xd2,0xa2,0x00,0xf8,0xff,0xff

				# ATT: vpdpwusds %ymm4, %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymm4
				0xc4,0x62,0x15,0xd3,0xe4

				# ATT: vpdpwusds %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmm4
				0xc4,0x62,0x11,0xd3,0xe4

				# ATT: vpdpwusds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x15,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwusds 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x15,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwusds (%rip), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x15,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwusds -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x15,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwusds 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x15,0xd3,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwusds -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwusds ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x15,0xd3,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwusds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x11,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwusds 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x11,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwusds (%rip), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x11,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwusds -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x11,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwusds 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x11,0xd3,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwusds -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwusds xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x11,0xd3,0xa2,0x00,0xf8,0xff,0xff

				# ATT: vpdpwuud %ymm4, %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymm4
				0xc4,0x62,0x14,0xd2,0xe4

				# ATT: vpdpwuud %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmm4
				0xc4,0x62,0x10,0xd2,0xe4

				# ATT: vpdpwuud 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x14,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwuud 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x14,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwuud (%rip), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x14,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwuud -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x14,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwuud 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x14,0xd2,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwuud -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwuud ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x14,0xd2,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwuud 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x10,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwuud 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x10,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwuud (%rip), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x10,0xd2,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwuud -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x10,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwuud 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x10,0xd2,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwuud -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwuud xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x10,0xd2,0xa2,0x00,0xf8,0xff,0xff

				# ATT: vpdpwuuds %ymm4, %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymm4
				0xc4,0x62,0x14,0xd3,0xe4

				# ATT: vpdpwuuds %xmm4, %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmm4
				0xc4,0x62,0x10,0xd3,0xe4

				# ATT: vpdpwuuds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x14,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwuuds 291(%r8,%rax,4), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x14,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwuuds (%rip), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [rip]
				0xc4,0x62,0x14,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwuuds -1024(,%rbp,2), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				0xc4,0x62,0x14,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff

				# ATT: vpdpwuuds 4064(%rcx), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [rcx + 4064]
				0xc4,0x62,0x14,0xd3,0xa1,0xe0,0x0f,0x00,0x00

				# ATT: vpdpwuuds -4096(%rdx), %ymm13, %ymm12
				# INTEL: vpdpwuuds ymm12, ymm13, ymmword ptr [rdx - 4096]
				0xc4,0x62,0x14,0xd3,0xa2,0x00,0xf0,0xff,0xff

				# ATT: vpdpwuuds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				0xc4,0x22,0x10,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10

				# ATT: vpdpwuuds 291(%r8,%rax,4), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				0xc4,0x42,0x10,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00

				# ATT: vpdpwuuds (%rip), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [rip]
				0xc4,0x62,0x10,0xd3,0x25,0x00,0x00,0x00,0x00

				# ATT: vpdpwuuds -512(,%rbp,2), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				0xc4,0x62,0x10,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff

				# ATT: vpdpwuuds 2032(%rcx), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [rcx + 2032]
				0xc4,0x62,0x10,0xd3,0xa1,0xf0,0x07,0x00,0x00

				# ATT: vpdpwuuds -2048(%rdx), %xmm13, %xmm12
				# INTEL: vpdpwuuds xmm12, xmm13, xmmword ptr [rdx - 2048]
				0xc4,0x62,0x10,0xd3,0xa2,0x00,0xf8,0xff,0xff

llvm/test/MC/X86/avx-vnni-int16-32-att.s

This file was added.

				// RUN: llvm-mc -triple i686-unknown-unknown --show-encoding %s \| FileCheck %s

				// CHECK: vpdpwsud %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0xd4]
				vpdpwsud %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwsud %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0xd4]
				vpdpwsud %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwsud 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsud 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwsud 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsud 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwsud (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x10]
				vpdpwsud (%eax), %ymm3, %ymm2

				// CHECK: vpdpwsud -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsud -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwsud 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwsud 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwsud -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwsud -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwsud 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsud 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwsud 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsud 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwsud (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x10]
				vpdpwsud (%eax), %xmm3, %xmm2

				// CHECK: vpdpwsud -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsud -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwsud 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwsud 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwsud -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwsud -2048(%edx), %xmm3, %xmm2

				// CHECK: vpdpwsuds %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0xd4]
				vpdpwsuds %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwsuds %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0xd4]
				vpdpwsuds %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwsuds 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsuds 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwsuds 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsuds 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwsuds (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x10]
				vpdpwsuds (%eax), %ymm3, %ymm2

				// CHECK: vpdpwsuds -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsuds -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwsuds 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwsuds 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwsuds -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwsuds -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwsuds 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsuds 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwsuds 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsuds 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwsuds (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x10]
				vpdpwsuds (%eax), %xmm3, %xmm2

				// CHECK: vpdpwsuds -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsuds -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwsuds 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwsuds 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwsuds -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwsuds -2048(%edx), %xmm3, %xmm2

				// CHECK: vpdpwusd %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0xd4]
				vpdpwusd %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwusd %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0xd4]
				vpdpwusd %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwusd 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusd 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwusd 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusd 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwusd (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x10]
				vpdpwusd (%eax), %ymm3, %ymm2

				// CHECK: vpdpwusd -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusd -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwusd 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwusd 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwusd -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwusd -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwusd 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusd 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwusd 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusd 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwusd (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x10]
				vpdpwusd (%eax), %xmm3, %xmm2

				// CHECK: vpdpwusd -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusd -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwusd 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwusd 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwusd -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwusd -2048(%edx), %xmm3, %xmm2

				// CHECK: vpdpwusds %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0xd4]
				vpdpwusds %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwusds %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0xd4]
				vpdpwusds %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwusds 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusds 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwusds 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusds 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwusds (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x10]
				vpdpwusds (%eax), %ymm3, %ymm2

				// CHECK: vpdpwusds -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusds -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwusds 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwusds 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwusds -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwusds -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwusds 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusds 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwusds 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusds 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwusds (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x10]
				vpdpwusds (%eax), %xmm3, %xmm2

				// CHECK: vpdpwusds -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusds -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwusds 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwusds 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwusds -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwusds -2048(%edx), %xmm3, %xmm2

				// CHECK: vpdpwuud %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0xd4]
				vpdpwuud %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwuud %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0xd4]
				vpdpwuud %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwuud 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuud 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwuud 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuud 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwuud (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x10]
				vpdpwuud (%eax), %ymm3, %ymm2

				// CHECK: vpdpwuud -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuud -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwuud 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwuud 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwuud -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwuud -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwuud 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuud 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwuud 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuud 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwuud (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x10]
				vpdpwuud (%eax), %xmm3, %xmm2

				// CHECK: vpdpwuud -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuud -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwuud 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwuud 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwuud -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwuud -2048(%edx), %xmm3, %xmm2

				// CHECK: vpdpwuuds %ymm4, %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0xd4]
				vpdpwuuds %ymm4, %ymm3, %ymm2

				// CHECK: vpdpwuuds %xmm4, %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0xd4]
				vpdpwuuds %xmm4, %xmm3, %xmm2

				// CHECK: vpdpwuuds 268435456(%esp,%esi,8), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuuds 268435456(%esp,%esi,8), %ymm3, %ymm2

				// CHECK: vpdpwuuds 291(%edi,%eax,4), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuuds 291(%edi,%eax,4), %ymm3, %ymm2

				// CHECK: vpdpwuuds (%eax), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x10]
				vpdpwuuds (%eax), %ymm3, %ymm2

				// CHECK: vpdpwuuds -1024(,%ebp,2), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuuds -1024(,%ebp,2), %ymm3, %ymm2

				// CHECK: vpdpwuuds 4064(%ecx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwuuds 4064(%ecx), %ymm3, %ymm2

				// CHECK: vpdpwuuds -4096(%edx), %ymm3, %ymm2
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwuuds -4096(%edx), %ymm3, %ymm2

				// CHECK: vpdpwuuds 268435456(%esp,%esi,8), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuuds 268435456(%esp,%esi,8), %xmm3, %xmm2

				// CHECK: vpdpwuuds 291(%edi,%eax,4), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuuds 291(%edi,%eax,4), %xmm3, %xmm2

				// CHECK: vpdpwuuds (%eax), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x10]
				vpdpwuuds (%eax), %xmm3, %xmm2

				// CHECK: vpdpwuuds -512(,%ebp,2), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuuds -512(,%ebp,2), %xmm3, %xmm2

				// CHECK: vpdpwuuds 2032(%ecx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwuuds 2032(%ecx), %xmm3, %xmm2

				// CHECK: vpdpwuuds -2048(%edx), %xmm3, %xmm2
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwuuds -2048(%edx), %xmm3, %xmm2

llvm/test/MC/X86/avx-vnni-int16-32-intel.s

This file was added.

				// RUN: llvm-mc -triple i686-unknown-unknown -x86-asm-syntax=intel -output-asm-variant=1 --show-encoding %s \| FileCheck %s

				// CHECK: vpdpwsud ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0xd4]
				vpdpwsud ymm2, ymm3, ymm4

				// CHECK: vpdpwsud xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0xd4]
				vpdpwsud xmm2, xmm3, xmm4

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x10]
				vpdpwsud ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsud ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwsud ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwsud ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwsud ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x10]
				vpdpwsud xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsud xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwsud xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwsud xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwsud xmm2, xmm3, xmmword ptr [edx - 2048]

				// CHECK: vpdpwsuds ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0xd4]
				vpdpwsuds ymm2, ymm3, ymm4

				// CHECK: vpdpwsuds xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0xd4]
				vpdpwsuds xmm2, xmm3, xmm4

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x10]
				vpdpwsuds ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwsuds ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwsuds ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x66,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwsuds ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwsuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwsuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x10]
				vpdpwsuds xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsuds xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwsuds xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwsuds xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x62,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwsuds xmm2, xmm3, xmmword ptr [edx - 2048]

				// CHECK: vpdpwusd ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0xd4]
				vpdpwusd ymm2, ymm3, ymm4

				// CHECK: vpdpwusd xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0xd4]
				vpdpwusd xmm2, xmm3, xmm4

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusd ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusd ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x10]
				vpdpwusd ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusd ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwusd ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwusd ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwusd ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusd xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusd xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x10]
				vpdpwusd xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusd xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwusd xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwusd xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwusd xmm2, xmm3, xmmword ptr [edx - 2048]

				// CHECK: vpdpwusds ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0xd4]
				vpdpwusds ymm2, ymm3, ymm4

				// CHECK: vpdpwusds xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0xd4]
				vpdpwusds xmm2, xmm3, xmm4

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x10]
				vpdpwusds ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusds ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwusds ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwusds ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x65,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwusds ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwusds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwusds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x10]
				vpdpwusds xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusds xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwusds xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwusds xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x61,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwusds xmm2, xmm3, xmmword ptr [edx - 2048]

				// CHECK: vpdpwuud ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0xd4]
				vpdpwuud ymm2, ymm3, ymm4

				// CHECK: vpdpwuud xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0xd4]
				vpdpwuud xmm2, xmm3, xmm4

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuud ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuud ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x10]
				vpdpwuud ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuud ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwuud ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwuud ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd2,0x92,0x00,0xf0,0xff,0xff]
				vpdpwuud ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuud xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuud xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x10]
				vpdpwuud xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuud xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x91,0xf0,0x07,0x00,0x00]
				vpdpwuud xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwuud xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd2,0x92,0x00,0xf8,0xff,0xff]
				vpdpwuud xmm2, xmm3, xmmword ptr [edx - 2048]

				// CHECK: vpdpwuuds ymm2, ymm3, ymm4
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0xd4]
				vpdpwuuds ymm2, ymm3, ymm4

				// CHECK: vpdpwuuds xmm2, xmm3, xmm4
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0xd4]
				vpdpwuuds xmm2, xmm3, xmm4

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuuds ymm2, ymm3, ymmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuuds ymm2, ymm3, ymmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x10]
				vpdpwuuds ymm2, ymm3, ymmword ptr [eax]

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x14,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuuds ymm2, ymm3, ymmword ptr [2*ebp - 1024]

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [ecx + 4064]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x91,0xe0,0x0f,0x00,0x00]
				vpdpwuuds ymm2, ymm3, ymmword ptr [ecx + 4064]

				// CHECK: vpdpwuuds ymm2, ymm3, ymmword ptr [edx - 4096]
				// CHECK: encoding: [0xc4,0xe2,0x64,0xd3,0x92,0x00,0xf0,0xff,0xff]
				vpdpwuuds ymm2, ymm3, ymmword ptr [edx - 4096]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x94,0xf4,0x00,0x00,0x00,0x10]
				vpdpwuuds xmm2, xmm3, xmmword ptr [esp + 8*esi + 268435456]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x94,0x87,0x23,0x01,0x00,0x00]
				vpdpwuuds xmm2, xmm3, xmmword ptr [edi + 4*eax + 291]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [eax]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x10]
				vpdpwuuds xmm2, xmm3, xmmword ptr [eax]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [2*ebp - 512]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x14,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuuds xmm2, xmm3, xmmword ptr [2*ebp - 512]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [ecx + 2032]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x91,0xf0,0x07,0x00,0x00]
				vpdpwuuds xmm2, xmm3, xmmword ptr [ecx + 2032]

				// CHECK: vpdpwuuds xmm2, xmm3, xmmword ptr [edx - 2048]
				// CHECK: encoding: [0xc4,0xe2,0x60,0xd3,0x92,0x00,0xf8,0xff,0xff]
				vpdpwuuds xmm2, xmm3, xmmword ptr [edx - 2048]

llvm/test/MC/X86/avx-vnni-int16-64-att.s

This file was added.

				// RUN: llvm-mc -triple x86_64 --show-encoding %s \| FileCheck %s

				// CHECK: vpdpwsud %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xe4]
				vpdpwsud %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwsud %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xe4]
				vpdpwsud %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwsud 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x16,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsud 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwsud 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x16,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsud 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwsud (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwsud (%rip), %ymm13, %ymm12

				// CHECK: vpdpwsud -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsud -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwsud 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwsud 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwsud -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwsud -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwsud 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x12,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsud 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwsud 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x12,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsud 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwsud (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwsud (%rip), %xmm13, %xmm12

				// CHECK: vpdpwsud -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsud -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwsud 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwsud 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwsud -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwsud -2048(%rdx), %xmm13, %xmm12

				// CHECK: vpdpwsuds %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xe4]
				vpdpwsuds %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwsuds %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xe4]
				vpdpwsuds %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwsuds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x16,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsuds 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwsuds 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x16,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsuds 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwsuds (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwsuds (%rip), %ymm13, %ymm12

				// CHECK: vpdpwsuds -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsuds -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwsuds 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwsuds 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwsuds -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwsuds -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwsuds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x12,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsuds 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwsuds 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x12,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsuds 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwsuds (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwsuds (%rip), %xmm13, %xmm12

				// CHECK: vpdpwsuds -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsuds -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwsuds 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwsuds 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwsuds -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwsuds -2048(%rdx), %xmm13, %xmm12

				// CHECK: vpdpwusd %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xe4]
				vpdpwusd %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwusd %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xe4]
				vpdpwusd %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwusd 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x15,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusd 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwusd 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x15,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusd 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwusd (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwusd (%rip), %ymm13, %ymm12

				// CHECK: vpdpwusd -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusd -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwusd 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwusd 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwusd -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwusd -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwusd 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x11,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusd 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwusd 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x11,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusd 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwusd (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwusd (%rip), %xmm13, %xmm12

				// CHECK: vpdpwusd -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusd -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwusd 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwusd 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwusd -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwusd -2048(%rdx), %xmm13, %xmm12

				// CHECK: vpdpwusds %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xe4]
				vpdpwusds %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwusds %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xe4]
				vpdpwusds %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwusds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x15,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusds 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwusds 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x15,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusds 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwusds (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwusds (%rip), %ymm13, %ymm12

				// CHECK: vpdpwusds -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusds -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwusds 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwusds 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwusds -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwusds -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwusds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x11,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusds 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwusds 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x11,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusds 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwusds (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwusds (%rip), %xmm13, %xmm12

				// CHECK: vpdpwusds -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusds -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwusds 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwusds 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwusds -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwusds -2048(%rdx), %xmm13, %xmm12

				// CHECK: vpdpwuud %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xe4]
				vpdpwuud %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwuud %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xe4]
				vpdpwuud %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwuud 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x14,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuud 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwuud 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x14,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuud 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwuud (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwuud (%rip), %ymm13, %ymm12

				// CHECK: vpdpwuud -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuud -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwuud 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwuud 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwuud -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwuud -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwuud 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x10,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuud 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwuud 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x10,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuud 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwuud (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwuud (%rip), %xmm13, %xmm12

				// CHECK: vpdpwuud -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuud -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwuud 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwuud 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwuud -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwuud -2048(%rdx), %xmm13, %xmm12

				// CHECK: vpdpwuuds %ymm4, %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xe4]
				vpdpwuuds %ymm4, %ymm13, %ymm12

				// CHECK: vpdpwuuds %xmm4, %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xe4]
				vpdpwuuds %xmm4, %xmm13, %xmm12

				// CHECK: vpdpwuuds 268435456(%rbp,%r14,8), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x22,0x14,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuuds 268435456(%rbp,%r14,8), %ymm13, %ymm12

				// CHECK: vpdpwuuds 291(%r8,%rax,4), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x42,0x14,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuuds 291(%r8,%rax,4), %ymm13, %ymm12

				// CHECK: vpdpwuuds (%rip), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwuuds (%rip), %ymm13, %ymm12

				// CHECK: vpdpwuuds -1024(,%rbp,2), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuuds -1024(,%rbp,2), %ymm13, %ymm12

				// CHECK: vpdpwuuds 4064(%rcx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwuuds 4064(%rcx), %ymm13, %ymm12

				// CHECK: vpdpwuuds -4096(%rdx), %ymm13, %ymm12
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwuuds -4096(%rdx), %ymm13, %ymm12

				// CHECK: vpdpwuuds 268435456(%rbp,%r14,8), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x22,0x10,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuuds 268435456(%rbp,%r14,8), %xmm13, %xmm12

				// CHECK: vpdpwuuds 291(%r8,%rax,4), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x42,0x10,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuuds 291(%r8,%rax,4), %xmm13, %xmm12

				// CHECK: vpdpwuuds (%rip), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwuuds (%rip), %xmm13, %xmm12

				// CHECK: vpdpwuuds -512(,%rbp,2), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuuds -512(,%rbp,2), %xmm13, %xmm12

				// CHECK: vpdpwuuds 2032(%rcx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwuuds 2032(%rcx), %xmm13, %xmm12

				// CHECK: vpdpwuuds -2048(%rdx), %xmm13, %xmm12
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwuuds -2048(%rdx), %xmm13, %xmm12

llvm/test/MC/X86/avx-vnni-int16-64-intel.s

This file was added.

				// RUN: llvm-mc -triple x86_64 -x86-asm-syntax=intel -output-asm-variant=1 --show-encoding %s \| FileCheck %s

				// CHECK: vpdpwsud ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xe4]
				vpdpwsud ymm12, ymm13, ymm4

				// CHECK: vpdpwsud xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xe4]
				vpdpwsud xmm12, xmm13, xmm4

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x16,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x16,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwsud ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsud ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwsud ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwsud ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwsud ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x12,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x12,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwsud xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsud xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwsud xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwsud xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwsud xmm12, xmm13, xmmword ptr [rdx - 2048]

				// CHECK: vpdpwsuds ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xe4]
				vpdpwsuds ymm12, ymm13, ymm4

				// CHECK: vpdpwsuds xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xe4]
				vpdpwsuds xmm12, xmm13, xmm4

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x16,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x16,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwsuds ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwsuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwsuds ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwsuds ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x16,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwsuds ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x12,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwsuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x12,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwsuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwsuds xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwsuds xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwsuds xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwsuds xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x12,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwsuds xmm12, xmm13, xmmword ptr [rdx - 2048]

				// CHECK: vpdpwusd ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xe4]
				vpdpwusd ymm12, ymm13, ymm4

				// CHECK: vpdpwusd xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xe4]
				vpdpwusd xmm12, xmm13, xmm4

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x15,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusd ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x15,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusd ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwusd ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusd ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwusd ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwusd ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwusd ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x11,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusd xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x11,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusd xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwusd xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusd xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwusd xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwusd xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwusd xmm12, xmm13, xmmword ptr [rdx - 2048]

				// CHECK: vpdpwusds ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xe4]
				vpdpwusds ymm12, ymm13, ymm4

				// CHECK: vpdpwusds xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xe4]
				vpdpwusds xmm12, xmm13, xmm4

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x15,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x15,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwusds ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwusds ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwusds ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwusds ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x15,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwusds ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x11,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwusds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x11,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwusds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwusds xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwusds xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwusds xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwusds xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x11,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwusds xmm12, xmm13, xmmword ptr [rdx - 2048]

				// CHECK: vpdpwuud ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xe4]
				vpdpwuud ymm12, ymm13, ymm4

				// CHECK: vpdpwuud xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xe4]
				vpdpwuud xmm12, xmm13, xmm4

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x14,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuud ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x14,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuud ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwuud ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuud ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwuud ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwuud ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd2,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwuud ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x10,0xd2,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuud xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x10,0xd2,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuud xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0x25,0x00,0x00,0x00,0x00]
				vpdpwuud xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuud xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwuud xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwuud xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd2,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwuud xmm12, xmm13, xmmword ptr [rdx - 2048]

				// CHECK: vpdpwuuds ymm12, ymm13, ymm4
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xe4]
				vpdpwuuds ymm12, ymm13, ymm4

				// CHECK: vpdpwuuds xmm12, xmm13, xmm4
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xe4]
				vpdpwuuds xmm12, xmm13, xmm4

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x14,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuuds ymm12, ymm13, ymmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x14,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuuds ymm12, ymm13, ymmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwuuds ymm12, ymm13, ymmword ptr [rip]

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0x24,0x6d,0x00,0xfc,0xff,0xff]
				vpdpwuuds ymm12, ymm13, ymmword ptr [2*rbp - 1024]

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [rcx + 4064]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xa1,0xe0,0x0f,0x00,0x00]
				vpdpwuuds ymm12, ymm13, ymmword ptr [rcx + 4064]

				// CHECK: vpdpwuuds ymm12, ymm13, ymmword ptr [rdx - 4096]
				// CHECK: encoding: [0xc4,0x62,0x14,0xd3,0xa2,0x00,0xf0,0xff,0xff]
				vpdpwuuds ymm12, ymm13, ymmword ptr [rdx - 4096]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]
				// CHECK: encoding: [0xc4,0x22,0x10,0xd3,0xa4,0xf5,0x00,0x00,0x00,0x10]
				vpdpwuuds xmm12, xmm13, xmmword ptr [rbp + 8*r14 + 268435456]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]
				// CHECK: encoding: [0xc4,0x42,0x10,0xd3,0xa4,0x80,0x23,0x01,0x00,0x00]
				vpdpwuuds xmm12, xmm13, xmmword ptr [r8 + 4*rax + 291]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [rip]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0x25,0x00,0x00,0x00,0x00]
				vpdpwuuds xmm12, xmm13, xmmword ptr [rip]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [2*rbp - 512]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0x24,0x6d,0x00,0xfe,0xff,0xff]
				vpdpwuuds xmm12, xmm13, xmmword ptr [2*rbp - 512]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [rcx + 2032]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xa1,0xf0,0x07,0x00,0x00]
				vpdpwuuds xmm12, xmm13, xmmword ptr [rcx + 2032]

				// CHECK: vpdpwuuds xmm12, xmm13, xmmword ptr [rdx - 2048]
				// CHECK: encoding: [0xc4,0x62,0x10,0xd3,0xa2,0x00,0xf8,0xff,0xff]
				vpdpwuuds xmm12, xmm13, xmmword ptr [rdx - 2048]

llvm/test/TableGen/x86-fold-tables.inc

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,234 Lines • ▼ Show 20 Lines	static const X86MemoryFoldTableEntry MemoryFoldTable3[] = {
{X86::VPDPWSSDSZ256r, X86::VPDPWSSDSZ256m, 0},		{X86::VPDPWSSDSZ256r, X86::VPDPWSSDSZ256m, 0},
{X86::VPDPWSSDSZr, X86::VPDPWSSDSZm, 0},		{X86::VPDPWSSDSZr, X86::VPDPWSSDSZm, 0},
{X86::VPDPWSSDSrr, X86::VPDPWSSDSrm, 0},		{X86::VPDPWSSDSrr, X86::VPDPWSSDSrm, 0},
{X86::VPDPWSSDYrr, X86::VPDPWSSDYrm, 0},		{X86::VPDPWSSDYrr, X86::VPDPWSSDYrm, 0},
{X86::VPDPWSSDZ128r, X86::VPDPWSSDZ128m, 0},		{X86::VPDPWSSDZ128r, X86::VPDPWSSDZ128m, 0},
{X86::VPDPWSSDZ256r, X86::VPDPWSSDZ256m, 0},		{X86::VPDPWSSDZ256r, X86::VPDPWSSDZ256m, 0},
{X86::VPDPWSSDZr, X86::VPDPWSSDZm, 0},		{X86::VPDPWSSDZr, X86::VPDPWSSDZm, 0},
{X86::VPDPWSSDrr, X86::VPDPWSSDrm, 0},		{X86::VPDPWSSDrr, X86::VPDPWSSDrm, 0},
		{X86::VPDPWSUDSYrr, X86::VPDPWSUDSYrm, 0},
		{X86::VPDPWSUDSrr, X86::VPDPWSUDSrm, 0},
		{X86::VPDPWSUDYrr, X86::VPDPWSUDYrm, 0},
		{X86::VPDPWSUDrr, X86::VPDPWSUDrm, 0},
		{X86::VPDPWUSDSYrr, X86::VPDPWUSDSYrm, 0},
		{X86::VPDPWUSDSrr, X86::VPDPWUSDSrm, 0},
		{X86::VPDPWUSDYrr, X86::VPDPWUSDYrm, 0},
		{X86::VPDPWUSDrr, X86::VPDPWUSDrm, 0},
		{X86::VPDPWUUDSYrr, X86::VPDPWUUDSYrm, 0},
		{X86::VPDPWUUDSrr, X86::VPDPWUUDSrm, 0},
		{X86::VPDPWUUDYrr, X86::VPDPWUUDYrm, 0},
		{X86::VPDPWUUDrr, X86::VPDPWUUDrm, 0},
{X86::VPERMBZ128rrkz, X86::VPERMBZ128rmkz, 0},		{X86::VPERMBZ128rrkz, X86::VPERMBZ128rmkz, 0},
{X86::VPERMBZ256rrkz, X86::VPERMBZ256rmkz, 0},		{X86::VPERMBZ256rrkz, X86::VPERMBZ256rmkz, 0},
{X86::VPERMBZrrkz, X86::VPERMBZrmkz, 0},		{X86::VPERMBZrrkz, X86::VPERMBZrmkz, 0},
{X86::VPERMDZ256rrkz, X86::VPERMDZ256rmkz, 0},		{X86::VPERMDZ256rrkz, X86::VPERMDZ256rmkz, 0},
{X86::VPERMDZrrkz, X86::VPERMDZrmkz, 0},		{X86::VPERMDZrrkz, X86::VPERMDZrmkz, 0},
{X86::VPERMI2B128rr, X86::VPERMI2B128rm, 0},		{X86::VPERMI2B128rr, X86::VPERMI2B128rm, 0},
{X86::VPERMI2B256rr, X86::VPERMI2B256rm, 0},		{X86::VPERMI2B256rr, X86::VPERMI2B256rm, 0},
{X86::VPERMI2Brr, X86::VPERMI2Brm, 0},		{X86::VPERMI2Brr, X86::VPERMI2Brm, 0},
▲ Show 20 Lines • Show All 1,733 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add AVX-VNNI-INT16 instructions.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 542318

clang/docs/ReleaseNotes.rst

clang/include/clang/Basic/BuiltinsX86.def

clang/include/clang/Driver/Options.td

clang/lib/Basic/Targets/X86.h

clang/lib/Basic/Targets/X86.cpp

clang/lib/Headers/CMakeLists.txt

clang/lib/Headers/avxvnniint16intrin.h

clang/lib/Headers/immintrin.h

clang/test/CodeGen/X86/avxvnniint16-builtins.c

clang/test/CodeGen/attr-target-x86.c

clang/test/Driver/x86-target-features.c

clang/test/Preprocessor/x86_target_features.c

llvm/docs/ReleaseNotes.rst

llvm/include/llvm/IR/IntrinsicsX86.td

llvm/include/llvm/TargetParser/X86TargetParser.def

llvm/lib/Target/X86/X86.td

llvm/lib/Target/X86/X86InstrInfo.cpp

llvm/lib/Target/X86/X86InstrInfo.td

llvm/lib/Target/X86/X86InstrSSE.td

llvm/lib/TargetParser/Host.cpp

llvm/lib/TargetParser/X86TargetParser.cpp

llvm/test/CodeGen/X86/avxvnniint16-intrinsics.ll

llvm/test/CodeGen/X86/stack-folding-int-avxvnniint16.ll

llvm/test/MC/Disassembler/X86/avx-vnni-int16-32.txt

llvm/test/MC/Disassembler/X86/avx-vnni-int16-64.txt

llvm/test/MC/X86/avx-vnni-int16-32-att.s

llvm/test/MC/X86/avx-vnni-int16-32-intel.s

llvm/test/MC/X86/avx-vnni-int16-64-att.s

llvm/test/MC/X86/avx-vnni-int16-64-intel.s

llvm/test/TableGen/x86-fold-tables.inc

[X86] Add AVX-VNNI-INT16 instructions.
ClosedPublic