This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lowering addus/subus intrinsics to native IR (LLVM part)
ClosedPublic

Authored by tkrupa on Apr 27 2018, 1:35 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel

Commits

rG86a63889f310: [X86] Lowering addus/subus intrinsics to native IR
rL339650: [X86] Lowering addus/subus intrinsics to native IR

Summary

This revision improves previous version (rL330322) which has been reverted due to crashes.

This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.
Lowering in clang: https://reviews.llvm.org/D46892

Diff Detail

Repository: rL LLVM

Event Timeline

tkrupa created this revision.Apr 27 2018, 1:35 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 27 2018, 1:35 AM

tkrupa mentioned this in D44785: Lowering x86 adds/addus/subs/subus intrinsics (llvm part).Apr 27 2018, 1:40 AM

The test case no longer crashes after adding the check of operands' type before extension and after truncation.
Currently detectAddSubSatPattern sometimes detects pattern even when both operands are constant. In some cases it would be more efficient not to introduce X86ISD::ADDS etc node (such as in the test case chandlerc provided in the https://bugs.llvm.org/show_bug.cgi?id=37260). Would it be better to not detect it ever if both args are constant?

tkrupa added reviewers: craig.topper, RKSimon, spatel.Apr 27 2018, 1:46 AM

Do we have test coverage for PR37260?

craig.topper added inline comments.Apr 30 2018, 10:56 AM

lib/Target/X86/X86ISelLowering.cpp
36147 ↗	(On Diff #144295)	Move this comment into the if body so it doesn't interfere with the closing curly brace from the above 'if'
36153 ↗	(On Diff #144295)	Can you use APInt::isSignedIntN/isIntN here?
36177 ↗	(On Diff #144295)	What if the SIGN_EXTEND/ZERO_EXTEND was from an even smaller type? You can't just remove it. You need to narrow it. Always emitting an ISD::TRUNCATE may be sufficient. DAG combine should be able to simplify it.
36183 ↗	(On Diff #144295)	Oh I see, you covered for that case here. But do we really need to do this? Can we just make smaller extends?

tkrupa marked 3 inline comments as done.May 2 2018, 3:59 AM

tkrupa added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36183 ↗	(On Diff #144295)	No - detectSSatPattern and detectUSatPattern check for min/max values of destination type (VT here), so we cannot change that. However, if element type before extension is smaller than in VT, the overflow/underflow never occurs. Can we just emit ISD::ADD/ISD::SUB in such cases?

Kindly ping, @craig.topper

Can you take another look at fixing the psubusb/psubusw fast-isel cases please?

What do we want to do about the concerns raised in https://reviews.llvm.org/D45721 by @sroland ?

We're currently discussing the possible solutions for the JIT pipeline issue with @DavidKreitzer.

In D46179#1093151, @tkrupa wrote:

We're currently discussing the possible solutions for the JIT pipeline issue with @DavidKreitzer.

Thanks for taking a look. I mean, it was quite easy to adapt our code when the intrinsics removed had more or less obvious, not complicated fallbacks (which we most likely already had in our code anyway), such as the sse2 pmin/pmax or pabs functions, but it gets really cumbersome (with quite some potential for issues due to the requirement to match what autoupgrade would do exactly) with the ones which are rather complex to emulate.

I'm also concerned that the more complex patterns are easier for other optimizations to simplify a little and break. The simpler things like pmin/pmax or pabs aren't so bad to not match when they get optimized a little.

About test/CodeGen/X86/avx2-intrinsics-fast-isel.ll change - there is a canonical form for subus pattern I'm using here - it's different from adds/addus/subs patterns. While those three use ext/trunc pattern and fold correctly, subus has only max+sub. If fast-isel is enabled, sub is not put into SelectionDAG - that's what prevents it from combining. Instead, it's appended after isel as a lowered node. Is there a machine instruction pass for combining?

In D46179#1093843, @craig.topper wrote:

I'm also concerned that the more complex patterns are easier for other optimizations to simplify a little and break. The simpler things like pmin/pmax or pabs aren't so bad to not match when they get optimized a little.

I was wondering about that too.
Also, another concern I actually have is that I believe some of these sequences are very suboptimal. Noone (at least when thinking about how to do it fast) would do them that way if trying to emulate this manually.
The subus looks great (we're doing the same as fallback), it's just cmp/select/sub.
The addus is simply terrible.
We're doing min((unsigned)(a, b xor ~0)) + b instead - that is only xor/cmp/select/add.
(Another (better) solution, and that is what is typically used to detect overflow for unsigned adds (apart from the select of course), is (unsigned)(a + b) < (unsigned)a ? ~0 : a + b - only add / cmp / select)
We're generally avoiding sext/trunc sequences (typically no good comes from doing that...) so for signed saturated sub/add we're using some more complex sequences (using a couple more cmp/select/sub (or add)). Not entirely sure if it would actually result in better code or not if you really have to emulate it, but the sext/trunc can be a problem in itself (don't ask me how many shuffle instructions it would generate on sse2 only...) and of course the add/cmp/select is really times 2 instructions due to running on twice as wide vectors in the end. These are really quite complex to emulate.
So maybe for the unsigned saturated add/sub, when using optimal patterns it wouldn't be too bad, without too much concern about not being able to recognize them again in the end. But not sure about the signed ones.

During internal review I proposed something like that (~0 - a) < b ? ~0 : a+b.
It gets rid of zext/trunc in addus pattern but introduces additional subtraction which is presumably more costly. Your solution seems much better, I'll change it to that form.

As was decided, I removed lowering in Autoupgrade.cpp. I also moved the tests into corresponding *target*-intrinsics-canonical.ll files as discussed with @mike.dvoretsky. I added tests for PR37260 in test/CodeGen/X86/avx2-intrinsics-canonical.ll.
What's your opinion on fast-isel subus emmision and my proposition in detectAddSubSatPattern function?

FWIW as said I think could live with autoupgrade of at least the unsigned sat ones. These have similar complexity to pabs (pabs is cmp/sel/neg whereas these are add(or sub)/cmp/sel, although the add one also requires a constant), so if there were no concerns about the possibility of optimizations breaking pabs recognition, there might not be any here neither, if it actually works (which doesn't always seem to be the case according to the test cases?). (Although personally I'd have liked to keep pabs intrinsics too...)
Though disappearing intrinsics causing a null call in the generated code rather than some error when compiling remains a concern - but certainly I know about this one now so easily debugged :-).
Other than that, there's a bug in the logic of the addus, otherwise looks reasonable to me, though I'm not an expert on llvm internals.

lib/Target/X86/X86ISelLowering.cpp
32275–32277 ↗	(On Diff #146864)	This isn't quite right. For x < x+y this is false for y == 0 but it wouldn't be an overflow. So it must be x <= x+y ? x+y : ~0 (or some reversed form like x > x+y ? ~0 : x+y - I suppose it doesn't make a difference in the end)
test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	Err, so it doesn't actually recognize the pattern?

I brought back lowering in AutoUpgrade for unsigned intrinsics, although I'm not sure if there's a universal agreement about it. In email correspondence @DavidKreitzer
suggested that it would be better to retain the intrinsics and get rid of them in another time/patch, which Craig disagreed to and argued we can lower non-complex ones (like addus/subus) without much trouble. Is there a consensus about it?

Fixed logic error for addus recognition - thanks for pointing that out. My recognition is more primitive than already existing subus one - should I expand it for those special cases or just leave a TODO?

Could you address the question at lib/Target/X86/X86ISelLowering.cpp:36183?

tkrupa added inline comments.May 16 2018, 12:57 AM

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	Please look at my comment from Thu, May 10, 10:27 AM.

tkrupa edited the summary of this revision. (Show Details)May 16 2018, 1:20 AM

In D46179#1094156, @tkrupa wrote:

Is there a machine instruction pass for combining?

I haven't been following this review, but I saw this question, and the answer is: yes.
See class MachineCombiner (pass name is shown as "Machine InstCombiner").

The addus looks right now, thanks.

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	Ah sorry I missed that. I'm not quite sure if that means the pattern can never be folded into a single instruction, but if so I'm very much against autoupgrade. The code might not be much worse, but in 99.99% of all cases it's going to be worse than using the intrinsic.

tkrupa added inline comments.May 16 2018, 9:16 AM

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	It only fails to fold with fast-isel (one of the non-standard instruction selection options chosen with a compiler flag); it is still always being combined with default SelectionDAGISel.

sroland added inline comments.May 16 2018, 9:23 AM

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	Ok, we're not using that but I believe at some point we considered it (compile times are really important with jit, and that's my other problem with autoupgrade anyway, this is not going to help there but it might not be significant and I'm thinking that's just the price you have to pay for the compiler getting smarter).

About lib/Target/X86/X86ISelLowering.cpp:36183 - does it make sense to emit normal ADD/SUB without saturation when element type after truncation is larger than before extension?

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	Can fast-isel case be left like that then? If not, I could try to do the MachineCombiner folding (we don't currently have any combining like that for x86 from what I've seen) or abandon doing the AutoUpgrade for this particular intrinsic.

sroland added inline comments.May 17 2018, 12:04 PM

test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
2585–2587 ↗	(On Diff #146864)	From my perspective for mesa, yes, it should be ok.

Added a check for two constant operands. I'm still waiting for answer for my last question.

Ping

Ping.

craig.topper added inline comments.Jul 16 2018, 3:35 PM

lib/Target/X86/X86ISelLowering.cpp
32275 ↗	(On Diff #148890)	Shouldn't this be getSetCCSwappedOperands?
32290 ↗	(On Diff #148890)	isEqualTo is overkill here. You can just compare Other->getOperand(0) == CondLHS And it should be "Other." instead of "Other->" in most of this code. Technically they are equivalent, but we tend to use the . on SDValue.
32301 ↗	(On Diff #148890)	Again you don't need isEqualTo.

craig.topper added inline comments.Jul 16 2018, 3:47 PM

lib/Target/X86/X86ISelLowering.cpp
36289 ↗	(On Diff #148890)	Why can't you just use APInt::isIntN for unsigned and APInt::isSignedIntN for signed? Why do you need to extract the high bits yourself?

Rebased and implemented suggested changes.

Removed unnecessary comment.

LGTM

This revision is now accepted and ready to land.Jul 17 2018, 10:03 PM

craig.topper requested changes to this revision.Jul 20 2018, 2:23 PM

craig.topper added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36873 ↗	(On Diff #155844)	isBuildVectorConstantSDNodes allows undef elements which will fail this cast. You need to add a skip for undef. Please add test cases.

This revision now requires changes to proceed.Jul 20 2018, 2:23 PM

sroland added inline comments.Jul 20 2018, 4:01 PM

include/llvm/IR/IntrinsicsX86.td
367 ↗	(On Diff #155844)	FWIW I don't quite agree this is really a FIXME (not without having some appropriate replacement, like a generic llvm intrinsic for saturate arithmetic). I'd take an intrinsic over tons of simple ir ops (on the vague hope it will get recognized again unless some other optimization messed it up) any day of the week. But otherwise looks alright to me.

Now the pattern is not detected if there are any undef elements in the constant operand.
I removed the tests added after fixing the bug which caused reversion of the patch - in the meantime the pattern conditions changed and the sequence used in those tests is now perfectly valid.
Should FIXME comments be removed?

If you remove the FIXMEs you need to replace them with a comment that says not to delete them. I have at various times ran a script or grep looking for unused intrinsics that we forgot to delet

But if we aren't going to delete them, I somewhat feel like we should continue using them in clang and focus on teaching InstCombine how to constant fold them and do other useful optimizations. Having a separate way of doing things for clang and other users of LLVM will lead to inconsistent optimization.

Brought back signed intrinsics and added constant folding in InstCombine.

Better to split the signed/unsigned into separate patches since they've diverged so much?

lib/IR/AutoUpgrade.cpp
79 ↗	(On Diff #159459)	This is version 8.0 now

Split to two different patches for signed and unsigned intrinsics.
Corrected version number.

tkrupa mentioned this in D50499: [X86] Constant folding of adds/subs intrinsics.Aug 9 2018, 2:39 AM

craig.topper added inline comments.Aug 9 2018, 10:17 AM

lib/Target/X86/X86ISelLowering.cpp
32900 ↗	(On Diff #159872)	Why are v16i32 and v8i64 included here? We don't have saturating add/sub for those types do we?

tkrupa updated this revision to Diff 160300.Aug 13 2018, 1:14 AM

tkrupa marked an inline comment as done.

LGTM

This revision is now accepted and ready to land.Aug 13 2018, 11:49 AM

Is clang part good to go too?

Closed by commit rL339650: [X86] Lowering addus/subus intrinsics to native IR (authored by tkrupa). · Explain WhyAug 14 2018, 1:01 AM

This revision was automatically updated to reflect the committed changes.

Hi,

I think this change is breaking one of our builds. The attached reduced test case fails with the current trunk revision if built with "clang -x c -O2 -mavx -c crash.ii".

crash.ii225 BDownload

In D46179#1203852, @jgorbe wrote:

I think this change is breaking one of our builds. The attached reduced test case fails with the current trunk revision if built with "clang -x c -O2 -mavx -c crash.ii".

define void @d() {
entry:

%wide.load = load <4 x i16>, <4 x i16>* undef, align 2
%0 = sext <4 x i16> %wide.load to <4 x i32>
%1 = sub nsw <4 x i32> zeroinitializer, %0
%2 = icmp sgt <4 x i32> %1, <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
%3 = select <4 x i1> %2, <4 x i32> %1, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
%4 = icmp slt <4 x i32> %3, <i32 32767, i32 32767, i32 32767, i32 32767>
%5 = select <4 x i1> %4, <4 x i32> %3, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>
%6 = trunc <4 x i32> %5 to <4 x i16>
store <4 x i16> %6, <4 x i16>* undef, align 2
unreachable

}

I think I see the issue. Working on a fix.

Turns out this patch only tested one of the two new paths added to X86ISelLowering.cpp. I've removed the detectAddSubSatPattern which contained the bug. I think we may want to put it back, but only with appropriate tests.

srj added a subscriber: srj.Aug 23 2018, 5:53 PM

I'm trying to update Halide to generate IR that will be recognized by this patch (instead of calling the now-deprecated intrinsics), but having trouble with a somewhat degenerate-but-legal case.

If user code specifies a non-native vector width (e.g., paddusb with 8 lanes on sse2, instead of the native with of 16 lanes), our code would handle this by loading at the size requested, then combining vectors to the right native size. So our code would formerly emit something like

# Do a saturating unsigned add on two <8 x i8> vectors, 
# then widen to an <8 x i32> result
%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = shufflevector <8 x i8> %20, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%23 = shufflevector <8 x i8> %21, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%24 = call <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8> %22, <16 x i8> %23) #5
%25 = shufflevector <16 x i8> %24, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>

To work with this patch, I revised our code to emit inline code that should pattern-match properly (based on the new self-tests for the IR), something like:

%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = shufflevector <8 x i8> %20, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%23 = shufflevector <8 x i8> %21, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
# Here's the inline pattern that should match paddusb
%24 = add <16 x i8> %22, %23
%25 = icmp ugt <16 x i8> %22, %24
%26 = select <16 x i1> %25, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %24
#
%25 = shufflevector <16 x i8> %26, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>

And, in fact, if I don't use any optimizer passes, this works perfectly. Unfortunately, the LLVM optimizer passes can do some rearranging of this, e.g. into a form something like this:

%20 = load <8 x i8>
%21 = load <8 x i8>
%22 = add <8 x i8> %20, %16
%23 = shufflevector <8 x i8> %22, <8 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%24 = icmp ult <8 x i8> %22, %20
%25 = shufflevector <8 x i1> %24, <8 x i1> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%26 = select <16 x i1> %25, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i8> %23
%27 = shufflevector <16 x i8> %26, <16 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>

...which no longer gets recognized as a pattern that produces paddusb, since the select no longer refers directly to the result of the compare (but rather to an intermediate shuffle).

Disabling all our optimizer passes 'fixes' this but that's obviously unsuitable as a solution. Could this pattern matching be made more robust?

In D46179#1211902, @srj wrote:

Disabling all our optimizer passes 'fixes' this but that's obviously unsuitable as a solution. Could this pattern matching be made more robust?

Would it help or hurt if we narrowed the select in IR to match the final return type and original operand types (I made the types smaller from your code just to make this easier to read):

define <2 x i8> @should_narrow_select(<2 x i8> %x, <2 x i1> %cmp) {
  %widex = shufflevector <2 x i8> %x, <2 x i8> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
  %widecmp = shufflevector <2 x i1> %cmp, <2 x i1> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
  %widesel = select <4 x i1> %widecmp, <4 x i8> <i8 -1, i8 -1, i8 undef, i8 undef>, <4 x i8> %widex
  %sel = shufflevector <4 x i8> %widesel, <4 x i8> undef, <2 x i32> <i32 0, i32 1>
  ret <2 x i8> %sel
}

-->

define <2 x i8> @should_narrow_select(<2 x i8> %x, <2 x i1> %cmp) {
  %narrowsel = select <2 x i1> %cmp, <2 x i8> <i8 -1, i8 -1>, <2 x i8> %x
  ret <2 x i8> %narrowsel
}

I took the liberty of opening a bug on this to simplify discussion: https://bugs.llvm.org/show_bug.cgi?id=38691

In D46179#1212415, @spatel wrote:

Would it help or hurt if we narrowed the select in IR to match the final return type and original operand types (I made the types smaller from your code just to make this easier to read):

Not sure I understand the question -- are you suggesting that this is code I should add, or that this is a transformation that LLVM would do under the hood?

If the former, I don't see how it would help, unless I made the function not-inlined (which would seem to compromise efficiency).

If the latter, I'm not qualified to answer as I know very little about the relevant internal bits of LLVM :-)

In D46179#1212714, @srj wrote:

In D46179#1212415, @spatel wrote:

Would it help or hurt if we narrowed the select in IR to match the final return type and original operand types (I made the types smaller from your code just to make this easier to read):

Not sure I understand the question -- are you suggesting that this is code I should add, or that this is a transformation that LLVM would do under the hood?

If the former, I don't see how it would help, unless I made the function not-inlined (which would seem to compromise efficiency).

If the latter, I'm not qualified to answer as I know very little about the relevant internal bits of LLVM :-)

Sorry it wasn't clear - I was suggesting the latter. The transforms that are occurring here are in instcombine, but we're missing a narrowing opportunity for the select. It does look like that will allow the backend to match saturation again (at least in my simple tests). Let's continue the discussion in the bug report since this review is closed.

@spatel, would the end result be completely removing all of the widening from v8i8 to v16i8 that was done in the incoming IR? Leaving only v8i8 types?

In D46179#1212740, @craig.topper wrote:

@spatel, would the end result be completely removing all of the widening from v8i8 to v16i8 that was done in the incoming IR? Leaving only v8i8 types?

Yes - I'll describe/show the potential transform in the bug report.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

IR/

IntrinsicsX86.td

60 lines

lib/

IR/

AutoUpgrade.cpp

49 lines

Target/

X86/

X86ISelLowering.cpp

160 lines

X86IntrinsicsInfo.h

20 lines

test/

CodeGen/

X86/

avx2-intrinsics-canonical.ll

167 lines

avx2-intrinsics-fast-isel.ll

36 lines

avx2-intrinsics-x86-upgrade.ll

104 lines

avx2-intrinsics-x86.ll

158 lines

avx512bw-intrinsics-canonical.ll

308 lines

avx512bw-intrinsics-upgrade.ll

208 lines

avx512bw-intrinsics.ll

197 lines

avx512bwvl-intrinsics-canonical.ll

669 lines

avx512bwvl-intrinsics-upgrade.ll

800 lines

avx512bwvl-intrinsics.ll

797 lines

sse2-intrinsics-canonical.ll

274 lines

sse2-intrinsics-fast-isel.ll

72 lines

sse2-intrinsics-x86-upgrade.ll

84 lines

sse2-intrinsics-x86.ll

84 lines

Diff 160522

llvm/trunk/include/llvm/IR/IntrinsicsX86.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
// Integer arithmetic ops.		// Integer arithmetic ops.
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_sse2_padds_b : GCCBuiltin<"__builtin_ia32_paddsb128">,		def int_x86_sse2_padds_b : GCCBuiltin<"__builtin_ia32_paddsb128">,
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,		Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,
llvm_v16i8_ty], [IntrNoMem, Commutative]>;		llvm_v16i8_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_padds_w : GCCBuiltin<"__builtin_ia32_paddsw128">,		def int_x86_sse2_padds_w : GCCBuiltin<"__builtin_ia32_paddsw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem, Commutative]>;		llvm_v8i16_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_paddus_b : GCCBuiltin<"__builtin_ia32_paddusb128">,
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,
llvm_v16i8_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_paddus_w : GCCBuiltin<"__builtin_ia32_paddusw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_psubs_b : GCCBuiltin<"__builtin_ia32_psubsb128">,		def int_x86_sse2_psubs_b : GCCBuiltin<"__builtin_ia32_psubsb128">,
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,		Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,
llvm_v16i8_ty], [IntrNoMem]>;		llvm_v16i8_ty], [IntrNoMem]>;
def int_x86_sse2_psubs_w : GCCBuiltin<"__builtin_ia32_psubsw128">,		def int_x86_sse2_psubs_w : GCCBuiltin<"__builtin_ia32_psubsw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem]>;		llvm_v8i16_ty], [IntrNoMem]>;
def int_x86_sse2_psubus_b : GCCBuiltin<"__builtin_ia32_psubusb128">,
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty,
llvm_v16i8_ty], [IntrNoMem]>;
def int_x86_sse2_psubus_w : GCCBuiltin<"__builtin_ia32_psubusw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem]>;
def int_x86_sse2_pmulhu_w : GCCBuiltin<"__builtin_ia32_pmulhuw128">,		def int_x86_sse2_pmulhu_w : GCCBuiltin<"__builtin_ia32_pmulhuw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem, Commutative]>;		llvm_v8i16_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_pmulh_w : GCCBuiltin<"__builtin_ia32_pmulhw128">,		def int_x86_sse2_pmulh_w : GCCBuiltin<"__builtin_ia32_pmulhw128">,
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty,
llvm_v8i16_ty], [IntrNoMem, Commutative]>;		llvm_v8i16_ty], [IntrNoMem, Commutative]>;
def int_x86_sse2_pmadd_wd : GCCBuiltin<"__builtin_ia32_pmaddwd128">,		def int_x86_sse2_pmadd_wd : GCCBuiltin<"__builtin_ia32_pmaddwd128">,
Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty,		Intrinsic<[llvm_v4i32_ty], [llvm_v8i16_ty,
▲ Show 20 Lines • Show All 960 Lines • ▼ Show 20 Lines
// Integer arithmetic ops.		// Integer arithmetic ops.
let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".		let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
def int_x86_avx2_padds_b : GCCBuiltin<"__builtin_ia32_paddsb256">,		def int_x86_avx2_padds_b : GCCBuiltin<"__builtin_ia32_paddsb256">,
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,		Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,
llvm_v32i8_ty], [IntrNoMem, Commutative]>;		llvm_v32i8_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_padds_w : GCCBuiltin<"__builtin_ia32_paddsw256">,		def int_x86_avx2_padds_w : GCCBuiltin<"__builtin_ia32_paddsw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem, Commutative]>;		llvm_v16i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_paddus_b : GCCBuiltin<"__builtin_ia32_paddusb256">,
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,
llvm_v32i8_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_paddus_w : GCCBuiltin<"__builtin_ia32_paddusw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_psubs_b : GCCBuiltin<"__builtin_ia32_psubsb256">,		def int_x86_avx2_psubs_b : GCCBuiltin<"__builtin_ia32_psubsb256">,
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,		Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,
llvm_v32i8_ty], [IntrNoMem]>;		llvm_v32i8_ty], [IntrNoMem]>;
def int_x86_avx2_psubs_w : GCCBuiltin<"__builtin_ia32_psubsw256">,		def int_x86_avx2_psubs_w : GCCBuiltin<"__builtin_ia32_psubsw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem]>;		llvm_v16i16_ty], [IntrNoMem]>;
def int_x86_avx2_psubus_b : GCCBuiltin<"__builtin_ia32_psubusb256">,
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty,
llvm_v32i8_ty], [IntrNoMem]>;
def int_x86_avx2_psubus_w : GCCBuiltin<"__builtin_ia32_psubusw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem]>;
def int_x86_avx2_pmulhu_w : GCCBuiltin<"__builtin_ia32_pmulhuw256">,		def int_x86_avx2_pmulhu_w : GCCBuiltin<"__builtin_ia32_pmulhuw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem, Commutative]>;		llvm_v16i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_pmulh_w : GCCBuiltin<"__builtin_ia32_pmulhw256">,		def int_x86_avx2_pmulh_w : GCCBuiltin<"__builtin_ia32_pmulhw256">,
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty,
llvm_v16i16_ty], [IntrNoMem, Commutative]>;		llvm_v16i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx2_pmadd_wd : GCCBuiltin<"__builtin_ia32_pmaddwd256">,		def int_x86_avx2_pmadd_wd : GCCBuiltin<"__builtin_ia32_pmaddwd256">,
Intrinsic<[llvm_v8i32_ty], [llvm_v16i16_ty,		Intrinsic<[llvm_v8i32_ty], [llvm_v16i16_ty,
▲ Show 20 Lines • Show All 2,297 Lines • ▼ Show 20 Lines	def int_x86_avx512_mask_padds_w_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;		llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_padds_w_256 : // FIXME: remove this intrinsic		def int_x86_avx512_mask_padds_w_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,
llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;		llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_padds_w_512 : GCCBuiltin<"__builtin_ia32_paddsw512_mask">,		def int_x86_avx512_mask_padds_w_512 : GCCBuiltin<"__builtin_ia32_paddsw512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,
llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_b_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
llvm_v16i8_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_b_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty, llvm_v32i8_ty,
llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_b_512 : GCCBuiltin<"__builtin_ia32_paddusb512_mask">,
Intrinsic<[llvm_v64i8_ty], [llvm_v64i8_ty, llvm_v64i8_ty,
llvm_v64i8_ty, llvm_i64_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_w_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_w_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,
llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_paddus_w_512 : GCCBuiltin<"__builtin_ia32_paddusw512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,
llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_b_128 : // FIXME: remove this intrinsic		def int_x86_avx512_mask_psubs_b_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty,		Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
llvm_v16i8_ty, llvm_i16_ty], [IntrNoMem]>;		llvm_v16i8_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_b_256 : // FIXME: remove this intrinsic		def int_x86_avx512_mask_psubs_b_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty, llvm_v32i8_ty,		Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty, llvm_v32i8_ty,
llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_b_512 : GCCBuiltin<"__builtin_ia32_psubsb512_mask">,		def int_x86_avx512_mask_psubs_b_512 : GCCBuiltin<"__builtin_ia32_psubsb512_mask">,
Intrinsic<[llvm_v64i8_ty], [llvm_v64i8_ty, llvm_v64i8_ty,		Intrinsic<[llvm_v64i8_ty], [llvm_v64i8_ty, llvm_v64i8_ty,
llvm_v64i8_ty, llvm_i64_ty], [IntrNoMem]>;		llvm_v64i8_ty, llvm_i64_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_w_128 : // FIXME: remove this intrinsic		def int_x86_avx512_mask_psubs_w_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,		Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;		llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_w_256 : // FIXME: remove this intrinsic		def int_x86_avx512_mask_psubs_w_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,		Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,
llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;		llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubs_w_512 : GCCBuiltin<"__builtin_ia32_psubsw512_mask">,		def int_x86_avx512_mask_psubs_w_512 : GCCBuiltin<"__builtin_ia32_psubsw512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,
llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;		llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_b_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i8_ty], [llvm_v16i8_ty, llvm_v16i8_ty,
llvm_v16i8_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_b_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v32i8_ty], [llvm_v32i8_ty, llvm_v32i8_ty,
llvm_v32i8_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_b_512 : GCCBuiltin<"__builtin_ia32_psubusb512_mask">,
Intrinsic<[llvm_v64i8_ty], [llvm_v64i8_ty, llvm_v64i8_ty,
llvm_v64i8_ty, llvm_i64_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_w_128 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v8i16_ty], [llvm_v8i16_ty, llvm_v8i16_ty,
llvm_v8i16_ty, llvm_i8_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_w_256 : // FIXME: remove this intrinsic
Intrinsic<[llvm_v16i16_ty], [llvm_v16i16_ty, llvm_v16i16_ty,
llvm_v16i16_ty, llvm_i16_ty], [IntrNoMem]>;
def int_x86_avx512_mask_psubus_w_512 : GCCBuiltin<"__builtin_ia32_psubusw512_mask">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty, llvm_v32i16_ty,
llvm_v32i16_ty, llvm_i32_ty], [IntrNoMem]>;
def int_x86_avx512_pmulhu_w_512 : GCCBuiltin<"__builtin_ia32_pmulhuw512">,		def int_x86_avx512_pmulhu_w_512 : GCCBuiltin<"__builtin_ia32_pmulhuw512">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
llvm_v32i16_ty], [IntrNoMem, Commutative]>;		llvm_v32i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx512_pmulh_w_512 : GCCBuiltin<"__builtin_ia32_pmulhw512">,		def int_x86_avx512_pmulh_w_512 : GCCBuiltin<"__builtin_ia32_pmulhw512">,
Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,		Intrinsic<[llvm_v32i16_ty], [llvm_v32i16_ty,
llvm_v32i16_ty], [IntrNoMem, Commutative]>;		llvm_v32i16_ty], [IntrNoMem, Commutative]>;
def int_x86_avx512_pmaddw_d_512 : GCCBuiltin<"__builtin_ia32_pmaddwd512">,		def int_x86_avx512_pmaddw_d_512 : GCCBuiltin<"__builtin_ia32_pmaddwd512">,
Intrinsic<[llvm_v16i32_ty], [llvm_v32i16_ty,		Intrinsic<[llvm_v16i32_ty], [llvm_v32i16_ty,
▲ Show 20 Lines • Show All 1,456 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/AutoUpgrade.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
}		}

static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {		static bool ShouldUpgradeX86Intrinsic(Function *F, StringRef Name) {
// All of the intrinsics matches below should be marked with which llvm		// All of the intrinsics matches below should be marked with which llvm
// version started autoupgrading them. At some point in the future we would		// version started autoupgrading them. At some point in the future we would
// like to use this information to remove upgrade code for some older		// like to use this information to remove upgrade code for some older
// intrinsics. It is currently undecided how we will determine that future		// intrinsics. It is currently undecided how we will determine that future
// point.		// point.
if (Name=="ssse3.pabs.b.128" \|\| // Added in 6.0		if (Name.startswith("sse2.paddus") \|\| // Added in 8.0
		Name.startswith("sse2.psubus") \|\| // Added in 8.0
		Name.startswith("avx2.paddus") \|\| // Added in 8.0
		Name.startswith("avx2.psubus") \|\| // Added in 8.0
		Name.startswith("avx512.mask.paddus") \|\| // Added in 8.0
		Name.startswith("avx512.mask.psubus") \|\| // Added in 8.0
		Name=="ssse3.pabs.b.128" \|\| // Added in 6.0
Name=="ssse3.pabs.w.128" \|\| // Added in 6.0		Name=="ssse3.pabs.w.128" \|\| // Added in 6.0
Name=="ssse3.pabs.d.128" \|\| // Added in 6.0		Name=="ssse3.pabs.d.128" \|\| // Added in 6.0
Name.startswith("fma4.vfmadd.s") \|\| // Added in 7.0		Name.startswith("fma4.vfmadd.s") \|\| // Added in 7.0
Name.startswith("fma.vfmadd.") \|\| // Added in 7.0		Name.startswith("fma.vfmadd.") \|\| // Added in 7.0
Name.startswith("fma.vfmsub.") \|\| // Added in 7.0		Name.startswith("fma.vfmsub.") \|\| // Added in 7.0
Name.startswith("fma.vfmaddsub.") \|\| // Added in 7.0		Name.startswith("fma.vfmaddsub.") \|\| // Added in 7.0
Name.startswith("fma.vfmsubadd.") \|\| // Added in 7.0		Name.startswith("fma.vfmsubadd.") \|\| // Added in 7.0
Name.startswith("fma.vfnmadd.") \|\| // Added in 7.0		Name.startswith("fma.vfnmadd.") \|\| // Added in 7.0
▲ Show 20 Lines • Show All 811 Lines • ▼ Show 20 Lines	static Value UpgradeX86ALIGNIntrinsics(IRBuilder<> &Builder, Value Op0,

Value *Align = Builder.CreateShuffleVector(Op1, Op0,		Value *Align = Builder.CreateShuffleVector(Op1, Op0,
makeArrayRef(Indices, NumElts),		makeArrayRef(Indices, NumElts),
"palignr");		"palignr");

return EmitX86Select(Builder, Mask, Align, Passthru);		return EmitX86Select(Builder, Mask, Align, Passthru);
}		}

		static Value *UpgradeX86AddSubSatIntrinsics(IRBuilder<> &Builder, CallInst &CI,
		bool IsAddition) {
		Value *Op0 = CI.getOperand(0);
		Value *Op1 = CI.getOperand(1);

		// Collect vector elements and type data.
		Type *ResultType = CI.getType();

		Value *Res;
		if (IsAddition) {
		// ADDUS: a > (a+b) ? ~0 : (a+b)
		// If Op0 > Add, overflow occured.
		Value *Add = Builder.CreateAdd(Op0, Op1);
		Value *ICmp = Builder.CreateICmp(ICmpInst::ICMP_UGT, Op0, Add);
		Value *Max = llvm::Constant::getAllOnesValue(ResultType);
		Res = Builder.CreateSelect(ICmp, Max, Add);
		} else {
		// SUBUS: max(a, b) - b
		Value *ICmp = Builder.CreateICmp(ICmpInst::ICMP_UGT, Op0, Op1);
		Value *Select = Builder.CreateSelect(ICmp, Op0, Op1);
		Res = Builder.CreateSub(Select, Op1);
		}

		if (CI.getNumArgOperands() == 4) { // For masked intrinsics.
		Value *VecSrc = CI.getOperand(2);
		Value *Mask = CI.getOperand(3);
		Res = EmitX86Select(Builder, Mask, Res, VecSrc);
		}
		return Res;
		}

static Value *UpgradeMaskedStore(IRBuilder<> &Builder,		static Value *UpgradeMaskedStore(IRBuilder<> &Builder,
Value Ptr, Value Data, Value *Mask,		Value Ptr, Value Data, Value *Mask,
bool Aligned) {		bool Aligned) {
// Cast the pointer to the right type.		// Cast the pointer to the right type.
Ptr = Builder.CreateBitCast(Ptr,		Ptr = Builder.CreateBitCast(Ptr,
llvm::PointerType::getUnqual(Data->getType()));		llvm::PointerType::getUnqual(Data->getType()));
unsigned Align =		unsigned Align =
Aligned ? cast<VectorType>(Data->getType())->getBitWidth() / 8 : 1;		Aligned ? cast<VectorType>(Data->getType())->getBitWidth() / 8 : 1;
▲ Show 20 Lines • Show All 1,144 Lines • ▼ Show 20 Lines	if (IsX86 && (Name.startswith("sse2.pcmp") \|\|
unsigned NumElts = CI->getType()->getVectorNumElements();		unsigned NumElts = CI->getType()->getVectorNumElements();
Type *MaskTy = VectorType::get(Type::getInt32Ty(C), NumElts);		Type *MaskTy = VectorType::get(Type::getInt32Ty(C), NumElts);
Rep = Builder.CreateShuffleVector(Op, UndefValue::get(Op->getType()),		Rep = Builder.CreateShuffleVector(Op, UndefValue::get(Op->getType()),
Constant::getNullValue(MaskTy));		Constant::getNullValue(MaskTy));

if (CI->getNumArgOperands() == 3)		if (CI->getNumArgOperands() == 3)
Rep = EmitX86Select(Builder, CI->getArgOperand(2), Rep,		Rep = EmitX86Select(Builder, CI->getArgOperand(2), Rep,
CI->getArgOperand(1));		CI->getArgOperand(1));
		} else if (IsX86 && (Name.startswith("sse2.paddus") \|\|
		Name.startswith("avx2.paddus") \|\|
		Name.startswith("avx512.mask.paddus"))) {
		Rep = UpgradeX86AddSubSatIntrinsics(Builder, CI, true /IsAdd*/);
		} else if (IsX86 && (Name.startswith("sse2.psubus") \|\|
		Name.startswith("avx2.psubus") \|\|
		Name.startswith("avx512.mask.psubus"))) {
		Rep = UpgradeX86AddSubSatIntrinsics(Builder, CI, false /IsAdd*/);
} else if (IsX86 && Name.startswith("avx512.mask.palignr.")) {		}else if (IsX86 && Name.startswith("avx512.mask.palignr.")) {
Rep = UpgradeX86ALIGNIntrinsics(Builder, CI->getArgOperand(0),		Rep = UpgradeX86ALIGNIntrinsics(Builder, CI->getArgOperand(0),
CI->getArgOperand(1),		CI->getArgOperand(1),
CI->getArgOperand(2),		CI->getArgOperand(2),
CI->getArgOperand(3),		CI->getArgOperand(3),
CI->getArgOperand(4),		CI->getArgOperand(4),
false);		false);
} else if (IsX86 && Name.startswith("avx512.mask.valign.")) {		} else if (IsX86 && Name.startswith("avx512.mask.valign.")) {
Rep = UpgradeX86ALIGNIntrinsics(Builder, CI->getArgOperand(0),		Rep = UpgradeX86ALIGNIntrinsics(Builder, CI->getArgOperand(0),
▲ Show 20 Lines • Show All 1,651 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 32,892 Lines • ▼ Show 20 Lines	if (Other.getNode() && Other->getNumOperands() == 2 &&
// don't rely on particular values of undef lanes.		// don't rely on particular values of undef lanes.
return SplitOpsAndApply(DAG, Subtarget, DL, VT, { OpLHS, OpRHS },		return SplitOpsAndApply(DAG, Subtarget, DL, VT, { OpLHS, OpRHS },
SUBUSBuilder);		SUBUSBuilder);
}		}
}		}
}		}
}		}

		// Match VSELECTs into add with unsigned saturation.
		if (N->getOpcode() == ISD::VSELECT && Cond.getOpcode() == ISD::SETCC &&
		((Subtarget.hasSSE2() && (VT == MVT::v16i8 \|\| VT == MVT::v8i16)) \|\|
		(Subtarget.hasAVX() && (VT == MVT::v32i8 \|\| VT == MVT::v16i16)) \|\|
		(Subtarget.useBWIRegs() && (VT == MVT::v64i8 \|\| VT == MVT::v32i16)))) {
		ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();

		SDValue CondLHS = Cond->getOperand(0);
		SDValue CondRHS = Cond->getOperand(1);

		// Canonicalize ADD to CondRHS to simplify the logic below.
		if (CondLHS.getOpcode() == ISD::ADD) {
		std::swap(CondLHS, CondRHS);
		CC = ISD::getSetCCSwappedOperands(CC);
		}

		// Check if one of the arms of the VSELECT is vector with all bits set.
		// If it's on the left side invert the predicate to simplify logic below.
		SDValue Other;
		if (ISD::isBuildVectorAllOnes(LHS.getNode())) {
		Other = RHS;
		CC = ISD::getSetCCInverse(CC, true);
		} else if (ISD::isBuildVectorAllOnes(RHS.getNode())) {
		Other = LHS;
		}

		// We can test against either of the addition operands.
		if (Other.getNode() && Other.getNumOperands() == 2 &&
		(Other.getOperand(0) == CondLHS \|\|
		Other.getOperand(1) == CondLHS)) {
		SDValue OpLHS = Other.getOperand(0), OpRHS = Other.getOperand(1);

		auto ADDUSBuilder = [](SelectionDAG &DAG, const SDLoc &DL,
		ArrayRef<SDValue> Ops) {
		return DAG.getNode(X86ISD::ADDUS, DL, Ops[0].getValueType(), Ops);
		};

		// x <= x+y ? x+y : ~0 --> addus x, y
		if ((CC == ISD::SETULE) &&
		Other.getOpcode() == ISD::ADD && Other == CondRHS)
		return SplitOpsAndApply(DAG, Subtarget, DL, VT, { OpLHS, OpRHS },
		ADDUSBuilder);
		}
		}

if (SDValue V = combineVSelectWithAllOnesOrZeros(N, DAG, DCI, Subtarget))		if (SDValue V = combineVSelectWithAllOnesOrZeros(N, DAG, DCI, Subtarget))
return V;		return V;

if (SDValue V = combineVSelectToShrunkBlend(N, DAG, DCI, Subtarget))		if (SDValue V = combineVSelectToShrunkBlend(N, DAG, DCI, Subtarget))
return V;		return V;

// Custom action for SELECT MMX		// Custom action for SELECT MMX
if (VT == MVT::x86mmx) {		if (VT == MVT::x86mmx) {
▲ Show 20 Lines • Show All 4,044 Lines • ▼ Show 20 Lines	auto PMADDBuilder = [](SelectionDAG &DAG, const SDLoc &DL,
EVT ResVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16,		EVT ResVT = EVT::getVectorVT(*DAG.getContext(), MVT::i16,
InVT.getVectorNumElements() / 2);		InVT.getVectorNumElements() / 2);
return DAG.getNode(X86ISD::VPMADDUBSW, DL, ResVT, Ops[0], Ops[1]);		return DAG.getNode(X86ISD::VPMADDUBSW, DL, ResVT, Ops[0], Ops[1]);
};		};
return SplitOpsAndApply(DAG, Subtarget, DL, VT, { ZExtIn, SExtIn },		return SplitOpsAndApply(DAG, Subtarget, DL, VT, { ZExtIn, SExtIn },
PMADDBuilder);		PMADDBuilder);
}		}

		/// This function detects the addition or subtraction with saturation pattern
		/// between 2 i8/i16 vectors and replace this operation with the
		/// efficient X86ISD::ADDUS/X86ISD::ADDS/X86ISD::SUBUS/X86ISD::SUBS instruction.
		static SDValue detectAddSubSatPattern(SDValue In, EVT VT, SelectionDAG &DAG,
		const X86Subtarget &Subtarget,
		const SDLoc &DL) {
		if (!VT.isVector())
		return SDValue();
		EVT InVT = In.getValueType();
		unsigned NumElems = VT.getVectorNumElements();

		EVT ScalarVT = VT.getVectorElementType();
		if ((ScalarVT != MVT::i8 && ScalarVT != MVT::i16) \|\|
		InVT.getSizeInBits() % 128 != 0 \|\| !isPowerOf2_32(NumElems))
		return SDValue();

		// InScalarVT is the intermediate type in AddSubSat pattern
		// and it should be greater than the output type.
		EVT InScalarVT = InVT.getVectorElementType();
		if (InScalarVT.getSizeInBits() <= ScalarVT.getSizeInBits())
		return SDValue();

		if (!Subtarget.hasSSE2())
		return SDValue();

		// Detect the following pattern:
		// %2 = zext <16 x i8> %0 to <16 x i16>
		// %3 = zext <16 x i8> %1 to <16 x i16>
		// %4 = add nuw nsw <16 x i16> %3, %2
		// %5 = icmp ult <16 x i16> %4, <16 x i16> (vector of max InScalarVT values)
		// %6 = select <16 x i1> %5, <16 x i16> (vector of max InScalarVT values)
		// %7 = trunc <16 x i16> %6 to <16 x i8>

		// Detect a Sat Pattern
		bool Signed = true;
		SDValue Sat = detectSSatPattern(In, VT, false);
		if (!Sat) {
		Sat = detectUSatPattern(In, VT, DAG, DL);
		Signed = false;
		}
		if (!Sat)
		return SDValue();
		if (Sat.getOpcode() != ISD::ADD && Sat.getOpcode() != ISD::SUB)
		return SDValue();

		unsigned Opcode = Sat.getOpcode() == ISD::ADD ? Signed ? X86ISD::ADDS
		: X86ISD::ADDUS
		: Signed ? X86ISD::SUBS
		: X86ISD::SUBUS;

		// Get addition elements.
		SDValue LHS = Sat.getOperand(0);
		SDValue RHS = Sat.getOperand(1);

		// Don't combine if both operands are constant.
		if (ISD::isBuildVectorOfConstantSDNodes(LHS.getNode()) &&
		ISD::isBuildVectorOfConstantSDNodes(RHS.getNode()))
		return SDValue();

		// Check if Op is a result of type promotion.
		auto IsExtended = [=, &DAG](SDValue Op) {
		unsigned Opcode = Op.getOpcode();
		unsigned EltSize = ScalarVT.getSizeInBits();
		unsigned ExtEltSize = InScalarVT.getSizeInBits();
		unsigned ExtPartSize = ExtEltSize - EltSize;

		// Extension of non-constant operand.
		if (Opcode == ISD::ZERO_EXTEND \|\| Opcode == ISD::SIGN_EXTEND) {
		if (Signed)
		return DAG.ComputeNumSignBits(Op) > ExtPartSize;
		else {
		APInt HighBitsMask = APInt::getHighBitsSet(ExtEltSize, ExtPartSize);
		return DAG.MaskedValueIsZero(Op, HighBitsMask);
		}
		} else if (ISD::isBuildVectorOfConstantSDNodes(Op.getNode())) {
		// Build vector of constant nodes. Each of them needs to be a correct
		// extension from a constant of ScalarVT type.
		unsigned NumOperands = Op.getNumOperands();
		for (unsigned i = 0; i < NumOperands; ++i) {
		SDValue Elem = Op.getOperand(i);
		if (Elem.isUndef())
		return false;
		APInt Elt = cast<ConstantSDNode>(Elem)->getAPIntValue();
		if ((Signed && !Elt.isSignedIntN(EltSize)) \|\|
		(!Signed && !Elt.isIntN(EltSize)))
		return false;
		}
		return true;
		}
		return false;
		};

		// Either both operands are extended or one of them is extended
		// and another one is a vector of constants.
		if (!IsExtended(LHS) \|\| !IsExtended(RHS))
		return SDValue();

		// Truncate extended nodes to result type.
		LHS = DAG.getNode(ISD::TRUNCATE, DL, VT, LHS);
		RHS = DAG.getNode(ISD::TRUNCATE, DL, VT, RHS);

		// The pattern is detected, emit ADDS/ADDUS/SUBS/SUBUS instruction.
		auto AddSubSatBuilder = [Opcode](SelectionDAG &DAG, const SDLoc &DL,
		ArrayRef<SDValue> Ops) {
		EVT VT = Ops[0].getValueType();
		return DAG.getNode(Opcode, DL, VT, Ops);
		};
		return SplitOpsAndApply(DAG, Subtarget, DL, VT, { LHS, RHS },
		AddSubSatBuilder);
		}

static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,		static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue Src = N->getOperand(0);		SDValue Src = N->getOperand(0);
SDLoc DL(N);		SDLoc DL(N);

// Attempt to pre-truncate inputs to arithmetic ops instead.		// Attempt to pre-truncate inputs to arithmetic ops instead.
if (SDValue V = combineTruncatedArithmetic(N, DAG, Subtarget, DL))		if (SDValue V = combineTruncatedArithmetic(N, DAG, Subtarget, DL))
return V;		return V;

// Try to detect AVG pattern first.		// Try to detect AVG pattern first.
if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))		if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))
return Avg;		return Avg;

// Try to detect PMADD		// Try to detect PMADD
if (SDValue PMAdd = detectPMADDUBSW(Src, VT, DAG, Subtarget, DL))		if (SDValue PMAdd = detectPMADDUBSW(Src, VT, DAG, Subtarget, DL))
return PMAdd;		return PMAdd;

		// Try to detect addition or subtraction with saturation.
		if (SDValue AddSubSat = detectAddSubSatPattern(Src, VT, DAG, Subtarget, DL))
		return AddSubSat;

// Try to combine truncation with signed/unsigned saturation.		// Try to combine truncation with signed/unsigned saturation.
if (SDValue Val = combineTruncateWithSat(Src, VT, DL, DAG, Subtarget))		if (SDValue Val = combineTruncateWithSat(Src, VT, DL, DAG, Subtarget))
return Val;		return Val;

// Try to combine PMULHUW/PMULHW for vXi16.		// Try to combine PMULHUW/PMULHW for vXi16.
if (SDValue V = combinePMULH(Src, VT, DL, DAG, Subtarget))		if (SDValue V = combinePMULH(Src, VT, DL, DAG, Subtarget))
return V;		return V;

▲ Show 20 Lines • Show All 3,959 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86IntrinsicsInfo.h

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx_vpermilvar_ps, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_ps, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx_vpermilvar_ps_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),		X86_INTRINSIC_DATA(avx_vpermilvar_ps_256, INTR_TYPE_2OP, X86ISD::VPERMILPV, 0),
X86_INTRINSIC_DATA(avx2_packssdw, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(avx2_packssdw, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(avx2_packsswb, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(avx2_packsswb, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(avx2_packusdw, INTR_TYPE_2OP, X86ISD::PACKUS, 0),		X86_INTRINSIC_DATA(avx2_packusdw, INTR_TYPE_2OP, X86ISD::PACKUS, 0),
X86_INTRINSIC_DATA(avx2_packuswb, INTR_TYPE_2OP, X86ISD::PACKUS, 0),		X86_INTRINSIC_DATA(avx2_packuswb, INTR_TYPE_2OP, X86ISD::PACKUS, 0),
X86_INTRINSIC_DATA(avx2_padds_b, INTR_TYPE_2OP, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx2_padds_b, INTR_TYPE_2OP, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx2_padds_w, INTR_TYPE_2OP, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx2_padds_w, INTR_TYPE_2OP, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx2_paddus_b, INTR_TYPE_2OP, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx2_paddus_w, INTR_TYPE_2OP, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx2_permd, VPERM_2OP, X86ISD::VPERMV, 0),		X86_INTRINSIC_DATA(avx2_permd, VPERM_2OP, X86ISD::VPERMV, 0),
X86_INTRINSIC_DATA(avx2_permps, VPERM_2OP, X86ISD::VPERMV, 0),		X86_INTRINSIC_DATA(avx2_permps, VPERM_2OP, X86ISD::VPERMV, 0),
X86_INTRINSIC_DATA(avx2_phadd_d, INTR_TYPE_2OP, X86ISD::HADD, 0),		X86_INTRINSIC_DATA(avx2_phadd_d, INTR_TYPE_2OP, X86ISD::HADD, 0),
X86_INTRINSIC_DATA(avx2_phadd_w, INTR_TYPE_2OP, X86ISD::HADD, 0),		X86_INTRINSIC_DATA(avx2_phadd_w, INTR_TYPE_2OP, X86ISD::HADD, 0),
X86_INTRINSIC_DATA(avx2_phsub_d, INTR_TYPE_2OP, X86ISD::HSUB, 0),		X86_INTRINSIC_DATA(avx2_phsub_d, INTR_TYPE_2OP, X86ISD::HSUB, 0),
X86_INTRINSIC_DATA(avx2_phsub_w, INTR_TYPE_2OP, X86ISD::HSUB, 0),		X86_INTRINSIC_DATA(avx2_phsub_w, INTR_TYPE_2OP, X86ISD::HSUB, 0),
X86_INTRINSIC_DATA(avx2_pmadd_ub_sw, INTR_TYPE_2OP, X86ISD::VPMADDUBSW, 0),		X86_INTRINSIC_DATA(avx2_pmadd_ub_sw, INTR_TYPE_2OP, X86ISD::VPMADDUBSW, 0),
X86_INTRINSIC_DATA(avx2_pmadd_wd, INTR_TYPE_2OP, X86ISD::VPMADDWD, 0),		X86_INTRINSIC_DATA(avx2_pmadd_wd, INTR_TYPE_2OP, X86ISD::VPMADDWD, 0),
Show All 26 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(avx2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_d_256, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psrlv_q, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_q, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psrlv_q_256, INTR_TYPE_2OP, ISD::SRL, 0),		X86_INTRINSIC_DATA(avx2_psrlv_q_256, INTR_TYPE_2OP, ISD::SRL, 0),
X86_INTRINSIC_DATA(avx2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx2_psubus_b, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx2_psubus_w, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_add_pd_512, INTR_TYPE_2OP, ISD::FADD, X86ISD::FADD_RND),		X86_INTRINSIC_DATA(avx512_add_pd_512, INTR_TYPE_2OP, ISD::FADD, X86ISD::FADD_RND),
X86_INTRINSIC_DATA(avx512_add_ps_512, INTR_TYPE_2OP, ISD::FADD, X86ISD::FADD_RND),		X86_INTRINSIC_DATA(avx512_add_ps_512, INTR_TYPE_2OP, ISD::FADD, X86ISD::FADD_RND),
X86_INTRINSIC_DATA(avx512_cmp_pd_128, CMP_MASK_CC, X86ISD::CMPM, 0),		X86_INTRINSIC_DATA(avx512_cmp_pd_128, CMP_MASK_CC, X86ISD::CMPM, 0),
X86_INTRINSIC_DATA(avx512_cmp_pd_256, CMP_MASK_CC, X86ISD::CMPM, 0),		X86_INTRINSIC_DATA(avx512_cmp_pd_256, CMP_MASK_CC, X86ISD::CMPM, 0),
X86_INTRINSIC_DATA(avx512_cmp_pd_512, CMP_MASK_CC, X86ISD::CMPM, X86ISD::CMPM_RND),		X86_INTRINSIC_DATA(avx512_cmp_pd_512, CMP_MASK_CC, X86ISD::CMPM, X86ISD::CMPM_RND),
X86_INTRINSIC_DATA(avx512_cmp_ps_128, CMP_MASK_CC, X86ISD::CMPM, 0),		X86_INTRINSIC_DATA(avx512_cmp_ps_128, CMP_MASK_CC, X86ISD::CMPM, 0),
X86_INTRINSIC_DATA(avx512_cmp_ps_256, CMP_MASK_CC, X86ISD::CMPM, 0),		X86_INTRINSIC_DATA(avx512_cmp_ps_256, CMP_MASK_CC, X86ISD::CMPM, 0),
X86_INTRINSIC_DATA(avx512_cmp_ps_512, CMP_MASK_CC, X86ISD::CMPM, X86ISD::CMPM_RND),		X86_INTRINSIC_DATA(avx512_cmp_ps_512, CMP_MASK_CC, X86ISD::CMPM, X86ISD::CMPM_RND),
▲ Show 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx512_mask_mul_ss_round, INTR_TYPE_SCALAR_MASK_RM,		X86_INTRINSIC_DATA(avx512_mask_mul_ss_round, INTR_TYPE_SCALAR_MASK_RM,
X86ISD::FMULS_RND, 0),		X86ISD::FMULS_RND, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_b_128, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_b_128, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_b_256, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_b_256, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_b_512, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_b_512, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_w_128, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_w_128, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_w_256, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_w_256, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_padds_w_512, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(avx512_mask_padds_w_512, INTR_TYPE_2OP_MASK, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_b_128, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_b_256, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_b_512, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_w_128, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_w_256, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_paddus_w_512, INTR_TYPE_2OP_MASK, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(avx512_mask_pmov_db_128, INTR_TYPE_1OP_MASK,		X86_INTRINSIC_DATA(avx512_mask_pmov_db_128, INTR_TYPE_1OP_MASK,
X86ISD::VTRUNC, 0),		X86ISD::VTRUNC, 0),
X86_INTRINSIC_DATA(avx512_mask_pmov_db_256, INTR_TYPE_1OP_MASK,		X86_INTRINSIC_DATA(avx512_mask_pmov_db_256, INTR_TYPE_1OP_MASK,
X86ISD::VTRUNC, 0),		X86ISD::VTRUNC, 0),
X86_INTRINSIC_DATA(avx512_mask_pmov_db_512, INTR_TYPE_1OP_MASK,		X86_INTRINSIC_DATA(avx512_mask_pmov_db_512, INTR_TYPE_1OP_MASK,
ISD::TRUNCATE, 0),		ISD::TRUNCATE, 0),
X86_INTRINSIC_DATA(avx512_mask_pmov_dw_128, INTR_TYPE_1OP_MASK,		X86_INTRINSIC_DATA(avx512_mask_pmov_dw_128, INTR_TYPE_1OP_MASK,
X86ISD::VTRUNC, 0),		X86ISD::VTRUNC, 0),
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(avx512_mask_pmultishift_qb_512, INTR_TYPE_2OP_MASK,		X86_INTRINSIC_DATA(avx512_mask_pmultishift_qb_512, INTR_TYPE_2OP_MASK,
X86ISD::MULTISHIFT, 0),		X86ISD::MULTISHIFT, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_b_128, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_b_128, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_b_256, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_b_256, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_b_512, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_b_512, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_w_128, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_w_128, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_w_256, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_w_256, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubs_w_512, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(avx512_mask_psubs_w_512, INTR_TYPE_2OP_MASK, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_b_128, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_b_256, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_b_512, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_w_128, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_w_256, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_psubus_w_512, INTR_TYPE_2OP_MASK, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(avx512_mask_range_pd_128, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),		X86_INTRINSIC_DATA(avx512_mask_range_pd_128, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),
X86_INTRINSIC_DATA(avx512_mask_range_pd_256, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),		X86_INTRINSIC_DATA(avx512_mask_range_pd_256, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),
X86_INTRINSIC_DATA(avx512_mask_range_pd_512, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, X86ISD::VRANGE_RND),		X86_INTRINSIC_DATA(avx512_mask_range_pd_512, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, X86ISD::VRANGE_RND),
X86_INTRINSIC_DATA(avx512_mask_range_ps_128, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),		X86_INTRINSIC_DATA(avx512_mask_range_ps_128, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),
X86_INTRINSIC_DATA(avx512_mask_range_ps_256, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),		X86_INTRINSIC_DATA(avx512_mask_range_ps_256, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, 0),
X86_INTRINSIC_DATA(avx512_mask_range_ps_512, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, X86ISD::VRANGE_RND),		X86_INTRINSIC_DATA(avx512_mask_range_ps_512, INTR_TYPE_3OP_MASK, X86ISD::VRANGE, X86ISD::VRANGE_RND),
X86_INTRINSIC_DATA(avx512_mask_range_sd, INTR_TYPE_SCALAR_MASK, X86ISD::VRANGES, X86ISD::VRANGES_RND),		X86_INTRINSIC_DATA(avx512_mask_range_sd, INTR_TYPE_SCALAR_MASK, X86ISD::VRANGES, X86ISD::VRANGES_RND),
X86_INTRINSIC_DATA(avx512_mask_range_ss, INTR_TYPE_SCALAR_MASK, X86ISD::VRANGES, X86ISD::VRANGES_RND),		X86_INTRINSIC_DATA(avx512_mask_range_ss, INTR_TYPE_SCALAR_MASK, X86ISD::VRANGES, X86ISD::VRANGES_RND),
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	static const IntrinsicData IntrinsicsWithoutChain[] = {
X86_INTRINSIC_DATA(sse2_min_pd, INTR_TYPE_2OP, X86ISD::FMIN, 0),		X86_INTRINSIC_DATA(sse2_min_pd, INTR_TYPE_2OP, X86ISD::FMIN, 0),
X86_INTRINSIC_DATA(sse2_min_sd, INTR_TYPE_2OP, X86ISD::FMINS, 0),		X86_INTRINSIC_DATA(sse2_min_sd, INTR_TYPE_2OP, X86ISD::FMINS, 0),
X86_INTRINSIC_DATA(sse2_movmsk_pd, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),		X86_INTRINSIC_DATA(sse2_movmsk_pd, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),
X86_INTRINSIC_DATA(sse2_packssdw_128, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(sse2_packssdw_128, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(sse2_packsswb_128, INTR_TYPE_2OP, X86ISD::PACKSS, 0),		X86_INTRINSIC_DATA(sse2_packsswb_128, INTR_TYPE_2OP, X86ISD::PACKSS, 0),
X86_INTRINSIC_DATA(sse2_packuswb_128, INTR_TYPE_2OP, X86ISD::PACKUS, 0),		X86_INTRINSIC_DATA(sse2_packuswb_128, INTR_TYPE_2OP, X86ISD::PACKUS, 0),
X86_INTRINSIC_DATA(sse2_padds_b, INTR_TYPE_2OP, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(sse2_padds_b, INTR_TYPE_2OP, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(sse2_padds_w, INTR_TYPE_2OP, X86ISD::ADDS, 0),		X86_INTRINSIC_DATA(sse2_padds_w, INTR_TYPE_2OP, X86ISD::ADDS, 0),
X86_INTRINSIC_DATA(sse2_paddus_b, INTR_TYPE_2OP, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(sse2_paddus_w, INTR_TYPE_2OP, X86ISD::ADDUS, 0),
X86_INTRINSIC_DATA(sse2_pmadd_wd, INTR_TYPE_2OP, X86ISD::VPMADDWD, 0),		X86_INTRINSIC_DATA(sse2_pmadd_wd, INTR_TYPE_2OP, X86ISD::VPMADDWD, 0),
X86_INTRINSIC_DATA(sse2_pmovmskb_128, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),		X86_INTRINSIC_DATA(sse2_pmovmskb_128, INTR_TYPE_1OP, X86ISD::MOVMSK, 0),
X86_INTRINSIC_DATA(sse2_pmulh_w, INTR_TYPE_2OP, ISD::MULHS, 0),		X86_INTRINSIC_DATA(sse2_pmulh_w, INTR_TYPE_2OP, ISD::MULHS, 0),
X86_INTRINSIC_DATA(sse2_pmulhu_w, INTR_TYPE_2OP, ISD::MULHU, 0),		X86_INTRINSIC_DATA(sse2_pmulhu_w, INTR_TYPE_2OP, ISD::MULHU, 0),
X86_INTRINSIC_DATA(sse2_psad_bw, INTR_TYPE_2OP, X86ISD::PSADBW, 0),		X86_INTRINSIC_DATA(sse2_psad_bw, INTR_TYPE_2OP, X86ISD::PSADBW, 0),
X86_INTRINSIC_DATA(sse2_psll_d, INTR_TYPE_2OP, X86ISD::VSHL, 0),		X86_INTRINSIC_DATA(sse2_psll_d, INTR_TYPE_2OP, X86ISD::VSHL, 0),
X86_INTRINSIC_DATA(sse2_psll_q, INTR_TYPE_2OP, X86ISD::VSHL, 0),		X86_INTRINSIC_DATA(sse2_psll_q, INTR_TYPE_2OP, X86ISD::VSHL, 0),
X86_INTRINSIC_DATA(sse2_psll_w, INTR_TYPE_2OP, X86ISD::VSHL, 0),		X86_INTRINSIC_DATA(sse2_psll_w, INTR_TYPE_2OP, X86ISD::VSHL, 0),
X86_INTRINSIC_DATA(sse2_pslli_d, VSHIFT, X86ISD::VSHLI, 0),		X86_INTRINSIC_DATA(sse2_pslli_d, VSHIFT, X86ISD::VSHLI, 0),
X86_INTRINSIC_DATA(sse2_pslli_q, VSHIFT, X86ISD::VSHLI, 0),		X86_INTRINSIC_DATA(sse2_pslli_q, VSHIFT, X86ISD::VSHLI, 0),
X86_INTRINSIC_DATA(sse2_pslli_w, VSHIFT, X86ISD::VSHLI, 0),		X86_INTRINSIC_DATA(sse2_pslli_w, VSHIFT, X86ISD::VSHLI, 0),
X86_INTRINSIC_DATA(sse2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(sse2_psra_d, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(sse2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),		X86_INTRINSIC_DATA(sse2_psra_w, INTR_TYPE_2OP, X86ISD::VSRA, 0),
X86_INTRINSIC_DATA(sse2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(sse2_psrai_d, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(sse2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),		X86_INTRINSIC_DATA(sse2_psrai_w, VSHIFT, X86ISD::VSRAI, 0),
X86_INTRINSIC_DATA(sse2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(sse2_psrl_d, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(sse2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(sse2_psrl_q, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(sse2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),		X86_INTRINSIC_DATA(sse2_psrl_w, INTR_TYPE_2OP, X86ISD::VSRL, 0),
X86_INTRINSIC_DATA(sse2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_d, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_q, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),		X86_INTRINSIC_DATA(sse2_psrli_w, VSHIFT, X86ISD::VSRLI, 0),
X86_INTRINSIC_DATA(sse2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(sse2_psubs_b, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(sse2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),		X86_INTRINSIC_DATA(sse2_psubs_w, INTR_TYPE_2OP, X86ISD::SUBS, 0),
X86_INTRINSIC_DATA(sse2_psubus_b, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(sse2_psubus_w, INTR_TYPE_2OP, X86ISD::SUBUS, 0),
X86_INTRINSIC_DATA(sse2_ucomieq_sd, COMI, X86ISD::UCOMI, ISD::SETEQ),		X86_INTRINSIC_DATA(sse2_ucomieq_sd, COMI, X86ISD::UCOMI, ISD::SETEQ),
X86_INTRINSIC_DATA(sse2_ucomige_sd, COMI, X86ISD::UCOMI, ISD::SETGE),		X86_INTRINSIC_DATA(sse2_ucomige_sd, COMI, X86ISD::UCOMI, ISD::SETGE),
X86_INTRINSIC_DATA(sse2_ucomigt_sd, COMI, X86ISD::UCOMI, ISD::SETGT),		X86_INTRINSIC_DATA(sse2_ucomigt_sd, COMI, X86ISD::UCOMI, ISD::SETGT),
X86_INTRINSIC_DATA(sse2_ucomile_sd, COMI, X86ISD::UCOMI, ISD::SETLE),		X86_INTRINSIC_DATA(sse2_ucomile_sd, COMI, X86ISD::UCOMI, ISD::SETLE),
X86_INTRINSIC_DATA(sse2_ucomilt_sd, COMI, X86ISD::UCOMI, ISD::SETLT),		X86_INTRINSIC_DATA(sse2_ucomilt_sd, COMI, X86ISD::UCOMI, ISD::SETLT),
X86_INTRINSIC_DATA(sse2_ucomineq_sd, COMI, X86ISD::UCOMI, ISD::SETNE),		X86_INTRINSIC_DATA(sse2_ucomineq_sd, COMI, X86ISD::UCOMI, ISD::SETNE),
X86_INTRINSIC_DATA(sse3_addsub_pd, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),		X86_INTRINSIC_DATA(sse3_addsub_pd, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),
X86_INTRINSIC_DATA(sse3_addsub_ps, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),		X86_INTRINSIC_DATA(sse3_addsub_ps, INTR_TYPE_2OP, X86ISD::ADDSUB, 0),
▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-canonical.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=avx2 -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2 --check-prefix=X86 --check-prefix=X86-AVX
				; RUN: llc < %s -mtriple=i686-apple-darwin -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512VL --check-prefix=X86 --check-prefix=X86-AVX512VL
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=avx2 -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX2 --check-prefix=X64 --check-prefix=X64-AVX
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=AVX512VL --check-prefix=X64 --check-prefix=X64-AVX512VL

				; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse2-builtins.c

				define <32 x i8> @test_x86_avx2_paddus_b(<32 x i8> %a0, <32 x i8> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_paddus_b:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_paddus_b:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_paddus_b:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_paddus_b:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%1 = add <32 x i8> %a0, %a1
				%2 = icmp ugt <32 x i8> %a0, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				ret <32 x i8> %3
				}

				define <16 x i16> @test_x86_avx2_paddus_w(<16 x i16> %a0, <16 x i16> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_paddus_w:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_paddus_w:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_paddus_w:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_paddus_w:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i16> %a0, %a1
				%2 = icmp ugt <16 x i16> %a0, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				ret <16 x i16> %3
				}

				define <32 x i8> @test_x86_avx2_psubus_b(<32 x i8> %a0, <32 x i8> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_psubus_b:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_psubus_b:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_psubus_b:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_psubus_b:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <32 x i8> %a0, %a1
				%sel = select <32 x i1> %cmp, <32 x i8> %a0, <32 x i8> %a1
				%sub = sub <32 x i8> %sel, %a1
				ret <32 x i8> %sub
				}

				define <16 x i16> @test_x86_avx2_psubus_w(<16 x i16> %a0, <16 x i16> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_psubus_w:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_psubus_w:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_psubus_w:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_psubus_w:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i16> %a0, %a1
				%sel = select <16 x i1> %cmp, <16 x i16> %a0, <16 x i16> %a1
				%sub = sub <16 x i16> %sel, %a1
				ret <16 x i16> %sub
				}

				define <32 x i16> @test_x86_avx2_paddus_w_512(<32 x i16> %a, <32 x i16> %b) {
				; X86-AVX-LABEL: test_x86_avx2_paddus_w_512:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpaddusw %ymm2, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc2]
				; X86-AVX-NEXT: vpaddusw %ymm3, %ymm1, %ymm1 ## encoding: [0xc5,0xf5,0xdd,0xcb]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_paddus_w_512:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 ## encoding: [0x62,0xf1,0x7d,0x48,0xdd,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_paddus_w_512:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpaddusw %ymm2, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc2]
				; X64-AVX-NEXT: vpaddusw %ymm3, %ymm1, %ymm1 ## encoding: [0xc5,0xf5,0xdd,0xcb]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_paddus_w_512:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 ## encoding: [0x62,0xf1,0x7d,0x48,0xdd,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				ret <32 x i16> %3
				}

				define <32 x i16> @test_x86_avx2_psubus_w_512(<32 x i16> %a, <32 x i16> %b) {
				; X86-AVX-LABEL: test_x86_avx2_psubus_w_512:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpsubusw %ymm2, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc2]
				; X86-AVX-NEXT: vpsubusw %ymm3, %ymm1, %ymm1 ## encoding: [0xc5,0xf5,0xd9,0xcb]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_psubus_w_512:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 ## encoding: [0x62,0xf1,0x7d,0x48,0xd9,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_psubus_w_512:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpsubusw %ymm2, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc2]
				; X64-AVX-NEXT: vpsubusw %ymm3, %ymm1, %ymm1 ## encoding: [0xc5,0xf5,0xd9,0xcb]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_psubus_w_512:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 ## encoding: [0x62,0xf1,0x7d,0x48,0xd9,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				ret <32 x i16> %sub
				}

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines

define <4 x i64> @test_mm256_adds_epu8(<4 x i64> %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_adds_epu8(<4 x i64> %a0, <4 x i64> %a1) {
; CHECK-LABEL: test_mm256_adds_epu8:		; CHECK-LABEL: test_mm256_adds_epu8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0		; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <32 x i8>		%arg0 = bitcast <4 x i64> %a0 to <32 x i8>
%arg1 = bitcast <4 x i64> %a1 to <32 x i8>		%arg1 = bitcast <4 x i64> %a1 to <32 x i8>
%res = call <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8> %arg0, <32 x i8> %arg1)		%1 = add <32 x i8> %arg0, %arg1
%bc = bitcast <32 x i8> %res to <4 x i64>		%2 = icmp ugt <32 x i8> %arg0, %1
		%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
		%bc = bitcast <32 x i8> %3 to <4 x i64>
ret <4 x i64> %bc		ret <4 x i64> %bc
}		}
declare <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8>, <32 x i8>) nounwind readnone

define <4 x i64> @test_mm256_adds_epu16(<4 x i64> %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_adds_epu16(<4 x i64> %a0, <4 x i64> %a1) {
; CHECK-LABEL: test_mm256_adds_epu16:		; CHECK-LABEL: test_mm256_adds_epu16:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0		; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <16 x i16>		%arg0 = bitcast <4 x i64> %a0 to <16 x i16>
%arg1 = bitcast <4 x i64> %a1 to <16 x i16>		%arg1 = bitcast <4 x i64> %a1 to <16 x i16>
%res = call <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16> %arg0, <16 x i16> %arg1)		%1 = add <16 x i16> %arg0, %arg1
%bc = bitcast <16 x i16> %res to <4 x i64>		%2 = icmp ugt <16 x i16> %arg0, %1
		%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
		%bc = bitcast <16 x i16> %3 to <4 x i64>
ret <4 x i64> %bc		ret <4 x i64> %bc
}		}
declare <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16>, <16 x i16>) nounwind readnone

define <4 x i64> @test_mm256_alignr_epi8(<4 x i64> %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_alignr_epi8(<4 x i64> %a0, <4 x i64> %a1) {
; CHECK-LABEL: test_mm256_alignr_epi8:		; CHECK-LABEL: test_mm256_alignr_epi8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpalignr {{.*#+}} ymm0 = ymm0[2,3,4,5,6,7,8,9,10,11,12,13,14,15],ymm1[0,1],ymm0[18,19,20,21,22,23,24,25,26,27,28,29,30,31],ymm1[16,17]		; CHECK-NEXT: vpalignr {{.*#+}} ymm0 = ymm0[2,3,4,5,6,7,8,9,10,11,12,13,14,15],ymm1[0,1],ymm0[18,19,20,21,22,23,24,25,26,27,28,29,30,31],ymm1[16,17]
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <32 x i8>		%arg0 = bitcast <4 x i64> %a0 to <32 x i8>
%arg1 = bitcast <4 x i64> %a1 to <32 x i8>		%arg1 = bitcast <4 x i64> %a1 to <32 x i8>
▲ Show 20 Lines • Show All 2,391 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret{{[l\|q]}}
%bc = bitcast <16 x i16> %res to <4 x i64>		%bc = bitcast <16 x i16> %res to <4 x i64>
ret <4 x i64> %bc		ret <4 x i64> %bc
}		}
declare <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16>, <16 x i16>) nounwind readnone		declare <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16>, <16 x i16>) nounwind readnone

define <4 x i64> @test_mm256_subs_epu8(<4 x i64> %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_subs_epu8(<4 x i64> %a0, <4 x i64> %a1) {
; CHECK-LABEL: test_mm256_subs_epu8:		; CHECK-LABEL: test_mm256_subs_epu8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm0		; CHECK-NEXT: vpmaxub %ymm1, %ymm0, %ymm0
		; CHECK-NEXT: vpsubb %ymm1, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <32 x i8>		%arg0 = bitcast <4 x i64> %a0 to <32 x i8>
%arg1 = bitcast <4 x i64> %a1 to <32 x i8>		%arg1 = bitcast <4 x i64> %a1 to <32 x i8>
%res = call <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8> %arg0, <32 x i8> %arg1)		%cmp = icmp ugt <32 x i8> %arg0, %arg1
%bc = bitcast <32 x i8> %res to <4 x i64>		%sel = select <32 x i1> %cmp, <32 x i8> %arg0, <32 x i8> %arg1
		%sub = sub <32 x i8> %sel, %arg1
		%bc = bitcast <32 x i8> %sub to <4 x i64>
ret <4 x i64> %bc		ret <4 x i64> %bc
}		}
declare <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8>, <32 x i8>) nounwind readnone

define <4 x i64> @test_mm256_subs_epu16(<4 x i64> %a0, <4 x i64> %a1) {		define <4 x i64> @test_mm256_subs_epu16(<4 x i64> %a0, <4 x i64> %a1) {
; CHECK-LABEL: test_mm256_subs_epu16:		; CHECK-LABEL: test_mm256_subs_epu16:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm0		; CHECK-NEXT: vpmaxuw %ymm1, %ymm0, %ymm0
		; CHECK-NEXT: vpsubw %ymm1, %ymm0, %ymm0
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <16 x i16>		%arg0 = bitcast <4 x i64> %a0 to <16 x i16>
%arg1 = bitcast <4 x i64> %a1 to <16 x i16>		%arg1 = bitcast <4 x i64> %a1 to <16 x i16>
%res = call <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16> %arg0, <16 x i16> %arg1)		%cmp = icmp ugt <16 x i16> %arg0, %arg1
%bc = bitcast <16 x i16> %res to <4 x i64>		%sel = select <16 x i1> %cmp, <16 x i16> %arg0, <16 x i16> %arg1
		%sub = sub <16 x i16> %sel, %arg1
		%bc = bitcast <16 x i16> %sub to <4 x i64>
ret <4 x i64> %bc		ret <4 x i64> %bc
}		}
declare <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16>, <16 x i16>) nounwind readnone

define <4 x i64> @test_mm256_unpackhi_epi8(<4 x i64> %a0, <4 x i64> %a1) nounwind {		define <4 x i64> @test_mm256_unpackhi_epi8(<4 x i64> %a0, <4 x i64> %a1) nounwind {
; CHECK-LABEL: test_mm256_unpackhi_epi8:		; CHECK-LABEL: test_mm256_unpackhi_epi8:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm0 = ymm0[8],ymm1[8],ymm0[9],ymm1[9],ymm0[10],ymm1[10],ymm0[11],ymm1[11],ymm0[12],ymm1[12],ymm0[13],ymm1[13],ymm0[14],ymm1[14],ymm0[15],ymm1[15],ymm0[24],ymm1[24],ymm0[25],ymm1[25],ymm0[26],ymm1[26],ymm0[27],ymm1[27],ymm0[28],ymm1[28],ymm0[29],ymm1[29],ymm0[30],ymm1[30],ymm0[31],ymm1[31]		; CHECK-NEXT: vpunpckhbw {{.*#+}} ymm0 = ymm0[8],ymm1[8],ymm0[9],ymm1[9],ymm0[10],ymm1[10],ymm0[11],ymm1[11],ymm0[12],ymm1[12],ymm0[13],ymm1[13],ymm0[14],ymm1[14],ymm0[15],ymm1[15],ymm0[24],ymm1[24],ymm0[25],ymm1[25],ymm0[26],ymm1[26],ymm0[27],ymm1[27],ymm0[28],ymm1[28],ymm0[29],ymm1[29],ymm0[30],ymm1[30],ymm0[31],ymm1[31]
; CHECK-NEXT: ret{{[l\|q]}}		; CHECK-NEXT: ret{{[l\|q]}}
%arg0 = bitcast <4 x i64> %a0 to <32 x i8>		%arg0 = bitcast <4 x i64> %a0 to <32 x i8>
%arg1 = bitcast <4 x i64> %a1 to <32 x i8>		%arg1 = bitcast <4 x i64> %a1 to <32 x i8>
▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 872 Lines • ▼ Show 20 Lines
	; X64-LABEL: test_x86_avx2_pmul_dq:			; X64-LABEL: test_x86_avx2_pmul_dq:
	; X64: ## %bb.0:			; X64: ## %bb.0:
	; X64-NEXT: vpmuldq %ymm1, %ymm0, %ymm0			; X64-NEXT: vpmuldq %ymm1, %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%res = call <4 x i64> @llvm.x86.avx2.pmul.dq(<8 x i32> %a0, <8 x i32> %a1) ; <<4 x i64>> [#uses=1]			%res = call <4 x i64> @llvm.x86.avx2.pmul.dq(<8 x i32> %a0, <8 x i32> %a1) ; <<4 x i64>> [#uses=1]
	ret <4 x i64> %res			ret <4 x i64> %res
	}			}
	declare <4 x i64> @llvm.x86.avx2.pmul.dq(<8 x i32>, <8 x i32>) nounwind readnone			declare <4 x i64> @llvm.x86.avx2.pmul.dq(<8 x i32>, <8 x i32>) nounwind readnone


				define <32 x i8> @test_x86_avx2_paddus_b(<32 x i8> %a0, <32 x i8> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_paddus_b:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_paddus_b:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_paddus_b:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_paddus_b:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8> %a0, <32 x i8> %a1) ; <<32 x i8>> [#uses=1]
				ret <32 x i8> %res
				}
				declare <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8>, <32 x i8>) nounwind readnone


				define <16 x i16> @test_x86_avx2_paddus_w(<16 x i16> %a0, <16 x i16> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_paddus_w:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_paddus_w:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_paddus_w:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_paddus_w:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
				ret <16 x i16> %res
				}
				declare <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16>, <16 x i16>) nounwind readnone


				define <32 x i8> @test_x86_avx2_psubus_b(<32 x i8> %a0, <32 x i8> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_psubus_b:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_psubus_b:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_psubus_b:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_psubus_b:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8> %a0, <32 x i8> %a1) ; <<32 x i8>> [#uses=1]
				ret <32 x i8> %res
				}
				declare <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8>, <32 x i8>) nounwind readnone


				define <16 x i16> @test_x86_avx2_psubus_w(<16 x i16> %a0, <16 x i16> %a1) {
				; X86-AVX-LABEL: test_x86_avx2_psubus_w:
				; X86-AVX: ## %bb.0:
				; X86-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
				; X86-AVX-NEXT: retl ## encoding: [0xc3]
				;
				; X86-AVX512VL-LABEL: test_x86_avx2_psubus_w:
				; X86-AVX512VL: ## %bb.0:
				; X86-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
				;
				; X64-AVX-LABEL: test_x86_avx2_psubus_w:
				; X64-AVX: ## %bb.0:
				; X64-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
				; X64-AVX-NEXT: retq ## encoding: [0xc3]
				;
				; X64-AVX512VL-LABEL: test_x86_avx2_psubus_w:
				; X64-AVX512VL: ## %bb.0:
				; X64-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
				ret <16 x i16> %res
				}
				declare <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16>, <16 x i16>) nounwind readnone

llvm/trunk/test/CodeGen/X86/avx2-intrinsics-x86.ll

Show First 20 Lines • Show All 227 Lines • ▼ Show 20 Lines
; X64-AVX512VL-NEXT: vpaddsw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xed,0xc1]		; X64-AVX512VL-NEXT: vpaddsw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xed,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]		; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <16 x i16> @llvm.x86.avx2.padds.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]		%res = call <16 x i16> @llvm.x86.avx2.padds.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
ret <16 x i16> %res		ret <16 x i16> %res
}		}
declare <16 x i16> @llvm.x86.avx2.padds.w(<16 x i16>, <16 x i16>) nounwind readnone		declare <16 x i16> @llvm.x86.avx2.padds.w(<16 x i16>, <16 x i16>) nounwind readnone


define <32 x i8> @test_x86_avx2_paddus_b(<32 x i8> %a0, <32 x i8> %a1) {
; X86-AVX-LABEL: test_x86_avx2_paddus_b:
; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
; X86-AVX-NEXT: retl ## encoding: [0xc3]
;
; X86-AVX512VL-LABEL: test_x86_avx2_paddus_b:
; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;
; X64-AVX-LABEL: test_x86_avx2_paddus_b:
; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdc,0xc1]
; X64-AVX-NEXT: retq ## encoding: [0xc3]
;
; X64-AVX512VL-LABEL: test_x86_avx2_paddus_b:
; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8> %a0, <32 x i8> %a1) ; <<32 x i8>> [#uses=1]
ret <32 x i8> %res
}
declare <32 x i8> @llvm.x86.avx2.paddus.b(<32 x i8>, <32 x i8>) nounwind readnone


define <16 x i16> @test_x86_avx2_paddus_w(<16 x i16> %a0, <16 x i16> %a1) {
; X86-AVX-LABEL: test_x86_avx2_paddus_w:
; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
; X86-AVX-NEXT: retl ## encoding: [0xc3]
;
; X86-AVX512VL-LABEL: test_x86_avx2_paddus_w:
; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;
; X64-AVX-LABEL: test_x86_avx2_paddus_w:
; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xdd,0xc1]
; X64-AVX-NEXT: retq ## encoding: [0xc3]
;
; X64-AVX512VL-LABEL: test_x86_avx2_paddus_w:
; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
ret <16 x i16> %res
}
declare <16 x i16> @llvm.x86.avx2.paddus.w(<16 x i16>, <16 x i16>) nounwind readnone


define <8 x i32> @test_x86_avx2_pmadd_wd(<16 x i16> %a0, <16 x i16> %a1) {		define <8 x i32> @test_x86_avx2_pmadd_wd(<16 x i16> %a0, <16 x i16> %a1) {
; X86-AVX-LABEL: test_x86_avx2_pmadd_wd:		; X86-AVX-LABEL: test_x86_avx2_pmadd_wd:
; X86-AVX: ## %bb.0:		; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xf5,0xc1]		; X86-AVX-NEXT: vpmaddwd %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xf5,0xc1]
; X86-AVX-NEXT: retl ## encoding: [0xc3]		; X86-AVX-NEXT: retl ## encoding: [0xc3]
;		;
; X86-AVX512VL-LABEL: test_x86_avx2_pmadd_wd:		; X86-AVX512VL-LABEL: test_x86_avx2_pmadd_wd:
; X86-AVX512VL: ## %bb.0:		; X86-AVX512VL: ## %bb.0:
▲ Show 20 Lines • Show All 677 Lines • ▼ Show 20 Lines
; X64-AVX512VL: ## %bb.0:		; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vpsubsw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xe9,0xc1]		; X64-AVX512VL-NEXT: vpsubsw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xe9,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]		; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]		%res = call <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
ret <16 x i16> %res		ret <16 x i16> %res
}		}
declare <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16>, <16 x i16>) nounwind readnone		declare <16 x i16> @llvm.x86.avx2.psubs.w(<16 x i16>, <16 x i16>) nounwind readnone


define <32 x i8> @test_x86_avx2_psubus_b(<32 x i8> %a0, <32 x i8> %a1) {
; X86-AVX-LABEL: test_x86_avx2_psubus_b:
; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
; X86-AVX-NEXT: retl ## encoding: [0xc3]
;
; X86-AVX512VL-LABEL: test_x86_avx2_psubus_b:
; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;
; X64-AVX-LABEL: test_x86_avx2_psubus_b:
; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd8,0xc1]
; X64-AVX-NEXT: retq ## encoding: [0xc3]
;
; X64-AVX512VL-LABEL: test_x86_avx2_psubus_b:
; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8> %a0, <32 x i8> %a1) ; <<32 x i8>> [#uses=1]
ret <32 x i8> %res
}
declare <32 x i8> @llvm.x86.avx2.psubus.b(<32 x i8>, <32 x i8>) nounwind readnone


define <16 x i16> @test_x86_avx2_psubus_w(<16 x i16> %a0, <16 x i16> %a1) {
; X86-AVX-LABEL: test_x86_avx2_psubus_w:
; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
; X86-AVX-NEXT: retl ## encoding: [0xc3]
;
; X86-AVX512VL-LABEL: test_x86_avx2_psubus_w:
; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;
; X64-AVX-LABEL: test_x86_avx2_psubus_w:
; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## encoding: [0xc5,0xfd,0xd9,0xc1]
; X64-AVX-NEXT: retq ## encoding: [0xc3]
;
; X64-AVX512VL-LABEL: test_x86_avx2_psubus_w:
; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16> %a0, <16 x i16> %a1) ; <<16 x i16>> [#uses=1]
ret <16 x i16> %res
}
declare <16 x i16> @llvm.x86.avx2.psubus.w(<16 x i16>, <16 x i16>) nounwind readnone

define <8 x i32> @test_x86_avx2_phadd_d(<8 x i32> %a0, <8 x i32> %a1) {		define <8 x i32> @test_x86_avx2_phadd_d(<8 x i32> %a0, <8 x i32> %a1) {
; X86-LABEL: test_x86_avx2_phadd_d:		; X86-LABEL: test_x86_avx2_phadd_d:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: vphaddd %ymm1, %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x02,0xc1]		; X86-NEXT: vphaddd %ymm1, %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x02,0xc1]
; X86-NEXT: retl ## encoding: [0xc3]		; X86-NEXT: retl ## encoding: [0xc3]
;		;
; X64-LABEL: test_x86_avx2_phadd_d:		; X64-LABEL: test_x86_avx2_phadd_d:
; X64: ## %bb.0:		; X64: ## %bb.0:
▲ Show 20 Lines • Show All 284 Lines • ▼ Show 20 Lines
declare <16 x i16> @llvm.x86.avx2.packusdw(<8 x i32>, <8 x i32>) nounwind readnone		declare <16 x i16> @llvm.x86.avx2.packusdw(<8 x i32>, <8 x i32>) nounwind readnone


define <16 x i16> @test_x86_avx2_packusdw_fold() {		define <16 x i16> @test_x86_avx2_packusdw_fold() {
; X86-AVX-LABEL: test_x86_avx2_packusdw_fold:		; X86-AVX-LABEL: test_x86_avx2_packusdw_fold:
; X86-AVX: ## %bb.0:		; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vmovaps {{.*#+}} ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]		; X86-AVX-NEXT: vmovaps {{.*#+}} ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]
; X86-AVX-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]		; X86-AVX-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]
; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI54_0, kind: FK_Data_4		; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI50_0, kind: FK_Data_4
; X86-AVX-NEXT: retl ## encoding: [0xc3]		; X86-AVX-NEXT: retl ## encoding: [0xc3]
;		;
; X86-AVX512VL-LABEL: test_x86_avx2_packusdw_fold:		; X86-AVX512VL-LABEL: test_x86_avx2_packusdw_fold:
; X86-AVX512VL: ## %bb.0:		; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vmovaps LCPI54_0, %ymm0 ## EVEX TO VEX Compression ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]		; X86-AVX512VL-NEXT: vmovaps LCPI50_0, %ymm0 ## EVEX TO VEX Compression ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]
; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]		; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]
; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI54_0, kind: FK_Data_4		; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI50_0, kind: FK_Data_4
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]		; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;		;
; X64-AVX-LABEL: test_x86_avx2_packusdw_fold:		; X64-AVX-LABEL: test_x86_avx2_packusdw_fold:
; X64-AVX: ## %bb.0:		; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vmovaps {{.*#+}} ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]		; X64-AVX-NEXT: vmovaps {{.*#+}} ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]
; X64-AVX-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]		; X64-AVX-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]
; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI54_0-4, kind: reloc_riprel_4byte		; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI50_0-4, kind: reloc_riprel_4byte
; X64-AVX-NEXT: retq ## encoding: [0xc3]		; X64-AVX-NEXT: retq ## encoding: [0xc3]
;		;
; X64-AVX512VL-LABEL: test_x86_avx2_packusdw_fold:		; X64-AVX512VL-LABEL: test_x86_avx2_packusdw_fold:
; X64-AVX512VL: ## %bb.0:		; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vmovaps {{.*}}(%rip), %ymm0 ## EVEX TO VEX Compression ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]		; X64-AVX512VL-NEXT: vmovaps {{.*}}(%rip), %ymm0 ## EVEX TO VEX Compression ymm0 = [0,0,0,0,255,32767,65535,0,0,0,0,0,0,0,0,0]
; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]		; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xfc,0x28,0x05,A,A,A,A]
; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI54_0-4, kind: reloc_riprel_4byte		; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI50_0-4, kind: reloc_riprel_4byte
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]		; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <16 x i16> @llvm.x86.avx2.packusdw(<8 x i32> zeroinitializer, <8 x i32> <i32 255, i32 32767, i32 65535, i32 -1, i32 -32767, i32 -65535, i32 0, i32 -256>)		%res = call <16 x i16> @llvm.x86.avx2.packusdw(<8 x i32> zeroinitializer, <8 x i32> <i32 255, i32 32767, i32 65535, i32 -1, i32 -32767, i32 -65535, i32 0, i32 -256>)
ret <16 x i16> %res		ret <16 x i16> %res
}		}


define <32 x i8> @test_x86_avx2_pblendvb(<32 x i8> %a0, <32 x i8> %a1, <32 x i8> %a2) {		define <32 x i8> @test_x86_avx2_pblendvb(<32 x i8> %a0, <32 x i8> %a1, <32 x i8> %a2) {
; X86-LABEL: test_x86_avx2_pblendvb:		; X86-LABEL: test_x86_avx2_pblendvb:
▲ Show 20 Lines • Show All 703 Lines • ▼ Show 20 Lines	; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
ret <4 x i32> %res		ret <4 x i32> %res
}		}

define <4 x i32> @test_x86_avx2_psrav_d_const(<4 x i32> %a0, <4 x i32> %a1) {		define <4 x i32> @test_x86_avx2_psrav_d_const(<4 x i32> %a0, <4 x i32> %a1) {
; X86-AVX-LABEL: test_x86_avx2_psrav_d_const:		; X86-AVX-LABEL: test_x86_avx2_psrav_d_const:
; X86-AVX: ## %bb.0:		; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [2,9,4294967284,23]		; X86-AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [2,9,4294967284,23]
; X86-AVX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]		; X86-AVX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]
; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI86_0, kind: FK_Data_4		; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI82_0, kind: FK_Data_4
; X86-AVX-NEXT: vpsravd LCPI86_1, %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]		; X86-AVX-NEXT: vpsravd LCPI82_1, %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]
; X86-AVX-NEXT: ## fixup A - offset: 5, value: LCPI86_1, kind: FK_Data_4		; X86-AVX-NEXT: ## fixup A - offset: 5, value: LCPI82_1, kind: FK_Data_4
; X86-AVX-NEXT: retl ## encoding: [0xc3]		; X86-AVX-NEXT: retl ## encoding: [0xc3]
;		;
; X86-AVX512VL-LABEL: test_x86_avx2_psrav_d_const:		; X86-AVX512VL-LABEL: test_x86_avx2_psrav_d_const:
; X86-AVX512VL: ## %bb.0:		; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vmovdqa LCPI86_0, %xmm0 ## EVEX TO VEX Compression xmm0 = [2,9,4294967284,23]		; X86-AVX512VL-NEXT: vmovdqa LCPI82_0, %xmm0 ## EVEX TO VEX Compression xmm0 = [2,9,4294967284,23]
; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]		; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]
; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI86_0, kind: FK_Data_4		; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI82_0, kind: FK_Data_4
; X86-AVX512VL-NEXT: vpsravd LCPI86_1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]		; X86-AVX512VL-NEXT: vpsravd LCPI82_1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]
; X86-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI86_1, kind: FK_Data_4		; X86-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI82_1, kind: FK_Data_4
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]		; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;		;
; X64-AVX-LABEL: test_x86_avx2_psrav_d_const:		; X64-AVX-LABEL: test_x86_avx2_psrav_d_const:
; X64-AVX: ## %bb.0:		; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [2,9,4294967284,23]		; X64-AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [2,9,4294967284,23]
; X64-AVX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]		; X64-AVX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]
; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI86_0-4, kind: reloc_riprel_4byte		; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI82_0-4, kind: reloc_riprel_4byte
; X64-AVX-NEXT: vpsravd {{.*}}(%rip), %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]		; X64-AVX-NEXT: vpsravd {{.*}}(%rip), %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]
; X64-AVX-NEXT: ## fixup A - offset: 5, value: LCPI86_1-4, kind: reloc_riprel_4byte		; X64-AVX-NEXT: ## fixup A - offset: 5, value: LCPI82_1-4, kind: reloc_riprel_4byte
; X64-AVX-NEXT: retq ## encoding: [0xc3]		; X64-AVX-NEXT: retq ## encoding: [0xc3]
;		;
; X64-AVX512VL-LABEL: test_x86_avx2_psrav_d_const:		; X64-AVX512VL-LABEL: test_x86_avx2_psrav_d_const:
; X64-AVX512VL: ## %bb.0:		; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vmovdqa {{.*}}(%rip), %xmm0 ## EVEX TO VEX Compression xmm0 = [2,9,4294967284,23]		; X64-AVX512VL-NEXT: vmovdqa {{.*}}(%rip), %xmm0 ## EVEX TO VEX Compression xmm0 = [2,9,4294967284,23]
; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]		; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x05,A,A,A,A]
; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI86_0-4, kind: reloc_riprel_4byte		; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI82_0-4, kind: reloc_riprel_4byte
; X64-AVX512VL-NEXT: vpsravd {{.*}}(%rip), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]		; X64-AVX512VL-NEXT: vpsravd {{.*}}(%rip), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x46,0x05,A,A,A,A]
; X64-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI86_1-4, kind: reloc_riprel_4byte		; X64-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI82_1-4, kind: reloc_riprel_4byte
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]		; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)		%res = call <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32> <i32 2, i32 9, i32 -12, i32 23>, <4 x i32> <i32 1, i32 18, i32 35, i32 52>)
ret <4 x i32> %res		ret <4 x i32> %res
}		}
declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone		declare <4 x i32> @llvm.x86.avx2.psrav.d(<4 x i32>, <4 x i32>) nounwind readnone

define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {		define <8 x i32> @test_x86_avx2_psrav_d_256(<8 x i32> %a0, <8 x i32> %a1) {
; X86-AVX-LABEL: test_x86_avx2_psrav_d_256:		; X86-AVX-LABEL: test_x86_avx2_psrav_d_256:
Show All 19 Lines	; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
ret <8 x i32> %res		ret <8 x i32> %res
}		}

define <8 x i32> @test_x86_avx2_psrav_d_256_const(<8 x i32> %a0, <8 x i32> %a1) {		define <8 x i32> @test_x86_avx2_psrav_d_256_const(<8 x i32> %a0, <8 x i32> %a1) {
; X86-AVX-LABEL: test_x86_avx2_psrav_d_256_const:		; X86-AVX-LABEL: test_x86_avx2_psrav_d_256_const:
; X86-AVX: ## %bb.0:		; X86-AVX: ## %bb.0:
; X86-AVX-NEXT: vmovdqa {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]		; X86-AVX-NEXT: vmovdqa {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
; X86-AVX-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]		; X86-AVX-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]
; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI88_0, kind: FK_Data_4		; X86-AVX-NEXT: ## fixup A - offset: 4, value: LCPI84_0, kind: FK_Data_4
; X86-AVX-NEXT: vpsravd LCPI88_1, %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]		; X86-AVX-NEXT: vpsravd LCPI84_1, %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]
; X86-AVX-NEXT: ## fixup A - offset: 5, value: LCPI88_1, kind: FK_Data_4		; X86-AVX-NEXT: ## fixup A - offset: 5, value: LCPI84_1, kind: FK_Data_4
; X86-AVX-NEXT: retl ## encoding: [0xc3]		; X86-AVX-NEXT: retl ## encoding: [0xc3]
;		;
; X86-AVX512VL-LABEL: test_x86_avx2_psrav_d_256_const:		; X86-AVX512VL-LABEL: test_x86_avx2_psrav_d_256_const:
; X86-AVX512VL: ## %bb.0:		; X86-AVX512VL: ## %bb.0:
; X86-AVX512VL-NEXT: vmovdqa LCPI88_0, %ymm0 ## EVEX TO VEX Compression ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]		; X86-AVX512VL-NEXT: vmovdqa LCPI84_0, %ymm0 ## EVEX TO VEX Compression ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]		; X86-AVX512VL-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]
; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI88_0, kind: FK_Data_4		; X86-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI84_0, kind: FK_Data_4
; X86-AVX512VL-NEXT: vpsravd LCPI88_1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]		; X86-AVX512VL-NEXT: vpsravd LCPI84_1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]
; X86-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI88_1, kind: FK_Data_4		; X86-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI84_1, kind: FK_Data_4
; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]		; X86-AVX512VL-NEXT: retl ## encoding: [0xc3]
;		;
; X64-AVX-LABEL: test_x86_avx2_psrav_d_256_const:		; X64-AVX-LABEL: test_x86_avx2_psrav_d_256_const:
; X64-AVX: ## %bb.0:		; X64-AVX: ## %bb.0:
; X64-AVX-NEXT: vmovdqa {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]		; X64-AVX-NEXT: vmovdqa {{.*#+}} ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
; X64-AVX-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]		; X64-AVX-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]
; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI88_0-4, kind: reloc_riprel_4byte		; X64-AVX-NEXT: ## fixup A - offset: 4, value: LCPI84_0-4, kind: reloc_riprel_4byte
; X64-AVX-NEXT: vpsravd {{.*}}(%rip), %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]		; X64-AVX-NEXT: vpsravd {{.*}}(%rip), %ymm0, %ymm0 ## encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]
; X64-AVX-NEXT: ## fixup A - offset: 5, value: LCPI88_1-4, kind: reloc_riprel_4byte		; X64-AVX-NEXT: ## fixup A - offset: 5, value: LCPI84_1-4, kind: reloc_riprel_4byte
; X64-AVX-NEXT: retq ## encoding: [0xc3]		; X64-AVX-NEXT: retq ## encoding: [0xc3]
;		;
; X64-AVX512VL-LABEL: test_x86_avx2_psrav_d_256_const:		; X64-AVX512VL-LABEL: test_x86_avx2_psrav_d_256_const:
; X64-AVX512VL: ## %bb.0:		; X64-AVX512VL: ## %bb.0:
; X64-AVX512VL-NEXT: vmovdqa {{.*}}(%rip), %ymm0 ## EVEX TO VEX Compression ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]		; X64-AVX512VL-NEXT: vmovdqa {{.*}}(%rip), %ymm0 ## EVEX TO VEX Compression ymm0 = [2,9,4294967284,23,4294967270,37,4294967256,51]
; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]		; X64-AVX512VL-NEXT: ## encoding: [0xc5,0xfd,0x6f,0x05,A,A,A,A]
; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI88_0-4, kind: reloc_riprel_4byte		; X64-AVX512VL-NEXT: ## fixup A - offset: 4, value: LCPI84_0-4, kind: reloc_riprel_4byte
; X64-AVX512VL-NEXT: vpsravd {{.*}}(%rip), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]		; X64-AVX512VL-NEXT: vpsravd {{.*}}(%rip), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x7d,0x46,0x05,A,A,A,A]
; X64-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI88_1-4, kind: reloc_riprel_4byte		; X64-AVX512VL-NEXT: ## fixup A - offset: 5, value: LCPI84_1-4, kind: reloc_riprel_4byte
; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]		; X64-AVX512VL-NEXT: retq ## encoding: [0xc3]
%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>)		%res = call <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32> <i32 2, i32 9, i32 -12, i32 23, i32 -26, i32 37, i32 -40, i32 51>, <8 x i32> <i32 1, i32 18, i32 35, i32 52, i32 69, i32 15, i32 32, i32 49>)
ret <8 x i32> %res		ret <8 x i32> %res
}		}
declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone		declare <8 x i32> @llvm.x86.avx2.psrav.d.256(<8 x i32>, <8 x i32>) nounwind readnone

define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {		define <2 x double> @test_x86_avx2_gather_d_pd(<2 x double> %a0, i8* %a1, <4 x i32> %idx, <2 x double> %mask) {
; X86-LABEL: test_x86_avx2_gather_d_pd:		; X86-LABEL: test_x86_avx2_gather_d_pd:
▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-canonical.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512BW
				; RUN: llc < %s -mtriple=i386-unknown-linux-gnu -mcpu=knl -mattr=+avx512bw \| FileCheck %s --check-prefix=ALL --check-prefix=AVX512F-32

				; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse2-builtins.c


				define <32 x i16> @test_mask_adds_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rr_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rr_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				ret <32 x i16> %3
				}

				define <32 x i16> @test_mask_adds_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rrk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rrk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i16> %3, <32 x i16> %passThru
				ret <32 x i16> %5
				}

				define <32 x i16> @test_mask_adds_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rrkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rrkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i16> %3, <32 x i16> zeroinitializer
				ret <32 x i16> %5
				}

				define <32 x i16> @test_mask_adds_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rm_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rm_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				ret <32 x i16> %3
				}

				define <32 x i16> @test_mask_adds_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rmk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rmk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i16> %3, <32 x i16> %passThru
				ret <32 x i16> %5
				}

				define <32 x i16> @test_mask_adds_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rmkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rmkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%1 = add <32 x i16> %a, %b
				%2 = icmp ugt <32 x i16> %a, %1
				%3 = select <32 x i1> %2, <32 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <32 x i16> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i16> %3, <32 x i16> zeroinitializer
				ret <32 x i16> %5
				}

				define <32 x i16> @test_mask_subs_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rr_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rr_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				ret <32 x i16> %sub
				}

				define <32 x i16> @test_mask_subs_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rrk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rrk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i16> %sub, <32 x i16> %passThru
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rrkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rrkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i16> %sub, <32 x i16> zeroinitializer
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rm_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rm_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				ret <32 x i16> %sub
				}

				define <32 x i16> @test_mask_subs_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rmk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rmk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i16> %sub, <32 x i16> %passThru
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rmkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rmkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%cmp = icmp ugt <32 x i16> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i16> %a, <32 x i16> %b
				%sub = sub <32 x i16> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i16> %sub, <32 x i16> zeroinitializer
				ret <32 x i16> %res
				}


				define <64 x i16> @test_mask_adds_epu16_rr_1024(<64 x i16> %a, <64 x i16> %b) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rr_1024:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpaddusw %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpaddusw %zmm3, %zmm1, %zmm1
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rr_1024:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: pushl %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
				; AVX512F-32-NEXT: .cfi_offset %ebp, -8
				; AVX512F-32-NEXT: movl %esp, %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa_register %ebp
				; AVX512F-32-NEXT: andl $-64, %esp
				; AVX512F-32-NEXT: subl $64, %esp
				; AVX512F-32-NEXT: vpaddusw %zmm2, %zmm0, %zmm0
				; AVX512F-32-NEXT: vpaddusw 8(%ebp), %zmm1, %zmm1
				; AVX512F-32-NEXT: movl %ebp, %esp
				; AVX512F-32-NEXT: popl %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa %esp, 4
				; AVX512F-32-NEXT: retl
				%1 = add <64 x i16> %a, %b
				%2 = icmp ugt <64 x i16> %a, %1
				%3 = select <64 x i1> %2, <64 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <64 x i16> %1
				ret <64 x i16> %3
				}

				define <64 x i16> @test_mask_subs_epu16_rr_1024(<64 x i16> %a, <64 x i16> %b) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rr_1024:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpsubusw %zmm2, %zmm0, %zmm0
				; AVX512BW-NEXT: vpsubusw %zmm3, %zmm1, %zmm1
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rr_1024:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: pushl %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa_offset 8
				; AVX512F-32-NEXT: .cfi_offset %ebp, -8
				; AVX512F-32-NEXT: movl %esp, %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa_register %ebp
				; AVX512F-32-NEXT: andl $-64, %esp
				; AVX512F-32-NEXT: subl $64, %esp
				; AVX512F-32-NEXT: vpsubusw %zmm2, %zmm0, %zmm0
				; AVX512F-32-NEXT: vpsubusw 8(%ebp), %zmm1, %zmm1
				; AVX512F-32-NEXT: movl %ebp, %esp
				; AVX512F-32-NEXT: popl %ebp
				; AVX512F-32-NEXT: .cfi_def_cfa %esp, 4
				; AVX512F-32-NEXT: retl
				%cmp = icmp ugt <64 x i16> %a, %b
				%sel = select <64 x i1> %cmp, <64 x i16> %a, <64 x i16> %b
				%sub = sub <64 x i16> %sel, %b
				ret <64 x i16> %sub
				}

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

	Show First 20 Lines • Show All 2,882 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> %x3, i32 %x4)			%res = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> %x3, i32 %x4)
	%res1 = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> zeroinitializer, i32 %x4)			%res1 = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> zeroinitializer, i32 %x4)
	%res2 = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> %x3, i32 -1)			%res2 = call <32 x i16> @llvm.x86.avx512.mask.dbpsadbw.512(<64 x i8> %x0, <64 x i8> %x1, i32 2, <32 x i16> %x3, i32 -1)
	%res3 = add <32 x i16> %res, %res1			%res3 = add <32 x i16> %res, %res1
	%res4 = add <32 x i16> %res3, %res2			%res4 = add <32 x i16> %res3, %res2
	ret <32 x i16> %res4			ret <32 x i16> %res4
	}			}

				define <32 x i16> @test_mask_adds_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rr_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rr_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_adds_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rrk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rrk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_adds_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rrkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rrkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_adds_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rm_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rm_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_adds_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rmk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rmk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_adds_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_adds_epu16_rmkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpaddusw (%rdi), %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_adds_epu16_rmkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpaddusw (%eax), %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
				ret <32 x i16> %res
				}

				declare <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)

				define <32 x i16> @test_mask_subs_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rr_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rr_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rrk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rrk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm2, %zmm0
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rrkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %edi, %k1
				; AVX512BW-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rrkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rm_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rm_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rmk_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm1 {%k1}
				; AVX512BW-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rmk_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm1 {%k1}
				; AVX512F-32-NEXT: vmovdqa64 %zmm1, %zmm0
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
				ret <32 x i16> %res
				}

				define <32 x i16> @test_mask_subs_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
				; AVX512BW-LABEL: test_mask_subs_epu16_rmkz_512:
				; AVX512BW: ## %bb.0:
				; AVX512BW-NEXT: kmovd %esi, %k1
				; AVX512BW-NEXT: vpsubusw (%rdi), %zmm0, %zmm0 {%k1} {z}
				; AVX512BW-NEXT: retq
				;
				; AVX512F-32-LABEL: test_mask_subs_epu16_rmkz_512:
				; AVX512F-32: # %bb.0:
				; AVX512F-32-NEXT: movl {{[0-9]+}}(%esp), %eax
				; AVX512F-32-NEXT: kmovd {{[0-9]+}}(%esp), %k1
				; AVX512F-32-NEXT: vpsubusw (%eax), %zmm0, %zmm0 {%k1} {z}
				; AVX512F-32-NEXT: retl
				%b = load <32 x i16>, <32 x i16>* %ptr_b
				%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
				ret <32 x i16> %res
				}

				declare <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)

llvm/trunk/test/CodeGen/X86/avx512bw-intrinsics.ll

	Show First 20 Lines • Show All 751 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b			%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubs.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)			%res = call <32 x i16> @llvm.x86.avx512.mask.psubs.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
	ret <32 x i16> %res			ret <32 x i16> %res
	}			}

	declare <32 x i16> @llvm.x86.avx512.mask.psubs.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)			declare <32 x i16> @llvm.x86.avx512.mask.psubs.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)

	define <32 x i16> @test_mask_adds_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
	; CHECK-LABEL: test_mask_adds_epu16_rr_512:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xdd,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_adds_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrk_512:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xdd,0xd1]
	; X86-NEXT: vmovdqa64 %zmm2, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrk_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %zmm1, %zmm0, %zmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xdd,0xd1]
	; X64-NEXT: vmovdqa64 %zmm2, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_adds_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrkz_512:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xdd,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrkz_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %zmm1, %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xdd,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_adds_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_adds_epu16_rm_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusw (%eax), %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rm_512:
	; X64: # %bb.0:
	; X64-NEXT: vpaddusw (%rdi), %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_adds_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmk_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusw (%eax), %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xdd,0x08]
	; X86-NEXT: vmovdqa64 %zmm1, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmk_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xdd,0x0f]
	; X64-NEXT: vmovdqa64 %zmm1, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_adds_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmkz_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusw (%eax), %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmkz_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
	ret <32 x i16> %res
	}

	declare <32 x i16> @llvm.x86.avx512.mask.paddus.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)

	define <32 x i16> @test_mask_subs_epu16_rr_512(<32 x i16> %a, <32 x i16> %b) {
	; CHECK-LABEL: test_mask_subs_epu16_rr_512:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xd9,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_subs_epu16_rrk_512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrk_512:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xd9,0xd1]
	; X86-NEXT: vmovdqa64 %zmm2, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrk_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %zmm1, %zmm0, %zmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xd9,0xd1]
	; X64-NEXT: vmovdqa64 %zmm2, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_subs_epu16_rrkz_512(<32 x i16> %a, <32 x i16> %b, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrkz_512:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xd9,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrkz_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %zmm1, %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xd9,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_subs_epu16_rm_512(<32 x i16> %a, <32 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_subs_epu16_rm_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusw (%eax), %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rm_512:
	; X64: # %bb.0:
	; X64-NEXT: vpsubusw (%rdi), %zmm0, %zmm0 # encoding: [0x62,0xf1,0x7d,0x48,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 -1)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_subs_epu16_rmk_512(<32 x i16> %a, <32 x i16>* %ptr_b, <32 x i16> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmk_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusw (%eax), %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xd9,0x08]
	; X86-NEXT: vmovdqa64 %zmm1, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmk_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x49,0xd9,0x0f]
	; X64-NEXT: vmovdqa64 %zmm1, %zmm0 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> %passThru, i32 %mask)
	ret <32 x i16> %res
	}

	define <32 x i16> @test_mask_subs_epu16_rmkz_512(<32 x i16> %a, <32 x i16>* %ptr_b, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmkz_512:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusw (%eax), %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmkz_512:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %zmm0, %zmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xc9,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i16>, <32 x i16>* %ptr_b
	%res = call <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16> %a, <32 x i16> %b, <32 x i16> zeroinitializer, i32 %mask)
	ret <32 x i16> %res
	}

	declare <32 x i16> @llvm.x86.avx512.mask.psubus.w.512(<32 x i16>, <32 x i16>, <32 x i16>, i32)

	define <32 x i16>@test_int_x86_avx512_mask_vpermt2var_hi_512(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3) {			define <32 x i16>@test_int_x86_avx512_mask_vpermt2var_hi_512(<32 x i16> %x0, <32 x i16> %x1, <32 x i16> %x2, i32 %x3) {
	; X86-LABEL: test_int_x86_avx512_mask_vpermt2var_hi_512:			; X86-LABEL: test_int_x86_avx512_mask_vpermt2var_hi_512:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vmovdqa64 %zmm1, %zmm3 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xd9]			; X86-NEXT: vmovdqa64 %zmm1, %zmm3 # encoding: [0x62,0xf1,0xfd,0x48,0x6f,0xd9]
	; X86-NEXT: vpermt2w %zmm2, %zmm0, %zmm3 # encoding: [0x62,0xf2,0xfd,0x48,0x7d,0xda]			; X86-NEXT: vpermt2w %zmm2, %zmm0, %zmm3 # encoding: [0x62,0xf2,0xfd,0x48,0x7d,0xda]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]			; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpermt2w %zmm2, %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf2,0xfd,0x49,0x7d,0xca]			; X86-NEXT: vpermt2w %zmm2, %zmm0, %zmm1 {%k1} # encoding: [0x62,0xf2,0xfd,0x49,0x7d,0xca]
	▲ Show 20 Lines • Show All 891 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics-canonical.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=knl -mattr=+avx512bw -mattr=+avx512vl --show-mc-encoding\| FileCheck %s

				; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse2-builtins.c


				define <8 x i16> @test_mask_adds_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: test_mask_adds_epu16_rr_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				ret <8 x i16> %3
				}

				define <8 x i16> @test_mask_adds_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rrk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusw %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xdd,0xd1]
				; CHECK-NEXT: vmovdqa %xmm2, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				%4 = bitcast i8 %mask to <8 x i1>
				%5 = select <8 x i1> %4, <8 x i16> %3, <8 x i16> %passThru
				ret <8 x i16> %5
				}

				define <8 x i16> @test_mask_adds_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rrkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xdd,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				%4 = bitcast i8 %mask to <8 x i1>
				%5 = select <8 x i1> %4, <8 x i16> %3, <8 x i16> zeroinitializer
				ret <8 x i16> %5
				}

				define <8 x i16> @test_mask_adds_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
				; CHECK-LABEL: test_mask_adds_epu16_rm_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				ret <8 x i16> %3
				}

				define <8 x i16> @test_mask_adds_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rmk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusw (%rdi), %xmm0, %xmm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xdd,0x0f]
				; CHECK-NEXT: vmovdqa %xmm1, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				%4 = bitcast i8 %mask to <8 x i1>
				%5 = select <8 x i1> %4, <8 x i16> %3, <8 x i16> %passThru
				ret <8 x i16> %5
				}

				define <8 x i16> @test_mask_adds_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rmkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xdd,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%1 = add <8 x i16> %a, %b
				%2 = icmp ugt <8 x i16> %a, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				%4 = bitcast i8 %mask to <8 x i1>
				%5 = select <8 x i1> %4, <8 x i16> %3, <8 x i16> zeroinitializer
				ret <8 x i16> %5
				}

				define <16 x i16> @test_mask_adds_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
				; CHECK-LABEL: test_mask_adds_epu16_rr_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				ret <16 x i16> %3
				}

				define <16 x i16> @test_mask_adds_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rrk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xdd,0xd1]
				; CHECK-NEXT: vmovdqa %ymm2, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i16> %3, <16 x i16> %passThru
				ret <16 x i16> %5
				}

				define <16 x i16> @test_mask_adds_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rrkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i16> %3, <16 x i16> zeroinitializer
				ret <16 x i16> %5
				}

				define <16 x i16> @test_mask_adds_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
				; CHECK-LABEL: test_mask_adds_epu16_rm_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				ret <16 x i16> %3
				}

				define <16 x i16> @test_mask_adds_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rmk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusw (%rdi), %ymm0, %ymm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xdd,0x0f]
				; CHECK-NEXT: vmovdqa %ymm1, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i16> %3, <16 x i16> %passThru
				ret <16 x i16> %5
				}

				define <16 x i16> @test_mask_adds_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu16_rmkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%1 = add <16 x i16> %a, %b
				%2 = icmp ugt <16 x i16> %a, %1
				%3 = select <16 x i1> %2, <16 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <16 x i16> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i16> %3, <16 x i16> zeroinitializer
				ret <16 x i16> %5
				}

				define <8 x i16> @test_mask_subs_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: test_mask_subs_epu16_rr_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				ret <8 x i16> %sub
				}

				define <8 x i16> @test_mask_subs_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rrk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusw %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xd9,0xd1]
				; CHECK-NEXT: vmovdqa %xmm2, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				%bc = bitcast i8 %mask to <8 x i1>
				%res = select <8 x i1> %bc, <8 x i16> %sub, <8 x i16> %passThru
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rrkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xd9,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				%bc = bitcast i8 %mask to <8 x i1>
				%res = select <8 x i1> %bc, <8 x i16> %sub, <8 x i16> zeroinitializer
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
				; CHECK-LABEL: test_mask_subs_epu16_rm_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				ret <8 x i16> %sub
				}

				define <8 x i16> @test_mask_subs_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rmk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusw (%rdi), %xmm0, %xmm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xd9,0x0f]
				; CHECK-NEXT: vmovdqa %xmm1, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				%bc = bitcast i8 %mask to <8 x i1>
				%res = select <8 x i1> %bc, <8 x i16> %sub, <8 x i16> %passThru
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rmkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xd9,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%cmp = icmp ugt <8 x i16> %a, %b
				%sel = select <8 x i1> %cmp, <8 x i16> %a, <8 x i16> %b
				%sub = sub <8 x i16> %sel, %b
				%bc = bitcast i8 %mask to <8 x i1>
				%res = select <8 x i1> %bc, <8 x i16> %sub, <8 x i16> zeroinitializer
				ret <8 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
				; CHECK-LABEL: test_mask_subs_epu16_rr_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				ret <16 x i16> %sub
				}

				define <16 x i16> @test_mask_subs_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rrk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xd9,0xd1]
				; CHECK-NEXT: vmovdqa %ymm2, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i16> %sub, <16 x i16> %passThru
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rrkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i16> %sub, <16 x i16> zeroinitializer
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
				; CHECK-LABEL: test_mask_subs_epu16_rm_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				ret <16 x i16> %sub
				}

				define <16 x i16> @test_mask_subs_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rmk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusw (%rdi), %ymm0, %ymm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xd9,0x0f]
				; CHECK-NEXT: vmovdqa %ymm1, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i16> %sub, <16 x i16> %passThru
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu16_rmkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%cmp = icmp ugt <16 x i16> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i16> %a, <16 x i16> %b
				%sub = sub <16 x i16> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i16> %sub, <16 x i16> zeroinitializer
				ret <16 x i16> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
				; CHECK-LABEL: test_mask_adds_epu8_rr_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				ret <16 x i8> %3
				}

				define <16 x i8> @test_mask_adds_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rrk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusb %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xdc,0xd1]
				; CHECK-NEXT: vmovdqa %xmm2, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i8> %3, <16 x i8> %passThru
				ret <16 x i8> %5
				}

				define <16 x i8> @test_mask_adds_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rrkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xdc,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i8> %3, <16 x i8> zeroinitializer
				ret <16 x i8> %5
				}

				define <16 x i8> @test_mask_adds_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
				; CHECK-LABEL: test_mask_adds_epu8_rm_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				ret <16 x i8> %3
				}

				define <16 x i8> @test_mask_adds_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rmk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusb (%rdi), %xmm0, %xmm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xdc,0x0f]
				; CHECK-NEXT: vmovdqa %xmm1, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i8> %3, <16 x i8> %passThru
				ret <16 x i8> %5
				}

				define <16 x i8> @test_mask_adds_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rmkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xdc,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%1 = add <16 x i8> %a, %b
				%2 = icmp ugt <16 x i8> %a, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				%4 = bitcast i16 %mask to <16 x i1>
				%5 = select <16 x i1> %4, <16 x i8> %3, <16 x i8> zeroinitializer
				ret <16 x i8> %5
				}

				define <32 x i8> @test_mask_adds_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
				; CHECK-LABEL: test_mask_adds_epu8_rr_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				ret <32 x i8> %3
				}

				define <32 x i8> @test_mask_adds_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rrk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xdc,0xd1]
				; CHECK-NEXT: vmovdqa %ymm2, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i8> %3, <32 x i8> %passThru
				ret <32 x i8> %5
				}

				define <32 x i8> @test_mask_adds_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rrkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i8> %3, <32 x i8> zeroinitializer
				ret <32 x i8> %5
				}

				define <32 x i8> @test_mask_adds_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
				; CHECK-LABEL: test_mask_adds_epu8_rm_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				ret <32 x i8> %3
				}

				define <32 x i8> @test_mask_adds_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rmk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusb (%rdi), %ymm0, %ymm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xdc,0x0f]
				; CHECK-NEXT: vmovdqa %ymm1, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i8> %3, <32 x i8> %passThru
				ret <32 x i8> %5
				}

				define <32 x i8> @test_mask_adds_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
				; CHECK-LABEL: test_mask_adds_epu8_rmkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%1 = add <32 x i8> %a, %b
				%2 = icmp ugt <32 x i8> %a, %1
				%3 = select <32 x i1> %2, <32 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <32 x i8> %1
				%4 = bitcast i32 %mask to <32 x i1>
				%5 = select <32 x i1> %4, <32 x i8> %3, <32 x i8> zeroinitializer
				ret <32 x i8> %5
				}

				define <16 x i8> @test_mask_subs_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
				; CHECK-LABEL: test_mask_subs_epu8_rr_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				ret <16 x i8> %sub
				}

				define <16 x i8> @test_mask_subs_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rrk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusb %xmm1, %xmm0, %xmm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xd8,0xd1]
				; CHECK-NEXT: vmovdqa %xmm2, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i8> %sub, <16 x i8> %passThru
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rrkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xd8,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i8> %sub, <16 x i8> zeroinitializer
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
				; CHECK-LABEL: test_mask_subs_epu8_rm_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				ret <16 x i8> %sub
				}

				define <16 x i8> @test_mask_subs_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rmk_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusb (%rdi), %xmm0, %xmm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x09,0xd8,0x0f]
				; CHECK-NEXT: vmovdqa %xmm1, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i8> %sub, <16 x i8> %passThru
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rmkz_128:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0x89,0xd8,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%cmp = icmp ugt <16 x i8> %a, %b
				%sel = select <16 x i1> %cmp, <16 x i8> %a, <16 x i8> %b
				%sub = sub <16 x i8> %sel, %b
				%bc = bitcast i16 %mask to <16 x i1>
				%res = select <16 x i1> %bc, <16 x i8> %sub, <16 x i8> zeroinitializer
				ret <16 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
				; CHECK-LABEL: test_mask_subs_epu8_rr_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				ret <32 x i8> %sub
				}

				define <32 x i8> @test_mask_subs_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rrk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm2 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xd8,0xd1]
				; CHECK-NEXT: vmovdqa %ymm2, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i8> %sub, <32 x i8> %passThru
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rrkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %edi, %k1 ## encoding: [0xc5,0xfb,0x92,0xcf]
				; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i8> %sub, <32 x i8> zeroinitializer
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
				; CHECK-LABEL: test_mask_subs_epu8_rm_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				ret <32 x i8> %sub
				}

				define <32 x i8> @test_mask_subs_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rmk_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusb (%rdi), %ymm0, %ymm1 {%k1} ## encoding: [0x62,0xf1,0x7d,0x29,0xd8,0x0f]
				; CHECK-NEXT: vmovdqa %ymm1, %ymm0 ## EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i8> %sub, <32 x i8> %passThru
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
				; CHECK-LABEL: test_mask_subs_epu8_rmkz_256:
				; CHECK: ## %bb.0:
				; CHECK-NEXT: kmovd %esi, %k1 ## encoding: [0xc5,0xfb,0x92,0xce]
				; CHECK-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 {%k1} {z} ## encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0x07]
				; CHECK-NEXT: retq ## encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%cmp = icmp ugt <32 x i8> %a, %b
				%sel = select <32 x i1> %cmp, <32 x i8> %a, <32 x i8> %b
				%sub = sub <32 x i8> %sel, %b
				%bc = bitcast i32 %mask to <32 x i1>
				%res = select <32 x i1> %bc, <32 x i8> %sub, <32 x i8> zeroinitializer
				ret <32 x i8> %res
				}

llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 6,339 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> %x3, i16 %x4)			%res = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> %x3, i16 %x4)
	%res1 = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> zeroinitializer, i16 %x4)			%res1 = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> zeroinitializer, i16 %x4)
	%res2 = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> %x3, i16 -1)			%res2 = call <16 x i16> @llvm.x86.avx512.mask.dbpsadbw.256(<32 x i8> %x0, <32 x i8> %x1, i32 2, <16 x i16> %x3, i16 -1)
	%res3 = add <16 x i16> %res, %res1			%res3 = add <16 x i16> %res, %res1
	%res4 = add <16 x i16> %res3, %res2			%res4 = add <16 x i16> %res3, %res2
	ret <16 x i16> %res4			ret <16 x i16> %res4
	}			}

				define <8 x i16> @test_mask_adds_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: test_mask_adds_epu16_rr_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_adds_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rrk_128:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
				; X86-NEXT: vpaddusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0xd1]
				; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rrk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0xd1]
				; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_adds_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rrkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
				; X86-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rrkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_adds_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
				; X86-LABEL: test_mask_adds_epu16_rm_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rm_128:
				; X64: # %bb.0:
				; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_adds_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rmk_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx # encoding: [0x0f,0xb6,0x4c,0x24,0x08]
				; X86-NEXT: kmovd %ecx, %k1 # encoding: [0xc5,0xfb,0x92,0xc9]
				; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0x08]
				; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rmk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0x0f]
				; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_adds_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rmkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx # encoding: [0x0f,0xb6,0x4c,0x24,0x08]
				; X86-NEXT: kmovd %ecx, %k1 # encoding: [0xc5,0xfb,0x92,0xc9]
				; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rmkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
				ret <8 x i16> %res
				}

				declare <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16>, <8 x i16>, <8 x i16>, i8)

				define <16 x i16> @test_mask_adds_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
				; CHECK-LABEL: test_mask_adds_epu16_rr_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_adds_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rrk_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0xd1]
				; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rrk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0xd1]
				; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_adds_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rrkz_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rrkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_adds_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
				; X86-LABEL: test_mask_adds_epu16_rm_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rm_256:
				; X64: # %bb.0:
				; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_adds_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rmk_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0x08]
				; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rmk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0x0f]
				; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_adds_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu16_rmkz_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu16_rmkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
				ret <16 x i16> %res
				}

				declare <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)

				define <8 x i16> @test_mask_subs_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
				; CHECK-LABEL: test_mask_subs_epu16_rr_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rrk_128:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
				; X86-NEXT: vpsubusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0xd1]
				; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rrk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0xd1]
				; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rrkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
				; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
				; X86-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rrkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
				; X86-LABEL: test_mask_subs_epu16_rm_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rm_128:
				; X64: # %bb.0:
				; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rmk_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx # encoding: [0x0f,0xb6,0x4c,0x24,0x08]
				; X86-NEXT: kmovd %ecx, %k1 # encoding: [0xc5,0xfb,0x92,0xc9]
				; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0x08]
				; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rmk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0x0f]
				; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
				ret <8 x i16> %res
				}

				define <8 x i16> @test_mask_subs_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rmkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: movzbl {{[0-9]+}}(%esp), %ecx # encoding: [0x0f,0xb6,0x4c,0x24,0x08]
				; X86-NEXT: kmovd %ecx, %k1 # encoding: [0xc5,0xfb,0x92,0xc9]
				; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rmkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <8 x i16>, <8 x i16>* %ptr_b
				%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
				ret <8 x i16> %res
				}

				declare <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16>, <8 x i16>, <8 x i16>, i8)

				define <16 x i16> @test_mask_subs_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
				; CHECK-LABEL: test_mask_subs_epu16_rr_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rrk_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0xd1]
				; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rrk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0xd1]
				; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rrkz_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rrkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
				; X86-LABEL: test_mask_subs_epu16_rm_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rm_256:
				; X64: # %bb.0:
				; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rmk_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0x08]
				; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rmk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0x0f]
				; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
				ret <16 x i16> %res
				}

				define <16 x i16> @test_mask_subs_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu16_rmkz_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu16_rmkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i16>, <16 x i16>* %ptr_b
				%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
				ret <16 x i16> %res
				}

				declare <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)

				define <16 x i8> @test_mask_adds_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
				; CHECK-LABEL: test_mask_adds_epu8_rr_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rrk_128:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0xd1]
				; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rrk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0xd1]
				; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rrkz_128:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rrkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
				; X86-LABEL: test_mask_adds_epu8_rm_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rm_128:
				; X64: # %bb.0:
				; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rmk_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0x08]
				; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rmk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0x0f]
				; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_adds_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rmkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rmkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
				ret <16 x i8> %res
				}

				declare <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8>, <16 x i8>, <16 x i8>, i16)

				define <32 x i8> @test_mask_adds_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
				; CHECK-LABEL: test_mask_adds_epu8_rr_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_adds_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rrk_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0xd1]
				; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rrk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0xd1]
				; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_adds_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rrkz_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rrkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_adds_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
				; X86-LABEL: test_mask_adds_epu8_rm_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rm_256:
				; X64: # %bb.0:
				; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_adds_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rmk_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0x08]
				; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rmk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0x0f]
				; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_adds_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
				; X86-LABEL: test_mask_adds_epu8_rmkz_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_adds_epu8_rmkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
				ret <32 x i8> %res
				}

				declare <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)

				define <16 x i8> @test_mask_subs_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
				; CHECK-LABEL: test_mask_subs_epu8_rr_128:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rrk_128:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0xd1]
				; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rrk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0xd1]
				; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rrkz_128:
				; X86: # %bb.0:
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rrkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
				; X86-LABEL: test_mask_subs_epu8_rm_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rm_128:
				; X64: # %bb.0:
				; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rmk_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0x08]
				; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rmk_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0x0f]
				; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
				ret <16 x i8> %res
				}

				define <16 x i8> @test_mask_subs_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rmkz_128:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rmkz_128:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <16 x i8>, <16 x i8>* %ptr_b
				%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
				ret <16 x i8> %res
				}

				declare <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8>, <16 x i8>, <16 x i8>, i16)

				define <32 x i8> @test_mask_subs_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
				; CHECK-LABEL: test_mask_subs_epu8_rr_256:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
				; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rrk_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0xd1]
				; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rrk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0xd1]
				; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rrkz_256:
				; X86: # %bb.0:
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
				; X86-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rrkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
				; X64-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
				; X86-LABEL: test_mask_subs_epu8_rm_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rm_256:
				; X64: # %bb.0:
				; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rmk_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0x08]
				; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rmk_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0x0f]
				; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
				ret <32 x i8> %res
				}

				define <32 x i8> @test_mask_subs_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
				; X86-LABEL: test_mask_subs_epu8_rmkz_256:
				; X86: # %bb.0:
				; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
				; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
				; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0x00]
				; X86-NEXT: retl # encoding: [0xc3]
				;
				; X64-LABEL: test_mask_subs_epu8_rmkz_256:
				; X64: # %bb.0:
				; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
				; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0x07]
				; X64-NEXT: retq # encoding: [0xc3]
				%b = load <32 x i8>, <32 x i8>* %ptr_b
				%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
				ret <32 x i8> %res
				}

				declare <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)

llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics.ll

	Show First 20 Lines • Show All 1,525 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b			%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubs.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)			%res = call <16 x i16> @llvm.x86.avx512.mask.psubs.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
	ret <16 x i16> %res			ret <16 x i16> %res
	}			}

	declare <16 x i16> @llvm.x86.avx512.mask.psubs.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)			declare <16 x i16> @llvm.x86.avx512.mask.psubs.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)

	define <8 x i16> @test_mask_adds_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
	; CHECK-LABEL: test_mask_adds_epu16_rr_128:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_adds_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrk_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: vpaddusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0xd1]
	; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0xd1]
	; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_adds_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_adds_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_adds_epu16_rm_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rm_128:
	; X64: # %bb.0:
	; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_adds_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmk_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x08]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0x08]
	; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdd,0x0f]
	; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_adds_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x08]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusw (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
	ret <8 x i16> %res
	}

	declare <8 x i16> @llvm.x86.avx512.mask.paddus.w.128(<8 x i16>, <8 x i16>, <8 x i16>, i8)

	define <16 x i16> @test_mask_adds_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
	; CHECK-LABEL: test_mask_adds_epu16_rr_256:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_adds_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrk_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0xd1]
	; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0xd1]
	; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_adds_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rrkz_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rrkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_adds_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_adds_epu16_rm_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rm_256:
	; X64: # %bb.0:
	; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_adds_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmk_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0x08]
	; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdd,0x0f]
	; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_adds_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu16_rmkz_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusw (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu16_rmkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusw (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdd,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
	ret <16 x i16> %res
	}

	declare <16 x i16> @llvm.x86.avx512.mask.paddus.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)

	define <8 x i16> @test_mask_subs_epu16_rr_128(<8 x i16> %a, <8 x i16> %b) {
	; CHECK-LABEL: test_mask_subs_epu16_rr_128:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_subs_epu16_rrk_128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrk_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: vpsubusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0xd1]
	; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0xd1]
	; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_subs_epu16_rrkz_128(<8 x i16> %a, <8 x i16> %b, i8 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x04]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_subs_epu16_rm_128(<8 x i16> %a, <8 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_subs_epu16_rm_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rm_128:
	; X64: # %bb.0:
	; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 -1)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_subs_epu16_rmk_128(<8 x i16> %a, <8 x i16>* %ptr_b, <8 x i16> %passThru, i8 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmk_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x08]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0x08]
	; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd9,0x0f]
	; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> %passThru, i8 %mask)
	ret <8 x i16> %res
	}

	define <8 x i16> @test_mask_subs_epu16_rmkz_128(<8 x i16> %a, <8 x i16>* %ptr_b, i8 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movzbl {{[0-9]+}}(%esp), %eax # encoding: [0x0f,0xb6,0x44,0x24,0x08]
	; X86-NEXT: kmovd %eax, %k1 # encoding: [0xc5,0xfb,0x92,0xc8]
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusw (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <8 x i16>, <8 x i16>* %ptr_b
	%res = call <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16> %a, <8 x i16> %b, <8 x i16> zeroinitializer, i8 %mask)
	ret <8 x i16> %res
	}

	declare <8 x i16> @llvm.x86.avx512.mask.psubus.w.128(<8 x i16>, <8 x i16>, <8 x i16>, i8)

	define <16 x i16> @test_mask_subs_epu16_rr_256(<16 x i16> %a, <16 x i16> %b) {
	; CHECK-LABEL: test_mask_subs_epu16_rr_256:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_subs_epu16_rrk_256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrk_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0xd1]
	; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0xd1]
	; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_subs_epu16_rrkz_256(<16 x i16> %a, <16 x i16> %b, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rrkz_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rrkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusw %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_subs_epu16_rm_256(<16 x i16> %a, <16 x i16>* %ptr_b) {
	; X86-LABEL: test_mask_subs_epu16_rm_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rm_256:
	; X64: # %bb.0:
	; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 -1)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_subs_epu16_rmk_256(<16 x i16> %a, <16 x i16>* %ptr_b, <16 x i16> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmk_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0x08]
	; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd9,0x0f]
	; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> %passThru, i16 %mask)
	ret <16 x i16> %res
	}

	define <16 x i16> @test_mask_subs_epu16_rmkz_256(<16 x i16> %a, <16 x i16>* %ptr_b, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu16_rmkz_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusw (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu16_rmkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusw (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd9,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i16>, <16 x i16>* %ptr_b
	%res = call <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16> %a, <16 x i16> %b, <16 x i16> zeroinitializer, i16 %mask)
	ret <16 x i16> %res
	}

	declare <16 x i16> @llvm.x86.avx512.mask.psubus.w.256(<16 x i16>, <16 x i16>, <16 x i16>, i16)

	define <16 x i8> @test_mask_adds_epi8_rr_128(<16 x i8> %a, <16 x i8> %b) {			define <16 x i8> @test_mask_adds_epi8_rr_128(<16 x i8> %a, <16 x i8> %b) {
	; CHECK-LABEL: test_mask_adds_epi8_rr_128:			; CHECK-LABEL: test_mask_adds_epi8_rr_128:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddsb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xec,0xc1]			; CHECK-NEXT: vpaddsb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xec,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]			; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.padds.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)			%res = call <16 x i8> @llvm.x86.avx512.mask.padds.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
	ret <16 x i8> %res			ret <16 x i8> %res
	}			}
	▲ Show 20 Lines • Show All 380 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq # encoding: [0xc3]			; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b			%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubs.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)			%res = call <32 x i8> @llvm.x86.avx512.mask.psubs.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
	ret <32 x i8> %res			ret <32 x i8> %res
	}			}

	declare <32 x i8> @llvm.x86.avx512.mask.psubs.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)			declare <32 x i8> @llvm.x86.avx512.mask.psubs.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)

	define <16 x i8> @test_mask_adds_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
	; CHECK-LABEL: test_mask_adds_epu8_rr_128:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_adds_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rrk_128:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0xd1]
	; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rrk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0xd1]
	; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_adds_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rrkz_128:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rrkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_adds_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
	; X86-LABEL: test_mask_adds_epu8_rm_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rm_128:
	; X64: # %bb.0:
	; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_adds_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rmk_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0x08]
	; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rmk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xdc,0x0f]
	; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_adds_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rmkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusb (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rmkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusb (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xdc,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
	ret <16 x i8> %res
	}

	declare <16 x i8> @llvm.x86.avx512.mask.paddus.b.128(<16 x i8>, <16 x i8>, <16 x i8>, i16)

	define <32 x i8> @test_mask_adds_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
	; CHECK-LABEL: test_mask_adds_epu8_rr_256:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_adds_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rrk_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0xd1]
	; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rrk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0xd1]
	; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_adds_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rrkz_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rrkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpaddusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_adds_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
	; X86-LABEL: test_mask_adds_epu8_rm_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rm_256:
	; X64: # %bb.0:
	; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xdc,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_adds_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rmk_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0x08]
	; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rmk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xdc,0x0f]
	; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_adds_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
	; X86-LABEL: test_mask_adds_epu8_rmkz_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpaddusb (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_adds_epu8_rmkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpaddusb (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xdc,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
	ret <32 x i8> %res
	}

	declare <32 x i8> @llvm.x86.avx512.mask.paddus.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)

	define <16 x i8> @test_mask_subs_epu8_rr_128(<16 x i8> %a, <16 x i8> %b) {
	; CHECK-LABEL: test_mask_subs_epu8_rr_128:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_subs_epu8_rrk_128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rrk_128:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0xd1]
	; X86-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rrk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusb %xmm1, %xmm0, %xmm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0xd1]
	; X64-NEXT: vmovdqa %xmm2, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_subs_epu8_rrkz_128(<16 x i8> %a, <16 x i8> %b, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rrkz_128:
	; X86: # %bb.0:
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rrkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_subs_epu8_rm_128(<16 x i8> %a, <16 x i8>* %ptr_b) {
	; X86-LABEL: test_mask_subs_epu8_rm_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rm_128:
	; X64: # %bb.0:
	; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 -1)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_subs_epu8_rmk_128(<16 x i8> %a, <16 x i8>* %ptr_b, <16 x i8> %passThru, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rmk_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0x08]
	; X86-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rmk_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x09,0xd8,0x0f]
	; X64-NEXT: vmovdqa %xmm1, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> %passThru, i16 %mask)
	ret <16 x i8> %res
	}

	define <16 x i8> @test_mask_subs_epu8_rmkz_128(<16 x i8> %a, <16 x i8>* %ptr_b, i16 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rmkz_128:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1 # encoding: [0xc5,0xf8,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusb (%eax), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rmkz_128:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusb (%rdi), %xmm0, %xmm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0x89,0xd8,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <16 x i8>, <16 x i8>* %ptr_b
	%res = call <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8> %a, <16 x i8> %b, <16 x i8> zeroinitializer, i16 %mask)
	ret <16 x i8> %res
	}

	declare <16 x i8> @llvm.x86.avx512.mask.psubus.b.128(<16 x i8>, <16 x i8>, <16 x i8>, i16)

	define <32 x i8> @test_mask_subs_epu8_rr_256(<32 x i8> %a, <32 x i8> %b) {
	; CHECK-LABEL: test_mask_subs_epu8_rr_256:
	; CHECK: # %bb.0:
	; CHECK-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0xc1]
	; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_subs_epu8_rrk_256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rrk_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0xd1]
	; X86-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rrk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusb %ymm1, %ymm0, %ymm2 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0xd1]
	; X64-NEXT: vmovdqa %ymm2, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc2]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_subs_epu8_rrkz_256(<32 x i8> %a, <32 x i8> %b, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rrkz_256:
	; X86: # %bb.0:
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x04]
	; X86-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rrkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %edi, %k1 # encoding: [0xc5,0xfb,0x92,0xcf]
	; X64-NEXT: vpsubusb %ymm1, %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_subs_epu8_rm_256(<32 x i8> %a, <32 x i8>* %ptr_b) {
	; X86-LABEL: test_mask_subs_epu8_rm_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rm_256:
	; X64: # %bb.0:
	; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0xd8,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 -1)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_subs_epu8_rmk_256(<32 x i8> %a, <32 x i8>* %ptr_b, <32 x i8> %passThru, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rmk_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0x08]
	; X86-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rmk_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm1 {%k1} # encoding: [0x62,0xf1,0x7d,0x29,0xd8,0x0f]
	; X64-NEXT: vmovdqa %ymm1, %ymm0 # EVEX TO VEX Compression encoding: [0xc5,0xfd,0x6f,0xc1]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> %passThru, i32 %mask)
	ret <32 x i8> %res
	}

	define <32 x i8> @test_mask_subs_epu8_rmkz_256(<32 x i8> %a, <32 x i8>* %ptr_b, i32 %mask) {
	; X86-LABEL: test_mask_subs_epu8_rmkz_256:
	; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax # encoding: [0x8b,0x44,0x24,0x04]
	; X86-NEXT: kmovd {{[0-9]+}}(%esp), %k1 # encoding: [0xc4,0xe1,0xf9,0x90,0x4c,0x24,0x08]
	; X86-NEXT: vpsubusb (%eax), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0x00]
	; X86-NEXT: retl # encoding: [0xc3]
	;
	; X64-LABEL: test_mask_subs_epu8_rmkz_256:
	; X64: # %bb.0:
	; X64-NEXT: kmovd %esi, %k1 # encoding: [0xc5,0xfb,0x92,0xce]
	; X64-NEXT: vpsubusb (%rdi), %ymm0, %ymm0 {%k1} {z} # encoding: [0x62,0xf1,0x7d,0xa9,0xd8,0x07]
	; X64-NEXT: retq # encoding: [0xc3]
	%b = load <32 x i8>, <32 x i8>* %ptr_b
	%res = call <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8> %a, <32 x i8> %b, <32 x i8> zeroinitializer, i32 %mask)
	ret <32 x i8> %res
	}

	declare <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)			declare <32 x i8> @llvm.x86.avx512.mask.psubus.b.256(<32 x i8>, <32 x i8>, <32 x i8>, i32)

	define <8 x i16>@test_int_x86_avx512_mask_vpermt2var_hi_128(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, i8 %x3) {			define <8 x i16>@test_int_x86_avx512_mask_vpermt2var_hi_128(<8 x i16> %x0, <8 x i16> %x1, <8 x i16> %x2, i8 %x3) {
	; X86-LABEL: test_int_x86_avx512_mask_vpermt2var_hi_128:			; X86-LABEL: test_int_x86_avx512_mask_vpermt2var_hi_128:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: vmovdqa %xmm1, %xmm3 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xd9]			; X86-NEXT: vmovdqa %xmm1, %xmm3 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6f,0xd9]
	; X86-NEXT: vpermt2w %xmm2, %xmm0, %xmm3 # encoding: [0x62,0xf2,0xfd,0x08,0x7d,0xda]			; X86-NEXT: vpermt2w %xmm2, %xmm0, %xmm3 # encoding: [0x62,0xf2,0xfd,0x08,0x7d,0xda]
	▲ Show 20 Lines • Show All 1,073 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse2 -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=SSE
				; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=+avx2 -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=VCHECK --check-prefix=AVX2
				; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=skx -show-mc-encoding \| FileCheck %s --check-prefix=CHECK --check-prefix=VCHECK --check-prefix=SKX

				; NOTE: This should use IR equivalent to what is generated by clang/test/CodeGen/sse2-builtins.c


				define <16 x i8> @test_x86_sse2_paddus_b(<16 x i8> %a0, <16 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_b:
				; SSE: ## %bb.0:
				; SSE-NEXT: paddusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdc,0xc1]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_b:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdc,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_b:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%1 = add <16 x i8> %a0, %a1
				%2 = icmp ugt <16 x i8> %a0, %1
				%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
				ret <16 x i8> %3
				}

				define <8 x i16> @test_x86_sse2_paddus_w(<8 x i16> %a0, <8 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_w:
				; SSE: ## %bb.0:
				; SSE-NEXT: paddusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdd,0xc1]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_w:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdd,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_w:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%1 = add <8 x i16> %a0, %a1
				%2 = icmp ugt <8 x i16> %a0, %1
				%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
				ret <8 x i16> %3
				}

				define <16 x i8> @test_x86_sse2_psubus_b(<16 x i8> %a0, <16 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_b:
				; SSE: ## %bb.0:
				; SSE-NEXT: psubusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd8,0xc1]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_b:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd8,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_b:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%cmp = icmp ugt <16 x i8> %a0, %a1
				%sel = select <16 x i1> %cmp, <16 x i8> %a0, <16 x i8> %a1
				%sub = sub <16 x i8> %sel, %a1
				ret <16 x i8> %sub
				}

				define <8 x i16> @test_x86_sse2_psubus_w(<8 x i16> %a0, <8 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_w:
				; SSE: ## %bb.0:
				; SSE-NEXT: psubusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd9,0xc1]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_w:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd9,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_w:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%cmp = icmp ugt <8 x i16> %a0, %a1
				%sel = select <8 x i1> %cmp, <8 x i16> %a0, <8 x i16> %a1
				%sub = sub <8 x i16> %sel, %a1
				ret <8 x i16> %sub
				}

				define <8 x i8> @test_x86_sse2_paddus_b_64(<8 x i8> %a0, <8 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_b_64:
				; SSE: ## %bb.0:
				; SSE-NEXT: movdqa {{.*#+}} xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; SSE-NEXT: ## encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI4_0, kind: FK_Data_4
				; SSE-NEXT: paddw %xmm0, %xmm1 ## encoding: [0x66,0x0f,0xfd,0xc8]
				; SSE-NEXT: pand %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xdb,0xc2]
				; SSE-NEXT: pand %xmm1, %xmm2 ## encoding: [0x66,0x0f,0xdb,0xd1]
				; SSE-NEXT: pcmpgtw %xmm2, %xmm0 ## encoding: [0x66,0x0f,0x65,0xc2]
				; SSE-NEXT: movdqa %xmm0, %xmm2 ## encoding: [0x66,0x0f,0x6f,0xd0]
				; SSE-NEXT: pandn %xmm1, %xmm2 ## encoding: [0x66,0x0f,0xdf,0xd1]
				; SSE-NEXT: pand LCPI4_0, %xmm0 ## encoding: [0x66,0x0f,0xdb,0x05,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI4_0, kind: FK_Data_4
				; SSE-NEXT: por %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xeb,0xc2]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_b_64:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; AVX2-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
				; AVX2-NEXT: ## fixup A - offset: 4, value: LCPI4_0, kind: FK_Data_4
				; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm3 ## encoding: [0xc5,0xf9,0xdb,0xda]
				; AVX2-NEXT: vpaddw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xfd,0xc1]
				; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm1 ## encoding: [0xc5,0xf9,0xdb,0xca]
				; AVX2-NEXT: vpcmpgtw %xmm1, %xmm3, %xmm1 ## encoding: [0xc5,0xe1,0x65,0xc9]
				; AVX2-NEXT: vpblendvb %xmm1, %xmm2, %xmm0, %xmm0 ## encoding: [0xc4,0xe3,0x79,0x4c,0xc2,0x10]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_b_64:
				; SKX: ## %bb.0:
				; SKX-NEXT: vmovdqa LCPI4_0, %xmm2 ## EVEX TO VEX Compression xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; SKX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
				; SKX-NEXT: ## fixup A - offset: 4, value: LCPI4_0, kind: FK_Data_4
				; SKX-NEXT: vpand %xmm2, %xmm0, %xmm3 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdb,0xda]
				; SKX-NEXT: vpaddw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xfd,0xc1]
				; SKX-NEXT: vpand %xmm2, %xmm0, %xmm1 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdb,0xca]
				; SKX-NEXT: vpcmpnleuw %xmm1, %xmm3, %k1 ## encoding: [0x62,0xf3,0xe5,0x08,0x3e,0xc9,0x06]
				; SKX-NEXT: vmovdqu16 LCPI4_0, %xmm0 {%k1} ## encoding: [0x62,0xf1,0xff,0x09,0x6f,0x05,A,A,A,A]
				; SKX-NEXT: ## fixup A - offset: 6, value: LCPI4_0, kind: FK_Data_4
				; SKX-NEXT: retl ## encoding: [0xc3]
				%1 = add <8 x i8> %a0, %a1
				%2 = icmp ugt <8 x i8> %a0, %1
				%3 = select <8 x i1> %2, <8 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <8 x i8> %1
				ret <8 x i8> %3
				}

				define <4 x i16> @test_x86_sse2_paddus_w_64(<4 x i16> %a0, <4 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_w_64:
				; SSE: ## %bb.0:
				; SSE-NEXT: movdqa {{.*#+}} xmm2 = [65535,0,65535,0,65535,0,65535,0]
				; SSE-NEXT: ## encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI5_0, kind: FK_Data_4
				; SSE-NEXT: paddd %xmm0, %xmm1 ## encoding: [0x66,0x0f,0xfe,0xc8]
				; SSE-NEXT: pand %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xdb,0xc2]
				; SSE-NEXT: pand %xmm1, %xmm2 ## encoding: [0x66,0x0f,0xdb,0xd1]
				; SSE-NEXT: pcmpgtd %xmm2, %xmm0 ## encoding: [0x66,0x0f,0x66,0xc2]
				; SSE-NEXT: movdqa %xmm0, %xmm2 ## encoding: [0x66,0x0f,0x6f,0xd0]
				; SSE-NEXT: pandn %xmm1, %xmm2 ## encoding: [0x66,0x0f,0xdf,0xd1]
				; SSE-NEXT: pand LCPI5_0, %xmm0 ## encoding: [0x66,0x0f,0xdb,0x05,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI5_0, kind: FK_Data_4
				; SSE-NEXT: por %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xeb,0xc2]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_w_64:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2 ## encoding: [0xc5,0xe9,0xef,0xd2]
				; AVX2-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm3 ## encoding: [0xc4,0xe3,0x79,0x0e,0xda,0xaa]
				; AVX2-NEXT: ## xmm3 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xfe,0xc1]
				; AVX2-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm1 ## encoding: [0xc4,0xe3,0x79,0x0e,0xca,0xaa]
				; AVX2-NEXT: ## xmm1 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; AVX2-NEXT: vpcmpgtd %xmm1, %xmm3, %xmm1 ## encoding: [0xc5,0xe1,0x66,0xc9]
				; AVX2-NEXT: vbroadcastss {{.*#+}} xmm2 = [65535,65535,65535,65535]
				; AVX2-NEXT: ## encoding: [0xc4,0xe2,0x79,0x18,0x15,A,A,A,A]
				; AVX2-NEXT: ## fixup A - offset: 5, value: LCPI5_0, kind: FK_Data_4
				; AVX2-NEXT: vblendvps %xmm1, %xmm2, %xmm0, %xmm0 ## encoding: [0xc4,0xe3,0x79,0x4a,0xc2,0x10]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_w_64:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpxor %xmm2, %xmm2, %xmm2 ## EVEX TO VEX Compression encoding: [0xc5,0xe9,0xef,0xd2]
				; SKX-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm3 ## encoding: [0xc4,0xe3,0x79,0x0e,0xda,0xaa]
				; SKX-NEXT: ## xmm3 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; SKX-NEXT: vpaddd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xfe,0xc1]
				; SKX-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm1 ## encoding: [0xc4,0xe3,0x79,0x0e,0xca,0xaa]
				; SKX-NEXT: ## xmm1 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; SKX-NEXT: vpcmpnleud %xmm1, %xmm3, %k1 ## encoding: [0x62,0xf3,0x65,0x08,0x1e,0xc9,0x06]
				; SKX-NEXT: vpbroadcastd LCPI5_0, %xmm0 {%k1} ## encoding: [0x62,0xf2,0x7d,0x09,0x58,0x05,A,A,A,A]
				; SKX-NEXT: ## fixup A - offset: 6, value: LCPI5_0, kind: FK_Data_4
				; SKX-NEXT: retl ## encoding: [0xc3]
				%1 = add <4 x i16> %a0, %a1
				%2 = icmp ugt <4 x i16> %a0, %1
				%3 = select <4 x i1> %2, <4 x i16> <i16 -1, i16 -1, i16 -1, i16 -1>, <4 x i16> %1
				ret <4 x i16> %3
				}

				define <8 x i8> @test_x86_sse2_psubus_b_64(<8 x i8> %a0, <8 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_b_64:
				; SSE: ## %bb.0:
				; SSE-NEXT: movdqa {{.*#+}} xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; SSE-NEXT: ## encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI6_0, kind: FK_Data_4
				; SSE-NEXT: movdqa %xmm1, %xmm3 ## encoding: [0x66,0x0f,0x6f,0xd9]
				; SSE-NEXT: pand %xmm2, %xmm3 ## encoding: [0x66,0x0f,0xdb,0xda]
				; SSE-NEXT: pand %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xdb,0xc2]
				; SSE-NEXT: pmaxsw %xmm3, %xmm0 ## encoding: [0x66,0x0f,0xee,0xc3]
				; SSE-NEXT: psubw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xf9,0xc1]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_b_64:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; AVX2-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
				; AVX2-NEXT: ## fixup A - offset: 4, value: LCPI6_0, kind: FK_Data_4
				; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm3 ## encoding: [0xc5,0xf1,0xdb,0xda]
				; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdb,0xc2]
				; AVX2-NEXT: vpmaxuw %xmm3, %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x3e,0xc3]
				; AVX2-NEXT: vpsubw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xf9,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_b_64:
				; SKX: ## %bb.0:
				; SKX-NEXT: vmovdqa LCPI6_0, %xmm2 ## EVEX TO VEX Compression xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
				; SKX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
				; SKX-NEXT: ## fixup A - offset: 4, value: LCPI6_0, kind: FK_Data_4
				; SKX-NEXT: vpand %xmm2, %xmm1, %xmm3 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0xdb,0xda]
				; SKX-NEXT: vpand %xmm2, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdb,0xc2]
				; SKX-NEXT: vpmaxuw %xmm3, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x3e,0xc3]
				; SKX-NEXT: vpsubw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xf9,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%cmp = icmp ugt <8 x i8> %a0, %a1
				%sel = select <8 x i1> %cmp, <8 x i8> %a0, <8 x i8> %a1
				%sub = sub <8 x i8> %sel, %a1
				ret <8 x i8> %sub
				}

				define <4 x i16> @test_x86_sse2_psubus_w_64(<4 x i16> %a0, <4 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_w_64:
				; SSE: ## %bb.0:
				; SSE-NEXT: movdqa {{.*#+}} xmm2 = [65535,0,65535,0,65535,0,65535,0]
				; SSE-NEXT: ## encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
				; SSE-NEXT: ## fixup A - offset: 4, value: LCPI7_0, kind: FK_Data_4
				; SSE-NEXT: movdqa %xmm1, %xmm3 ## encoding: [0x66,0x0f,0x6f,0xd9]
				; SSE-NEXT: pand %xmm2, %xmm3 ## encoding: [0x66,0x0f,0xdb,0xda]
				; SSE-NEXT: pand %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xdb,0xc2]
				; SSE-NEXT: movdqa %xmm0, %xmm2 ## encoding: [0x66,0x0f,0x6f,0xd0]
				; SSE-NEXT: pcmpgtd %xmm3, %xmm2 ## encoding: [0x66,0x0f,0x66,0xd3]
				; SSE-NEXT: pand %xmm2, %xmm0 ## encoding: [0x66,0x0f,0xdb,0xc2]
				; SSE-NEXT: pandn %xmm3, %xmm2 ## encoding: [0x66,0x0f,0xdf,0xd3]
				; SSE-NEXT: por %xmm0, %xmm2 ## encoding: [0x66,0x0f,0xeb,0xd0]
				; SSE-NEXT: psubd %xmm1, %xmm2 ## encoding: [0x66,0x0f,0xfa,0xd1]
				; SSE-NEXT: movdqa %xmm2, %xmm0 ## encoding: [0x66,0x0f,0x6f,0xc2]
				; SSE-NEXT: retl ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_w_64:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2 ## encoding: [0xc5,0xe9,0xef,0xd2]
				; AVX2-NEXT: vpblendw $170, %xmm2, %xmm1, %xmm3 ## encoding: [0xc4,0xe3,0x71,0x0e,0xda,0xaa]
				; AVX2-NEXT: ## xmm3 = xmm1[0],xmm2[1],xmm1[2],xmm2[3],xmm1[4],xmm2[5],xmm1[6],xmm2[7]
				; AVX2-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm0 ## encoding: [0xc4,0xe3,0x79,0x0e,0xc2,0xaa]
				; AVX2-NEXT: ## xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; AVX2-NEXT: vpmaxud %xmm3, %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x3f,0xc3]
				; AVX2-NEXT: vpsubd %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xfa,0xc1]
				; AVX2-NEXT: retl ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_w_64:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpxor %xmm2, %xmm2, %xmm2 ## EVEX TO VEX Compression encoding: [0xc5,0xe9,0xef,0xd2]
				; SKX-NEXT: vpblendw $170, %xmm2, %xmm1, %xmm3 ## encoding: [0xc4,0xe3,0x71,0x0e,0xda,0xaa]
				; SKX-NEXT: ## xmm3 = xmm1[0],xmm2[1],xmm1[2],xmm2[3],xmm1[4],xmm2[5],xmm1[6],xmm2[7]
				; SKX-NEXT: vpblendw $170, %xmm2, %xmm0, %xmm0 ## encoding: [0xc4,0xe3,0x79,0x0e,0xc2,0xaa]
				; SKX-NEXT: ## xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; SKX-NEXT: vpmaxud %xmm3, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x3f,0xc3]
				; SKX-NEXT: vpsubd %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xfa,0xc1]
				; SKX-NEXT: retl ## encoding: [0xc3]
				%cmp = icmp ugt <4 x i16> %a0, %a1
				%sel = select <4 x i1> %cmp, <4 x i16> %a0, <4 x i16> %a1
				%sub = sub <4 x i16> %sel, %a1
				ret <4 x i16> %sub
				}

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines
; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX512-LABEL: test_mm_adds_epu8:		; AVX512-LABEL: test_mm_adds_epu8:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]		; AVX512-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
%arg0 = bitcast <2 x i64> %a0 to <16 x i8>		%arg0 = bitcast <2 x i64> %a0 to <16 x i8>
%arg1 = bitcast <2 x i64> %a1 to <16 x i8>		%arg1 = bitcast <2 x i64> %a1 to <16 x i8>
%res = call <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8> %arg0, <16 x i8> %arg1)		%1 = add <16 x i8> %arg0, %arg1
%bc = bitcast <16 x i8> %res to <2 x i64>		%2 = icmp ugt <16 x i8> %arg0, %1
		%3 = select <16 x i1> %2, <16 x i8> <i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1, i8 -1>, <16 x i8> %1
		%bc = bitcast <16 x i8> %3 to <2 x i64>
ret <2 x i64> %bc		ret <2 x i64> %bc
}		}
declare <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8>, <16 x i8>) nounwind readnone

define <2 x i64> @test_mm_adds_epu16(<2 x i64> %a0, <2 x i64> %a1) nounwind {		define <2 x i64> @test_mm_adds_epu16(<2 x i64> %a0, <2 x i64> %a1) nounwind {
; SSE-LABEL: test_mm_adds_epu16:		; SSE-LABEL: test_mm_adds_epu16:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: paddusw %xmm1, %xmm0 # encoding: [0x66,0x0f,0xdd,0xc1]		; SSE-NEXT: paddusw %xmm1, %xmm0 # encoding: [0x66,0x0f,0xdd,0xc1]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX1-LABEL: test_mm_adds_epu16:		; AVX1-LABEL: test_mm_adds_epu16:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xdd,0xc1]		; AVX1-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xdd,0xc1]
; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX512-LABEL: test_mm_adds_epu16:		; AVX512-LABEL: test_mm_adds_epu16:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]		; AVX512-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
%arg0 = bitcast <2 x i64> %a0 to <8 x i16>		%arg0 = bitcast <2 x i64> %a0 to <8 x i16>
%arg1 = bitcast <2 x i64> %a1 to <8 x i16>		%arg1 = bitcast <2 x i64> %a1 to <8 x i16>
%res = call <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16> %arg0, <8 x i16> %arg1)		%1 = add <8 x i16> %arg0, %arg1
%bc = bitcast <8 x i16> %res to <2 x i64>		%2 = icmp ugt <8 x i16> %arg0, %1
		%3 = select <8 x i1> %2, <8 x i16> <i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1, i16 -1>, <8 x i16> %1
		%bc = bitcast <8 x i16> %3 to <2 x i64>
ret <2 x i64> %bc		ret <2 x i64> %bc
}		}
declare <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16>, <8 x i16>) nounwind readnone

define <2 x double> @test_mm_and_pd(<2 x double> %a0, <2 x double> %a1) nounwind {		define <2 x double> @test_mm_and_pd(<2 x double> %a0, <2 x double> %a1) nounwind {
; SSE-LABEL: test_mm_and_pd:		; SSE-LABEL: test_mm_and_pd:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: andps %xmm1, %xmm0 # encoding: [0x0f,0x54,0xc1]		; SSE-NEXT: andps %xmm1, %xmm0 # encoding: [0x0f,0x54,0xc1]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX1-LABEL: test_mm_and_pd:		; AVX1-LABEL: test_mm_and_pd:
▲ Show 20 Lines • Show All 5,683 Lines • ▼ Show 20 Lines	; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
%bc = bitcast <8 x i16> %res to <2 x i64>		%bc = bitcast <8 x i16> %res to <2 x i64>
ret <2 x i64> %bc		ret <2 x i64> %bc
}		}
declare <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16>, <8 x i16>) nounwind readnone		declare <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16>, <8 x i16>) nounwind readnone

define <2 x i64> @test_mm_subs_epu8(<2 x i64> %a0, <2 x i64> %a1) nounwind {		define <2 x i64> @test_mm_subs_epu8(<2 x i64> %a0, <2 x i64> %a1) nounwind {
; SSE-LABEL: test_mm_subs_epu8:		; SSE-LABEL: test_mm_subs_epu8:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: psubusb %xmm1, %xmm0 # encoding: [0x66,0x0f,0xd8,0xc1]		; SSE-NEXT: pmaxub %xmm1, %xmm0 # encoding: [0x66,0x0f,0xde,0xc1]
		; SSE-NEXT: psubb %xmm1, %xmm0 # encoding: [0x66,0x0f,0xf8,0xc1]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX1-LABEL: test_mm_subs_epu8:		; AVX1-LABEL: test_mm_subs_epu8:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xd8,0xc1]		; AVX1-NEXT: vpmaxub %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xde,0xc1]
		; AVX1-NEXT: vpsubb %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xf8,0xc1]
; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX512-LABEL: test_mm_subs_epu8:		; AVX512-LABEL: test_mm_subs_epu8:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]		; AVX512-NEXT: vpmaxub %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xde,0xc1]
		; AVX512-NEXT: vpsubb %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xf8,0xc1]
; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
%arg0 = bitcast <2 x i64> %a0 to <16 x i8>		%arg0 = bitcast <2 x i64> %a0 to <16 x i8>
%arg1 = bitcast <2 x i64> %a1 to <16 x i8>		%arg1 = bitcast <2 x i64> %a1 to <16 x i8>
%res = call <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8> %arg0, <16 x i8> %arg1)		%cmp = icmp ugt <16 x i8> %arg0, %arg1
%bc = bitcast <16 x i8> %res to <2 x i64>		%sel = select <16 x i1> %cmp, <16 x i8> %arg0, <16 x i8> %arg1
		%sub = sub <16 x i8> %sel, %arg1
		%bc = bitcast <16 x i8> %sub to <2 x i64>
ret <2 x i64> %bc		ret <2 x i64> %bc
}		}
declare <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8>, <16 x i8>) nounwind readnone

define <2 x i64> @test_mm_subs_epu16(<2 x i64> %a0, <2 x i64> %a1) nounwind {		define <2 x i64> @test_mm_subs_epu16(<2 x i64> %a0, <2 x i64> %a1) nounwind {
; SSE-LABEL: test_mm_subs_epu16:		; X86-SSE-LABEL: test_mm_subs_epu16:
; SSE: # %bb.0:		; X86-SSE: # %bb.0:
; SSE-NEXT: psubusw %xmm1, %xmm0 # encoding: [0x66,0x0f,0xd9,0xc1]		; X86-SSE-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; X86-SSE-NEXT: # encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
		; X86-SSE-NEXT: # fixup A - offset: 4, value: {{\.LCPI.*}}, kind: FK_Data_4
		; X86-SSE-NEXT: movdqa %xmm1, %xmm3 # encoding: [0x66,0x0f,0x6f,0xd9]
		; X86-SSE-NEXT: pxor %xmm2, %xmm3 # encoding: [0x66,0x0f,0xef,0xda]
		; X86-SSE-NEXT: pxor %xmm2, %xmm0 # encoding: [0x66,0x0f,0xef,0xc2]
		; X86-SSE-NEXT: pmaxsw %xmm3, %xmm0 # encoding: [0x66,0x0f,0xee,0xc3]
		; X86-SSE-NEXT: pxor %xmm2, %xmm0 # encoding: [0x66,0x0f,0xef,0xc2]
		; X86-SSE-NEXT: psubw %xmm1, %xmm0 # encoding: [0x66,0x0f,0xf9,0xc1]
		; X86-SSE-NEXT: retl # encoding: [0xc3]
;		;
; AVX1-LABEL: test_mm_subs_epu16:		; AVX1-LABEL: test_mm_subs_epu16:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xd9,0xc1]		; AVX1-NEXT: vpmaxuw %xmm1, %xmm0, %xmm0 # encoding: [0xc4,0xe2,0x79,0x3e,0xc1]
		; AVX1-NEXT: vpsubw %xmm1, %xmm0, %xmm0 # encoding: [0xc5,0xf9,0xf9,0xc1]
; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX1-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
;		;
; AVX512-LABEL: test_mm_subs_epu16:		; AVX512-LABEL: test_mm_subs_epu16:
; AVX512: # %bb.0:		; AVX512: # %bb.0:
; AVX512-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]		; AVX512-NEXT: vpmaxuw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x3e,0xc1]
		; AVX512-NEXT: vpsubw %xmm1, %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0xf9,0xc1]
; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
		;
		; X64-SSE-LABEL: test_mm_subs_epu16:
		; X64-SSE: # %bb.0:
		; X64-SSE-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
		; X64-SSE-NEXT: # encoding: [0x66,0x0f,0x6f,0x15,A,A,A,A]
		; X64-SSE-NEXT: # fixup A - offset: 4, value: {{\.LCPI.*}}-4, kind: reloc_riprel_4byte
		; X64-SSE-NEXT: movdqa %xmm1, %xmm3 # encoding: [0x66,0x0f,0x6f,0xd9]
		; X64-SSE-NEXT: pxor %xmm2, %xmm3 # encoding: [0x66,0x0f,0xef,0xda]
		; X64-SSE-NEXT: pxor %xmm2, %xmm0 # encoding: [0x66,0x0f,0xef,0xc2]
		; X64-SSE-NEXT: pmaxsw %xmm3, %xmm0 # encoding: [0x66,0x0f,0xee,0xc3]
		; X64-SSE-NEXT: pxor %xmm2, %xmm0 # encoding: [0x66,0x0f,0xef,0xc2]
		; X64-SSE-NEXT: psubw %xmm1, %xmm0 # encoding: [0x66,0x0f,0xf9,0xc1]
		; X64-SSE-NEXT: retq # encoding: [0xc3]
%arg0 = bitcast <2 x i64> %a0 to <8 x i16>		%arg0 = bitcast <2 x i64> %a0 to <8 x i16>
%arg1 = bitcast <2 x i64> %a1 to <8 x i16>		%arg1 = bitcast <2 x i64> %a1 to <8 x i16>
%res = call <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16> %arg0, <8 x i16> %arg1)		%cmp = icmp ugt <8 x i16> %arg0, %arg1
%bc = bitcast <8 x i16> %res to <2 x i64>		%sel = select <8 x i1> %cmp, <8 x i16> %arg0, <8 x i16> %arg1
		%sub = sub <8 x i16> %sel, %arg1
		%bc = bitcast <8 x i16> %sub to <2 x i64>
ret <2 x i64> %bc		ret <2 x i64> %bc
}		}
declare <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16>, <8 x i16>) nounwind readnone

define i32 @test_mm_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) nounwind {		define i32 @test_mm_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) nounwind {
; SSE-LABEL: test_mm_ucomieq_sd:		; SSE-LABEL: test_mm_ucomieq_sd:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: ucomisd %xmm1, %xmm0 # encoding: [0x66,0x0f,0x2e,0xc1]		; SSE-NEXT: ucomisd %xmm1, %xmm0 # encoding: [0x66,0x0f,0x2e,0xc1]
; SSE-NEXT: setnp %al # encoding: [0x0f,0x9b,0xc0]		; SSE-NEXT: setnp %al # encoding: [0x0f,0x9b,0xc0]
; SSE-NEXT: sete %cl # encoding: [0x0f,0x94,0xc1]		; SSE-NEXT: sete %cl # encoding: [0x0f,0x94,0xc1]
; SSE-NEXT: andb %al, %cl # encoding: [0x20,0xc1]		; SSE-NEXT: andb %al, %cl # encoding: [0x20,0xc1]
▲ Show 20 Lines • Show All 456 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll

	Show First 20 Lines • Show All 876 Lines • ▼ Show 20 Lines
	; AVX512-LABEL: test_x86_sse2_cvtdq2ps:			; AVX512-LABEL: test_x86_sse2_cvtdq2ps:
	; AVX512: ## %bb.0:			; AVX512: ## %bb.0:
	; AVX512-NEXT: vcvtdq2ps %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf8,0x5b,0xc0]			; AVX512-NEXT: vcvtdq2ps %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf8,0x5b,0xc0]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %a0) ; <<4 x float>> [#uses=1]			%res = call <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32> %a0) ; <<4 x float>> [#uses=1]
	ret <4 x float> %res			ret <4 x float> %res
	}			}
	declare <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32>) nounwind readnone			declare <4 x float> @llvm.x86.sse2.cvtdq2ps(<4 x i32>) nounwind readnone


				define <16 x i8> @test_x86_sse2_paddus_b(<16 x i8> %a0, <16 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_b:
				; SSE: ## %bb.0:
				; SSE-NEXT: paddusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdc,0xc1]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_b:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdc,0xc1]
				; AVX2-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_b:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
				; SKX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8> %a0, <16 x i8> %a1) ; <<16 x i8>> [#uses=1]
				ret <16 x i8> %res
				}
				declare <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8>, <16 x i8>) nounwind readnone


				define <8 x i16> @test_x86_sse2_paddus_w(<8 x i16> %a0, <8 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_paddus_w:
				; SSE: ## %bb.0:
				; SSE-NEXT: paddusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdd,0xc1]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_paddus_w:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdd,0xc1]
				; AVX2-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_paddus_w:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
				; SKX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
				ret <8 x i16> %res
				}
				declare <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16>, <8 x i16>) nounwind readnone


				define <16 x i8> @test_x86_sse2_psubus_b(<16 x i8> %a0, <16 x i8> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_b:
				; SSE: ## %bb.0:
				; SSE-NEXT: psubusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd8,0xc1]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_b:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd8,0xc1]
				; AVX2-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_b:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
				; SKX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8> %a0, <16 x i8> %a1) ; <<16 x i8>> [#uses=1]
				ret <16 x i8> %res
				}
				declare <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8>, <16 x i8>) nounwind readnone


				define <8 x i16> @test_x86_sse2_psubus_w(<8 x i16> %a0, <8 x i16> %a1) {
				; SSE-LABEL: test_x86_sse2_psubus_w:
				; SSE: ## %bb.0:
				; SSE-NEXT: psubusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd9,0xc1]
				; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; AVX2-LABEL: test_x86_sse2_psubus_w:
				; AVX2: ## %bb.0:
				; AVX2-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd9,0xc1]
				; AVX2-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				;
				; SKX-LABEL: test_x86_sse2_psubus_w:
				; SKX: ## %bb.0:
				; SKX-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
				; SKX-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
				%res = call <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
				ret <8 x i16> %res
				}
				declare <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16>, <8 x i16>) nounwind readnone

llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll

	Show First 20 Lines • Show All 955 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpaddsw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xed,0xc1]			; AVX512-NEXT: vpaddsw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xed,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.sse2.padds.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]			%res = call <8 x i16> @llvm.x86.sse2.padds.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
	ret <8 x i16> %res			ret <8 x i16> %res
	}			}
	declare <8 x i16> @llvm.x86.sse2.padds.w(<8 x i16>, <8 x i16>) nounwind readnone			declare <8 x i16> @llvm.x86.sse2.padds.w(<8 x i16>, <8 x i16>) nounwind readnone


	define <16 x i8> @test_x86_sse2_paddus_b(<16 x i8> %a0, <16 x i8> %a1) {
	; SSE-LABEL: test_x86_sse2_paddus_b:
	; SSE: ## %bb.0:
	; SSE-NEXT: paddusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdc,0xc1]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_paddus_b:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdc,0xc1]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_paddus_b:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vpaddusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdc,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8> %a0, <16 x i8> %a1) ; <<16 x i8>> [#uses=1]
	ret <16 x i8> %res
	}
	declare <16 x i8> @llvm.x86.sse2.paddus.b(<16 x i8>, <16 x i8>) nounwind readnone


	define <8 x i16> @test_x86_sse2_paddus_w(<8 x i16> %a0, <8 x i16> %a1) {
	; SSE-LABEL: test_x86_sse2_paddus_w:
	; SSE: ## %bb.0:
	; SSE-NEXT: paddusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xdd,0xc1]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_paddus_w:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdd,0xc1]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_paddus_w:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vpaddusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdd,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
	ret <8 x i16> %res
	}
	declare <8 x i16> @llvm.x86.sse2.paddus.w(<8 x i16>, <8 x i16>) nounwind readnone


	define <4 x i32> @test_x86_sse2_pmadd_wd(<8 x i16> %a0, <8 x i16> %a1) {			define <4 x i32> @test_x86_sse2_pmadd_wd(<8 x i16> %a0, <8 x i16> %a1) {
	; SSE-LABEL: test_x86_sse2_pmadd_wd:			; SSE-LABEL: test_x86_sse2_pmadd_wd:
	; SSE: ## %bb.0:			; SSE: ## %bb.0:
	; SSE-NEXT: pmaddwd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xf5,0xc1]			; SSE-NEXT: pmaddwd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xf5,0xc1]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;			;
	; AVX1-LABEL: test_x86_sse2_pmadd_wd:			; AVX1-LABEL: test_x86_sse2_pmadd_wd:
	; AVX1: ## %bb.0:			; AVX1: ## %bb.0:
	▲ Show 20 Lines • Show All 546 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: vpsubsw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xe9,0xc1]			; AVX512-NEXT: vpsubsw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xe9,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]			; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]			%res = call <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
	ret <8 x i16> %res			ret <8 x i16> %res
	}			}
	declare <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16>, <8 x i16>) nounwind readnone			declare <8 x i16> @llvm.x86.sse2.psubs.w(<8 x i16>, <8 x i16>) nounwind readnone


	define <16 x i8> @test_x86_sse2_psubus_b(<16 x i8> %a0, <16 x i8> %a1) {
	; SSE-LABEL: test_x86_sse2_psubus_b:
	; SSE: ## %bb.0:
	; SSE-NEXT: psubusb %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd8,0xc1]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_psubus_b:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd8,0xc1]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_psubus_b:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vpsubusb %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd8,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8> %a0, <16 x i8> %a1) ; <<16 x i8>> [#uses=1]
	ret <16 x i8> %res
	}
	declare <16 x i8> @llvm.x86.sse2.psubus.b(<16 x i8>, <16 x i8>) nounwind readnone


	define <8 x i16> @test_x86_sse2_psubus_w(<8 x i16> %a0, <8 x i16> %a1) {
	; SSE-LABEL: test_x86_sse2_psubus_w:
	; SSE: ## %bb.0:
	; SSE-NEXT: psubusw %xmm1, %xmm0 ## encoding: [0x66,0x0f,0xd9,0xc1]
	; SSE-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX1-LABEL: test_x86_sse2_psubus_w:
	; AVX1: ## %bb.0:
	; AVX1-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xd9,0xc1]
	; AVX1-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	;
	; AVX512-LABEL: test_x86_sse2_psubus_w:
	; AVX512: ## %bb.0:
	; AVX512-NEXT: vpsubusw %xmm1, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xd9,0xc1]
	; AVX512-NEXT: ret{{[l\|q]}} ## encoding: [0xc3]
	%res = call <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16> %a0, <8 x i16> %a1) ; <<8 x i16>> [#uses=1]
	ret <8 x i16> %res
	}
	declare <8 x i16> @llvm.x86.sse2.psubus.w(<8 x i16>, <8 x i16>) nounwind readnone


	define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {			define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {
	; SSE-LABEL: test_x86_sse2_ucomieq_sd:			; SSE-LABEL: test_x86_sse2_ucomieq_sd:
	; SSE: ## %bb.0:			; SSE: ## %bb.0:
	; SSE-NEXT: ucomisd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0x2e,0xc1]			; SSE-NEXT: ucomisd %xmm1, %xmm0 ## encoding: [0x66,0x0f,0x2e,0xc1]
	; SSE-NEXT: setnp %al ## encoding: [0x0f,0x9b,0xc0]			; SSE-NEXT: setnp %al ## encoding: [0x0f,0x9b,0xc0]
	; SSE-NEXT: sete %cl ## encoding: [0x0f,0x94,0xc1]			; SSE-NEXT: sete %cl ## encoding: [0x0f,0x94,0xc1]
	; SSE-NEXT: andb %al, %cl ## encoding: [0x20,0xc1]			; SSE-NEXT: andb %al, %cl ## encoding: [0x20,0xc1]
	; SSE-NEXT: movzbl %cl, %eax ## encoding: [0x0f,0xb6,0xc1]			; SSE-NEXT: movzbl %cl, %eax ## encoding: [0x0f,0xb6,0xc1]
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines