This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
11
CGBuiltin.cpp
-
test/CodeGen/
-
CodeGen/
1
avx-builtins.c
-
avx-cmp-builtins.c
-
avx2-builtins.c
1
avx512f-builtins.c
-
avx512vl-builtins.c

Differential D45616

[X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
ClosedPublic

Authored by GBuella on Apr 13 2018, 4:12 AM.

Download Raw Diff

Details

Reviewers

craig.topper
uriel.k
RKSimon
andrew.w.kaylor
spatel
scanon
efriedma
GBuella

Commits

rG716863c820db: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
rL335339: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
rC335339: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR

Summary

Lowering some vector comparision builtins to fcmp IR instructions.
This ignores the signaling behaviour specified in the predicate
argument of said builtins.

Affected AVX512 builtins:

builtin_ia32_cmpps128_mask
builtin_ia32_cmpps256_mask
builtin_ia32_cmpps512_mask
builtin_ia32_cmppd128_mask
builtin_ia32_cmppd256_mask
builtin_ia32_cmppd512_mask

Diff Detail

Repository: rC Clang

Event Timeline

GBuella created this revision.Apr 13 2018, 4:12 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptApr 13 2018, 4:12 AM

GBuella added reviewers: uriel.k, RKSimon.Apr 13 2018, 8:01 AM

The fcmp opcode has no defined behavior with NaN operands in the comparisions handled in this patch.

Could you describe the problem here in a bit more detail? As far as I know, the LLVM IR fcmp should return the same result as the x86 CMPPD, even without fast-math.

GBuella added a reviewer: andrew.w.kaylor.May 9 2018, 6:48 AM

RKSimon added reviewers: spatel, scanon, efriedma.May 9 2018, 6:52 AM

In D45616#1067492, @efriedma wrote:

The fcmp opcode has no defined behavior with NaN operands in the comparisions handled in this patch.

Could you describe the problem here in a bit more detail? As far as I know, the LLVM IR fcmp should return the same result as the x86 CMPPD, even without fast-math.

So, I'm still looking into this.
What I see is, yes, fcmp just so happens to work the same as x86 CMPPD.
An example:

fcmp olt <2 x double> %x, %y

becomes vcmpltpd.

But this only holds for condition codes 0 - 7.

Where LLVM IR has a condition "olt" <- ordered less-than, x86 cmppd has two corresponding condition codes: 0x01->"less-than (ordered, signaling)", which is "vcmpltpd" and 0x11->"less-than (ordered, nonsignaling)" which is "vcmplt_oqps"

Now, if the builtin's CC argument is 1 (which refers to vcmpltps), we lower it to "fcmp olt", which then results in "vcmpltps", we are ok, yes.
But in the IR, there is no information about the user expecting "vcmpltps" vs "vcmplt_oqps".

Do I understand these tricks right?
If we are ok with this (hard to understand) approach, I can just lower these without fast-math as well, as long as CC < 8, by modifying this condition:

if (CC < 8 && !UsesNonDefaultRounding && getLangOpts().FastMath) {

Although, I'm still looked at what happens with sNaN, and with qNaN constants, once these comparisons are lowered to fcmp.

There is no difference between "signalling" and "non-signalling" unless you're using "#pragma STDC FENV_ACCESS", which is currently not supported. Presumably the work to implement that will include some LLVM IR intrinsic which can encode the difference, but for now we can ignore it.

In D45616#1093514, @efriedma wrote:

There is no difference between "signalling" and "non-signalling" unless you're using "#pragma STDC FENV_ACCESS", which is currently not supported. Presumably the work to implement that will include some LLVM IR intrinsic which can encode the difference, but for now we can ignore it.

Does that mean, it is OK to generate the vcmpltpd instruction for both of these intrinsic calls:

_mm_cmp_ps_mask(a, b, _CMP_EQ_OQ);
_mm_cmp_ps_mask(a, b, _CMP_EQ_OS);

?
In that case we can lower both of these to fcmp olt.
I'm still not sure, if this is what a user would expect...

GBuella accepted this revision.May 14 2018, 3:44 AM

This comment was removed by GBuella.

This revision is now accepted and ready to land.May 14 2018, 3:44 AM

Oops, accepted this by accident.

Ping @efriedma

I don't see any reason to exactly match the constant specified by the user, as long as the result is semantically equivalent.

Once we have llvm.experimental.constrained.fcmp, this code should be updated to emit it; that will preserve the user's intended exception semantics.

I altered the code, to ignore the the signaling behaviour, as suggested.
Also, it handles more vector cmp builtins now.

Ping @efriedma

efriedma added inline comments.Jun 11 2018, 2:26 PM

lib/CodeGen/CGBuiltin.cpp
10071	Given we're ignoring floating-point exceptions, we should also ignore the "rounding mode" operand (__MM_FROUND_NO_EXC only affects exceptions, and the other values are irrelevant because there isn't any actual rounding involved).
10156	*interesting.

GBuella added inline comments.Jun 12 2018, 11:18 AM

lib/CodeGen/CGBuiltin.cpp
10071	Oh, alltight. @craig.topper , Do you also agree on ignoring all of these, and just lowering all the comparisions to fcmp? That is the easiest path of course.

craig.topper added inline comments.Jun 12 2018, 4:22 PM

lib/CodeGen/CGBuiltin.cpp
10071	I think I'm fine with ignoring it, but definitely leave a comment because we will probably have to revisit this code in the future as we continue towards supporting FENV_ACCESS.
10143	Move this into the "ReturnsMask" path, and use getVectorFCmpIR for the other path.

spatel added inline comments.Jun 13 2018, 6:44 AM

lib/CodeGen/CGBuiltin.cpp

10090–10100

llvm::Type *ResultType = ConvertType(E->getType());
if (CC == 0x0f || CC == 0x1f)
  return llvm::Constant::getAllOnesValue(ResultType);
if (CC == 0x0b || CC == 0x1b)
  return llvm::Constant::getNullValue(ResultType);

Also, can we use the defined predicate names instead of hex values in this code?

GBuella added inline comments.Jun 13 2018, 10:11 AM

lib/CodeGen/CGBuiltin.cpp
10090–10100	Well, I believe we would need to predefine them first. I only found those in the client header `avxintrin.h`.

Ignoring signaling behviour - and rounding mode with it.
Also lowering __builtin_ia32_cmpsd and __builtin_ia32_cmpss.

spatel added inline comments.Jun 13 2018, 11:15 AM

lib/CodeGen/CGBuiltin.cpp
10205–10210	On 2nd thought, why are we optimizing when we have matching IR predicates for these? Just translate to FCMP_TRUE / FCMP_FALSE instead of special-casing these values. InstSimplify can handle the constant folding if optimization is on.

GBuella added inline comments.Jun 14 2018, 12:43 AM

lib/CodeGen/CGBuiltin.cpp
10205–10210	I don't know, these TRUE/FALSE cases were already handled here, I only rearranged the code. Does this cause any problems? I mean, if it meant an extra dozen lines of code I would get it, but as it is, does it hurt anything?

spatel added inline comments.Jun 14 2018, 6:50 AM

lib/CodeGen/CGBuiltin.cpp
10205–10210	It's mostly about being consistent. I think it's generally out-of-bounds for clang to optimize code. That's not its job. The potential end user difference is that in unoptimized code, a user might expect to see the vcmpXX asm corresponding to the source-level intrinsic when debugging. I agree that this is changing existing behavior, so it's better if we make this change before or after this patch.

LGTM with that one comment.

lib/CodeGen/CGBuiltin.cpp
10168	Sink this into the switch as the default case with an llvm_unreachable

This revision is now accepted and ready to land.Jun 17 2018, 6:11 PM

The question still left is, should we remove, auto upgrade the LLVM intrinsics not used anymore, or keep them around for when the signal behaviour is going to matter?

GBuella updated this revision to Diff 151710.Jun 18 2018, 7:08 AM

We should probably just leave them around. Leave a NOTE so they don't get deleted.

Added __builtin_ia32_cmpsd_mask & __builtin_ia32_cmpss_mask.

craig.topper requested changes to this revision.Jun 19 2018, 10:01 AM

craig.topper added inline comments.

test/CodeGen/avx-builtins.c
241	This doesn't look right. This is a scalar instructino it shoudl only be comparing a single double. There should be insertelementss and extractelements.
test/CodeGen/avx512f-builtins.c
7449	I don't think this is right either.

This revision now requires changes to proceed.Jun 19 2018, 10:01 AM

GBuella updated this revision to Diff 152060.Jun 20 2018, 4:52 AM

GBuella edited the summary of this revision. (Show Details)

I was overzealous with the intrinsics, I lower really only the packed comparison now.

LGTM

This revision is now accepted and ready to land.Jun 20 2018, 9:49 AM

Closed by commit rC335339: [X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR (authored by GBuella). · Explain WhyJun 22 2018, 5:03 AM

This revision was automatically updated to reflect the committed changes.

Hello. It seems you were well aware that you are changing the semantics of FP operation here by ignoring the signaling/quiet portion of the immediate. But what shall the user do now? There was no way to force quiet FP comparison behavior in C language, so intrinsics and reliance on quiet compare (and SAE bit in AVX512) were natural way of forcing it. And now you are taking them out. Is there a switch that could prevent this optimization? I think it could be more tolerable if you only did this under fast-math.

It seems you were well aware that you are changing the semantics of FP operation here by ignoring the signaling/quiet portion of the immediate.

There's ongoing work to support code that accesses the floating point environment (see http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics). There isn't any way to turn it on in clang yet, though. And until that's implemented, LLVM can trigger arbitrary floating-point exceptions anywhere in your code, so there's no point to trying to generate a "quiet" compare.

can trigger arbitrary floating-point exceptions anywhere in your code

I believe this statement reflects the current state of many compilers on the market, I guess I just don't see the reason why making things worse. It seems the original intent of the commit was to add support for masked compares, and that could have been achieved without breaking what already worked.

I hope the patch is ultimately helping some performance optimization, but it is breaking the original intent of some legitimate programs that worked before, and introduces correctness regression. So to me it must be at least guarded by a flip-switch.

The reference to constrained floating-point intrinsics work is relevant, but it will obviously take time and extra effort to enable and then to unbreak all the cases that are made broken here. Instead one could postpone lowering of the particular instructions until it was possible without violation of the semantics...

schweitz added a subscriber: schweitz.Nov 8 2018, 8:12 AM

In D45616#1290897, @nastafev wrote:

can trigger arbitrary floating-point exceptions anywhere in your code

I believe this statement reflects the current state of many compilers on the market, I guess I just don't see the reason why making things worse. It seems the original intent of the commit was to add support for masked compares, and that could have been achieved without breaking what already worked.

I hope the patch is ultimately helping some performance optimization, but it is breaking the original intent of some legitimate programs that worked before, and introduces correctness regression. So to me it must be at least guarded by a flip-switch.

The reference to constrained floating-point intrinsics work is relevant, but it will obviously take time and extra effort to enable and then to unbreak all the cases that are made broken here. Instead one could postpone lowering of the particular instructions until it was possible without violation of the semantics...

That seems like a legitimate concern to me.

I believe this patch was part of a larger effort to get rid of the dependence on intrinsics. We have a very broad preference for expressing things using native IR whenever possible because (among other reasons) intrinsics block most optimizations and we don't want to teach optimizations to reason about target-specific intrinsics. In this particular case we may have overreached because it isn't strictly possible to express all the semantics of this built-in accurately using native IR.

There is a patch currently active to add constrained fcmp intrinsics, but it doesn't have a masked variant.

Yes, in constrained-fp mode we might need intrinsics, at least short-term. I assume you'll probably add target-independent constrained masked fp vector operations at some point, but that's probably not a priority. But that still leaves two problems. One, clang doesn't currently have any flag that actually makes sense to control this. (I assume it will be added soon, but it doesn't exist yet.) I mean, it's technically possible to gate it under one of the fast-math flags, like @nastafev suggested, but that's not semantically correct. And two, the removed intrinsics didn't have the right semantics for constrained-fp mode anyway: they were marked readnone. So we need new intrinsics anyway.

So yes, it's possible we could revert this patch, and that might fix @nastafev's code for the next few months, but it doesn't help us at all in terms of making constrained fp work correctly in general.

Agreed. Reverting this patch wouldn't move us forward on constrained FP handling. What I'm saying (and what I think @nastafev is saying) is that this patch is taking a built-in that allows the user to express specific signalling behavior and throwing away that aspect of its semantics. Granted we do say that FP environment access is unsupported, so technically unmasking FP exceptions or even reading the status flag is undefined behavior. But it seems like there's a pretty big difference between using that claim to justify enabling some optimization that might do constant folding in a way that assumes the default rounding mode versus using that claim to disregard part of the semantics of a built-in that is modeling a target-specific instruction.

I'm not insisting that we have to revert this patch or anything like that. I'm just saying that we should think about it. My personal preference would actually be to have this code re-implemented using the new constrained fcmp intrinsic when it lands. That still leaves the masking part of this unsettled, but as you say that's probably not a priority right now.

BTW, here's a link to the constrained fcmp review: https://reviews.llvm.org/D54121

Thanks, I agree with @andrew.w.kaylor and his interpretation.
I was trying to convey the message that the programmer operating with intrinsics relies on the semantics they carry because there's no other way to express that semantics. Re-optimizing what's already optimized (hand-written code with intrinsics) may be nice, but not critical in his (my) view, whereas violating semantics defeats the purpose - I could have written that same loop around generic compare myself if that was enough for my purposes. I would not insist on the way you resolve this, this is not urgent, but I do believe this is a regression and it deserves a fix.

Revision Contents

Path

Size

lib/

CodeGen/

CGBuiltin.cpp

165 lines

test/

CodeGen/

114 lines

24 lines

4 lines

152 lines

139 lines

Diff 152456

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,062 Lines • ▼ Show 20 Lines	case X86::BI__builtin_ia32_divsd_round_mask: {
// If round parameter is not _MM_FROUND_CUR_DIRECTION, don't lower.		// If round parameter is not _MM_FROUND_CUR_DIRECTION, don't lower.
if (cast<llvm::ConstantInt>(Ops[4])->getZExtValue() != (uint64_t)4)		if (cast<llvm::ConstantInt>(Ops[4])->getZExtValue() != (uint64_t)4)
return Builder.CreateCall(Intr, Ops);		return Builder.CreateCall(Intr, Ops);

Value *A = Builder.CreateExtractElement(Ops[0], (uint64_t)0);		Value *A = Builder.CreateExtractElement(Ops[0], (uint64_t)0);
Value *B = Builder.CreateExtractElement(Ops[1], (uint64_t)0);		Value *B = Builder.CreateExtractElement(Ops[1], (uint64_t)0);
Value *C = Builder.CreateExtractElement(Ops[2], (uint64_t)0);		Value *C = Builder.CreateExtractElement(Ops[2], (uint64_t)0);
Value *Mask = Ops[3];		Value *Mask = Ops[3];
Value *Div = Builder.CreateFDiv(A, B);		Value *Div = Builder.CreateFDiv(A, B);
		efriedmaUnsubmitted Not Done Reply Inline Actions Given we're ignoring floating-point exceptions, we should also ignore the "rounding mode" operand (__MM_FROUND_NO_EXC only affects exceptions, and the other values are irrelevant because there isn't any actual rounding involved). efriedma: Given we're ignoring floating-point exceptions, we should also ignore the "rounding mode"…
		GBuellaAuthorUnsubmitted Not Done Reply Inline Actions Oh, alltight. @craig.topper , Do you also agree on ignoring all of these, and just lowering all the comparisions to fcmp? That is the easiest path of course. GBuella: Oh, alltight. @craig.topper , Do you also agree on ignoring all of these, and just lowering all…
		craig.topperUnsubmitted Not Done Reply Inline Actions I think I'm fine with ignoring it, but definitely leave a comment because we will probably have to revisit this code in the future as we continue towards supporting FENV_ACCESS. craig.topper: I think I'm fine with ignoring it, but definitely leave a comment because we will probably have…
llvm::VectorType *MaskTy = llvm::VectorType::get(Builder.getInt1Ty(),		llvm::VectorType *MaskTy = llvm::VectorType::get(Builder.getInt1Ty(),
cast<IntegerType>(Mask->getType())->getBitWidth());		cast<IntegerType>(Mask->getType())->getBitWidth());
Mask = Builder.CreateBitCast(Mask, MaskTy);		Mask = Builder.CreateBitCast(Mask, MaskTy);
Mask = Builder.CreateExtractElement(Mask, (uint64_t)0);		Mask = Builder.CreateExtractElement(Mask, (uint64_t)0);
Value *Select = Builder.CreateSelect(Mask, Div, C);		Value *Select = Builder.CreateSelect(Mask, Div, C);
return Builder.CreateInsertElement(Ops[0], Select, (uint64_t)0);		return Builder.CreateInsertElement(Ops[0], Select, (uint64_t)0);
}		}

// 3DNow!		// 3DNow!
case X86::BI__builtin_ia32_pswapdsf:		case X86::BI__builtin_ia32_pswapdsf:
case X86::BI__builtin_ia32_pswapdsi: {		case X86::BI__builtin_ia32_pswapdsi: {
llvm::Type *MMXTy = llvm::Type::getX86_MMXTy(getLLVMContext());		llvm::Type *MMXTy = llvm::Type::getX86_MMXTy(getLLVMContext());
Ops[0] = Builder.CreateBitCast(Ops[0], MMXTy, "cast");		Ops[0] = Builder.CreateBitCast(Ops[0], MMXTy, "cast");
llvm::Function *F = CGM.getIntrinsic(Intrinsic::x86_3dnowa_pswapd);		llvm::Function *F = CGM.getIntrinsic(Intrinsic::x86_3dnowa_pswapd);
return Builder.CreateCall(F, Ops, "pswapd");		return Builder.CreateCall(F, Ops, "pswapd");
}		}
case X86::BI__builtin_ia32_rdrand16_step:		case X86::BI__builtin_ia32_rdrand16_step:
case X86::BI__builtin_ia32_rdrand32_step:		case X86::BI__builtin_ia32_rdrand32_step:
case X86::BI__builtin_ia32_rdrand64_step:		case X86::BI__builtin_ia32_rdrand64_step:
case X86::BI__builtin_ia32_rdseed16_step:		case X86::BI__builtin_ia32_rdseed16_step:
case X86::BI__builtin_ia32_rdseed32_step:		case X86::BI__builtin_ia32_rdseed32_step:
case X86::BI__builtin_ia32_rdseed64_step: {		case X86::BI__builtin_ia32_rdseed64_step: {
Intrinsic::ID ID;		Intrinsic::ID ID;
switch (BuiltinID) {		switch (BuiltinID) {
default: llvm_unreachable("Unsupported intrinsic!");		default: llvm_unreachable("Unsupported intrinsic!");
case X86::BI__builtin_ia32_rdrand16_step:		case X86::BI__builtin_ia32_rdrand16_step:
ID = Intrinsic::x86_rdrand_16;		ID = Intrinsic::x86_rdrand_16;
break;		break;
case X86::BI__builtin_ia32_rdrand32_step:		case X86::BI__builtin_ia32_rdrand32_step:
		spatelUnsubmitted Not Done Reply Inline Actions llvm::Type ResultType = ConvertType(E->getType()); if (CC == 0x0f \|\| CC == 0x1f) return llvm::Constant::getAllOnesValue(ResultType); if (CC == 0x0b \|\| CC == 0x1b) return llvm::Constant::getNullValue(ResultType); ? Also, can we use the defined predicate names instead of hex values in this code? spatel:* llvm::Type *ResultType = ConvertType(E->getType()); if (CC == 0x0f \|\| CC == 0x1f)…
		GBuellaAuthorUnsubmitted Not Done Reply Inline Actions Well, I believe we would need to predefine them first. I only found those in the client header `avxintrin.h`. GBuella: Well, I believe we would need to predefine them first. I only found those in the client header…
ID = Intrinsic::x86_rdrand_32;		ID = Intrinsic::x86_rdrand_32;
break;		break;
case X86::BI__builtin_ia32_rdrand64_step:		case X86::BI__builtin_ia32_rdrand64_step:
ID = Intrinsic::x86_rdrand_64;		ID = Intrinsic::x86_rdrand_64;
break;		break;
case X86::BI__builtin_ia32_rdseed16_step:		case X86::BI__builtin_ia32_rdseed16_step:
ID = Intrinsic::x86_rdseed_16;		ID = Intrinsic::x86_rdseed_16;
break;		break;
case X86::BI__builtin_ia32_rdseed32_step:		case X86::BI__builtin_ia32_rdseed32_step:
ID = Intrinsic::x86_rdseed_32;		ID = Intrinsic::x86_rdseed_32;
break;		break;
case X86::BI__builtin_ia32_rdseed64_step:		case X86::BI__builtin_ia32_rdseed64_step:
ID = Intrinsic::x86_rdseed_64;		ID = Intrinsic::x86_rdseed_64;
break;		break;
}		}

Value *Call = Builder.CreateCall(CGM.getIntrinsic(ID));		Value *Call = Builder.CreateCall(CGM.getIntrinsic(ID));
Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 0),		Builder.CreateDefaultAlignedStore(Builder.CreateExtractValue(Call, 0),
Ops[0]);		Ops[0]);
return Builder.CreateExtractValue(Call, 1);		return Builder.CreateExtractValue(Call, 1);
}		}

case X86::BI__builtin_ia32_cmpps128_mask:		// packed comparison intrinsics
case X86::BI__builtin_ia32_cmpps256_mask:
case X86::BI__builtin_ia32_cmpps512_mask:
case X86::BI__builtin_ia32_cmppd128_mask:
case X86::BI__builtin_ia32_cmppd256_mask:
case X86::BI__builtin_ia32_cmppd512_mask: {
unsigned NumElts = Ops[0]->getType()->getVectorNumElements();
Value *MaskIn = Ops[3];
Ops.erase(&Ops[3]);

Intrinsic::ID ID;
switch (BuiltinID) {
default: llvm_unreachable("Unsupported intrinsic!");
case X86::BI__builtin_ia32_cmpps128_mask:
ID = Intrinsic::x86_avx512_mask_cmp_ps_128;
break;
case X86::BI__builtin_ia32_cmpps256_mask:
ID = Intrinsic::x86_avx512_mask_cmp_ps_256;
break;
case X86::BI__builtin_ia32_cmpps512_mask:
ID = Intrinsic::x86_avx512_mask_cmp_ps_512;
break;
case X86::BI__builtin_ia32_cmppd128_mask:
ID = Intrinsic::x86_avx512_mask_cmp_pd_128;
break;
case X86::BI__builtin_ia32_cmppd256_mask:
ID = Intrinsic::x86_avx512_mask_cmp_pd_256;
break;
case X86::BI__builtin_ia32_cmppd512_mask:
ID = Intrinsic::x86_avx512_mask_cmp_pd_512;
break;
}

Value *Cmp = Builder.CreateCall(CGM.getIntrinsic(ID), Ops);
return EmitX86MaskedCompareResult(*this, Cmp, NumElts, MaskIn);
}

// SSE packed comparison intrinsics
case X86::BI__builtin_ia32_cmpeqps:		case X86::BI__builtin_ia32_cmpeqps:
case X86::BI__builtin_ia32_cmpeqpd:		case X86::BI__builtin_ia32_cmpeqpd:
return getVectorFCmpIR(CmpInst::FCMP_OEQ);		return getVectorFCmpIR(CmpInst::FCMP_OEQ);
case X86::BI__builtin_ia32_cmpltps:		case X86::BI__builtin_ia32_cmpltps:
case X86::BI__builtin_ia32_cmpltpd:		case X86::BI__builtin_ia32_cmpltpd:
return getVectorFCmpIR(CmpInst::FCMP_OLT);		return getVectorFCmpIR(CmpInst::FCMP_OLT);
case X86::BI__builtin_ia32_cmpleps:		case X86::BI__builtin_ia32_cmpleps:
case X86::BI__builtin_ia32_cmplepd:		case X86::BI__builtin_ia32_cmplepd:
return getVectorFCmpIR(CmpInst::FCMP_OLE);		return getVectorFCmpIR(CmpInst::FCMP_OLE);
case X86::BI__builtin_ia32_cmpunordps:		case X86::BI__builtin_ia32_cmpunordps:
case X86::BI__builtin_ia32_cmpunordpd:		case X86::BI__builtin_ia32_cmpunordpd:
return getVectorFCmpIR(CmpInst::FCMP_UNO);		return getVectorFCmpIR(CmpInst::FCMP_UNO);
case X86::BI__builtin_ia32_cmpneqps:		case X86::BI__builtin_ia32_cmpneqps:
case X86::BI__builtin_ia32_cmpneqpd:		case X86::BI__builtin_ia32_cmpneqpd:
return getVectorFCmpIR(CmpInst::FCMP_UNE);		return getVectorFCmpIR(CmpInst::FCMP_UNE);
case X86::BI__builtin_ia32_cmpnltps:		case X86::BI__builtin_ia32_cmpnltps:
case X86::BI__builtin_ia32_cmpnltpd:		case X86::BI__builtin_ia32_cmpnltpd:
return getVectorFCmpIR(CmpInst::FCMP_UGE);		return getVectorFCmpIR(CmpInst::FCMP_UGE);
case X86::BI__builtin_ia32_cmpnleps:		case X86::BI__builtin_ia32_cmpnleps:
case X86::BI__builtin_ia32_cmpnlepd:		case X86::BI__builtin_ia32_cmpnlepd:
		craig.topperUnsubmitted Not Done Reply Inline Actions Move this into the "ReturnsMask" path, and use getVectorFCmpIR for the other path. craig.topper: Move this into the "ReturnsMask" path, and use getVectorFCmpIR for the other path.
return getVectorFCmpIR(CmpInst::FCMP_UGT);		return getVectorFCmpIR(CmpInst::FCMP_UGT);
case X86::BI__builtin_ia32_cmpordps:		case X86::BI__builtin_ia32_cmpordps:
case X86::BI__builtin_ia32_cmpordpd:		case X86::BI__builtin_ia32_cmpordpd:
return getVectorFCmpIR(CmpInst::FCMP_ORD);		return getVectorFCmpIR(CmpInst::FCMP_ORD);
case X86::BI__builtin_ia32_cmpps:		case X86::BI__builtin_ia32_cmpps:
case X86::BI__builtin_ia32_cmpps256:		case X86::BI__builtin_ia32_cmpps256:
case X86::BI__builtin_ia32_cmppd:		case X86::BI__builtin_ia32_cmppd:
case X86::BI__builtin_ia32_cmppd256: {		case X86::BI__builtin_ia32_cmppd256:
		case X86::BI__builtin_ia32_cmpps128_mask:
		case X86::BI__builtin_ia32_cmpps256_mask:
		case X86::BI__builtin_ia32_cmpps512_mask:
		case X86::BI__builtin_ia32_cmppd128_mask:
		case X86::BI__builtin_ia32_cmppd256_mask:
		efriedmaUnsubmitted Not Done Reply Inline Actions interesting. efriedma:* *interesting.
		case X86::BI__builtin_ia32_cmppd512_mask: {
		// Lowering vector comparisons to fcmp instructions, while
		// ignoring signalling behaviour requested
		// ignoring rounding mode requested
		// This is is only possible as long as FENV_ACCESS is not implemented.
		// See also: https://reviews.llvm.org/D45616

		// The third argument is the comparison condition, and integer in the
		// range [0, 31]
unsigned CC = cast<llvm::ConstantInt>(Ops[2])->getZExtValue() & 0x1f;		unsigned CC = cast<llvm::ConstantInt>(Ops[2])->getZExtValue() & 0x1f;
// If this one of the SSE immediates, we can use native IR.
if (CC < 8) {		// Lowering to IR fcmp instruction.
		craig.topperUnsubmitted Not Done Reply Inline Actions Sink this into the switch as the default case with an llvm_unreachable craig.topper: Sink this into the switch as the default case with an llvm_unreachable
		// Ignoring requested signaling behaviour,
		// e.g. both _CMP_GT_OS & _CMP_GT_OQ are translated to FCMP_OGT.
FCmpInst::Predicate Pred;		FCmpInst::Predicate Pred;
switch (CC) {		switch (CC) {
case 0: Pred = FCmpInst::FCMP_OEQ; break;		case 0x00: Pred = FCmpInst::FCMP_OEQ; break;
case 1: Pred = FCmpInst::FCMP_OLT; break;		case 0x01: Pred = FCmpInst::FCMP_OLT; break;
case 2: Pred = FCmpInst::FCMP_OLE; break;		case 0x02: Pred = FCmpInst::FCMP_OLE; break;
case 3: Pred = FCmpInst::FCMP_UNO; break;		case 0x03: Pred = FCmpInst::FCMP_UNO; break;
case 4: Pred = FCmpInst::FCMP_UNE; break;		case 0x04: Pred = FCmpInst::FCMP_UNE; break;
case 5: Pred = FCmpInst::FCMP_UGE; break;		case 0x05: Pred = FCmpInst::FCMP_UGE; break;
case 6: Pred = FCmpInst::FCMP_UGT; break;		case 0x06: Pred = FCmpInst::FCMP_UGT; break;
case 7: Pred = FCmpInst::FCMP_ORD; break;		case 0x07: Pred = FCmpInst::FCMP_ORD; break;
}		case 0x08: Pred = FCmpInst::FCMP_UEQ; break;
return getVectorFCmpIR(Pred);		case 0x09: Pred = FCmpInst::FCMP_ULT; break;
}		case 0x0a: Pred = FCmpInst::FCMP_ULE; break;
		case 0x0c: Pred = FCmpInst::FCMP_ONE; break;
// We can't handle 8-31 immediates with native IR, use the intrinsic.		case 0x0d: Pred = FCmpInst::FCMP_OGE; break;
// Except for predicates that create constants.		case 0x0e: Pred = FCmpInst::FCMP_OGT; break;
Intrinsic::ID ID;		case 0x10: Pred = FCmpInst::FCMP_OEQ; break;
switch (BuiltinID) {		case 0x11: Pred = FCmpInst::FCMP_OLT; break;
default: llvm_unreachable("Unsupported intrinsic!");		case 0x12: Pred = FCmpInst::FCMP_OLE; break;
case X86::BI__builtin_ia32_cmpps:		case 0x13: Pred = FCmpInst::FCMP_UNO; break;
ID = Intrinsic::x86_sse_cmp_ps;		case 0x14: Pred = FCmpInst::FCMP_UNE; break;
break;		case 0x15: Pred = FCmpInst::FCMP_UGE; break;
case X86::BI__builtin_ia32_cmpps256:		case 0x16: Pred = FCmpInst::FCMP_UGT; break;
		case 0x17: Pred = FCmpInst::FCMP_ORD; break;
		case 0x18: Pred = FCmpInst::FCMP_UEQ; break;
		case 0x19: Pred = FCmpInst::FCMP_ULT; break;
		case 0x1a: Pred = FCmpInst::FCMP_ULE; break;
		case 0x1c: Pred = FCmpInst::FCMP_ONE; break;
		case 0x1d: Pred = FCmpInst::FCMP_OGE; break;
		case 0x1e: Pred = FCmpInst::FCMP_OGT; break;
// _CMP_TRUE_UQ, _CMP_TRUE_US produce -1,-1... vector		// _CMP_TRUE_UQ, _CMP_TRUE_US produce -1,-1... vector
// on any input and _CMP_FALSE_OQ, _CMP_FALSE_OS produce 0, 0...		// on any input and _CMP_FALSE_OQ, _CMP_FALSE_OS produce 0, 0...
if (CC == 0xf \|\| CC == 0xb \|\| CC == 0x1b \|\| CC == 0x1f) {		case 0x0b: // FALSE_OQ
Value *Constant = (CC == 0xf \|\| CC == 0x1f) ?		case 0x1b: // FALSE_OS
llvm::Constant::getAllOnesValue(Builder.getInt32Ty()) :		return llvm::Constant::getNullValue(ConvertType(E->getType()));
llvm::Constant::getNullValue(Builder.getInt32Ty());		case 0x0f: // TRUE_UQ
Value *Vec = Builder.CreateVectorSplat(		case 0x1f: // TRUE_US
Ops[0]->getType()->getVectorNumElements(), Constant);		return llvm::Constant::getAllOnesValue(ConvertType(E->getType()));
return Builder.CreateBitCast(Vec, Ops[0]->getType());
		default: llvm_unreachable("Unhandled CC");
		spatelUnsubmitted Not Done Reply Inline Actions On 2nd thought, why are we optimizing when we have matching IR predicates for these? Just translate to FCMP_TRUE / FCMP_FALSE instead of special-casing these values. InstSimplify can handle the constant folding if optimization is on. spatel: On 2nd thought, why are we optimizing when we have matching IR predicates for these? Just…
		GBuellaAuthorUnsubmitted Not Done Reply Inline Actions I don't know, these TRUE/FALSE cases were already handled here, I only rearranged the code. Does this cause any problems? I mean, if it meant an extra dozen lines of code I would get it, but as it is, does it hurt anything? GBuella: I don't know, these TRUE/FALSE cases were already handled here, I only rearranged the code.
		spatelUnsubmitted Not Done Reply Inline Actions It's mostly about being consistent. I think it's generally out-of-bounds for clang to optimize code. That's not its job. The potential end user difference is that in unoptimized code, a user might expect to see the vcmpXX asm corresponding to the source-level intrinsic when debugging. I agree that this is changing existing behavior, so it's better if we make this change before or after this patch. spatel: It's mostly about being consistent. I think it's generally out-of-bounds for clang to optimize…
}		}
ID = Intrinsic::x86_avx_cmp_ps_256;
break;		// Builtins without the _mask suffix return a vector of integers
case X86::BI__builtin_ia32_cmppd:		// of the same width as the input vectors
ID = Intrinsic::x86_sse2_cmp_pd;		switch (BuiltinID) {
break;		case X86::BI__builtin_ia32_cmpps512_mask:
case X86::BI__builtin_ia32_cmppd256:		case X86::BI__builtin_ia32_cmppd512_mask:
// _CMP_TRUE_UQ, _CMP_TRUE_US produce -1,-1... vector		case X86::BI__builtin_ia32_cmpps128_mask:
// on any input and _CMP_FALSE_OQ, _CMP_FALSE_OS produce 0, 0...		case X86::BI__builtin_ia32_cmpps256_mask:
if (CC == 0xf \|\| CC == 0xb \|\| CC == 0x1b \|\| CC == 0x1f) {		case X86::BI__builtin_ia32_cmppd128_mask:
Value *Constant = (CC == 0xf \|\| CC == 0x1f) ?		case X86::BI__builtin_ia32_cmppd256_mask: {
llvm::Constant::getAllOnesValue(Builder.getInt64Ty()) :		unsigned NumElts = Ops[0]->getType()->getVectorNumElements();
llvm::Constant::getNullValue(Builder.getInt64Ty());		Value *Cmp = Builder.CreateFCmp(Pred, Ops[0], Ops[1]);
Value *Vec = Builder.CreateVectorSplat(		return EmitX86MaskedCompareResult(*this, Cmp, NumElts, Ops[3]);
Ops[0]->getType()->getVectorNumElements(), Constant);
return Builder.CreateBitCast(Vec, Ops[0]->getType());
}		}
ID = Intrinsic::x86_avx_cmp_pd_256;		default:
break;		return getVectorFCmpIR(Pred);
}		}

return Builder.CreateCall(CGM.getIntrinsic(ID), Ops);
}		}

// SSE scalar comparison intrinsics		// SSE scalar comparison intrinsics
case X86::BI__builtin_ia32_cmpeqss:		case X86::BI__builtin_ia32_cmpeqss:
return getCmpIntrinsicCall(Intrinsic::x86_sse_cmp_ss, 0);		return getCmpIntrinsicCall(Intrinsic::x86_sse_cmp_ss, 0);
case X86::BI__builtin_ia32_cmpltss:		case X86::BI__builtin_ia32_cmpltss:
return getCmpIntrinsicCall(Intrinsic::x86_sse_cmp_ss, 1);		return getCmpIntrinsicCall(Intrinsic::x86_sse_cmp_ss, 1);
case X86::BI__builtin_ia32_cmpless:		case X86::BI__builtin_ia32_cmpless:
▲ Show 20 Lines • Show All 2,002 Lines • Show Last 20 Lines

test/CodeGen/avx-builtins.c

Show First 20 Lines • Show All 208 Lines • ▼ Show 20 Lines
__m256 test_mm_ceil_ps(__m256 x) {		__m256 test_mm_ceil_ps(__m256 x) {
// CHECK-LABEL: test_mm_ceil_ps		// CHECK-LABEL: test_mm_ceil_ps
// CHECK: call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %{{.*}}, i32 2)		// CHECK: call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %{{.*}}, i32 2)
return _mm256_ceil_ps(x);		return _mm256_ceil_ps(x);
}		}

__m128d test_mm_cmp_pd(__m128d A, __m128d B) {		__m128d test_mm_cmp_pd(__m128d A, __m128d B) {
// CHECK-LABEL: test_mm_cmp_pd		// CHECK-LABEL: test_mm_cmp_pd
// CHECK: call <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double> %{{.}}, <2 x double> %{{.}}, i8 13)		// CHECK: [[CMP:%.]] = fcmp oge <2 x double> %{{.}}, %{{.*}}
return _mm_cmp_pd(A, B, _CMP_GE_OS);		return _mm_cmp_pd(A, B, _CMP_GE_OS);
}		}

__m256d test_mm256_cmp_pd(__m256d A, __m256d B) {		__m256d test_mm256_cmp_pd(__m256d A, __m256d B) {
// CHECK-LABEL: test_mm256_cmp_pd		// CHECK-LABEL: test_mm256_cmp_pd
// CHECK: call <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double> %{{.}}, <4 x double> %{{.}}, i8 13)		// CHECK: [[CMP:%.]] = fcmp oge <4 x double> %{{.}}, %{{.*}}
return _mm256_cmp_pd(A, B, _CMP_GE_OS);		return _mm256_cmp_pd(A, B, _CMP_GE_OS);
}		}

__m128 test_mm_cmp_ps(__m128 A, __m128 B) {		__m128 test_mm_cmp_ps(__m128 A, __m128 B) {
// CHECK-LABEL: test_mm_cmp_ps		// CHECK-LABEL: test_mm_cmp_ps
// CHECK: call <4 x float> @llvm.x86.sse.cmp.ps(<4 x float> %{{.}}, <4 x float> %{{.}}, i8 13)		// CHECK: [[CMP:%.]] = fcmp oge <4 x float> %{{.}}, %{{.*}}
return _mm_cmp_ps(A, B, _CMP_GE_OS);		return _mm_cmp_ps(A, B, _CMP_GE_OS);
}		}

__m256 test_mm256_cmp_ps(__m256d A, __m256d B) {		__m256 test_mm256_cmp_ps(__m256d A, __m256d B) {
// CHECK-LABEL: test_mm256_cmp_ps		// CHECK-LABEL: test_mm256_cmp_ps
// CHECK: call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %{{.}}, <8 x float> %{{.}}, i8 13)		// CHECK: [[CMP:%.]] = fcmp oge <8 x float> %{{.}}, %{{.*}}
return _mm256_cmp_ps(A, B, _CMP_GE_OS);		return _mm256_cmp_ps(A, B, _CMP_GE_OS);
}		}

__m128d test_mm_cmp_sd(__m128d A, __m128d B) {		__m128d test_mm_cmp_sd(__m128d A, __m128d B) {
// CHECK-LABEL: test_mm_cmp_sd		// CHECK-LABEL: test_mm_cmp_sd
// CHECK: call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %{{.}}, <2 x double> %{{.}}, i8 13)		// CHECK: call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %{{.}}, <2 x double> %{{.}}, i8 13)
		craig.topperUnsubmitted Not Done Reply Inline Actions This doesn't look right. This is a scalar instructino it shoudl only be comparing a single double. There should be insertelementss and extractelements. craig.topper: This doesn't look right. This is a scalar instructino it shoudl only be comparing a single…
return _mm_cmp_sd(A, B, _CMP_GE_OS);		return _mm_cmp_sd(A, B, _CMP_GE_OS);
}		}

__m128 test_mm_cmp_ss(__m128 A, __m128 B) {		__m128 test_mm_cmp_ss(__m128 A, __m128 B) {
// CHECK-LABEL: test_mm_cmp_ss		// CHECK-LABEL: test_mm_cmp_ss
// CHECK: call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i8 13)		// CHECK: call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i8 13)
return _mm_cmp_ss(A, B, _CMP_GE_OS);		return _mm_cmp_ss(A, B, _CMP_GE_OS);
}		}
▲ Show 20 Lines • Show All 1,146 Lines • ▼ Show 20 Lines	__m256i test_mm256_zextsi128_si256(__m128i A) {
// CHECK-LABEL: test_mm256_zextsi128_si256		// CHECK-LABEL: test_mm256_zextsi128_si256
// CHECK: store <2 x i64> zeroinitializer		// CHECK: store <2 x i64> zeroinitializer
// CHECK: shufflevector <2 x i64> %{{.}}, <2 x i64> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>		// CHECK: shufflevector <2 x i64> %{{.}}, <2 x i64> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
return _mm256_zextsi128_si256(A);		return _mm256_zextsi128_si256(A);
}		}

double test_mm256_cvtsd_f64(__m256d __a)		double test_mm256_cvtsd_f64(__m256d __a)
{		{
// CHECK-LABEL: @test_mm256_cvtsd_f64		// CHECK-LABEL: @test_mm256_cvtsd_f64
// CHECK: extractelement <4 x double> %{{.*}}, i32 0		// CHECK: extractelement <4 x double> %{{.*}}, i32 0
return _mm256_cvtsd_f64(__a);		return _mm256_cvtsd_f64(__a);
}		}

int test_mm256_cvtsi256_si32(__m256i __a)		int test_mm256_cvtsi256_si32(__m256i __a)
{		{
// CHECK-LABEL: @test_mm256_cvtsi256_si32		// CHECK-LABEL: @test_mm256_cvtsi256_si32
// CHECK: extractelement <8 x i32> %{{.*}}, i32 0		// CHECK: extractelement <8 x i32> %{{.*}}, i32 0
return _mm256_cvtsi256_si32(__a);		return _mm256_cvtsi256_si32(__a);
}		}

float test_mm256_cvtss_f32(__m256 __a)		float test_mm256_cvtss_f32(__m256 __a)
{		{
// CHECK-LABEL: @test_mm256_cvtss_f32		// CHECK-LABEL: @test_mm256_cvtss_f32
// CHECK: extractelement <8 x float> %{{.*}}, i32 0		// CHECK: extractelement <8 x float> %{{.*}}, i32 0
return _mm256_cvtss_f32(__a);		return _mm256_cvtss_f32(__a);
}		}

__m256 test_mm256_cmp_ps_true(__m256 a, __m256 b) {		__m256 test_mm256_cmp_ps_true(__m256 a, __m256 b) {
// CHECK-LABEL: @test_mm256_cmp_ps_true		// CHECK-LABEL: @test_mm256_cmp_ps_true
// CHECK: ret <8 x float> <float 0xFFFFFFFFE0000000,		// CHECK: ret <8 x float> <float 0xFFFFFFFFE0000000,
return _mm256_cmp_ps(a, b, _CMP_TRUE_UQ);		return _mm256_cmp_ps(a, b, _CMP_TRUE_UQ);
}		}

__m256d test_mm256_cmp_pd_true(__m256d a, __m256d b) {		__m256d test_mm256_cmp_pd_true(__m256d a, __m256d b) {
// CHECK-LABEL: @test_mm256_cmp_pd_true		// CHECK-LABEL: @test_mm256_cmp_pd_true
// CHECK: ret <4 x double> <double 0xFFFFFFFFFFFFFFFF,		// CHECK: ret <4 x double> <double 0xFFFFFFFFFFFFFFFF,
return _mm256_cmp_pd(a, b, _CMP_TRUE_UQ);		return _mm256_cmp_pd(a, b, _CMP_TRUE_UQ);
}		}

__m256 test_mm256_cmp_ps_false(__m256 a, __m256 b) {		__m256 test_mm256_cmp_ps_false(__m256 a, __m256 b) {
// CHECK-LABEL: @test_mm256_cmp_ps_false		// CHECK-LABEL: @test_mm256_cmp_ps_false
// CHECK: ret <8 x float> zeroinitializer		// CHECK: ret <8 x float> zeroinitializer
return _mm256_cmp_ps(a, b, _CMP_FALSE_OQ);		return _mm256_cmp_ps(a, b, _CMP_FALSE_OQ);
}		}

__m256d test_mm256_cmp_pd_false(__m256d a, __m256d b) {		__m256d test_mm256_cmp_pd_false(__m256d a, __m256d b) {
// CHECK-LABEL: @test_mm256_cmp_pd_false		// CHECK-LABEL: @test_mm256_cmp_pd_false
// CHECK: ret <4 x double> zeroinitializer		// CHECK: ret <4 x double> zeroinitializer
return _mm256_cmp_pd(a, b, _CMP_FALSE_OQ);		return _mm256_cmp_pd(a, b, _CMP_FALSE_OQ);
}		}

__m256 test_mm256_cmp_ps_strue(__m256 a, __m256 b) {		__m256 test_mm256_cmp_ps_strue(__m256 a, __m256 b) {
// CHECK-LABEL: @test_mm256_cmp_ps_strue		// CHECK-LABEL: @test_mm256_cmp_ps_strue
// CHECK: ret <8 x float> <float 0xFFFFFFFFE0000000,		// CHECK: ret <8 x float> <float 0xFFFFFFFFE0000000,
return _mm256_cmp_ps(a, b, _CMP_TRUE_US);		return _mm256_cmp_ps(a, b, _CMP_TRUE_US);
}		}

__m256d test_mm256_cmp_pd_strue(__m256d a, __m256d b) {		__m256d test_mm256_cmp_pd_strue(__m256d a, __m256d b) {
// CHECK-LABEL: @test_mm256_cmp_pd_strue		// CHECK-LABEL: @test_mm256_cmp_pd_strue
// CHECK: ret <4 x double> <double 0xFFFFFFFFFFFFFFFF,		// CHECK: ret <4 x double> <double 0xFFFFFFFFFFFFFFFF,
return _mm256_cmp_pd(a, b, _CMP_TRUE_US);		return _mm256_cmp_pd(a, b, _CMP_TRUE_US);
}		}

__m256 test_mm256_cmp_ps_sfalse(__m256 a, __m256 b) {		__m256 test_mm256_cmp_ps_sfalse(__m256 a, __m256 b) {
// CHECK-LABEL: @test_mm256_cmp_ps_sfalse		// CHECK-LABEL: @test_mm256_cmp_ps_sfalse
// CHECK: ret <8 x float> zeroinitializer		// CHECK: ret <8 x float> zeroinitializer
return _mm256_cmp_ps(a, b, _CMP_FALSE_OS);		return _mm256_cmp_ps(a, b, _CMP_FALSE_OS);
}		}

__m256d test_mm256_cmp_pd_sfalse(__m256d a, __m256d b) {		__m256d test_mm256_cmp_pd_sfalse(__m256d a, __m256d b) {
// CHECK-LABEL: @test_mm256_cmp_pd_sfalse		// CHECK-LABEL: @test_mm256_cmp_pd_sfalse
// CHECK: ret <4 x double> zeroinitializer		// CHECK: ret <4 x double> zeroinitializer
return _mm256_cmp_pd(a, b, _CMP_FALSE_OS);		return _mm256_cmp_pd(a, b, _CMP_FALSE_OS);
}		}

		__m128 test_mm_cmp_ps_true(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_ps_true
		// CHECK: ret <4 x float> <float 0xFFFFFFFFE0000000,
		return _mm_cmp_ps(a, b, _CMP_TRUE_UQ);
		}

		__m128 test_mm_cmp_pd_true(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_pd_true
		// CHECK: ret <4 x float> <float 0xFFFFFFFFE0000000,
		return _mm_cmp_pd(a, b, _CMP_TRUE_UQ);
		}

		__m128 test_mm_cmp_ps_false(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_ps_false
		// CHECK: ret <4 x float> zeroinitializer
		return _mm_cmp_ps(a, b, _CMP_FALSE_OQ);
		}

		__m128 test_mm_cmp_pd_false(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_pd_false
		// CHECK: ret <4 x float> zeroinitializer
		return _mm_cmp_pd(a, b, _CMP_FALSE_OQ);
		}

		__m128 test_mm_cmp_ps_strue(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_ps_strue
		// CHECK: ret <4 x float> <float 0xFFFFFFFFE0000000,
		return _mm_cmp_ps(a, b, _CMP_TRUE_US);
		}

		__m128 test_mm_cmp_pd_strue(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_pd_strue
		// CHECK: ret <4 x float> <float 0xFFFFFFFFE0000000,
		return _mm_cmp_pd(a, b, _CMP_TRUE_US);
		}

		__m128 test_mm_cmp_ps_sfalse(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_ps_sfalse
		// CHECK: ret <4 x float> zeroinitializer
		return _mm_cmp_ps(a, b, _CMP_FALSE_OS);
		}

		__m128 test_mm_cmp_pd_sfalse(__m128 a, __m128 b) {
		// CHECK-LABEL: @test_mm_cmp_pd_sfalse
		// CHECK: ret <4 x float> zeroinitializer
		return _mm_cmp_pd(a, b, _CMP_FALSE_OS);
		}

test/CodeGen/avx-cmp-builtins.c

	// RUN: %clang_cc1 -ffreestanding %s -O3 -triple=x86_64-apple-darwin -target-feature +avx -emit-llvm -o - \| FileCheck %s			// RUN: %clang_cc1 -ffreestanding %s -O3 -triple=x86_64-apple-darwin -target-feature +avx -emit-llvm -o - \| FileCheck %s
	// FIXME: The shufflevector instructions in test_cmpgt_sd are relying on O3 here.			// FIXME: The shufflevector instructions in test_cmpgt_sd are relying on O3 here.


	#include <immintrin.h>			#include <immintrin.h>

	//			//
	// Test LLVM IR codegen of cmpXY instructions			// Test LLVM IR codegen of cmpXY instructions
	//			//

	__m128d test_cmp_pd(__m128d a, __m128d b) {
	// Expects that the third argument in LLVM IR is immediate expression
	// CHECK: @llvm.x86.sse2.cmp.pd({{.*}}, i8 13)
	return _mm_cmp_pd(a, b, _CMP_GE_OS);
	}

	__m128d test_cmp_ps(__m128 a, __m128 b) {
	// Expects that the third argument in LLVM IR is immediate expression
	// CHECK: @llvm.x86.sse.cmp.ps({{.*}}, i8 13)
	return _mm_cmp_ps(a, b, _CMP_GE_OS);
	}

	__m256d test_cmp_pd256(__m256d a, __m256d b) {
	// Expects that the third argument in LLVM IR is immediate expression
	// CHECK: @llvm.x86.avx.cmp.pd.256({{.*}}, i8 13)
	return _mm256_cmp_pd(a, b, _CMP_GE_OS);
	}

	__m256d test_cmp_ps256(__m256 a, __m256 b) {
	// Expects that the third argument in LLVM IR is immediate expression
	// CHECK: @llvm.x86.avx.cmp.ps.256({{.*}}, i8 13)
	return _mm256_cmp_ps(a, b, _CMP_GE_OS);
	}

	__m128d test_cmp_sd(__m128d a, __m128d b) {			__m128d test_cmp_sd(__m128d a, __m128d b) {
	// Expects that the third argument in LLVM IR is immediate expression			// Expects that the third argument in LLVM IR is immediate expression
	// CHECK: @llvm.x86.sse2.cmp.sd({{.*}}, i8 13)			// CHECK: @llvm.x86.sse2.cmp.sd({{.*}}, i8 13)
	return _mm_cmp_sd(a, b, _CMP_GE_OS);			return _mm_cmp_sd(a, b, _CMP_GE_OS);
	}			}

	__m128d test_cmp_ss(__m128 a, __m128 b) {			__m128d test_cmp_ss(__m128 a, __m128 b) {
	// Expects that the third argument in LLVM IR is immediate expression			// Expects that the third argument in LLVM IR is immediate expression
	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

test/CodeGen/avx2-builtins.c

	Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines
	__m128d test_mm_mask_i64gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {			__m128d test_mm_mask_i64gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {
	// CHECK-LABEL: test_mm_mask_i64gather_pd			// CHECK-LABEL: test_mm_mask_i64gather_pd
	// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)			// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)
	return _mm_mask_i64gather_pd(a, b, c, d, 2);			return _mm_mask_i64gather_pd(a, b, c, d, 2);
	}			}

	__m256d test_mm256_i64gather_pd(double const *b, __m256i c) {			__m256d test_mm256_i64gather_pd(double const *b, __m256i c) {
	// CHECK-LABEL: test_mm256_i64gather_pd			// CHECK-LABEL: test_mm256_i64gather_pd
	// CHECK: [[CMP:%.*]] = fcmp oeq <4 x double>			// CHECK: fcmp oeq <4 x double> %{{.}}, %{{.}}
	// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i64>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i64> [[SEXT]] to <4 x double>
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> zeroinitializer, i8* %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> zeroinitializer, i8* %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_i64gather_pd(b, c, 2);			return _mm256_i64gather_pd(b, c, 2);
	}			}

	__m256d test_mm256_mask_i64gather_pd(__m256d a, double const *b, __m256i c, __m256d d) {			__m256d test_mm256_mask_i64gather_pd(__m256d a, double const *b, __m256i c, __m256d d) {
	// CHECK-LABEL: test_mm256_mask_i64gather_pd			// CHECK-LABEL: test_mm256_mask_i64gather_pd
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_mask_i64gather_pd(a, b, c, d, 2);			return _mm256_mask_i64gather_pd(a, b, c, d, 2);
	▲ Show 20 Lines • Show All 623 Lines • Show Last 20 Lines

test/CodeGen/avx512f-builtins.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,273 Lines • ▼ Show 20 Lines
	{			{
	// CHECK-LABEL: @test_mm512_unpacklo_ps			// CHECK-LABEL: @test_mm512_unpacklo_ps
	// CHECK: shufflevector <16 x float> {{.*}} <i32 0, i32 16, i32 1, i32 17, i32 4, i32 20, i32 5, i32 21, i32 8, i32 24, i32 9, i32 25, i32 12, i32 28, i32 13, i32 29>			// CHECK: shufflevector <16 x float> {{.*}} <i32 0, i32 16, i32 1, i32 17, i32 4, i32 20, i32 5, i32 21, i32 8, i32 24, i32 9, i32 25, i32 12, i32 28, i32 13, i32 29>
	return _mm512_unpacklo_ps(a, b);			return _mm512_unpacklo_ps(a, b);
	}			}

	__mmask16 test_mm512_cmp_round_ps_mask(__m512 a, __m512 b) {			__mmask16 test_mm512_cmp_round_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmp_round_ps_mask			// CHECK-LABEL: @test_mm512_cmp_round_ps_mask
	// CHECK: call <16 x i1> @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp oeq <16 x float> %{{.}}, %{{.}}
	return _mm512_cmp_round_ps_mask(a, b, 0, _MM_FROUND_CUR_DIRECTION);			return _mm512_cmp_round_ps_mask(a, b, 0, _MM_FROUND_CUR_DIRECTION);
	}			}

	__mmask16 test_mm512_mask_cmp_round_ps_mask(__mmask16 m, __m512 a, __m512 b) {			__mmask16 test_mm512_mask_cmp_round_ps_mask(__mmask16 m, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmp_round_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmp_round_ps_mask
	// CHECK: [[CMP:%.*]] = call <16 x i1> @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp oeq <16 x float> %{{.}}, %{{.*}}
	// CHECK: and <16 x i1> [[CMP]], {{.*}}			// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmp_round_ps_mask(m, a, b, 0, _MM_FROUND_CUR_DIRECTION);			return _mm512_mask_cmp_round_ps_mask(m, a, b, 0, _MM_FROUND_CUR_DIRECTION);
	}			}

	__mmask16 test_mm512_cmp_ps_mask(__m512 a, __m512 b) {			__mmask16 test_mm512_cmp_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmp_ps_mask			// CHECK-LABEL: @test_mm512_cmp_ps_mask
	// CHECK: call <16 x i1> @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp oeq <16 x float> %{{.}}, %{{.}}
	return _mm512_cmp_ps_mask(a, b, 0);			return _mm512_cmp_ps_mask(a, b, 0);
	}			}

				__mmask16 test_mm512_cmp_ps_mask_true_uq(__m512 a, __m512 b) {
				// CHECK-LABEL: @test_mm512_cmp_ps_mask_true_uq
				// CHECK-NOT: call
				// CHECK: ret i16 -1
				return _mm512_cmp_ps_mask(a, b, _CMP_TRUE_UQ);
				}

				__mmask16 test_mm512_cmp_ps_mask_true_us(__m512 a, __m512 b) {
				// CHECK-LABEL: @test_mm512_cmp_ps_mask_true_us
				// CHECK-NOT: call
				// CHECK: ret i16 -1
				return _mm512_cmp_ps_mask(a, b, _CMP_TRUE_US);
				}

				__mmask16 test_mm512_cmp_ps_mask_false_oq(__m512 a, __m512 b) {
				// CHECK-LABEL: @test_mm512_cmp_ps_mask_false_oq
				// CHECK-NOT: call
				// CHECK: ret i16 0
				return _mm512_cmp_ps_mask(a, b, _CMP_FALSE_OQ);
				}

				__mmask16 test_mm512_cmp_ps_mask_false_os(__m512 a, __m512 b) {
				// CHECK-LABEL: @test_mm512_cmp_ps_mask_false_os
				// CHECK-NOT: call
				// CHECK: ret i16 0
				return _mm512_cmp_ps_mask(a, b, _CMP_FALSE_OS);
				}

	__mmask16 test_mm512_mask_cmp_ps_mask(__mmask16 m, __m512 a, __m512 b) {			__mmask16 test_mm512_mask_cmp_ps_mask(__mmask16 m, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmp_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmp_ps_mask
	// CHECK: [[CMP:%.*]] = call <16 x i1> @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp oeq <16 x float> %{{.}}, %{{.*}}
	// CHECK: and <16 x i1> [[CMP]], {{.*}}			// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmp_ps_mask(m, a, b, 0);			return _mm512_mask_cmp_ps_mask(m, a, b, 0);
	}			}

	__mmask8 test_mm512_cmp_round_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmp_round_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmp_round_pd_mask			// CHECK-LABEL: @test_mm512_cmp_round_pd_mask
	// CHECK: call <8 x i1> @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp oeq <8 x double> %{{.}}, %{{.*}}
	return _mm512_cmp_round_pd_mask(a, b, 0, _MM_FROUND_CUR_DIRECTION);			return _mm512_cmp_round_pd_mask(a, b, 0, _MM_FROUND_CUR_DIRECTION);
	}			}

	__mmask8 test_mm512_mask_cmp_round_pd_mask(__mmask8 m, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmp_round_pd_mask(__mmask8 m, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmp_round_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmp_round_pd_mask
	// CHECK: [[CMP:%.*]] = call <8 x i1> @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp oeq <8 x double> %{{.}}, %{{.*}}
	// CHECK: and <8 x i1> [[CMP]], {{.*}}			// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmp_round_pd_mask(m, a, b, 0, _MM_FROUND_CUR_DIRECTION);			return _mm512_mask_cmp_round_pd_mask(m, a, b, 0, _MM_FROUND_CUR_DIRECTION);
	}			}

	__mmask8 test_mm512_cmp_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmp_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmp_pd_mask			// CHECK-LABEL: @test_mm512_cmp_pd_mask
	// CHECK: call <8 x i1> @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp oeq <8 x double> %{{.}}, %{{.}}
	return _mm512_cmp_pd_mask(a, b, 0);			return _mm512_cmp_pd_mask(a, b, 0);
	}			}

				__mmask8 test_mm512_cmp_pd_mask_true_uq(__m512d a, __m512d b) {
				// CHECK-LABEL: @test_mm512_cmp_pd_mask_true_uq
				// CHECK-NOT: call
				// CHECK: ret i8 -1
				return _mm512_cmp_pd_mask(a, b, _CMP_TRUE_UQ);
				}

				__mmask8 test_mm512_cmp_pd_mask_true_us(__m512d a, __m512d b) {
				// CHECK-LABEL: @test_mm512_cmp_pd_mask_true_us
				// CHECK-NOT: call
				// CHECK: ret i8 -1
				return _mm512_cmp_pd_mask(a, b, _CMP_TRUE_US);
				}

				__mmask8 test_mm512_cmp_pd_mask_false_oq(__m512d a, __m512d b) {
				// CHECK-LABEL: @test_mm512_cmp_pd_mask_false_oq
				// CHECK-NOT: call
				// CHECK: ret i8 0
				return _mm512_cmp_pd_mask(a, b, _CMP_FALSE_OQ);
				}

				__mmask8 test_mm512_cmp_pd_mask_false_os(__m512d a, __m512d b) {
				// CHECK-LABEL: @test_mm512_cmp_pd_mask_false_os
				// CHECK-NOT: call
				// CHECK: ret i8 0
				return _mm512_cmp_pd_mask(a, b, _CMP_FALSE_OS);
				}

	__mmask8 test_mm512_mask_cmp_pd_mask(__mmask8 m, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmp_pd_mask(__mmask8 m, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmp_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmp_pd_mask
	// CHECK: [[CMP:%.*]] = call <8 x i1> @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp oeq <8 x double> %{{.}}, %{{.*}}
	// CHECK: and <8 x i1> [[CMP]], {{.*}}			// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmp_pd_mask(m, a, b, 0);			return _mm512_mask_cmp_pd_mask(m, a, b, 0);
	}			}

	__mmask8 test_mm512_cmpeq_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpeq_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpeq_pd_mask			// CHECK-LABEL: @test_mm512_cmpeq_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp oeq <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpeq_pd_mask(a, b);			return _mm512_cmpeq_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpeq_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpeq_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpeq_ps_mask			// CHECK-LABEL: @test_mm512_cmpeq_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp oeq <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpeq_ps_mask(a, b);			return _mm512_cmpeq_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpeq_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpeq_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpeq_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpeq_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp oeq <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpeq_pd_mask(k, a, b);			return _mm512_mask_cmpeq_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpeq_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpeq_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpeq_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpeq_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp oeq <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpeq_ps_mask(k, a, b);			return _mm512_mask_cmpeq_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmple_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmple_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmple_pd_mask			// CHECK-LABEL: @test_mm512_cmple_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp ole <8 x double> %{{.}}, %{{.}}
	return _mm512_cmple_pd_mask(a, b);			return _mm512_cmple_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmple_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmple_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmple_ps_mask			// CHECK-LABEL: @test_mm512_cmple_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp ole <16 x float> %{{.}}, %{{.}}
	return _mm512_cmple_ps_mask(a, b);			return _mm512_cmple_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmple_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmple_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmple_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmple_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp ole <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmple_pd_mask(k, a, b);			return _mm512_mask_cmple_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmple_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmple_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmple_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmple_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp ole <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmple_ps_mask(k, a, b);			return _mm512_mask_cmple_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmplt_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmplt_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmplt_pd_mask			// CHECK-LABEL: @test_mm512_cmplt_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp olt <8 x double> %{{.}}, %{{.}}
	return _mm512_cmplt_pd_mask(a, b);			return _mm512_cmplt_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmplt_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmplt_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmplt_ps_mask			// CHECK-LABEL: @test_mm512_cmplt_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp olt <16 x float> %{{.}}, %{{.}}
	return _mm512_cmplt_ps_mask(a, b);			return _mm512_cmplt_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmplt_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmplt_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmplt_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmplt_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp olt <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmplt_pd_mask(k, a, b);			return _mm512_mask_cmplt_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmplt_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmplt_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmplt_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmplt_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp olt <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmplt_ps_mask(k, a, b);			return _mm512_mask_cmplt_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmpneq_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpneq_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpneq_pd_mask			// CHECK-LABEL: @test_mm512_cmpneq_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp une <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpneq_pd_mask(a, b);			return _mm512_cmpneq_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpneq_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpneq_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpneq_ps_mask			// CHECK-LABEL: @test_mm512_cmpneq_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp une <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpneq_ps_mask(a, b);			return _mm512_cmpneq_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpneq_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpneq_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpneq_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpneq_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp une <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpneq_pd_mask(k, a, b);			return _mm512_mask_cmpneq_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpneq_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpneq_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpneq_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpneq_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp une <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpneq_ps_mask(k, a, b);			return _mm512_mask_cmpneq_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmpnle_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpnle_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpnle_pd_mask			// CHECK-LABEL: @test_mm512_cmpnle_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp ugt <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpnle_pd_mask(a, b);			return _mm512_cmpnle_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpnle_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpnle_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpnle_ps_mask			// CHECK-LABEL: @test_mm512_cmpnle_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp ugt <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpnle_ps_mask(a, b);			return _mm512_cmpnle_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpnle_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpnle_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpnle_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpnle_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp ugt <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpnle_pd_mask(k, a, b);			return _mm512_mask_cmpnle_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpnle_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpnle_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpnle_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpnle_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp ugt <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpnle_ps_mask(k, a, b);			return _mm512_mask_cmpnle_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmpnlt_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpnlt_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpnlt_pd_mask			// CHECK-LABEL: @test_mm512_cmpnlt_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp uge <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpnlt_pd_mask(a, b);			return _mm512_cmpnlt_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpnlt_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpnlt_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpnlt_ps_mask			// CHECK-LABEL: @test_mm512_cmpnlt_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp uge <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpnlt_ps_mask(a, b);			return _mm512_cmpnlt_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpnlt_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpnlt_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpnlt_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpnlt_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp uge <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpnlt_pd_mask(k, a, b);			return _mm512_mask_cmpnlt_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpnlt_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpnlt_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpnlt_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpnlt_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp uge <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpnlt_ps_mask(k, a, b);			return _mm512_mask_cmpnlt_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmpord_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpord_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpord_pd_mask			// CHECK-LABEL: @test_mm512_cmpord_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp ord <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpord_pd_mask(a, b);			return _mm512_cmpord_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpord_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpord_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpord_ps_mask			// CHECK-LABEL: @test_mm512_cmpord_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp ord <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpord_ps_mask(a, b);			return _mm512_cmpord_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpord_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpord_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpord_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpord_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp ord <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpord_pd_mask(k, a, b);			return _mm512_mask_cmpord_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpord_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpord_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpord_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpord_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp ord <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpord_ps_mask(k, a, b);			return _mm512_mask_cmpord_ps_mask(k, a, b);
	}			}

	__mmask8 test_mm512_cmpunord_pd_mask(__m512d a, __m512d b) {			__mmask8 test_mm512_cmpunord_pd_mask(__m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_cmpunord_pd_mask			// CHECK-LABEL: @test_mm512_cmpunord_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: fcmp uno <8 x double> %{{.}}, %{{.}}
	return _mm512_cmpunord_pd_mask(a, b);			return _mm512_cmpunord_pd_mask(a, b);
	}			}

	__mmask8 test_mm512_cmpunord_ps_mask(__m512 a, __m512 b) {			__mmask8 test_mm512_cmpunord_ps_mask(__m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_cmpunord_ps_mask			// CHECK-LABEL: @test_mm512_cmpunord_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: fcmp uno <16 x float> %{{.}}, %{{.}}
	return _mm512_cmpunord_ps_mask(a, b);			return _mm512_cmpunord_ps_mask(a, b);
	}			}

	__mmask8 test_mm512_mask_cmpunord_pd_mask(__mmask8 k, __m512d a, __m512d b) {			__mmask8 test_mm512_mask_cmpunord_pd_mask(__mmask8 k, __m512d a, __m512d b) {
	// CHECK-LABEL: @test_mm512_mask_cmpunord_pd_mask			// CHECK-LABEL: @test_mm512_mask_cmpunord_pd_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.pd.512			// CHECK: [[CMP:%.]] = fcmp uno <8 x double> %{{.}}, %{{.*}}
				// CHECK: and <8 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpunord_pd_mask(k, a, b);			return _mm512_mask_cmpunord_pd_mask(k, a, b);
	}			}

	__mmask8 test_mm512_mask_cmpunord_ps_mask(__mmask8 k, __m512 a, __m512 b) {			__mmask8 test_mm512_mask_cmpunord_ps_mask(__mmask8 k, __m512 a, __m512 b) {
	// CHECK-LABEL: @test_mm512_mask_cmpunord_ps_mask			// CHECK-LABEL: @test_mm512_mask_cmpunord_ps_mask
	// CHECK: @llvm.x86.avx512.mask.cmp.ps.512			// CHECK: [[CMP:%.]] = fcmp uno <16 x float> %{{.}}, %{{.*}}
				// CHECK: and <16 x i1> [[CMP]], {{.*}}
	return _mm512_mask_cmpunord_ps_mask(k, a, b);			return _mm512_mask_cmpunord_ps_mask(k, a, b);
	}			}

	__m256d test_mm512_extractf64x4_pd(__m512d a)			__m256d test_mm512_extractf64x4_pd(__m512d a)
	{			{
	// CHECK-LABEL: @test_mm512_extractf64x4_pd			// CHECK-LABEL: @test_mm512_extractf64x4_pd
	// CHECK: shufflevector <8 x double> %{{.*}}, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>			// CHECK: shufflevector <8 x double> %{{.*}}, <8 x double> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
	return _mm512_extractf64x4_pd(a, 1);			return _mm512_extractf64x4_pd(a, 1);
	▲ Show 20 Lines • Show All 5,840 Lines • ▼ Show 20 Lines
	__m512i test_mm512_maskz_compress_epi32(__mmask16 __U, __m512i __A) {			__m512i test_mm512_maskz_compress_epi32(__mmask16 __U, __m512i __A) {
	// CHECK-LABEL: @test_mm512_maskz_compress_epi32			// CHECK-LABEL: @test_mm512_maskz_compress_epi32
	// CHECK: @llvm.x86.avx512.mask.compress.d.512			// CHECK: @llvm.x86.avx512.mask.compress.d.512
	return _mm512_maskz_compress_epi32(__U, __A);			return _mm512_maskz_compress_epi32(__U, __A);
	}			}

	__mmask8 test_mm_cmp_round_ss_mask(__m128 __X, __m128 __Y) {			__mmask8 test_mm_cmp_round_ss_mask(__m128 __X, __m128 __Y) {
	// CHECK-LABEL: @test_mm_cmp_round_ss_mask			// CHECK-LABEL: @test_mm_cmp_round_ss_mask
	// CHECK: @llvm.x86.avx512.mask.cmp			// CHECK: @llvm.x86.avx512.mask.cmp
				craig.topperUnsubmitted Not Done Reply Inline Actions I don't think this is right either. craig.topper: I don't think this is right either.
	return _mm_cmp_round_ss_mask(__X, __Y, 5, _MM_FROUND_CUR_DIRECTION);			return _mm_cmp_round_ss_mask(__X, __Y, 5, _MM_FROUND_CUR_DIRECTION);
	}			}

	__mmask8 test_mm_mask_cmp_round_ss_mask(__mmask8 __M, __m128 __X, __m128 __Y) {			__mmask8 test_mm_mask_cmp_round_ss_mask(__mmask8 __M, __m128 __X, __m128 __Y) {
	// CHECK-LABEL: @test_mm_mask_cmp_round_ss_mask			// CHECK-LABEL: @test_mm_mask_cmp_round_ss_mask
	// CHECK: @llvm.x86.avx512.mask.cmp			// CHECK: @llvm.x86.avx512.mask.cmp
	return _mm_mask_cmp_round_ss_mask(__M, __X, __Y, 5, _MM_FROUND_CUR_DIRECTION);			return _mm_mask_cmp_round_ss_mask(__M, __X, __Y, 5, _MM_FROUND_CUR_DIRECTION);
	}			}
	▲ Show 20 Lines • Show All 2,171 Lines • Show Last 20 Lines

test/CodeGen/avx512vl-builtins.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,067 Lines • ▼ Show 20 Lines	__m128i test_mm_maskz_xor_epi64 (__mmask8 __U, __m128i __A, __m128i __B) {
//CHECK-LABEL: @test_mm_maskz_xor_epi64		//CHECK-LABEL: @test_mm_maskz_xor_epi64
//CHECK: xor <2 x i64> %{{.}}, %{{.}}		//CHECK: xor <2 x i64> %{{.}}, %{{.}}
//CHECK: select <2 x i1> %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}		//CHECK: select <2 x i1> %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}
return _mm_maskz_xor_epi64( __U, __A, __B);		return _mm_maskz_xor_epi64( __U, __A, __B);
}		}

__mmask8 test_mm256_cmp_ps_mask(__m256 __A, __m256 __B) {		__mmask8 test_mm256_cmp_ps_mask(__m256 __A, __m256 __B) {
// CHECK-LABEL: @test_mm256_cmp_ps_mask		// CHECK-LABEL: @test_mm256_cmp_ps_mask
// CHECK: call <8 x i1> @llvm.x86.avx512.mask.cmp.ps.256		// CHECK: fcmp oeq <8 x float> %{{.}}, %{{.}}
return (__mmask8)_mm256_cmp_ps_mask(__A, __B, 0);		return (__mmask8)_mm256_cmp_ps_mask(__A, __B, 0);
}		}

		__mmask8 test_mm256_cmp_ps_mask_true_uq(__m256 __A, __m256 __B) {
		// CHECK-LABEL: @test_mm256_cmp_ps_mask_true_uq
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm256_cmp_ps_mask(__A, __B, _CMP_TRUE_UQ);
		}

		__mmask8 test_mm256_cmp_ps_mask_true_us(__m256 __A, __m256 __B) {
		// CHECK-LABEL: @test_mm256_cmp_ps_mask_true_us
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm256_cmp_ps_mask(__A, __B, _CMP_TRUE_US);
		}

		__mmask8 test_mm256_cmp_ps_mask_false_oq(__m256 __A, __m256 __B) {
		// CHECK-LABEL: @test_mm256_cmp_ps_mask_false_oq
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm256_cmp_ps_mask(__A, __B, _CMP_FALSE_OQ);
		}

		__mmask8 test_mm256_cmp_ps_mask_false_os(__m256 __A, __m256 __B) {
		// CHECK-LABEL: @test_mm256_cmp_ps_mask_false_os
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm256_cmp_ps_mask(__A, __B, _CMP_FALSE_OS);
		}

__mmask8 test_mm256_mask_cmp_ps_mask(__mmask8 m, __m256 __A, __m256 __B) {		__mmask8 test_mm256_mask_cmp_ps_mask(__mmask8 m, __m256 __A, __m256 __B) {
// CHECK-LABEL: @test_mm256_mask_cmp_ps_mask		// CHECK-LABEL: @test_mm256_mask_cmp_ps_mask
// CHECK: [[CMP:%.*]] = call <8 x i1> @llvm.x86.avx512.mask.cmp.ps.256		// CHECK: fcmp oeq <8 x float> %{{.}}, %{{.}}
// CHECK: and <8 x i1> [[CMP]], {{.*}}		// CHECK: and <8 x i1> %{{.}}, %{{.}}
return _mm256_mask_cmp_ps_mask(m, __A, __B, 0);		return _mm256_mask_cmp_ps_mask(m, __A, __B, 0);
}		}

__mmask8 test_mm_cmp_ps_mask(__m128 __A, __m128 __B) {		__mmask8 test_mm_cmp_ps_mask(__m128 __A, __m128 __B) {
// CHECK-LABEL: @test_mm_cmp_ps_mask		// CHECK-LABEL: @test_mm_cmp_ps_mask
// CHECK: call <4 x i1> @llvm.x86.avx512.mask.cmp.ps.128		// CHECK: fcmp oeq <4 x float> %{{.}}, %{{.}}
return (__mmask8)_mm_cmp_ps_mask(__A, __B, 0);		return (__mmask8)_mm_cmp_ps_mask(__A, __B, 0);
}		}

		__mmask8 test_mm_cmp_ps_mask_true_uq(__m128 __A, __m128 __B) {
		// CHECK-LABEL: @test_mm_cmp_ps_mask_true_uq
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm_cmp_ps_mask(__A, __B, _CMP_TRUE_UQ);
		}

		__mmask8 test_mm_cmp_ps_mask_true_us(__m128 __A, __m128 __B) {
		// CHECK-LABEL: @test_mm_cmp_ps_mask_true_us
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm_cmp_ps_mask(__A, __B, _CMP_TRUE_US);
		}

		__mmask8 test_mm_cmp_ps_mask_false_oq(__m128 __A, __m128 __B) {
		// CHECK-LABEL: @test_mm_cmp_ps_mask_false_oq
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm_cmp_ps_mask(__A, __B, _CMP_FALSE_OQ);
		}

		__mmask8 test_mm_cmp_ps_mask_false_os(__m128 __A, __m128 __B) {
		// CHECK-LABEL: @test_mm_cmp_ps_mask_false_os
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm_cmp_ps_mask(__A, __B, _CMP_FALSE_OS);
		}

__mmask8 test_mm_mask_cmp_ps_mask(__mmask8 m, __m128 __A, __m128 __B) {		__mmask8 test_mm_mask_cmp_ps_mask(__mmask8 m, __m128 __A, __m128 __B) {
// CHECK-LABEL: @test_mm_mask_cmp_ps_mask		// CHECK-LABEL: @test_mm_mask_cmp_ps_mask
// CHECK: [[CMP:%.*]] = call <4 x i1> @llvm.x86.avx512.mask.cmp.ps.128		// CHECK: fcmp oeq <4 x float> %{{.}}, %{{.}}
// CHECK: and <4 x i1> [[CMP]], {{.*}}		// CHECK: shufflevector <8 x i1> %{{.}}, <8 x i1> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		// CHECK: and <4 x i1> %{{.}}, %{{.}}
return _mm_mask_cmp_ps_mask(m, __A, __B, 0);		return _mm_mask_cmp_ps_mask(m, __A, __B, 0);
}		}

__mmask8 test_mm256_cmp_pd_mask(__m256d __A, __m256d __B) {		__mmask8 test_mm256_cmp_pd_mask(__m256d __A, __m256d __B) {
// CHECK-LABEL: @test_mm256_cmp_pd_mask		// CHECK-LABEL: @test_mm256_cmp_pd_mask
// CHECK: call <4 x i1> @llvm.x86.avx512.mask.cmp.pd.256		// CHECK: fcmp oeq <4 x double> %{{.}}, %{{.}}
return (__mmask8)_mm256_cmp_pd_mask(__A, __B, 0);		return (__mmask8)_mm256_cmp_pd_mask(__A, __B, 0);
}		}

		__mmask8 test_mm256_cmp_pd_mask_true_uq(__m256d __A, __m256d __B) {
		// CHECK-LABEL: @test_mm256_cmp_pd_mask_true_uq
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm256_cmp_pd_mask(__A, __B, _CMP_TRUE_UQ);
		}

		__mmask8 test_mm256_cmp_pd_mask_true_us(__m256d __A, __m256d __B) {
		// CHECK-LABEL: @test_mm256_cmp_pd_mask_true_us
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm256_cmp_pd_mask(__A, __B, _CMP_TRUE_US);
		}

		__mmask8 test_mm256_cmp_pd_mask_false_oq(__m256d __A, __m256d __B) {
		// CHECK-LABEL: @test_mm256_cmp_pd_mask_false_oq
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm256_cmp_pd_mask(__A, __B, _CMP_FALSE_OQ);
		}

		__mmask8 test_mm256_cmp_pd_mask_false_os(__m256d __A, __m256d __B) {
		// CHECK-LABEL: @test_mm256_cmp_pd_mask_false_os
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm256_cmp_pd_mask(__A, __B, _CMP_FALSE_OS);
		}

__mmask8 test_mm256_mask_cmp_pd_mask(__mmask8 m, __m256d __A, __m256d __B) {		__mmask8 test_mm256_mask_cmp_pd_mask(__mmask8 m, __m256d __A, __m256d __B) {
// CHECK-LABEL: @test_mm256_mask_cmp_pd_mask		// CHECK-LABEL: @test_mm256_mask_cmp_pd_mask
// CHECK: [[CMP:%.*]] = call <4 x i1> @llvm.x86.avx512.mask.cmp.pd.256		// CHECK: fcmp oeq <4 x double> %{{.}}, %{{.}}
// CHECK: and <4 x i1> [[CMP]], {{.*}}		// CHECK: shufflevector <8 x i1> %{{.}}, <8 x i1> %{{.}}, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		// CHECK: and <4 x i1> %{{.}}, %{{.}}
return _mm256_mask_cmp_pd_mask(m, __A, __B, 0);		return _mm256_mask_cmp_pd_mask(m, __A, __B, 0);
}		}

__mmask8 test_mm_cmp_pd_mask(__m128d __A, __m128d __B) {		__mmask8 test_mm_cmp_pd_mask(__m128d __A, __m128d __B) {
// CHECK-LABEL: @test_mm_cmp_pd_mask		// CHECK-LABEL: @test_mm_cmp_pd_mask
// CHECK: call <2 x i1> @llvm.x86.avx512.mask.cmp.pd.128		// CHECK: fcmp oeq <2 x double> %{{.}}, %{{.}}
return (__mmask8)_mm_cmp_pd_mask(__A, __B, 0);		return (__mmask8)_mm_cmp_pd_mask(__A, __B, 0);
}		}

		__mmask8 test_mm_cmp_pd_mask_true_uq(__m128d __A, __m128d __B) {
		// CHECK-LABEL: @test_mm_cmp_pd_mask_true_uq
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm_cmp_pd_mask(__A, __B, _CMP_TRUE_UQ);
		}

		__mmask8 test_mm_cmp_pd_mask_true_us(__m128d __A, __m128d __B) {
		// CHECK-LABEL: @test_mm_cmp_pd_mask_true_us
		// CHECK-NOT: call
		// CHECK: ret i8 -1
		return (__mmask8)_mm_cmp_pd_mask(__A, __B, _CMP_TRUE_US);
		}

		__mmask8 test_mm_cmp_pd_mask_false_oq(__m128d __A, __m128d __B) {
		// CHECK-LABEL: @test_mm_cmp_pd_mask_false_oq
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm_cmp_pd_mask(__A, __B, _CMP_FALSE_OQ);
		}

		__mmask8 test_mm_cmp_pd_mask_false_os(__m128d __A, __m128d __B) {
		// CHECK-LABEL: @test_mm_cmp_pd_mask_false_os
		// CHECK-NOT: call
		// CHECK: ret i8 0
		return (__mmask8)_mm_cmp_pd_mask(__A, __B, _CMP_FALSE_OS);
		}

__mmask8 test_mm_mask_cmp_pd_mask(__mmask8 m, __m128d __A, __m128d __B) {		__mmask8 test_mm_mask_cmp_pd_mask(__mmask8 m, __m128d __A, __m128d __B) {
// CHECK-LABEL: @test_mm_mask_cmp_pd_mask		// CHECK-LABEL: @test_mm_mask_cmp_pd_mask
// CHECK: [[CMP:%.*]] = call <2 x i1> @llvm.x86.avx512.mask.cmp.pd.128		// CHECK: fcmp oeq <2 x double> %{{.}}, %{{.}}
// CHECK: and <2 x i1> [[CMP]], {{.*}}		// CHECK: shufflevector <8 x i1> %{{.}}, <8 x i1> %{{.}}, <2 x i32> <i32 0, i32 1>
		// CHECK: and <2 x i1> %{{.}}, %{{.}}
return _mm_mask_cmp_pd_mask(m, __A, __B, 0);		return _mm_mask_cmp_pd_mask(m, __A, __B, 0);
}		}

__m128d test_mm_mask_fmadd_pd(__m128d __A, __mmask8 __U, __m128d __B, __m128d __C) {		__m128d test_mm_mask_fmadd_pd(__m128d __A, __mmask8 __U, __m128d __B, __m128d __C) {
// CHECK-LABEL: @test_mm_mask_fmadd_pd		// CHECK-LABEL: @test_mm_mask_fmadd_pd
// CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})		// CHECK: call <2 x double> @llvm.fma.v2f64(<2 x double> %{{.}}, <2 x double> %{{.}}, <2 x double> %{{.*}})
// CHECK: bitcast i8 %{{.*}} to <8 x i1>		// CHECK: bitcast i8 %{{.*}} to <8 x i1>
// CHECK: shufflevector <8 x i1> %{{.}}, <8 x i1> %{{.}}, <2 x i32> <i32 0, i32 1>		// CHECK: shufflevector <8 x i1> %{{.}}, <8 x i1> %{{.}}, <2 x i32> <i32 0, i32 1>
▲ Show 20 Lines • Show All 6,719 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IRClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152456

lib/CodeGen/CGBuiltin.cpp

test/CodeGen/avx-builtins.c

test/CodeGen/avx-cmp-builtins.c

test/CodeGen/avx2-builtins.c

test/CodeGen/avx512f-builtins.c

test/CodeGen/avx512vl-builtins.c

[X86] Lower _mm[256|512]_cmp[.]_mask intrinsics to native llvm IR
ClosedPublic