This is an archive of the discontinued LLVM Phabricator instance.

[X86] Replacing X86-specific floor and ceil vector intrinsics with generic LLVM intrinsics
AbandonedPublic

Authored by mike.dvoretsky on Apr 3 2018, 2:12 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon

Summary

Currently, X86 floor and ceil intrinsics for vectors are implemented as target-specific intrinsics that use the generic rounding instruction of the corresponding vector processing feature (ROUND* or VRNDSCALE*). This patch replaces those specific cases with calls to target-independent @llvm.floor.* and @llvm.ceil.* intrinsics. This doesn't affect the resulting machine code, as those intrinsics are lowered to the same instructions, but exposes these specific rounding cases to generic optimizations.

This patch also has an LLVM part, D45203. An alternative InstCombine-based implementation is proposed in D48067.

Diff Detail

Repository: rC Clang

Event Timeline

mike.dvoretsky created this revision.Apr 3 2018, 2:12 AM

Herald added a subscriber: cfe-commits. · View Herald TranscriptApr 3 2018, 2:12 AM

mike.dvoretsky mentioned this in D45203: [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns.Apr 3 2018, 2:13 AM

mike.dvoretsky edited the summary of this revision. (Show Details)

craig.topper added inline comments.Apr 3 2018, 1:05 PM

include/clang/Basic/BuiltinsX86.def
951	I'd prefer CGBuiltin to detect the specific immediates on the rndscale value. Primarily because we should be able to optimize _mm512_roundscale_pd when the ceil/floor immediate is used.

On suggestion from @craig.topper moved all lowering to CGBuiltin.cpp with no new builtins added. Instead the existing builtins are lowered if their immediate values correspond to generic ceil and floor operations. D45203 is now required to enable transformations.

What about rndscaless/rndscalesd?

clang/lib/CodeGen/CGBuiltin.cpp
8307 ↗	(On Diff #140972)	I'm not sure we should even try to emit a mask for the legacy scalar intrinsics. Does this get removed well by the middle or backend?
8320 ↗	(On Diff #140972)	Why Int32? That's not the right mask width for the legacy intrinsics.

mike.dvoretsky added inline comments.Apr 5 2018, 7:09 AM

clang/lib/CodeGen/CGBuiltin.cpp
8307 ↗	(On Diff #140972)	The masking is done to represent all operations handled here in a uniform way. D45203 removes it in the backend.

But it’s not really consistent because the mask is being removed early for the packed intrinsics, but late for the scalar intrinsics. Doesn’t it also introduce extra code for fast isel?

There's a similar patch for sqrt here https://reviews.llvm.org/D41168 and it uses a scalar sqrt and insert element for the scalar case. I think we need a consistent direction here.

Changed the scalar intrinsic lowering to work via extract-insert. D45203 contains tests for folding the resulting IR patterns.

I'm not sure whether we should be doing this here or in InstCombine. @spatel, what do you think?

In D45202#1126616, @craig.topper wrote:

I'm not sure whether we should be doing this here or in InstCombine. @spatel, what do you think?

It's been a while since I looked at these. Last memory I have is for the conversion from x86 masked ops to the generic LLVM intrinsics, and we did that in InstCombineCalls. I don't know if there was any sound reasoning for that though. If it makes no functional difference, I'd continue with that structure just so we don't become scattered in the transform.

mike.dvoretsky mentioned this in D48067: [InstCombine] Replacing X86-specific rounding intrinsics with generic floor-ceil.Jun 12 2018, 2:59 AM

mike.dvoretsky edited the summary of this revision. (Show Details)

Abandoning this due to D48067 being accepted instead.

Revision Contents

Path

Size

include/

clang/

Basic/

BuiltinsX86.def

12 lines

lib/

CodeGen/

CGBuiltin.cpp

24 lines

Headers/

avx512fintrin.h

40 lines

avxintrin.h

24 lines

smmintrin.h

48 lines

test/

CodeGen/

avx-builtins.c

12 lines

avx512f-builtins.c

52 lines

sse41-builtins.c

24 lines

Diff 140745

include/clang/Basic/BuiltinsX86.def

	Show First 20 Lines • Show All 378 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_pminsd128, "V4iV4iV4i", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_pminsd128, "V4iV4iV4i", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_pminud128, "V4iV4iV4i", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_pminud128, "V4iV4iV4i", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_pminuw128, "V8sV8sV8s", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_pminuw128, "V8sV8sV8s", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_pmuldq128, "V2LLiV4iV4i", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_pmuldq128, "V2LLiV4iV4i", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_roundps, "V4fV4fIi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_roundps, "V4fV4fIi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_roundss, "V4fV4fV4fIi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_roundss, "V4fV4fV4fIi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_roundsd, "V2dV2dV2dIi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_roundsd, "V2dV2dV2dIi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_roundpd, "V2dV2dIi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_roundpd, "V2dV2dIi", "", "sse4.1")
				TARGET_BUILTIN(__builtin_ia32_ceilps_128_mask, "V4fV4fV4fIi", "", "sse4.1")
				TARGET_BUILTIN(__builtin_ia32_ceilpd_128_mask, "V2dV2dV2dIi", "", "sse4.1")
				TARGET_BUILTIN(__builtin_ia32_floorps_128_mask, "V4fV4fV4fIi", "", "sse4.1")
				TARGET_BUILTIN(__builtin_ia32_floorpd_128_mask, "V2dV2dV2dIi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_dpps, "V4fV4fV4fIc", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_dpps, "V4fV4fV4fIc", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_dppd, "V2dV2dV2dIc", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_dppd, "V2dV2dV2dIc", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_ptestz128, "iV2LLiV2LLi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_ptestz128, "iV2LLiV2LLi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_ptestc128, "iV2LLiV2LLi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_ptestc128, "iV2LLiV2LLi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_ptestnzc128, "iV2LLiV2LLi", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_ptestnzc128, "iV2LLiV2LLi", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_mpsadbw128, "V16cV16cV16cIc", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_mpsadbw128, "V16cV16cV16cIc", "", "sse4.1")
	TARGET_BUILTIN(__builtin_ia32_phminposuw128, "V8sV8s", "", "sse4.1")			TARGET_BUILTIN(__builtin_ia32_phminposuw128, "V8sV8s", "", "sse4.1")

	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_ps256, "V8fV8fV8fIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vperm2f128_si256, "V8iV8iV8iIc", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtpd256, "V4dV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_sqrtps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_rsqrtps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_rsqrtps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_rcpps256, "V8fV8f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_rcpps256, "V8fV8f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_roundpd256, "V4dV4dIi", "", "avx")			TARGET_BUILTIN(__builtin_ia32_roundpd256, "V4dV4dIi", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_roundps256, "V8fV8fIi", "", "avx")			TARGET_BUILTIN(__builtin_ia32_roundps256, "V8fV8fIi", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_floorpd_256_mask, "V4dV4dV4dIi", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_floorps_256_mask, "V8fV8fV4dIi", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_ceilpd_256_mask, "V4dV4dV4dIi", "", "avx")
				TARGET_BUILTIN(__builtin_ia32_ceilps_256_mask, "V8fV8fV4dIi", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestzpd, "iV2dV2d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestzpd, "iV2dV2d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestcpd, "iV2dV2d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestcpd, "iV2dV2d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestnzcpd, "iV2dV2d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestnzcpd, "iV2dV2d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestzps, "iV4fV4f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestzps, "iV4fV4f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestcps, "iV4fV4f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestcps, "iV4fV4f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestnzcps, "iV4fV4f", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestnzcps, "iV4fV4f", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestzpd256, "iV4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestzpd256, "iV4dV4d", "", "avx")
	TARGET_BUILTIN(__builtin_ia32_vtestcpd256, "iV4dV4d", "", "avx")			TARGET_BUILTIN(__builtin_ia32_vtestcpd256, "iV4dV4d", "", "avx")
	▲ Show 20 Lines • Show All 428 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_ia32_cmpps256_mask, "UcV8fV8fIiUc", "", "avx512vl")			TARGET_BUILTIN(__builtin_ia32_cmpps256_mask, "UcV8fV8fIiUc", "", "avx512vl")
	TARGET_BUILTIN(__builtin_ia32_cmpps128_mask, "UcV4fV4fIiUc", "", "avx512vl")			TARGET_BUILTIN(__builtin_ia32_cmpps128_mask, "UcV4fV4fIiUc", "", "avx512vl")
	TARGET_BUILTIN(__builtin_ia32_cmppd512_mask, "UcV8dV8dIiUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_cmppd512_mask, "UcV8dV8dIiUcIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_cmppd256_mask, "UcV4dV4dIiUc", "", "avx512vl")			TARGET_BUILTIN(__builtin_ia32_cmppd256_mask, "UcV4dV4dIiUc", "", "avx512vl")
	TARGET_BUILTIN(__builtin_ia32_cmppd128_mask, "UcV2dV2dIiUc", "", "avx512vl")			TARGET_BUILTIN(__builtin_ia32_cmppd128_mask, "UcV2dV2dIiUc", "", "avx512vl")

	TARGET_BUILTIN(__builtin_ia32_rndscaleps_mask, "V16fV16fIiV16fUsIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_rndscaleps_mask, "V16fV16fIiV16fUsIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_rndscalepd_mask, "V8dV8dIiV8dUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_rndscalepd_mask, "V8dV8dIiV8dUcIi", "", "avx512f")
				TARGET_BUILTIN(__builtin_ia32_floorps_mask, "V16fV16fV16fUs", "", "avx512f")
				craig.topperUnsubmitted Not Done Reply Inline Actions I'd prefer CGBuiltin to detect the specific immediates on the rndscale value. Primarily because we should be able to optimize _mm512_roundscale_pd when the ceil/floor immediate is used. craig.topper: I'd prefer CGBuiltin to detect the specific immediates on the rndscale value. Primarily because…
				TARGET_BUILTIN(__builtin_ia32_floorpd_mask, "V8dV8dV8dUc", "", "avx512f")
				TARGET_BUILTIN(__builtin_ia32_ceilps_mask, "V16fV16fV16fUs", "", "avx512f")
				TARGET_BUILTIN(__builtin_ia32_ceilpd_mask, "V8dV8dV8dUc", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_cvtps2dq512_mask, "V16iV16fV16iUsIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_cvtps2dq512_mask, "V16iV16fV16iUsIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2dq512_mask, "V8iV8dV8iUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_cvtpd2dq512_mask, "V8iV8dV8iUcIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_cvtps2udq512_mask, "V16iV16fV16iUsIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_cvtps2udq512_mask, "V16iV16fV16iUsIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_cvtpd2udq512_mask, "V8iV8dV8iUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_cvtpd2udq512_mask, "V8iV8dV8iUcIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_minps512_mask, "V16fV16fV16fV16fUsIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_minps512_mask, "V16fV16fV16fV16fUsIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_minpd512_mask, "V8dV8dV8dV8dUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_minpd512_mask, "V8dV8dV8dV8dUcIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_maxps512_mask, "V16fV16fV16fV16fUsIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_maxps512_mask, "V16fV16fV16fV16fUsIi", "", "avx512f")
	TARGET_BUILTIN(__builtin_ia32_maxpd512_mask, "V8dV8dV8dV8dUcIi", "", "avx512f")			TARGET_BUILTIN(__builtin_ia32_maxpd512_mask, "V8dV8dV8dV8dUcIi", "", "avx512f")
	▲ Show 20 Lines • Show All 961 Lines • Show Last 20 Lines

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,263 Lines • ▼ Show 20 Lines

static Value EmitX86SExtMask(CodeGenFunction &CGF, Value Op,		static Value EmitX86SExtMask(CodeGenFunction &CGF, Value Op,
llvm::Type *DstTy) {		llvm::Type *DstTy) {
unsigned NumberOfElements = DstTy->getVectorNumElements();		unsigned NumberOfElements = DstTy->getVectorNumElements();
Value *Mask = getMaskVecValue(CGF, Op, NumberOfElements);		Value *Mask = getMaskVecValue(CGF, Op, NumberOfElements);
return CGF.Builder.CreateSExt(Mask, DstTy, "vpmovm2");		return CGF.Builder.CreateSExt(Mask, DstTy, "vpmovm2");
}		}

		static Value EmitX86FloorCeil(CodeGenFunction &CGF, ArrayRef<Value > Ops,
		Intrinsic::ID ID) {
		assert((ID == Intrinsic::ceil \|\| ID == Intrinsic::floor) &&
		"Unexpected intrinsic ID");
		Value *F = CGF.CGM.getIntrinsic(ID, Ops[0]->getType());
		Value *Res = CGF.Builder.CreateCall(F, {Ops[0]});
		return EmitX86Select(CGF, Ops[2], Res, Ops[1]);
		}

Value CodeGenFunction::EmitX86CpuIs(const CallExpr E) {		Value CodeGenFunction::EmitX86CpuIs(const CallExpr E) {
const Expr *CPUExpr = E->getArg(0)->IgnoreParenCasts();		const Expr *CPUExpr = E->getArg(0)->IgnoreParenCasts();
StringRef CPUStr = cast<clang::StringLiteral>(CPUExpr)->getString();		StringRef CPUStr = cast<clang::StringLiteral>(CPUExpr)->getString();
return EmitX86CpuIs(CPUStr);		return EmitX86CpuIs(CPUStr);
}		}

Value *CodeGenFunction::EmitX86CpuIs(StringRef CPUStr) {		Value *CodeGenFunction::EmitX86CpuIs(StringRef CPUStr) {

▲ Show 20 Lines • Show All 614 Lines • ▼ Show 20 Lines	#undef INTRINSIC_X86_XSAVE_ID
case X86::BI__builtin_ia32_vplzcntq_256_mask:		case X86::BI__builtin_ia32_vplzcntq_256_mask:
case X86::BI__builtin_ia32_vplzcntq_512_mask: {		case X86::BI__builtin_ia32_vplzcntq_512_mask: {
Function *F = CGM.getIntrinsic(Intrinsic::ctlz, Ops[0]->getType());		Function *F = CGM.getIntrinsic(Intrinsic::ctlz, Ops[0]->getType());
return EmitX86Select(*this, Ops[2],		return EmitX86Select(*this, Ops[2],
Builder.CreateCall(F, {Ops[0],Builder.getInt1(false)}),		Builder.CreateCall(F, {Ops[0],Builder.getInt1(false)}),
Ops[1]);		Ops[1]);
}		}

		case X86::BI__builtin_ia32_floorps_128_mask:
		case X86::BI__builtin_ia32_floorpd_128_mask:
		case X86::BI__builtin_ia32_floorps_256_mask:
		case X86::BI__builtin_ia32_floorpd_256_mask:
		case X86::BI__builtin_ia32_floorps_mask:
		case X86::BI__builtin_ia32_floorpd_mask:
		return EmitX86FloorCeil(*this, Ops, Intrinsic::floor);
		case X86::BI__builtin_ia32_ceilps_128_mask:
		case X86::BI__builtin_ia32_ceilpd_128_mask:
		case X86::BI__builtin_ia32_ceilps_256_mask:
		case X86::BI__builtin_ia32_ceilpd_256_mask:
		case X86::BI__builtin_ia32_ceilps_mask:
		case X86::BI__builtin_ia32_ceilpd_mask:
		return EmitX86FloorCeil(*this, Ops, Intrinsic::ceil);

case X86::BI__builtin_ia32_pabsb128:		case X86::BI__builtin_ia32_pabsb128:
case X86::BI__builtin_ia32_pabsw128:		case X86::BI__builtin_ia32_pabsw128:
case X86::BI__builtin_ia32_pabsd128:		case X86::BI__builtin_ia32_pabsd128:
case X86::BI__builtin_ia32_pabsb256:		case X86::BI__builtin_ia32_pabsb256:
case X86::BI__builtin_ia32_pabsw256:		case X86::BI__builtin_ia32_pabsw256:
case X86::BI__builtin_ia32_pabsd256:		case X86::BI__builtin_ia32_pabsd256:
case X86::BI__builtin_ia32_pabsq128_mask:		case X86::BI__builtin_ia32_pabsq128_mask:
case X86::BI__builtin_ia32_pabsq256_mask:		case X86::BI__builtin_ia32_pabsq256_mask:
▲ Show 20 Lines • Show All 2,059 Lines • Show Last 20 Lines

lib/Headers/avx512fintrin.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,880 Lines • ▼ Show 20 Lines	return (__m128d) __builtin_ia32_rcp14sd_mask ( (__v2df) __A,
(__v2df) __B,		(__v2df) __B,
(__v2df) _mm_setzero_pd (),		(__v2df) _mm_setzero_pd (),
(__mmask8) __U);		(__mmask8) __U);
}		}

static __inline __m512 __DEFAULT_FN_ATTRS		static __inline __m512 __DEFAULT_FN_ATTRS
_mm512_floor_ps(__m512 __A)		_mm512_floor_ps(__m512 __A)
{		{
return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,		return (__m512)__builtin_ia32_floorps_mask((__v16sf)__A, (__v16sf)__A, -1);
_MM_FROUND_FLOOR,
(__v16sf) __A, -1,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline__ __m512 __DEFAULT_FN_ATTRS		static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)		_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
{		{
return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,		return (__m512)__builtin_ia32_floorps_mask((__v16sf)__A, (__v16sf)__W, __U);
_MM_FROUND_FLOOR,
(__v16sf) __W, __U,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline __m512d __DEFAULT_FN_ATTRS		static __inline __m512d __DEFAULT_FN_ATTRS
_mm512_floor_pd(__m512d __A)		_mm512_floor_pd(__m512d __A)
{		{
return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,		return (__m512d)__builtin_ia32_floorpd_mask((__v8df)__A, (__v8df)__A, -1);
_MM_FROUND_FLOOR,
(__v8df) __A, -1,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline__ __m512d __DEFAULT_FN_ATTRS		static __inline__ __m512d __DEFAULT_FN_ATTRS
_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)		_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
{		{
return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,		return (__m512d)__builtin_ia32_floorpd_mask((__v8df)__A, (__v8df)__W, __U);
_MM_FROUND_FLOOR,
(__v8df) __W, __U,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline__ __m512 __DEFAULT_FN_ATTRS		static __inline__ __m512 __DEFAULT_FN_ATTRS
_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)		_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
{		{
return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,		return (__m512)__builtin_ia32_ceilps_mask((__v16sf)__A, (__v16sf)__W, __U);
_MM_FROUND_CEIL,
(__v16sf) __W, __U,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline __m512 __DEFAULT_FN_ATTRS		static __inline __m512 __DEFAULT_FN_ATTRS
_mm512_ceil_ps(__m512 __A)		_mm512_ceil_ps(__m512 __A)
{		{
return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A,		return (__m512)__builtin_ia32_ceilps_mask((__v16sf)__A, (__v16sf)__A, -1);
_MM_FROUND_CEIL,
(__v16sf) __A, -1,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline __m512d __DEFAULT_FN_ATTRS		static __inline __m512d __DEFAULT_FN_ATTRS
_mm512_ceil_pd(__m512d __A)		_mm512_ceil_pd(__m512d __A)
{		{
return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,		return (__m512d)__builtin_ia32_ceilpd_mask((__v8df)__A, (__v8df)__A, -1);
_MM_FROUND_CEIL,
(__v8df) __A, -1,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline__ __m512d __DEFAULT_FN_ATTRS		static __inline__ __m512d __DEFAULT_FN_ATTRS
_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)		_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
{		{
return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A,		return (__m512d)__builtin_ia32_ceilpd_mask((__v8df)__A, (__v8df)__W, __U);
_MM_FROUND_CEIL,
(__v8df) __W, __U,
_MM_FROUND_CUR_DIRECTION);
}		}

static __inline __m512i __DEFAULT_FN_ATTRS		static __inline __m512i __DEFAULT_FN_ATTRS
_mm512_abs_epi64(__m512i __A)		_mm512_abs_epi64(__m512i __A)
{		{
return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,		return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A,
(__v8di)		(__v8di)
_mm512_setzero_si512 (),		_mm512_setzero_si512 (),
▲ Show 20 Lines • Show All 8,270 Lines • Show Last 20 Lines

lib/Headers/avxintrin.h

	Show First 20 Lines • Show All 452 Lines • ▼ Show 20 Lines
	/// __m256d _mm256_ceil_pd(__m256d V);			/// __m256d _mm256_ceil_pd(__m256d V);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPD </c> instruction.
	///			///
	/// \param V			/// \param V
	/// A 256-bit vector of [4 x double].			/// A 256-bit vector of [4 x double].
	/// \returns A 256-bit vector of [4 x double] containing the rounded up values.			/// \returns A 256-bit vector of [4 x double] containing the rounded up values.
	#define _mm256_ceil_pd(V) _mm256_round_pd((V), _MM_FROUND_CEIL)			#define _mm256_ceil_pd(V) \
				__extension__({ \
				(__m256) __builtin_ia32_ceilpd_256_mask((__v4df)(__m256)(V), \
				(__v4df)(__m256)(V), -1); \
				})

	/// \brief Rounds down the values stored in a 256-bit vector of [4 x double].			/// \brief Rounds down the values stored in a 256-bit vector of [4 x double].
	/// The source values are rounded down to integer values and returned as			/// The source values are rounded down to integer values and returned as
	/// 64-bit double-precision floating-point values.			/// 64-bit double-precision floating-point values.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m256d _mm256_floor_pd(__m256d V);			/// __m256d _mm256_floor_pd(__m256d V);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPD </c> instruction.
	///			///
	/// \param V			/// \param V
	/// A 256-bit vector of [4 x double].			/// A 256-bit vector of [4 x double].
	/// \returns A 256-bit vector of [4 x double] containing the rounded down			/// \returns A 256-bit vector of [4 x double] containing the rounded down
	/// values.			/// values.
	#define _mm256_floor_pd(V) _mm256_round_pd((V), _MM_FROUND_FLOOR)			#define _mm256_floor_pd(V) \
				__extension__({ \
				(__m256) __builtin_ia32_floorpd_256_mask((__v4df)(__m256)(V), \
				(__v4df)(__m256)(V), -1); \
				})

	/// \brief Rounds up the values stored in a 256-bit vector of [8 x float]. The			/// \brief Rounds up the values stored in a 256-bit vector of [8 x float]. The
	/// source values are rounded up to integer values and returned as			/// source values are rounded up to integer values and returned as
	/// floating-point values.			/// floating-point values.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m256 _mm256_ceil_ps(__m256 V);			/// __m256 _mm256_ceil_ps(__m256 V);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPS </c> instruction.
	///			///
	/// \param V			/// \param V
	/// A 256-bit vector of [8 x float].			/// A 256-bit vector of [8 x float].
	/// \returns A 256-bit vector of [8 x float] containing the rounded up values.			/// \returns A 256-bit vector of [8 x float] containing the rounded up values.
	#define _mm256_ceil_ps(V) _mm256_round_ps((V), _MM_FROUND_CEIL)			#define _mm256_ceil_ps(V) \
				__extension__({ \
				(__m256) __builtin_ia32_ceilps_256_mask((__v8sf)(__m256)(V), \
				(__v8sf)(__m256)(V), -1); \
				})

	/// \brief Rounds down the values stored in a 256-bit vector of [8 x float]. The			/// \brief Rounds down the values stored in a 256-bit vector of [8 x float]. The
	/// source values are rounded down to integer values and returned as			/// source values are rounded down to integer values and returned as
	/// floating-point values.			/// floating-point values.
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m256 _mm256_floor_ps(__m256 V);			/// __m256 _mm256_floor_ps(__m256 V);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPS </c> instruction.
	///			///
	/// \param V			/// \param V
	/// A 256-bit vector of [8 x float].			/// A 256-bit vector of [8 x float].
	/// \returns A 256-bit vector of [8 x float] containing the rounded down values.			/// \returns A 256-bit vector of [8 x float] containing the rounded down values.
	#define _mm256_floor_ps(V) _mm256_round_ps((V), _MM_FROUND_FLOOR)			#define _mm256_floor_ps(V) \
				__extension__({ \
				(__m256) __builtin_ia32_floorps_256_mask((__v4df)(__m256)(V), \
				(__v4df)(__m256)(V), -1); \
				})

	/* Logical */			/* Logical */
	/// \brief Performs a bitwise AND of two 256-bit vectors of [4 x double].			/// \brief Performs a bitwise AND of two 256-bit vectors of [4 x double].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// This intrinsic corresponds to the <c> VANDPD </c> instruction.			/// This intrinsic corresponds to the <c> VANDPD </c> instruction.
	///			///
	▲ Show 20 Lines • Show All 4,640 Lines • Show Last 20 Lines

lib/Headers/smmintrin.h

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	/// __m128 _mm_ceil_ps(__m128 X);			/// __m128 _mm_ceil_ps(__m128 X);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPS / ROUNDPS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPS / ROUNDPS </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [4 x float] values to be rounded up.			/// A 128-bit vector of [4 x float] values to be rounded up.
	/// \returns A 128-bit vector of [4 x float] containing the rounded values.			/// \returns A 128-bit vector of [4 x float] containing the rounded values.
	#define _mm_ceil_ps(X) _mm_round_ps((X), _MM_FROUND_CEIL)			#define _mm_ceil_ps(X) \
				__extension__({ \
				(__m128) __builtin_ia32_ceilps_128_mask((__v4sf)(__m128)(X), \
				(__v4sf)(__m128)(X), -1); \
				})

	/// \brief Rounds up each element of the 128-bit vector of [2 x double] to an			/// \brief Rounds up each element of the 128-bit vector of [2 x double] to an
	/// integer and returns the rounded values in a 128-bit vector of			/// integer and returns the rounded values in a 128-bit vector of
	/// [2 x double].			/// [2 x double].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128d _mm_ceil_pd(__m128d X);			/// __m128d _mm_ceil_pd(__m128d X);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPD / ROUNDPD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPD / ROUNDPD </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [2 x double] values to be rounded up.			/// A 128-bit vector of [2 x double] values to be rounded up.
	/// \returns A 128-bit vector of [2 x double] containing the rounded values.			/// \returns A 128-bit vector of [2 x double] containing the rounded values.
	#define _mm_ceil_pd(X) _mm_round_pd((X), _MM_FROUND_CEIL)			#define _mm_ceil_pd(X) \
				__extension__({ \
				(__m128) __builtin_ia32_ceilpd_128_mask((__v2df)(__m128d)(X), \
				(__v2df)(__m128d)(X), -1); \
				})

	/// \brief Copies three upper elements of the first 128-bit vector operand to			/// \brief Copies three upper elements of the first 128-bit vector operand to
	/// the corresponding three upper elements of the 128-bit result vector of			/// the corresponding three upper elements of the 128-bit result vector of
	/// [4 x float]. Rounds up the lowest element of the second 128-bit vector			/// [4 x float]. Rounds up the lowest element of the second 128-bit vector
	/// operand to an integer and copies it to the lowest element of the 128-bit			/// operand to an integer and copies it to the lowest element of the 128-bit
	/// result vector of [4 x float].			/// result vector of [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128 _mm_ceil_ss(__m128 X, __m128 Y);			/// __m128 _mm_ceil_ss(__m128 X, __m128 Y);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDSS / ROUNDSS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDSS / ROUNDSS </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [4 x float]. The values stored in bits [127:32] are			/// A 128-bit vector of [4 x float]. The values stored in bits [127:32] are
	/// copied to the corresponding bits of the result.			/// copied to the corresponding bits of the result.
	/// \param Y			/// \param Y
	/// A 128-bit vector of [4 x float]. The value stored in bits [31:0] is			/// A 128-bit vector of [4 x float]. The value stored in bits [31:0] is
	/// rounded up to the nearest integer and copied to the corresponding bits			/// rounded up to the nearest integer and copied to the corresponding bits
	/// of the result.			/// of the result.
	/// \returns A 128-bit vector of [4 x float] containing the copied and rounded			/// \returns A 128-bit vector of [4 x float] containing the copied and rounded
	/// values.			/// values.
	#define _mm_ceil_ss(X, Y) _mm_round_ss((X), (Y), _MM_FROUND_CEIL)			#define _mm_ceil_ss(X, Y) \
				__extension__({ \
				(__m128) __builtin_ia32_ceilps_128_mask((__v4sf)(__m128)(Y), \
				(__v4sf)(__m128)(X), 1); \
				})

	/// \brief Copies the upper element of the first 128-bit vector operand to the			/// \brief Copies the upper element of the first 128-bit vector operand to the
	/// corresponding upper element of the 128-bit result vector of [2 x double].			/// corresponding upper element of the 128-bit result vector of [2 x double].
	/// Rounds up the lower element of the second 128-bit vector operand to an			/// Rounds up the lower element of the second 128-bit vector operand to an
	/// integer and copies it to the lower element of the 128-bit result vector			/// integer and copies it to the lower element of the 128-bit result vector
	/// of [2 x double].			/// of [2 x double].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128d _mm_ceil_sd(__m128d X, __m128d Y);			/// __m128d _mm_ceil_sd(__m128d X, __m128d Y);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDSD / ROUNDSD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDSD / ROUNDSD </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [2 x double]. The value stored in bits [127:64] is			/// A 128-bit vector of [2 x double]. The value stored in bits [127:64] is
	/// copied to the corresponding bits of the result.			/// copied to the corresponding bits of the result.
	/// \param Y			/// \param Y
	/// A 128-bit vector of [2 x double]. The value stored in bits [63:0] is			/// A 128-bit vector of [2 x double]. The value stored in bits [63:0] is
	/// rounded up to the nearest integer and copied to the corresponding bits			/// rounded up to the nearest integer and copied to the corresponding bits
	/// of the result.			/// of the result.
	/// \returns A 128-bit vector of [2 x double] containing the copied and rounded			/// \returns A 128-bit vector of [2 x double] containing the copied and rounded
	/// values.			/// values.
	#define _mm_ceil_sd(X, Y) _mm_round_sd((X), (Y), _MM_FROUND_CEIL)			#define _mm_ceil_sd(X, Y) \
				__extension__({ \
				(__m128) __builtin_ia32_ceilpd_128_mask((__v2df)(__m128d)(Y), \
				(__v2df)(__m128d)(X), 1); \
				})

	/// \brief Rounds down each element of the 128-bit vector of [4 x float] to an			/// \brief Rounds down each element of the 128-bit vector of [4 x float] to an
	/// an integer and returns the rounded values in a 128-bit vector of			/// an integer and returns the rounded values in a 128-bit vector of
	/// [4 x float].			/// [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128 _mm_floor_ps(__m128 X);			/// __m128 _mm_floor_ps(__m128 X);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPS / ROUNDPS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPS / ROUNDPS </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [4 x float] values to be rounded down.			/// A 128-bit vector of [4 x float] values to be rounded down.
	/// \returns A 128-bit vector of [4 x float] containing the rounded values.			/// \returns A 128-bit vector of [4 x float] containing the rounded values.
	#define _mm_floor_ps(X) _mm_round_ps((X), _MM_FROUND_FLOOR)			#define _mm_floor_ps(X) \
				__extension__({ \
				(__m128) __builtin_ia32_floorps_128_mask((__v4sf)(__m128)(X), \
				(__v4sf)(__m128)(X), -1); \
				})

	/// \brief Rounds down each element of the 128-bit vector of [2 x double] to an			/// \brief Rounds down each element of the 128-bit vector of [2 x double] to an
	/// integer and returns the rounded values in a 128-bit vector of			/// integer and returns the rounded values in a 128-bit vector of
	/// [2 x double].			/// [2 x double].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128d _mm_floor_pd(__m128d X);			/// __m128d _mm_floor_pd(__m128d X);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDPD / ROUNDPD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDPD / ROUNDPD </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [2 x double].			/// A 128-bit vector of [2 x double].
	/// \returns A 128-bit vector of [2 x double] containing the rounded values.			/// \returns A 128-bit vector of [2 x double] containing the rounded values.
	#define _mm_floor_pd(X) _mm_round_pd((X), _MM_FROUND_FLOOR)			#define _mm_floor_pd(X) \
				__extension__({ \
				(__m128) __builtin_ia32_floorpd_128_mask((__v2df)(__m128d)(X), \
				(__v2df)(__m128d)(X), -1); \
				})

	/// \brief Copies three upper elements of the first 128-bit vector operand to			/// \brief Copies three upper elements of the first 128-bit vector operand to
	/// the corresponding three upper elements of the 128-bit result vector of			/// the corresponding three upper elements of the 128-bit result vector of
	/// [4 x float]. Rounds down the lowest element of the second 128-bit vector			/// [4 x float]. Rounds down the lowest element of the second 128-bit vector
	/// operand to an integer and copies it to the lowest element of the 128-bit			/// operand to an integer and copies it to the lowest element of the 128-bit
	/// result vector of [4 x float].			/// result vector of [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128 _mm_floor_ss(__m128 X, __m128 Y);			/// __m128 _mm_floor_ss(__m128 X, __m128 Y);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDSS / ROUNDSS </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDSS / ROUNDSS </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [4 x float]. The values stored in bits [127:32] are			/// A 128-bit vector of [4 x float]. The values stored in bits [127:32] are
	/// copied to the corresponding bits of the result.			/// copied to the corresponding bits of the result.
	/// \param Y			/// \param Y
	/// A 128-bit vector of [4 x float]. The value stored in bits [31:0] is			/// A 128-bit vector of [4 x float]. The value stored in bits [31:0] is
	/// rounded down to the nearest integer and copied to the corresponding bits			/// rounded down to the nearest integer and copied to the corresponding bits
	/// of the result.			/// of the result.
	/// \returns A 128-bit vector of [4 x float] containing the copied and rounded			/// \returns A 128-bit vector of [4 x float] containing the copied and rounded
	/// values.			/// values.
	#define _mm_floor_ss(X, Y) _mm_round_ss((X), (Y), _MM_FROUND_FLOOR)			#define _mm_floor_ss(X, Y) \
				__extension__({ \
				(__m128) __builtin_ia32_floorps_128_mask((__v4sf)(__m128)(Y), \
				(__v4sf)(__m128)(X), 1); \
				})

	/// \brief Copies the upper element of the first 128-bit vector operand to the			/// \brief Copies the upper element of the first 128-bit vector operand to the
	/// corresponding upper element of the 128-bit result vector of [2 x double].			/// corresponding upper element of the 128-bit result vector of [2 x double].
	/// Rounds down the lower element of the second 128-bit vector operand to an			/// Rounds down the lower element of the second 128-bit vector operand to an
	/// integer and copies it to the lower element of the 128-bit result vector			/// integer and copies it to the lower element of the 128-bit result vector
	/// of [2 x double].			/// of [2 x double].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	/// \code			/// \code
	/// __m128d _mm_floor_sd(__m128d X, __m128d Y);			/// __m128d _mm_floor_sd(__m128d X, __m128d Y);
	/// \endcode			/// \endcode
	///			///
	/// This intrinsic corresponds to the <c> VROUNDSD / ROUNDSD </c> instruction.			/// This intrinsic corresponds to the <c> VROUNDSD / ROUNDSD </c> instruction.
	///			///
	/// \param X			/// \param X
	/// A 128-bit vector of [2 x double]. The value stored in bits [127:64] is			/// A 128-bit vector of [2 x double]. The value stored in bits [127:64] is
	/// copied to the corresponding bits of the result.			/// copied to the corresponding bits of the result.
	/// \param Y			/// \param Y
	/// A 128-bit vector of [2 x double]. The value stored in bits [63:0] is			/// A 128-bit vector of [2 x double]. The value stored in bits [63:0] is
	/// rounded down to the nearest integer and copied to the corresponding bits			/// rounded down to the nearest integer and copied to the corresponding bits
	/// of the result.			/// of the result.
	/// \returns A 128-bit vector of [2 x double] containing the copied and rounded			/// \returns A 128-bit vector of [2 x double] containing the copied and rounded
	/// values.			/// values.
	#define _mm_floor_sd(X, Y) _mm_round_sd((X), (Y), _MM_FROUND_FLOOR)			#define _mm_floor_sd(X, Y) \
				__extension__({ \
				(__m128) __builtin_ia32_floorpd_128_mask((__v2df)(__m128d)(Y), \
				(__v2df)(__m128d)(X), 1); \
				})

	/// \brief Rounds each element of the 128-bit vector of [4 x float] to an			/// \brief Rounds each element of the 128-bit vector of [4 x float] to an
	/// integer value according to the rounding control specified by the second			/// integer value according to the rounding control specified by the second
	/// argument and returns the rounded values in a 128-bit vector of			/// argument and returns the rounded values in a 128-bit vector of
	/// [4 x float].			/// [4 x float].
	///			///
	/// \headerfile <x86intrin.h>			/// \headerfile <x86intrin.h>
	///			///
	▲ Show 20 Lines • Show All 2,242 Lines • Show Last 20 Lines

test/CodeGen/avx-builtins.c

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	__m128i test_mm256_castsi256_si128(__m256i A) {			__m128i test_mm256_castsi256_si128(__m256i A) {
	// CHECK-LABEL: test_mm256_castsi256_si128			// CHECK-LABEL: test_mm256_castsi256_si128
	// CHECK: shufflevector <4 x i64> %{{.}}, <4 x i64> %{{.}}, <2 x i32> <i32 0, i32 1>			// CHECK: shufflevector <4 x i64> %{{.}}, <4 x i64> %{{.}}, <2 x i32> <i32 0, i32 1>
	return _mm256_castsi256_si128(A);			return _mm256_castsi256_si128(A);
	}			}

	__m256d test_mm256_ceil_pd(__m256d x) {			__m256d test_mm256_ceil_pd(__m256d x) {
	// CHECK-LABEL: test_mm256_ceil_pd			// CHECK-LABEL: test_mm256_ceil_pd
	// CHECK: call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %{{.*}}, i32 2)			// CHECK: @llvm.ceil.v4f64
				// CHECK-NOT: select
	return _mm256_ceil_pd(x);			return _mm256_ceil_pd(x);
	}			}

	__m256 test_mm_ceil_ps(__m256 x) {			__m256 test_mm_ceil_ps(__m256 x) {
	// CHECK-LABEL: test_mm_ceil_ps			// CHECK-LABEL: test_mm_ceil_ps
	// CHECK: call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %{{.*}}, i32 2)			// CHECK: @llvm.ceil.v8f32
				// CHECK-NOT: select
	return _mm256_ceil_ps(x);			return _mm256_ceil_ps(x);
	}			}

	__m128d test_mm_cmp_pd(__m128d A, __m128d B) {			__m128d test_mm_cmp_pd(__m128d A, __m128d B) {
	// CHECK-LABEL: test_mm_cmp_pd			// CHECK-LABEL: test_mm_cmp_pd
	// CHECK: call <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double> %{{.}}, <2 x double> %{{.}}, i8 13)			// CHECK: call <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double> %{{.}}, <2 x double> %{{.}}, i8 13)
	return _mm_cmp_pd(A, B, _CMP_GE_OS);			return _mm_cmp_pd(A, B, _CMP_GE_OS);
	}			}
	▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	__m128i test_mm256_extractf128_si256(__m256i A) {			__m128i test_mm256_extractf128_si256(__m256i A) {
	// CHECK-LABEL: test_mm256_extractf128_si256			// CHECK-LABEL: test_mm256_extractf128_si256
	// CHECK: shufflevector <4 x i64> %{{.*}}, <4 x i64> zeroinitializer, <2 x i32> <i32 2, i32 3>			// CHECK: shufflevector <4 x i64> %{{.*}}, <4 x i64> zeroinitializer, <2 x i32> <i32 2, i32 3>
	return _mm256_extractf128_si256(A, 1);			return _mm256_extractf128_si256(A, 1);
	}			}

	__m256d test_mm256_floor_pd(__m256d x) {			__m256d test_mm256_floor_pd(__m256d x) {
	// CHECK-LABEL: test_mm256_floor_pd			// CHECK-LABEL: test_mm256_floor_pd
	// CHECK: call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %{{.*}}, i32 1)			// CHECK: @llvm.floor.v4f64
				// CHECK-NOT: select
	return _mm256_floor_pd(x);			return _mm256_floor_pd(x);
	}			}

	__m256 test_mm_floor_ps(__m256 x) {			__m256 test_mm_floor_ps(__m256 x) {
	// CHECK-LABEL: test_mm_floor_ps			// CHECK-LABEL: test_mm_floor_ps
	// CHECK: call <8 x float> @llvm.x86.avx.round.ps.256(<8 x float> %{{.*}}, i32 1)			// CHECK: @llvm.floor.v8f32
				// CHECK-NOT: select
	return _mm256_floor_ps(x);			return _mm256_floor_ps(x);
	}			}

	__m256d test_mm256_hadd_pd(__m256d A, __m256d B) {			__m256d test_mm256_hadd_pd(__m256d A, __m256d B) {
	// CHECK-LABEL: test_mm256_hadd_pd			// CHECK-LABEL: test_mm256_hadd_pd
	// CHECK: call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %{{.}}, <4 x double> %{{.}})			// CHECK: call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %{{.}}, <4 x double> %{{.}})
	return _mm256_hadd_pd(A, B);			return _mm256_hadd_pd(A, B);
	}			}
	▲ Show 20 Lines • Show All 1,096 Lines • Show Last 20 Lines

test/CodeGen/avx512f-builtins.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,479 Lines • ▼ Show 20 Lines

	__m512 test_mm512_min_round_ps(__m512 __A,__m512 __B)			__m512 test_mm512_min_round_ps(__m512 __A,__m512 __B)
	{			{
	// CHECK-LABEL: @test_mm512_min_round_ps			// CHECK-LABEL: @test_mm512_min_round_ps
	// CHECK: @llvm.x86.avx512.mask.min.ps.512			// CHECK: @llvm.x86.avx512.mask.min.ps.512
	return _mm512_min_round_ps(__A,__B,_MM_FROUND_CUR_DIRECTION);			return _mm512_min_round_ps(__A,__B,_MM_FROUND_CUR_DIRECTION);
	}			}

				__m512 test_mm512_floor_ps(__m512 __A)
				{
				// CHECK-LABEL: @test_mm512_floor_ps
				// CHECK: @llvm.floor.v16f32
				// CHECK-NOT: select
				return _mm512_floor_ps(__A);
				}

				__m512d test_mm512_floor_pd(__m512d __A)
				{
				// CHECK-LABEL: @test_mm512_floor_pd
				// CHECK: @llvm.floor.v8f64
				// CHECK-NOT: select
				return _mm512_floor_pd(__A);
				}

	__m512 test_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)			__m512 test_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A)
	{			{
	// CHECK-LABEL: @test_mm512_mask_floor_ps			// CHECK-LABEL: @test_mm512_mask_floor_ps
	// CHECK: @llvm.x86.avx512.mask.rndscale.ps.512			// CHECK: @llvm.floor.v16f32
				// CHECK: select <16 x i1> %{{.}}, <16 x float> %{{.}}, <16 x float> %{{.*}}
	return _mm512_mask_floor_ps (__W,__U,__A);			return _mm512_mask_floor_ps (__W,__U,__A);
	}			}

	__m512d test_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)			__m512d test_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A)
	{			{
	// CHECK-LABEL: @test_mm512_mask_floor_pd			// CHECK-LABEL: @test_mm512_mask_floor_pd
	// CHECK: @llvm.x86.avx512.mask.rndscale.pd.512			// CHECK: @llvm.floor.v8f64
				// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}
	return _mm512_mask_floor_pd (__W,__U,__A);			return _mm512_mask_floor_pd (__W,__U,__A);
	}			}

				__m512 test_mm512_ceil_ps(__m512 __A)
				{
				// CHECK-LABEL: @test_mm512_ceil_ps
				// CHECK: @llvm.ceil.v16f32
				// CHECK-NOT: select
				return _mm512_ceil_ps(__A);
				}

				__m512d test_mm512_ceil_pd(__m512d __A)
				{
				// CHECK-LABEL: @test_mm512_ceil_pd
				// CHECK: @llvm.ceil.v8f64
				// CHECK-NOT: select
				return _mm512_ceil_pd(__A);
				}

	__m512 test_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)			__m512 test_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A)
	{			{
	// CHECK-LABEL: @test_mm512_mask_ceil_ps			// CHECK-LABEL: @test_mm512_mask_ceil_ps
	// CHECK: @llvm.x86.avx512.mask.rndscale.ps.512			// CHECK: @llvm.ceil.v16f32
				// CHECK: select <16 x i1> %{{.}}, <16 x float> %{{.}}, <16 x float> %{{.*}}
	return _mm512_mask_ceil_ps (__W,__U,__A);			return _mm512_mask_ceil_ps (__W,__U,__A);
	}			}

	__m512d test_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)			__m512d test_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A)
	{			{
	// CHECK-LABEL: @test_mm512_mask_ceil_pd			// CHECK-LABEL: @test_mm512_mask_ceil_pd
	// CHECK: @llvm.x86.avx512.mask.rndscale.pd.512			// CHECK: @llvm.ceil.v8f64
				// CHECK: select <8 x i1> %{{.}}, <8 x double> %{{.}}, <8 x double> %{{.*}}
	return _mm512_mask_ceil_pd (__W,__U,__A);			return _mm512_mask_ceil_pd (__W,__U,__A);
	}			}

	__m512 test_mm512_mask_roundscale_ps(__m512 __W, __mmask16 __U, __m512 __A)			__m512 test_mm512_mask_roundscale_ps(__m512 __W, __mmask16 __U, __m512 __A)
	{			{
	// CHECK-LABEL: @test_mm512_mask_roundscale_ps			// CHECK-LABEL: @test_mm512_mask_roundscale_ps
	// CHECK: @llvm.x86.avx512.mask.rndscale.ps.512			// CHECK: @llvm.x86.avx512.mask.rndscale.ps.512
	return _mm512_mask_roundscale_ps(__W,__U,__A, 1);			return _mm512_mask_roundscale_ps(__W,__U,__A, 1);
	▲ Show 20 Lines • Show All 1,017 Lines • Show Last 20 Lines

test/CodeGen/sse41-builtins.c

	Show All 38 Lines
	__m128 test_mm_blendv_ps(__m128 V1, __m128 V2, __m128 V3) {			__m128 test_mm_blendv_ps(__m128 V1, __m128 V2, __m128 V3) {
	// CHECK-LABEL: test_mm_blendv_ps			// CHECK-LABEL: test_mm_blendv_ps
	// CHECK: call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})			// CHECK: call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %{{.}}, <4 x float> %{{.}}, <4 x float> %{{.*}})
	return _mm_blendv_ps(V1, V2, V3);			return _mm_blendv_ps(V1, V2, V3);
	}			}

	__m128d test_mm_ceil_pd(__m128d x) {			__m128d test_mm_ceil_pd(__m128d x) {
	// CHECK-LABEL: test_mm_ceil_pd			// CHECK-LABEL: test_mm_ceil_pd
	// CHECK: call <2 x double> @llvm.x86.sse41.round.pd(<2 x double> %{{.*}}, i32 2)			// CHECK: @llvm.ceil.v2f64
				// CHECK-NOT: select
	return _mm_ceil_pd(x);			return _mm_ceil_pd(x);
	}			}

	__m128 test_mm_ceil_ps(__m128 x) {			__m128 test_mm_ceil_ps(__m128 x) {
	// CHECK-LABEL: test_mm_ceil_ps			// CHECK-LABEL: test_mm_ceil_ps
	// CHECK: call <4 x float> @llvm.x86.sse41.round.ps(<4 x float> %{{.*}}, i32 2)			// CHECK: @llvm.ceil.v4f32
				// CHECK-NOT: select
	return _mm_ceil_ps(x);			return _mm_ceil_ps(x);
	}			}

	__m128d test_mm_ceil_sd(__m128d x, __m128d y) {			__m128d test_mm_ceil_sd(__m128d x, __m128d y) {
	// CHECK-LABEL: test_mm_ceil_sd			// CHECK-LABEL: test_mm_ceil_sd
	// CHECK: call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %{{.}}, <2 x double> %{{.}}, i32 2)			// CHECK: @llvm.ceil.v2f64
				// CHECK: select
	return _mm_ceil_sd(x, y);			return _mm_ceil_sd(x, y);
	}			}

	__m128 test_mm_ceil_ss(__m128 x, __m128 y) {			__m128 test_mm_ceil_ss(__m128 x, __m128 y) {
	// CHECK-LABEL: test_mm_ceil_ss			// CHECK-LABEL: test_mm_ceil_ss
	// CHECK: call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i32 2)			// CHECK: @llvm.ceil.v4f32
				// CHECK: select
	return _mm_ceil_ss(x, y);			return _mm_ceil_ss(x, y);
	}			}

	__m128i test_mm_cmpeq_epi64(__m128i A, __m128i B) {			__m128i test_mm_cmpeq_epi64(__m128i A, __m128i B) {
	// CHECK-LABEL: test_mm_cmpeq_epi64			// CHECK-LABEL: test_mm_cmpeq_epi64
	// CHECK: icmp eq <2 x i64>			// CHECK: icmp eq <2 x i64>
	// CHECK: sext <2 x i1> %{{.*}} to <2 x i64>			// CHECK: sext <2 x i1> %{{.*}} to <2 x i64>
	return _mm_cmpeq_epi64(A, B);			return _mm_cmpeq_epi64(A, B);
	▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	int test_mm_extract_ps(__m128 x) {			int test_mm_extract_ps(__m128 x) {
	// CHECK-LABEL: test_mm_extract_ps			// CHECK-LABEL: test_mm_extract_ps
	// CHECK: extractelement <4 x float> %{{.*}}, i32 1			// CHECK: extractelement <4 x float> %{{.*}}, i32 1
	return _mm_extract_ps(x, 1);			return _mm_extract_ps(x, 1);
	}			}

	__m128d test_mm_floor_pd(__m128d x) {			__m128d test_mm_floor_pd(__m128d x) {
	// CHECK-LABEL: test_mm_floor_pd			// CHECK-LABEL: test_mm_floor_pd
	// CHECK: call <2 x double> @llvm.x86.sse41.round.pd(<2 x double> %{{.*}}, i32 1)			// CHECK: @llvm.floor.v2f64
				// CHECK-NOT: select
	return _mm_floor_pd(x);			return _mm_floor_pd(x);
	}			}

	__m128 test_mm_floor_ps(__m128 x) {			__m128 test_mm_floor_ps(__m128 x) {
	// CHECK-LABEL: test_mm_floor_ps			// CHECK-LABEL: test_mm_floor_ps
	// CHECK: call <4 x float> @llvm.x86.sse41.round.ps(<4 x float> %{{.*}}, i32 1)			// CHECK: @llvm.floor.v4f32
				// CHECK-NOT: select
	return _mm_floor_ps(x);			return _mm_floor_ps(x);
	}			}

	__m128d test_mm_floor_sd(__m128d x, __m128d y) {			__m128d test_mm_floor_sd(__m128d x, __m128d y) {
	// CHECK-LABEL: test_mm_floor_sd			// CHECK-LABEL: test_mm_floor_sd
	// CHECK: call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %{{.}}, <2 x double> %{{.}}, i32 1)			// CHECK: @llvm.floor.v2f64
				// CHECK: select
	return _mm_floor_sd(x, y);			return _mm_floor_sd(x, y);
	}			}

	__m128 test_mm_floor_ss(__m128 x, __m128 y) {			__m128 test_mm_floor_ss(__m128 x, __m128 y) {
	// CHECK-LABEL: test_mm_floor_ss			// CHECK-LABEL: test_mm_floor_ss
	// CHECK: call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %{{.}}, <4 x float> %{{.}}, i32 1)			// CHECK: @llvm.floor.v4f32
				// CHECK: select
	return _mm_floor_ss(x, y);			return _mm_floor_ss(x, y);
	}			}

	__m128i test_mm_insert_epi8(__m128i x, char b) {			__m128i test_mm_insert_epi8(__m128i x, char b) {
	// CHECK-LABEL: test_mm_insert_epi8			// CHECK-LABEL: test_mm_insert_epi8
	// CHECK: insertelement <16 x i8> %{{.}}, i8 %{{.}}, i32 0			// CHECK: insertelement <16 x i8> %{{.}}, i8 %{{.}}, i32 0
	return _mm_insert_epi8(x, b, 16);			return _mm_insert_epi8(x, b, 16);
	}			}
	▲ Show 20 Lines • Show All 170 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Replacing X86-specific floor and ceil vector intrinsics with generic LLVM intrinsicsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 140745

include/clang/Basic/BuiltinsX86.def

lib/CodeGen/CGBuiltin.cpp

lib/Headers/avx512fintrin.h

lib/Headers/avxintrin.h

lib/Headers/smmintrin.h

test/CodeGen/avx-builtins.c

test/CodeGen/avx512f-builtins.c

test/CodeGen/sse41-builtins.c

[X86] Replacing X86-specific floor and ceil vector intrinsics with generic LLVM intrinsics
AbandonedPublic