This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGBuiltin.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-builtins.c
-
avx2-builtins.c
-
avx512f-builtins.c
-
sse-builtins.c
-
sse2-builtins.c
-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
SelectionDAGNodes.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-intrinsics-fast-isel.ll
-
sse-intrinsics-fast-isel.ll
-
sse2-intrinsics-fast-isel.ll

Differential D104790

[x86] fix mm_undefined intrinsics to use arbitrary frozen bit pattern
Needs ReviewPublic

Authored by aqjune on Jun 23 2021, 9:03 AM.

Download Raw Diff

Details

Reviewers

efriedma
spatel
craig.topper
RKSimon

Summary

This fixes lowering of mm*_undefined* intrinsics to use freeze poison instead of zeroinitializer.
(mentioned & discussed in D103874)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune created this revision.Jun 23 2021, 9:03 AM

Herald added a subscriber: pengfei. · View Herald TranscriptJun 23 2021, 9:03 AM

aqjune requested review of this revision.Jun 23 2021, 9:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 9:03 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

Harbormaster completed remote builds in B110653: Diff 353999.Jun 23 2021, 10:03 AM

In D104790#2836253, @aqjune wrote:

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

There are tests like test/CodeGen/X86/avx-intrinsics-fast-isel.ll that are supposed to contain the IR the frontend generates. They mostly contain optimized IR, but then run fast-isel in the backend. I don't think all intrinsics are tested this way.

We may want to update the code in X86ISelLowering getAVX2GatherNode and getGatherNode to replace freeze+poison on Src with a zero vector. We already do this when the Src is undef.

Update llvm's fast-isels tests for undefined intrinsics to compile freeze(poison)
Update X86ISelLowering's getAVX2GatherNode and getGatherNode to consider freeze(poison) as well
Update DAGCombiner to fold bitcast(freeze(poison)) -> freeze(poison)

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2021, 7:26 AM

Herald added subscribers: llvm-commits, ecnelises, hiraditya. · View Herald Transcript

Minor fixes

Update llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll as well

In D104790#2836463, @craig.topper wrote:

In D104790#2836253, @aqjune wrote:

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

There are tests like test/CodeGen/X86/avx-intrinsics-fast-isel.ll that are supposed to contain the IR the frontend generates. They mostly contain optimized IR, but then run fast-isel in the backend. I don't think all intrinsics are tested this way.

Thank you for the info. I updated three *-fast-isel.ll files to check this.

Harbormaster completed remote builds in B111135: Diff 354682.Jun 26 2021, 8:12 AM

Is this actually better in any meaningful way? InstCombine will turn freeze poison into zeroinitializer, and until then this is just a completely opaque value.

In D104790#2842523, @nikic wrote:

Is this actually better in any meaningful way? InstCombine will turn freeze poison into zeroinitializer, and until then this is just a completely opaque value.

I think to correctly emit IR for intrinsics like mm256_castsi128_si256 (D103874 has more context) efficient handling of this kind of pattern is necessary:

%v = freeze <n x ty> poison
%w = shufflevector %a, %v, mask

The zeroinitializer folding is done by InstCombine's visitFreeze, which should be fixed maybe.
I'll play with some patterns and create patches for this.

aqjune mentioned this in D103874: [IR] Rename the shufflevector's undef mask to poison.Sep 26 2021, 7:04 AM

RKSimon mentioned this in rG8c2668032cb1: [X86] Remove unnecessary _mm_undefined_pd() test from avx-intrinsics-fast-isel..Aug 9 2022, 10:12 AM

aqjune mentioned this in D143287: [Clang][X86] Change X86 cast intrinsics to use __builtin_nondeterministic_value.Mar 8 2023, 12:17 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

9 lines

test/

CodeGen/

X86/

11 lines

44 lines

15 lines

4 lines

7 lines

llvm/

include/

llvm/

CodeGen/

SelectionDAGNodes.h

5 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

4 lines

Target/

X86/

X86ISelLowering.cpp

15 lines

test/

CodeGen/

X86/

avx-intrinsics-fast-isel.ll

43 lines

sse-intrinsics-fast-isel.ll

4 lines

sse2-intrinsics-fast-isel.ll

7 lines

Diff 354682

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,485 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitX86BuiltinExpr(unsigned BuiltinID,
case X86::BI__builtin_ia32_tzcnt_u32:		case X86::BI__builtin_ia32_tzcnt_u32:
case X86::BI__builtin_ia32_tzcnt_u64: {		case X86::BI__builtin_ia32_tzcnt_u64: {
Function *F = CGM.getIntrinsic(Intrinsic::cttz, Ops[0]->getType());		Function *F = CGM.getIntrinsic(Intrinsic::cttz, Ops[0]->getType());
return Builder.CreateCall(F, {Ops[0], Builder.getInt1(false)});		return Builder.CreateCall(F, {Ops[0], Builder.getInt1(false)});
}		}
case X86::BI__builtin_ia32_undef128:		case X86::BI__builtin_ia32_undef128:
case X86::BI__builtin_ia32_undef256:		case X86::BI__builtin_ia32_undef256:
case X86::BI__builtin_ia32_undef512:		case X86::BI__builtin_ia32_undef512:
// The x86 definition of "undef" is not the same as the LLVM definition		// The x86 definition of "undef" is equivalent to "freeze poison" in LLVM
// (PR32176). We leave optimizing away an unnecessary zero constant to the		// (PR32176).
// IR optimizer and backend.		return Builder.CreateFreeze(PoisonValue::get(ConvertType(E->getType())));
// TODO: If we had a "freeze" IR instruction to generate a fixed undef
// value, we should use that here instead of a zero.
return llvm::Constant::getNullValue(ConvertType(E->getType()));
case X86::BI__builtin_ia32_vec_init_v8qi:		case X86::BI__builtin_ia32_vec_init_v8qi:
case X86::BI__builtin_ia32_vec_init_v4hi:		case X86::BI__builtin_ia32_vec_init_v4hi:
case X86::BI__builtin_ia32_vec_init_v2si:		case X86::BI__builtin_ia32_vec_init_v2si:
return Builder.CreateBitCast(BuildVector(Ops),		return Builder.CreateBitCast(BuildVector(Ops),
llvm::Type::getX86_MMXTy(getLLVMContext()));		llvm::Type::getX86_MMXTy(getLLVMContext()));
case X86::BI__builtin_ia32_vec_ext_v2si:		case X86::BI__builtin_ia32_vec_ext_v2si:
case X86::BI__builtin_ia32_vec_ext_v16qi:		case X86::BI__builtin_ia32_vec_ext_v16qi:
case X86::BI__builtin_ia32_vec_ext_v8hi:		case X86::BI__builtin_ia32_vec_ext_v8hi:
▲ Show 20 Lines • Show All 5,493 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx-builtins.c

	Show First 20 Lines • Show All 2,057 Lines • ▼ Show 20 Lines
	int test_mm256_testz_si256(__m256i A, __m256i B) {			int test_mm256_testz_si256(__m256i A, __m256i B) {
	// CHECK-LABEL: test_mm256_testz_si256			// CHECK-LABEL: test_mm256_testz_si256
	// CHECK: call i32 @llvm.x86.avx.ptestz.256(<4 x i64> %{{.}}, <4 x i64> %{{.}})			// CHECK: call i32 @llvm.x86.avx.ptestz.256(<4 x i64> %{{.}}, <4 x i64> %{{.}})
	return _mm256_testz_si256(A, B);			return _mm256_testz_si256(A, B);
	}			}

	__m256 test_mm256_undefined_ps() {			__m256 test_mm256_undefined_ps() {
	// CHECK-LABEL: test_mm256_undefined_ps			// CHECK-LABEL: test_mm256_undefined_ps
	// CHECK: ret <8 x float> zeroinitializer			// CHECK: freeze <4 x double> poison
				// CHECK: bitcast <4 x double> %{{.*}} to <8 x float>
				// CHECK: ret <8 x float> %{{.*}}
	return _mm256_undefined_ps();			return _mm256_undefined_ps();
	}			}

	__m256d test_mm256_undefined_pd() {			__m256d test_mm256_undefined_pd() {
	// CHECK-LABEL: test_mm256_undefined_pd			// CHECK-LABEL: test_mm256_undefined_pd
	// CHECK: ret <4 x double> zeroinitializer			// CHECK: freeze <4 x double> poison
				// CHECK: ret <4 x double> %{{.*}}
	return _mm256_undefined_pd();			return _mm256_undefined_pd();
	}			}

	__m256i test_mm256_undefined_si256() {			__m256i test_mm256_undefined_si256() {
	// CHECK-LABEL: test_mm256_undefined_si256			// CHECK-LABEL: test_mm256_undefined_si256
	// CHECK: ret <4 x i64> zeroinitializer			// CHECK: freeze <4 x double> poison
				// CHECK: bitcast <4 x double> %{{.*}} to <4 x i64>
				// CHECK: ret <4 x i64> %{{.*}}
	return _mm256_undefined_si256();			return _mm256_undefined_si256();
	}			}

	__m256d test_mm256_unpackhi_pd(__m256d A, __m256d B) {			__m256d test_mm256_unpackhi_pd(__m256d A, __m256d B) {
	// CHECK-LABEL: test_mm256_unpackhi_pd			// CHECK-LABEL: test_mm256_unpackhi_pd
	// CHECK: shufflevector <4 x double> %{{.}}, <4 x double> %{{.}}, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			// CHECK: shufflevector <4 x double> %{{.}}, <4 x double> %{{.}}, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	return _mm256_unpackhi_pd(A, B);			return _mm256_unpackhi_pd(A, B);
	}			}
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx2-builtins.c

	Show First 20 Lines • Show All 449 Lines • ▼ Show 20 Lines
	__m256i test_mm256_mask_i32gather_epi32(__m256i a, int const *b, __m256i c, __m256i d) {			__m256i test_mm256_mask_i32gather_epi32(__m256i a, int const *b, __m256i c, __m256i d) {
	// CHECK-LABEL: test_mm256_mask_i32gather_epi32			// CHECK-LABEL: test_mm256_mask_i32gather_epi32
	// CHECK: call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> %{{.}}, i8 %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}}, i8 2)			// CHECK: call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> %{{.}}, i8 %{{.}}, <8 x i32> %{{.}}, <8 x i32> %{{.*}}, i8 2)
	return _mm256_mask_i32gather_epi32(a, b, c, d, 2);			return _mm256_mask_i32gather_epi32(a, b, c, d, 2);
	}			}

	__m128i test_mm_i32gather_epi64(long long const *b, __m128i c) {			__m128i test_mm_i32gather_epi64(long long const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i32gather_epi64			// CHECK-LABEL: test_mm_i32gather_epi64
	// CHECK: call <2 x i64> @llvm.x86.avx2.gather.d.q(<2 x i64> zeroinitializer, i8* %{{.}}, <4 x i32> %{{.}}, <2 x i64> %{{.*}}, i8 2)			// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <2 x i64>
				// CHECK: call <2 x i64> @llvm.x86.avx2.gather.d.q(<2 x i64> %[[FR_BC]], i8* %{{.}}, <4 x i32> %{{.}}, <2 x i64> %{{.*}}, i8 2)
	return _mm_i32gather_epi64(b, c, 2);			return _mm_i32gather_epi64(b, c, 2);
	}			}

	__m128i test_mm_mask_i32gather_epi64(__m128i a, long long const *b, __m128i c, __m128i d) {			__m128i test_mm_mask_i32gather_epi64(__m128i a, long long const *b, __m128i c, __m128i d) {
	// CHECK-LABEL: test_mm_mask_i32gather_epi64			// CHECK-LABEL: test_mm_mask_i32gather_epi64
	// CHECK: call <2 x i64> @llvm.x86.avx2.gather.d.q(<2 x i64> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <2 x i64> %{{.*}}, i8 2)			// CHECK: call <2 x i64> @llvm.x86.avx2.gather.d.q(<2 x i64> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <2 x i64> %{{.*}}, i8 2)
	return _mm_mask_i32gather_epi64(a, b, c, d, 2);			return _mm_mask_i32gather_epi64(a, b, c, d, 2);
	}			}

	__m256i test_mm256_i32gather_epi64(long long const *b, __m128i c) {			__m256i test_mm256_i32gather_epi64(long long const *b, __m128i c) {
	// CHECK-LABEL: test_mm256_i32gather_epi64			// CHECK-LABEL: test_mm256_i32gather_epi64
	// CHECK: call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> zeroinitializer, i8* %{{.}}, <4 x i32> %{{.}}, <4 x i64> %{{.*}}, i8 2)			// CHECK: %[[FR:.*]] = freeze <4 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <4 x double> %[[FR]] to <4 x i64>
				// CHECK: call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %[[FR_BC]], i8* %{{.}}, <4 x i32> %{{.}}, <4 x i64> %{{.*}}, i8 2)
	return _mm256_i32gather_epi64(b, c, 2);			return _mm256_i32gather_epi64(b, c, 2);
	}			}

	__m256i test_mm256_mask_i32gather_epi64(__m256i a, long long const *b, __m128i c, __m256i d) {			__m256i test_mm256_mask_i32gather_epi64(__m256i a, long long const *b, __m128i c, __m256i d) {
	// CHECK-LABEL: test_mm256_mask_i32gather_epi64			// CHECK-LABEL: test_mm256_mask_i32gather_epi64
	// CHECK: call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x i64> %{{.*}}, i8 2)			// CHECK: call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x i64> %{{.*}}, i8 2)
	return _mm256_mask_i32gather_epi64(a, b, c, d, 2);			return _mm256_mask_i32gather_epi64(a, b, c, d, 2);
	}			}

	__m128d test_mm_i32gather_pd(double const *b, __m128i c) {			__m128d test_mm_i32gather_pd(double const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i32gather_pd			// CHECK-LABEL: test_mm_i32gather_pd
				// CHECK: %[[FR:.*]] = freeze <2 x double> poison
	// CHECK: [[CMP:%.*]] = fcmp oeq <2 x double>			// CHECK: [[CMP:%.*]] = fcmp oeq <2 x double>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>			// CHECK-NEXT: [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>			// CHECK-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
	// CHECK: call <2 x double> @llvm.x86.avx2.gather.d.pd(<2 x double> zeroinitializer, i8* %{{.}}, <4 x i32> %{{.}}, <2 x double> %{{.*}}, i8 2)			// CHECK: call <2 x double> @llvm.x86.avx2.gather.d.pd(<2 x double> %[[FR]], i8* %{{.}}, <4 x i32> %{{.}}, <2 x double> %{{.*}}, i8 2)
	return _mm_i32gather_pd(b, c, 2);			return _mm_i32gather_pd(b, c, 2);
	}			}

	__m128d test_mm_mask_i32gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {			__m128d test_mm_mask_i32gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {
	// CHECK-LABEL: test_mm_mask_i32gather_pd			// CHECK-LABEL: test_mm_mask_i32gather_pd
	// CHECK: call <2 x double> @llvm.x86.avx2.gather.d.pd(<2 x double> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <2 x double> %{{.*}}, i8 2)			// CHECK: call <2 x double> @llvm.x86.avx2.gather.d.pd(<2 x double> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <2 x double> %{{.*}}, i8 2)
	return _mm_mask_i32gather_pd(a, b, c, d, 2);			return _mm_mask_i32gather_pd(a, b, c, d, 2);
	}			}

	__m256d test_mm256_i32gather_pd(double const *b, __m128i c) {			__m256d test_mm256_i32gather_pd(double const *b, __m128i c) {
	// CHECK-LABEL: test_mm256_i32gather_pd			// CHECK-LABEL: test_mm256_i32gather_pd
				// CHECK: %[[FR:.*]] = freeze <4 x double> poison
	// CHECK: [[CMP:%.*]] = fcmp oeq <4 x double>			// CHECK: [[CMP:%.*]] = fcmp oeq <4 x double>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i64>			// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i64>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i64> [[SEXT]] to <4 x double>			// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i64> [[SEXT]] to <4 x double>
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> zeroinitializer, i8* %{{.}}, <4 x i32> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %[[FR]], i8* %{{.}}, <4 x i32> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_i32gather_pd(b, c, 2);			return _mm256_i32gather_pd(b, c, 2);
	}			}

	__m256d test_mm256_mask_i32gather_pd(__m256d a, double const *b, __m128i c, __m256d d) {			__m256d test_mm256_mask_i32gather_pd(__m256d a, double const *b, __m128i c, __m256d d) {
	// CHECK-LABEL: test_mm256_mask_i32gather_pd			// CHECK-LABEL: test_mm256_mask_i32gather_pd
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_mask_i32gather_pd(a, b, c, d, 2);			return _mm256_mask_i32gather_pd(a, b, c, d, 2);
	}			}

	__m128 test_mm_i32gather_ps(float const *b, __m128i c) {			__m128 test_mm_i32gather_ps(float const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i32gather_ps			// CHECK-LABEL: test_mm_i32gather_ps
				// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <4 x float>
	// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>			// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>			// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>			// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> zeroinitializer, i8* %{{.}}, <4 x i32> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> %[[FR_BC]], i8* %{{.}}, <4 x i32> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm_i32gather_ps(b, c, 2);			return _mm_i32gather_ps(b, c, 2);
	}			}

	__m128 test_mm_mask_i32gather_ps(__m128 a, float const *b, __m128i c, __m128 d) {			__m128 test_mm_mask_i32gather_ps(__m128 a, float const *b, __m128i c, __m128 d) {
	// CHECK-LABEL: test_mm_mask_i32gather_ps			// CHECK-LABEL: test_mm_mask_i32gather_ps
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> %{{.}}, i8 %{{.}}, <4 x i32> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm_mask_i32gather_ps(a, b, c, d, 2);			return _mm_mask_i32gather_ps(a, b, c, d, 2);
	}			}

	__m256 test_mm256_i32gather_ps(float const *b, __m256i c) {			__m256 test_mm256_i32gather_ps(float const *b, __m256i c) {
	// CHECK-LABEL: test_mm256_i32gather_ps			// CHECK-LABEL: test_mm256_i32gather_ps
				// CHECK: %[[FR:.*]] = freeze <4 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <4 x double> %[[FR]] to <8 x float>
	// CHECK: [[CMP:%.*]] = fcmp oeq <8 x float>			// CHECK: [[CMP:%.*]] = fcmp oeq <8 x float>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <8 x i1> [[CMP]] to <8 x i32>			// CHECK-NEXT: [[SEXT:%.*]] = sext <8 x i1> [[CMP]] to <8 x i32>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <8 x i32> [[SEXT]] to <8 x float>			// CHECK-NEXT: [[BC:%.*]] = bitcast <8 x i32> [[SEXT]] to <8 x float>
	// CHECK: call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> zeroinitializer, i8* %{{.}}, <8 x i32> %{{.}}, <8 x float> %{{.*}}, i8 2)			// CHECK: call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> %[[FR_BC]], i8* %{{.}}, <8 x i32> %{{.}}, <8 x float> %{{.*}}, i8 2)
	return _mm256_i32gather_ps(b, c, 2);			return _mm256_i32gather_ps(b, c, 2);
	}			}

	__m256 test_mm256_mask_i32gather_ps(__m256 a, float const *b, __m256i c, __m256 d) {			__m256 test_mm256_mask_i32gather_ps(__m256 a, float const *b, __m256i c, __m256 d) {
	// CHECK-LABEL: test_mm256_mask_i32gather_ps			// CHECK-LABEL: test_mm256_mask_i32gather_ps
	// CHECK: call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> %{{.}}, i8 %{{.}}, <8 x i32> %{{.}}, <8 x float> %{{.*}}, i8 2)			// CHECK: call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> %{{.}}, i8 %{{.}}, <8 x i32> %{{.}}, <8 x float> %{{.*}}, i8 2)
	return _mm256_mask_i32gather_ps(a, b, c, d, 2);			return _mm256_mask_i32gather_ps(a, b, c, d, 2);
	}			}
	Show All 19 Lines
	__m128i test_mm256_mask_i64gather_epi32(__m128i a, int const *b, __m256i c, __m128i d) {			__m128i test_mm256_mask_i64gather_epi32(__m128i a, int const *b, __m256i c, __m128i d) {
	// CHECK-LABEL: test_mm256_mask_i64gather_epi32			// CHECK-LABEL: test_mm256_mask_i64gather_epi32
	// CHECK: call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x i32> %{{.*}}, i8 2)			// CHECK: call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x i32> %{{.*}}, i8 2)
	return _mm256_mask_i64gather_epi32(a, b, c, d, 2);			return _mm256_mask_i64gather_epi32(a, b, c, d, 2);
	}			}

	__m128i test_mm_i64gather_epi64(long long const *b, __m128i c) {			__m128i test_mm_i64gather_epi64(long long const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i64gather_epi64			// CHECK-LABEL: test_mm_i64gather_epi64
	// CHECK: call <2 x i64> @llvm.x86.avx2.gather.q.q(<2 x i64> zeroinitializer, i8* %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}, i8 2)			// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <2 x i64>
				// CHECK: call <2 x i64> @llvm.x86.avx2.gather.q.q(<2 x i64> %[[FR_BC]], i8* %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}, i8 2)
	return _mm_i64gather_epi64(b, c, 2);			return _mm_i64gather_epi64(b, c, 2);
	}			}

	__m128i test_mm_mask_i64gather_epi64(__m128i a, long long const *b, __m128i c, __m128i d) {			__m128i test_mm_mask_i64gather_epi64(__m128i a, long long const *b, __m128i c, __m128i d) {
	// CHECK-LABEL: test_mm_mask_i64gather_epi64			// CHECK-LABEL: test_mm_mask_i64gather_epi64
	// CHECK: call <2 x i64> @llvm.x86.avx2.gather.q.q(<2 x i64> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}, i8 2)			// CHECK: call <2 x i64> @llvm.x86.avx2.gather.q.q(<2 x i64> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x i64> %{{.*}}, i8 2)
	return _mm_mask_i64gather_epi64(a, b, c, d, 2);			return _mm_mask_i64gather_epi64(a, b, c, d, 2);
	}			}

	__m256i test_mm256_i64gather_epi64(long long const *b, __m256i c) {			__m256i test_mm256_i64gather_epi64(long long const *b, __m256i c) {
	// CHECK-LABEL: test_mm256_i64gather_epi64			// CHECK-LABEL: test_mm256_i64gather_epi64
	// CHECK: call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> zeroinitializer, i8* %{{.}}, <4 x i64> %{{.}}, <4 x i64> %{{.*}}, i8 2)			// CHECK: %[[FR:.*]] = freeze <4 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <4 x double> %[[FR]] to <4 x i64>
				// CHECK: call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %[[FR_BC]], i8* %{{.}}, <4 x i64> %{{.}}, <4 x i64> %{{.*}}, i8 2)
	return _mm256_i64gather_epi64(b, c, 2);			return _mm256_i64gather_epi64(b, c, 2);
	}			}

	__m256i test_mm256_mask_i64gather_epi64(__m256i a, long long const *b, __m256i c, __m256i d) {			__m256i test_mm256_mask_i64gather_epi64(__m256i a, long long const *b, __m256i c, __m256i d) {
	// CHECK-LABEL: test_mm256_mask_i64gather_epi64			// CHECK-LABEL: test_mm256_mask_i64gather_epi64
	// CHECK: call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x i64> %{{.*}}, i8 2)			// CHECK: call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x i64> %{{.*}}, i8 2)
	return _mm256_mask_i64gather_epi64(a, b, c, d, 2);			return _mm256_mask_i64gather_epi64(a, b, c, d, 2);
	}			}

	__m128d test_mm_i64gather_pd(double const *b, __m128i c) {			__m128d test_mm_i64gather_pd(double const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i64gather_pd			// CHECK-LABEL: test_mm_i64gather_pd
				// CHECK: %[[FR:.*]] = freeze <2 x double> poison
	// CHECK: [[CMP:%.*]] = fcmp oeq <2 x double>			// CHECK: [[CMP:%.*]] = fcmp oeq <2 x double>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>			// CHECK-NEXT: [[SEXT:%.*]] = sext <2 x i1> [[CMP]] to <2 x i64>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>			// CHECK-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[SEXT]] to <2 x double>
	// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> zeroinitializer, i8* %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)			// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> %[[FR]], i8* %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)
	return _mm_i64gather_pd(b, c, 2);			return _mm_i64gather_pd(b, c, 2);
	}			}

	__m128d test_mm_mask_i64gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {			__m128d test_mm_mask_i64gather_pd(__m128d a, double const *b, __m128i c, __m128d d) {
	// CHECK-LABEL: test_mm_mask_i64gather_pd			// CHECK-LABEL: test_mm_mask_i64gather_pd
	// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)			// CHECK: call <2 x double> @llvm.x86.avx2.gather.q.pd(<2 x double> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <2 x double> %{{.*}}, i8 2)
	return _mm_mask_i64gather_pd(a, b, c, d, 2);			return _mm_mask_i64gather_pd(a, b, c, d, 2);
	}			}

	__m256d test_mm256_i64gather_pd(double const *b, __m256i c) {			__m256d test_mm256_i64gather_pd(double const *b, __m256i c) {
	// CHECK-LABEL: test_mm256_i64gather_pd			// CHECK-LABEL: test_mm256_i64gather_pd
				// CHECK: %[[FR:.*]] = freeze <4 x double> poison
	// CHECK: fcmp oeq <4 x double> %{{.}}, %{{.}}			// CHECK: fcmp oeq <4 x double> %{{.}}, %{{.}}
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> zeroinitializer, i8* %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %[[FR]], i8* %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_i64gather_pd(b, c, 2);			return _mm256_i64gather_pd(b, c, 2);
	}			}

	__m256d test_mm256_mask_i64gather_pd(__m256d a, double const *b, __m256i c, __m256d d) {			__m256d test_mm256_mask_i64gather_pd(__m256d a, double const *b, __m256i c, __m256d d) {
	// CHECK-LABEL: test_mm256_mask_i64gather_pd			// CHECK-LABEL: test_mm256_mask_i64gather_pd
	// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)			// CHECK: call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x double> %{{.*}}, i8 2)
	return _mm256_mask_i64gather_pd(a, b, c, d, 2);			return _mm256_mask_i64gather_pd(a, b, c, d, 2);
	}			}

	__m128 test_mm_i64gather_ps(float const *b, __m128i c) {			__m128 test_mm_i64gather_ps(float const *b, __m128i c) {
	// CHECK-LABEL: test_mm_i64gather_ps			// CHECK-LABEL: test_mm_i64gather_ps
				// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <4 x float>
	// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>			// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>			// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>			// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps(<4 x float> zeroinitializer, i8* %{{.}}, <2 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps(<4 x float> %[[FR_BC]], i8* %{{.}}, <2 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm_i64gather_ps(b, c, 2);			return _mm_i64gather_ps(b, c, 2);
	}			}

	__m128 test_mm_mask_i64gather_ps(__m128 a, float const *b, __m128i c, __m128 d) {			__m128 test_mm_mask_i64gather_ps(__m128 a, float const *b, __m128i c, __m128 d) {
	// CHECK-LABEL: test_mm_mask_i64gather_ps			// CHECK-LABEL: test_mm_mask_i64gather_ps
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps(<4 x float> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps(<4 x float> %{{.}}, i8 %{{.}}, <2 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm_mask_i64gather_ps(a, b, c, d, 2);			return _mm_mask_i64gather_ps(a, b, c, d, 2);
	}			}

	__m128 test_mm256_i64gather_ps(float const *b, __m256i c) {			__m128 test_mm256_i64gather_ps(float const *b, __m256i c) {
	// CHECK-LABEL: test_mm256_i64gather_ps			// CHECK-LABEL: test_mm256_i64gather_ps
				// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <4 x float>
	// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>			// CHECK: [[CMP:%.*]] = fcmp oeq <4 x float>
	// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>			// CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> [[CMP]] to <4 x i32>
	// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>			// CHECK-NEXT: [[BC:%.*]] = bitcast <4 x i32> [[SEXT]] to <4 x float>
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> zeroinitializer, i8* %{{.}}, <4 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %[[FR_BC]], i8* %{{.}}, <4 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm256_i64gather_ps(b, c, 2);			return _mm256_i64gather_ps(b, c, 2);
	}			}

	__m128 test_mm256_mask_i64gather_ps(__m128 a, float const *b, __m256i c, __m128 d) {			__m128 test_mm256_mask_i64gather_ps(__m128 a, float const *b, __m256i c, __m128 d) {
	// CHECK-LABEL: test_mm256_mask_i64gather_ps			// CHECK-LABEL: test_mm256_mask_i64gather_ps
	// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)			// CHECK: call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %{{.}}, i8 %{{.}}, <4 x i64> %{{.}}, <4 x float> %{{.*}}, i8 2)
	return _mm256_mask_i64gather_ps(a, b, c, d, 2);			return _mm256_mask_i64gather_ps(a, b, c, d, 2);
	}			}
	▲ Show 20 Lines • Show All 630 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx512f-builtins.c

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,774 Lines • ▼ Show 20 Lines
	__m128d test_mm_maskz_min_sd(__mmask8 __U, __m128d __A, __m128d __B) {			__m128d test_mm_maskz_min_sd(__mmask8 __U, __m128d __A, __m128d __B) {
	// CHECK-LABEL: @test_mm_maskz_min_sd			// CHECK-LABEL: @test_mm_maskz_min_sd
	// CHECK: @llvm.x86.avx512.mask.min.sd.round			// CHECK: @llvm.x86.avx512.mask.min.sd.round
	return _mm_maskz_min_sd(__U,__A,__B);			return _mm_maskz_min_sd(__U,__A,__B);
	}			}

	__m512 test_mm512_undefined() {			__m512 test_mm512_undefined() {
	// CHECK-LABEL: @test_mm512_undefined			// CHECK-LABEL: @test_mm512_undefined
	// CHECK: ret <16 x float> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <8 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <8 x double> %[[FR]] to <16 x float>
				// CHECK: ret <16 x float> %[[FR_BC]]
	return _mm512_undefined();			return _mm512_undefined();
	}			}

	__m512 test_mm512_undefined_ps() {			__m512 test_mm512_undefined_ps() {
	// CHECK-LABEL: @test_mm512_undefined_ps			// CHECK-LABEL: @test_mm512_undefined_ps
	// CHECK: ret <16 x float> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <8 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <8 x double> %[[FR]] to <16 x float>
				// CHECK: ret <16 x float> %[[FR_BC]]
	return _mm512_undefined_ps();			return _mm512_undefined_ps();
	}			}

	__m512d test_mm512_undefined_pd() {			__m512d test_mm512_undefined_pd() {
	// CHECK-LABEL: @test_mm512_undefined_pd			// CHECK-LABEL: @test_mm512_undefined_pd
	// CHECK: ret <8 x double> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <8 x double> poison
				// CHECK: ret <8 x double> %[[FR]]
	return _mm512_undefined_pd();			return _mm512_undefined_pd();
	}			}

	__m512i test_mm512_undefined_epi32() {			__m512i test_mm512_undefined_epi32() {
	// CHECK-LABEL: @test_mm512_undefined_epi32			// CHECK-LABEL: @test_mm512_undefined_epi32
	// CHECK: ret <8 x i64> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <8 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <8 x double> %[[FR]] to <8 x i64>
				// CHECK: ret <8 x i64> %[[FR_BC]]
	return _mm512_undefined_epi32();			return _mm512_undefined_epi32();
	}			}

	__m512i test_mm512_cvtepi8_epi32(__m128i __A) {			__m512i test_mm512_cvtepi8_epi32(__m128i __A) {
	// CHECK-LABEL: @test_mm512_cvtepi8_epi32			// CHECK-LABEL: @test_mm512_cvtepi8_epi32
	// CHECK: sext <16 x i8> %{{.*}} to <16 x i32>			// CHECK: sext <16 x i8> %{{.*}} to <16 x i32>
	return _mm512_cvtepi8_epi32(__A);			return _mm512_cvtepi8_epi32(__A);
	}			}
	▲ Show 20 Lines • Show All 7,060 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/sse-builtins.c

	Show First 20 Lines • Show All 780 Lines • ▼ Show 20 Lines
	int test_mm_ucomineq_ss(__m128 A, __m128 B) {			int test_mm_ucomineq_ss(__m128 A, __m128 B) {
	// CHECK-LABEL: test_mm_ucomineq_ss			// CHECK-LABEL: test_mm_ucomineq_ss
	// CHECK: call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %{{.}}, <4 x float> %{{.}})			// CHECK: call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %{{.}}, <4 x float> %{{.}})
	return _mm_ucomineq_ss(A, B);			return _mm_ucomineq_ss(A, B);
	}			}

	__m128 test_mm_undefined_ps() {			__m128 test_mm_undefined_ps() {
	// CHECK-LABEL: test_mm_undefined_ps			// CHECK-LABEL: test_mm_undefined_ps
	// CHECK: ret <4 x float> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <4 x float>
				// CHECK: ret <4 x float> %[[FR_BC]]
	return _mm_undefined_ps();			return _mm_undefined_ps();
	}			}

	__m128 test_mm_unpackhi_ps(__m128 A, __m128 B) {			__m128 test_mm_unpackhi_ps(__m128 A, __m128 B) {
	// CHECK-LABEL: test_mm_unpackhi_ps			// CHECK-LABEL: test_mm_unpackhi_ps
	// CHECK: shufflevector <4 x float> %{{.}}, <4 x float> %{{.}}, <4 x i32> <i32 2, i32 6, i32 3, i32 7>			// CHECK: shufflevector <4 x float> %{{.}}, <4 x float> %{{.}}, <4 x i32> <i32 2, i32 6, i32 3, i32 7>
	return _mm_unpackhi_ps(A, B);			return _mm_unpackhi_ps(A, B);
	}			}
	Show All 12 Lines

clang/test/CodeGen/X86/sse2-builtins.c

	Show First 20 Lines • Show All 1,624 Lines • ▼ Show 20 Lines
	int test_mm_ucomineq_sd(__m128d A, __m128d B) {			int test_mm_ucomineq_sd(__m128d A, __m128d B) {
	// CHECK-LABEL: test_mm_ucomineq_sd			// CHECK-LABEL: test_mm_ucomineq_sd
	// CHECK: call i32 @llvm.x86.sse2.ucomineq.sd(<2 x double> %{{.}}, <2 x double> %{{.}})			// CHECK: call i32 @llvm.x86.sse2.ucomineq.sd(<2 x double> %{{.}}, <2 x double> %{{.}})
	return _mm_ucomineq_sd(A, B);			return _mm_ucomineq_sd(A, B);
	}			}

	__m128d test_mm_undefined_pd() {			__m128d test_mm_undefined_pd() {
	// CHECK-LABEL: test_mm_undefined_pd			// CHECK-LABEL: test_mm_undefined_pd
	// CHECK: ret <2 x double> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: ret <2 x double> %[[FR]]
	return _mm_undefined_pd();			return _mm_undefined_pd();
	}			}

	__m128i test_mm_undefined_si128() {			__m128i test_mm_undefined_si128() {
	// CHECK-LABEL: test_mm_undefined_si128			// CHECK-LABEL: test_mm_undefined_si128
	// CHECK: ret <2 x i64> zeroinitializer			// CHECK: %[[FR:.*]] = freeze <2 x double> poison
				// CHECK: %[[FR_BC:.*]] = bitcast <2 x double> %[[FR]] to <2 x i64>
				// CHECK: ret <2 x i64> %[[FR_BC]]
	return _mm_undefined_si128();			return _mm_undefined_si128();
	}			}

	__m128i test_mm_unpackhi_epi8(__m128i A, __m128i B) {			__m128i test_mm_unpackhi_epi8(__m128i A, __m128i B) {
	// CHECK-LABEL: test_mm_unpackhi_epi8			// CHECK-LABEL: test_mm_unpackhi_epi8
	// CHECK: shufflevector <16 x i8> %{{.}}, <16 x i8> %{{.}}, <16 x i32> <i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>			// CHECK: shufflevector <16 x i8> %{{.}}, <16 x i8> %{{.}}, <16 x i32> <i32 8, i32 24, i32 9, i32 25, i32 10, i32 26, i32 11, i32 27, i32 12, i32 28, i32 13, i32 29, i32 14, i32 30, i32 15, i32 31>
	return _mm_unpackhi_epi8(A, B);			return _mm_unpackhi_epi8(A, B);
	}			}
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	public:
inline unsigned getNumOperands() const;		inline unsigned getNumOperands() const;
inline const SDValue &getOperand(unsigned i) const;		inline const SDValue &getOperand(unsigned i) const;
inline uint64_t getConstantOperandVal(unsigned i) const;		inline uint64_t getConstantOperandVal(unsigned i) const;
inline const APInt &getConstantOperandAPInt(unsigned i) const;		inline const APInt &getConstantOperandAPInt(unsigned i) const;
inline bool isTargetMemoryOpcode() const;		inline bool isTargetMemoryOpcode() const;
inline bool isTargetOpcode() const;		inline bool isTargetOpcode() const;
inline bool isMachineOpcode() const;		inline bool isMachineOpcode() const;
inline bool isUndef() const;		inline bool isUndef() const;
		inline bool isFreezeUndef() const;
inline unsigned getMachineOpcode() const;		inline unsigned getMachineOpcode() const;
inline const DebugLoc &getDebugLoc() const;		inline const DebugLoc &getDebugLoc() const;
inline void dump() const;		inline void dump() const;
inline void dump(const SelectionDAG *G) const;		inline void dump(const SelectionDAG *G) const;
inline void dumpr() const;		inline void dumpr() const;
inline void dumpr(const SelectionDAG *G) const;		inline void dumpr(const SelectionDAG *G) const;

/// Return true if this operand (which must be a chain) reaches the		/// Return true if this operand (which must be a chain) reaches the
▲ Show 20 Lines • Show All 927 Lines • ▼ Show 20 Lines
inline unsigned SDValue::getMachineOpcode() const {		inline unsigned SDValue::getMachineOpcode() const {
return Node->getMachineOpcode();		return Node->getMachineOpcode();
}		}

inline bool SDValue::isUndef() const {		inline bool SDValue::isUndef() const {
return Node->isUndef();		return Node->isUndef();
}		}

		inline bool SDValue::isFreezeUndef() const {
		return Node->getOpcode() == ISD::FREEZE && Node->getOperand(0).isUndef();
		}

inline bool SDValue::use_empty() const {		inline bool SDValue::use_empty() const {
return !Node->hasAnyUseOfValue(ResNo);		return !Node->hasAnyUseOfValue(ResNo);
}		}

inline bool SDValue::hasOneUse() const {		inline bool SDValue::hasOneUse() const {
return Node->hasNUsesOfValue(1, ResNo);		return Node->hasNUsesOfValue(1, ResNo);
}		}

▲ Show 20 Lines • Show All 1,602 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 12,491 Lines • ▼ Show 20 Lines

	SDValue DAGCombiner::visitBITCAST(SDNode *N) {			SDValue DAGCombiner::visitBITCAST(SDNode *N) {
	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	if (N0.isUndef())			if (N0.isUndef())
	return DAG.getUNDEF(VT);			return DAG.getUNDEF(VT);

				// bitcast (freeze undef) -> freeze undef
				if (N0.isFreezeUndef() && N0.hasOneUse())
				return DAG.getFreeze(DAG.getUNDEF(VT));

	// If the input is a BUILD_VECTOR with all constant elements, fold this now.			// If the input is a BUILD_VECTOR with all constant elements, fold this now.
	// Only do this before legalize types, unless both types are integer and the			// Only do this before legalize types, unless both types are integer and the
	// scalar type is legal. Only do this before legalize ops, since the target			// scalar type is legal. Only do this before legalize ops, since the target
	// maybe depending on the bitcast.			// maybe depending on the bitcast.
	// First check to see if this is all constant.			// First check to see if this is all constant.
	// TODO: Support FP bitcasts after legalize types.			// TODO: Support FP bitcasts after legalize types.
	if (VT.isVector() &&			if (VT.isVector() &&
	(!LegalTypes \|\|			(!LegalTypes \|\|
	▲ Show 20 Lines • Show All 10,746 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 26,005 Lines • ▼ Show 20 Lines	static SDValue getAVX2GatherNode(unsigned Opc, SDValue Op, SelectionDAG &DAG,
// Scale must be constant.		// Scale must be constant.
if (!C)		if (!C)
return SDValue();		return SDValue();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Scale = DAG.getTargetConstant(C->getZExtValue(), dl,		SDValue Scale = DAG.getTargetConstant(C->getZExtValue(), dl,
TLI.getPointerTy(DAG.getDataLayout()));		TLI.getPointerTy(DAG.getDataLayout()));
EVT MaskVT = Mask.getValueType().changeVectorElementTypeToInteger();		EVT MaskVT = Mask.getValueType().changeVectorElementTypeToInteger();
SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::Other);		SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::Other);
// If source is undef or we know it won't be used, use a zero vector		// If source is undef, frozen undef with one use only, or we
// to break register dependency.		// know it won't be used, use a zero vector to break register dependency.
// TODO: use undef instead and let BreakFalseDeps deal with it?		// TODO: use undef instead and let BreakFalseDeps deal with it?
if (Src.isUndef() \|\| ISD::isBuildVectorAllOnes(Mask.getNode()))		if (Src.isUndef() \|\| (Src.isFreezeUndef() && Src.hasOneUse()) \|\|
		ISD::isBuildVectorAllOnes(Mask.getNode()))
Src = getZeroVector(Op.getSimpleValueType(), Subtarget, DAG, dl);		Src = getZeroVector(Op.getSimpleValueType(), Subtarget, DAG, dl);

// Cast mask to an integer type.		// Cast mask to an integer type.
Mask = DAG.getBitcast(MaskVT, Mask);		Mask = DAG.getBitcast(MaskVT, Mask);

MemIntrinsicSDNode *MemIntr = cast<MemIntrinsicSDNode>(Op);		MemIntrinsicSDNode *MemIntr = cast<MemIntrinsicSDNode>(Op);

SDValue Ops[] = {Chain, Src, Mask, Base, Index, Scale };		SDValue Ops[] = {Chain, Src, Mask, Base, Index, Scale };
Show All 21 Lines	static SDValue getGatherNode(SDValue Op, SelectionDAG &DAG,
MVT MaskVT = MVT::getVectorVT(MVT::i1, MinElts);		MVT MaskVT = MVT::getVectorVT(MVT::i1, MinElts);

// We support two versions of the gather intrinsics. One with scalar mask and		// We support two versions of the gather intrinsics. One with scalar mask and
// one with vXi1 mask. Convert scalar to vXi1 if necessary.		// one with vXi1 mask. Convert scalar to vXi1 if necessary.
if (Mask.getValueType() != MaskVT)		if (Mask.getValueType() != MaskVT)
Mask = getMaskNode(Mask, MaskVT, Subtarget, DAG, dl);		Mask = getMaskNode(Mask, MaskVT, Subtarget, DAG, dl);

SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::Other);		SDVTList VTs = DAG.getVTList(Op.getValueType(), MVT::Other);
// If source is undef or we know it won't be used, use a zero vector		// If source is undef, frozen undef with one use only, or we
// to break register dependency.		// know it won't be used, use a zero vector to break register dependency.
// TODO: use undef instead and let BreakFalseDeps deal with it?		// TODO: use undef instead and let BreakFalseDeps deal with it?
if (Src.isUndef() \|\| ISD::isBuildVectorAllOnes(Mask.getNode()))		// TODO: use undef instead and let BreakFalseDeps deal with it?
		if (Src.isUndef() \|\| (Src.isFreezeUndef() && Src.hasOneUse()) \|\|
		ISD::isBuildVectorAllOnes(Mask.getNode()))
Src = getZeroVector(Op.getSimpleValueType(), Subtarget, DAG, dl);		Src = getZeroVector(Op.getSimpleValueType(), Subtarget, DAG, dl);

MemIntrinsicSDNode *MemIntr = cast<MemIntrinsicSDNode>(Op);		MemIntrinsicSDNode *MemIntr = cast<MemIntrinsicSDNode>(Op);

SDValue Ops[] = {Chain, Src, Mask, Base, Index, Scale };		SDValue Ops[] = {Chain, Src, Mask, Base, Index, Scale };
SDValue Res =		SDValue Res =
DAG.getMemIntrinsicNode(X86ISD::MGATHER, dl, VTs, Ops,		DAG.getMemIntrinsicNode(X86ISD::MGATHER, dl, VTs, Ops,
MemIntr->getMemoryVT(), MemIntr->getMemOperand());		MemIntr->getMemoryVT(), MemIntr->getMemOperand());
▲ Show 20 Lines • Show All 26,196 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

	Show First 20 Lines • Show All 2,959 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: sete %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	%res = call i32 @llvm.x86.avx.ptestz.256(<4 x i64> %a0, <4 x i64> %a1)			%res = call i32 @llvm.x86.avx.ptestz.256(<4 x i64> %a0, <4 x i64> %a1)
	ret i32 %res			ret i32 %res
	}			}
	declare i32 @llvm.x86.avx.ptestz.256(<4 x i64>, <4 x i64>) nounwind readnone			declare i32 @llvm.x86.avx.ptestz.256(<4 x i64>, <4 x i64>) nounwind readnone

	define <2 x double> @test_mm_undefined_pd() nounwind {
	; CHECK-LABEL: test_mm_undefined_pd:
	; CHECK: # %bb.0:
	; CHECK-NEXT: ret{{[l\|q]}}
	ret <2 x double> undef
	}

	define <4 x double> @test_mm256_undefined_pd() nounwind {			define <4 x double> @test_mm256_undefined_pd() nounwind {
	; CHECK-LABEL: test_mm256_undefined_pd:			; CHECK-LABEL: test_mm256_undefined_pd:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	ret <4 x double> undef			%v = freeze <4 x double> poison
				ret <4 x double> %v
	}			}

	define <8 x float> @test_mm256_undefined_ps() nounwind {			define <8 x float> @test_mm256_undefined_ps() nounwind {
	; CHECK-LABEL: test_mm256_undefined_ps:			; CHECK-LABEL: test_mm256_undefined_ps:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	ret <8 x float> undef			%v = freeze <4 x double> poison
				%w = bitcast <4 x double> %v to <8 x float>
				ret <8 x float> %w
	}			}

	define <4 x i64> @test_mm256_undefined_si256() nounwind {			define <4 x i64> @test_mm256_undefined_si256() nounwind {
	; CHECK-LABEL: test_mm256_undefined_si256:			; CHECK-LABEL: test_mm256_undefined_si256:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	ret <4 x i64> undef			%v = freeze <4 x double> poison
				%w = bitcast <4 x double> %v to <4 x i64>
				ret <4 x i64> %w
				}

				define <16 x float> @test_mm512_undefined() nounwind {
				; CHECK-LABEL: test_mm512_undefined:
				; CHECK: # %bb.0:
				; CHECK-NEXT: ret{{[l\|q]}}
				%v = freeze <8 x double> poison
				%w = bitcast <8 x double> %v to <16 x float>
				ret <16 x float> %w
				}

				define <8 x double> @test_mm512_undefined_pd() nounwind {
				; CHECK-LABEL: test_mm512_undefined_pd:
				; CHECK: # %bb.0:
				; CHECK-NEXT: ret{{[l\|q]}}
				%v = freeze <8 x double> poison
				ret <8 x double> %v
				}

				define <8 x i64> @test_mm512_undefined_epi32() nounwind {
				; CHECK-LABEL: test_mm512_undefined_epi32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: ret{{[l\|q]}}
				%v = freeze <8 x i64> poison
				ret <8 x i64> %v
	}			}

	define <4 x double> @test_mm256_unpackhi_pd(<4 x double> %a0, <4 x double> %a1) nounwind {			define <4 x double> @test_mm256_unpackhi_pd(<4 x double> %a0, <4 x double> %a1) nounwind {
	; CHECK-LABEL: test_mm256_unpackhi_pd:			; CHECK-LABEL: test_mm256_unpackhi_pd:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]			; CHECK-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	%res = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>			%res = shufflevector <4 x double> %a0, <4 x double> %a1, <4 x i32> <i32 1, i32 5, i32 3, i32 7>
	▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 3,507 Lines • ▼ Show 20 Lines	; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone

define <4 x float> @test_mm_undefined_ps() {		define <4 x float> @test_mm_undefined_ps() {
; CHECK-LABEL: test_mm_undefined_ps:		; CHECK-LABEL: test_mm_undefined_ps:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
ret <4 x float> undef		%v = freeze <2 x double> poison
		%w = bitcast <2 x double> %v to <4 x float>
		ret <4 x float> %w
}		}

define <4 x float> @test_mm_unpackhi_ps(<4 x float> %a0, <4 x float> %a1) nounwind {		define <4 x float> @test_mm_unpackhi_ps(<4 x float> %a0, <4 x float> %a1) nounwind {
; SSE-LABEL: test_mm_unpackhi_ps:		; SSE-LABEL: test_mm_unpackhi_ps:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: unpckhps %xmm1, %xmm0 # encoding: [0x0f,0x15,0xc1]		; SSE-NEXT: unpckhps %xmm1, %xmm0 # encoding: [0x0f,0x15,0xc1]
; SSE-NEXT: # xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; SSE-NEXT: # xmm0 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,384 Lines • ▼ Show 20 Lines	; AVX512-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.ucomineq.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.ucomineq.sd(<2 x double>, <2 x double>) nounwind readnone

define <2 x double> @test_mm_undefined_pd() {		define <2 x double> @test_mm_undefined_pd() {
; CHECK-LABEL: test_mm_undefined_pd:		; CHECK-LABEL: test_mm_undefined_pd:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
ret <2 x double> undef		%v = freeze <2 x double> poison
		ret <2 x double> %v
}		}

define <2 x i64> @test_mm_undefined_si128() {		define <2 x i64> @test_mm_undefined_si128() {
; CHECK-LABEL: test_mm_undefined_si128:		; CHECK-LABEL: test_mm_undefined_si128:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; CHECK-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
ret <2 x i64> undef		%v = freeze <2 x double> poison
		%w = bitcast <2 x double> %v to <2 x i64>
		ret <2 x i64> %w
}		}

define <2 x i64> @test_mm_unpackhi_epi8(<2 x i64> %a0, <2 x i64> %a1) {		define <2 x i64> @test_mm_unpackhi_epi8(<2 x i64> %a0, <2 x i64> %a1) {
; SSE-LABEL: test_mm_unpackhi_epi8:		; SSE-LABEL: test_mm_unpackhi_epi8:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: punpckhbw %xmm1, %xmm0 # encoding: [0x66,0x0f,0x68,0xc1]		; SSE-NEXT: punpckhbw %xmm1, %xmm0 # encoding: [0x66,0x0f,0x68,0xc1]
; SSE-NEXT: # xmm0 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]		; SSE-NEXT: # xmm0 = xmm0[8],xmm1[8],xmm0[9],xmm1[9],xmm0[10],xmm1[10],xmm0[11],xmm1[11],xmm0[12],xmm1[12],xmm0[13],xmm1[13],xmm0[14],xmm1[14],xmm0[15],xmm1[15]
; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]		; SSE-NEXT: ret{{[l\|q]}} # encoding: [0xc3]
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix mm*_undefined* intrinsics to use arbitrary frozen bit patternNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354682

clang/lib/CodeGen/CGBuiltin.cpp

clang/test/CodeGen/X86/avx-builtins.c

clang/test/CodeGen/X86/avx2-builtins.c

clang/test/CodeGen/X86/avx512f-builtins.c

clang/test/CodeGen/X86/sse-builtins.c

clang/test/CodeGen/X86/sse2-builtins.c

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll

llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll

llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll

[x86] fix mm_undefined intrinsics to use arbitrary frozen bit pattern
Needs ReviewPublic