This is an archive of the discontinued LLVM Phabricator instance.

PR28129 expand vector oparation to an IR constant.
ClosedPublic

Authored by dtemirbulatov on May 22 2017, 5:01 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
craig.topper
hfinkel

Diff Detail

Event Timeline

dtemirbulatov created this revision.May 22 2017, 5:01 AM

dtemirbulatov added a subscriber: cfe-commits.May 22 2017, 5:06 AM

dtemirbulatov added a reviewer: craig.topper.May 22 2017, 5:18 AM

Test _mm256_cmp_pd as well?

lib/CodeGen/CGBuiltin.cpp
7934	You need a comment here - explain what the constant represents and what the transform does.
test/CodeGen/avx-builtins.c
1434	Use _CMP_TRUE_UQ here instead of 0xf?

spatel added inline comments.May 22 2017, 6:42 AM

lib/CodeGen/CGBuiltin.cpp
7949–7959	Should we handle the 'pd256' version the same way? How about the 0xb ('false') constant? It should produce a zero here? Can or should we deal with the signalling versions (0x1b, 0x1f) too?

add _mm256_cmp_pd double version
add comments in lib/CodeGen/CGBuiltin.cpp
replaced 0xf to _CMP_TRUE_UQ in avx-builtins.c

Should we handle the 'pd256' version the same way?
How about the 0xb ('false') constant? It should produce a zero here?
Can or should we deal with the signalling versions (0x1b, 0x1f) too?

hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal.

lib/CodeGen/CGBuiltin.cpp
7949–7959	hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal.

spatel added subscribers: scanon, andrew.w.kaylor.May 24 2017, 9:59 AM

spatel added inline comments.

lib/CodeGen/CGBuiltin.cpp
7949–7959	I didn't follow this reasoning. The 0xB compare predicate will return 'false' (all zeros) no matter what the inputs are. "Ordered" in this definition is irrelevant; just like "unordered" is irrelevant for predicate 0xF (TRUE_UQ). It's probably helpful to run the program attached to PR28110 ( https://bugs.llvm.org/show_bug.cgi?id=28110 ) to confirm or deny if these predicates behave like you expect. Another possibly misleading wording: "non-signaling" does not actually mean non-signaling for all values. It means "non-signaling for QNAN, but still signaling for SNAN". Therefore, I think we're changing SNAN behavior by folding any of these preds to constant values. We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in? ( cc'ing @andrew.w.kaylor and @scanon for advice)

We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in?

I suppose it is FE_ALL_EXCEPT.

Ping. [andrew.w.kaylor, scanon] Is it OK to assume that FP exceptions are off by default and allow such transformation to constants in the IR since we know that we would have exception with "1.00 -nan" for _mm256_cmp_ps(a, b, 15) in case FP exceptions are enabled?

Update after http://lists.llvm.org/pipermail/llvm-dev/2017-June/114120.html. Added 0x1b(_CMP_FALSE_OS), 0x1f(_CMP_TRUE_US) handling.

Functionally, I think this is correct and complete now. See inline for some nits.

lib/CodeGen/CGBuiltin.cpp
7925–7926	Fix comment to something like: "Except for predicates that create constants, ..."
7934	would produce --> produces
7940	Formatting: over 80-col limit.
7950	would produce --> produces
7956	Formatting: over 80-col limit.

Update formatting, comments

LGTM.

This revision is now accepted and ready to land.Jun 15 2017, 3:49 PM

rL305551

Revision Contents

Path

Size

lib/

CodeGen/

CGBuiltin.cpp

21 lines

test/

CodeGen/

avx-builtins.c

48 lines

Diff 102717

lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,916 Lines • ▼ Show 20 Lines	if (CC < 8) {
case 4: Pred = FCmpInst::FCMP_UNE; break;		case 4: Pred = FCmpInst::FCMP_UNE; break;
case 5: Pred = FCmpInst::FCMP_UGE; break;		case 5: Pred = FCmpInst::FCMP_UGE; break;
case 6: Pred = FCmpInst::FCMP_UGT; break;		case 6: Pred = FCmpInst::FCMP_UGT; break;
case 7: Pred = FCmpInst::FCMP_ORD; break;		case 7: Pred = FCmpInst::FCMP_ORD; break;
}		}
return getVectorFCmpIR(Pred);		return getVectorFCmpIR(Pred);
}		}

// We can't handle 8-31 immediates with native IR, use the intrinsic.		// We can't handle 8-31 immediates with native IR, use the intrinsic.
		// Except for predicates that create constants.
		spatelUnsubmitted Not Done Reply Inline Actions Fix comment to something like: "Except for predicates that create constants, ..." spatel: Fix comment to something like: "Except for predicates that create constants, ..."
Intrinsic::ID ID;		Intrinsic::ID ID;
switch (BuiltinID) {		switch (BuiltinID) {
default: llvm_unreachable("Unsupported intrinsic!");		default: llvm_unreachable("Unsupported intrinsic!");
case X86::BI__builtin_ia32_cmpps:		case X86::BI__builtin_ia32_cmpps:
ID = Intrinsic::x86_sse_cmp_ps;		ID = Intrinsic::x86_sse_cmp_ps;
break;		break;
case X86::BI__builtin_ia32_cmpps256:		case X86::BI__builtin_ia32_cmpps256:
		// _CMP_TRUE_UQ, _CMP_TRUE_US produce -1,-1... vector
		RKSimonUnsubmitted Not Done Reply Inline Actions You need a comment here - explain what the constant represents and what the transform does. RKSimon: You need a comment here - explain what the constant represents and what the transform does.
		spatelUnsubmitted Not Done Reply Inline Actions would produce --> produces spatel: would produce --> produces
		// on any input and _CMP_FALSE_OQ, _CMP_FALSE_OS produce 0, 0...
		if (CC == 0xf \|\| CC == 0xb \|\| CC == 0x1b \|\| CC == 0x1f) {
		Value *Constant = (CC == 0xf \|\| CC == 0x1f) ?
		llvm::Constant::getAllOnesValue(Builder.getInt32Ty()) :
		llvm::Constant::getNullValue(Builder.getInt32Ty());
		Value *Vec = Builder.CreateVectorSplat(
		spatelUnsubmitted Not Done Reply Inline Actions Formatting: over 80-col limit. spatel: Formatting: over 80-col limit.
		Ops[0]->getType()->getVectorNumElements(), Constant);
		return Builder.CreateBitCast(Vec, Ops[0]->getType());
		}
ID = Intrinsic::x86_avx_cmp_ps_256;		ID = Intrinsic::x86_avx_cmp_ps_256;
break;		break;
case X86::BI__builtin_ia32_cmppd:		case X86::BI__builtin_ia32_cmppd:
ID = Intrinsic::x86_sse2_cmp_pd;		ID = Intrinsic::x86_sse2_cmp_pd;
break;		break;
case X86::BI__builtin_ia32_cmppd256:		case X86::BI__builtin_ia32_cmppd256:
		// _CMP_TRUE_UQ, _CMP_TRUE_US produce -1,-1... vector
		spatelUnsubmitted Not Done Reply Inline Actions would produce --> produces spatel: would produce --> produces
		// on any input and _CMP_FALSE_OQ, _CMP_FALSE_OS produce 0, 0...
		if (CC == 0xf \|\| CC == 0xb \|\| CC == 0x1b \|\| CC == 0x1f) {
		Value *Constant = (CC == 0xf \|\| CC == 0x1f) ?
		llvm::Constant::getAllOnesValue(Builder.getInt64Ty()) :
		llvm::Constant::getNullValue(Builder.getInt64Ty());
		Value *Vec = Builder.CreateVectorSplat(
		spatelUnsubmitted Not Done Reply Inline Actions Formatting: over 80-col limit. spatel: Formatting: over 80-col limit.
		Ops[0]->getType()->getVectorNumElements(), Constant);
		return Builder.CreateBitCast(Vec, Ops[0]->getType());
		}
		spatelUnsubmitted Not Done Reply Inline Actions Should we handle the 'pd256' version the same way? How about the 0xb ('false') constant? It should produce a zero here? Can or should we deal with the signalling versions (0x1b, 0x1f) too? spatel: 1. Should we handle the 'pd256' version the same way? 2. How about the 0xb ('false') constant?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit a signal. dtemirbulatov: hm looks like 0xb(_CMP_FALSE_OQ) is ordered, so it is not possible and 0x1b or 0x1f might emit…
		spatelUnsubmitted Not Done Reply Inline Actions I didn't follow this reasoning. The 0xB compare predicate will return 'false' (all zeros) no matter what the inputs are. "Ordered" in this definition is irrelevant; just like "unordered" is irrelevant for predicate 0xF (TRUE_UQ). It's probably helpful to run the program attached to PR28110 ( https://bugs.llvm.org/show_bug.cgi?id=28110 ) to confirm or deny if these predicates behave like you expect. Another possibly misleading wording: "non-signaling" does not actually mean non-signaling for all values. It means "non-signaling for QNAN, but still signaling for SNAN". Therefore, I think we're changing SNAN behavior by folding any of these preds to constant values. We should've asked this first: is that fold allowed in the default FPENV state that we assume that clang is operating in? ( cc'ing @andrew.w.kaylor and @scanon for advice) spatel: I didn't follow this reasoning. 1. The 0xB compare predicate will return 'false' (all zeros)…
ID = Intrinsic::x86_avx_cmp_pd_256;		ID = Intrinsic::x86_avx_cmp_pd_256;
break;		break;
}		}

return Builder.CreateCall(CGM.getIntrinsic(ID), Ops);		return Builder.CreateCall(CGM.getIntrinsic(ID), Ops);
}		}

// SSE scalar comparison intrinsics		// SSE scalar comparison intrinsics
▲ Show 20 Lines • Show All 1,241 Lines • Show Last 20 Lines

test/CodeGen/avx-builtins.c

	Show First 20 Lines • Show All 1,421 Lines • ▼ Show 20 Lines
	}			}

	float test_mm256_cvtss_f32(__m256 __a)			float test_mm256_cvtss_f32(__m256 __a)
	{			{
	// CHECK-LABEL: @test_mm256_cvtss_f32			// CHECK-LABEL: @test_mm256_cvtss_f32
	// CHECK: extractelement <8 x float> %{{.*}}, i32 0			// CHECK: extractelement <8 x float> %{{.*}}, i32 0
	return _mm256_cvtss_f32(__a);			return _mm256_cvtss_f32(__a);
	}			}

				__m256 test_mm256_cmp_ps_true(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_ps_true
				// CHECK: store <8 x float> <float 0xFFFFFFFFE0000000,
				return _mm256_cmp_ps(a, b, _CMP_TRUE_UQ);
				RKSimonUnsubmitted Not Done Reply Inline Actions Use _CMP_TRUE_UQ here instead of 0xf? RKSimon: Use _CMP_TRUE_UQ here instead of 0xf?
				}

				__m256 test_mm256_cmp_pd_true(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_pd_true
				// CHECK: store <4 x double> <double 0xFFFFFFFFFFFFFFFF,
				return _mm256_cmp_pd(a, b, _CMP_TRUE_UQ);
				}

				__m256 test_mm256_cmp_ps_false(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_ps_false
				// CHECK: store <8 x float> zeroinitializer, <8 x float>* %tmp, align 32
				return _mm256_cmp_ps(a, b, _CMP_FALSE_OQ);
				}

				__m256 test_mm256_cmp_pd_false(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_pd_false
				// CHECK: store <4 x double> zeroinitializer, <4 x double>* %tmp, align 32
				return _mm256_cmp_pd(a, b, _CMP_FALSE_OQ);
				}

				__m256 test_mm256_cmp_ps_strue(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_ps_strue
				// CHECK: store <8 x float> <float 0xFFFFFFFFE0000000,
				return _mm256_cmp_ps(a, b, _CMP_TRUE_US);
				}

				__m256 test_mm256_cmp_pd_strue(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_pd_strue
				// CHECK: store <4 x double> <double 0xFFFFFFFFFFFFFFFF,
				return _mm256_cmp_pd(a, b, _CMP_TRUE_US);
				}

				__m256 test_mm256_cmp_ps_sfalse(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_ps_sfalse
				// CHECK: store <8 x float> zeroinitializer, <8 x float>* %tmp, align 32
				return _mm256_cmp_ps(a, b, _CMP_FALSE_OS);
				}

				__m256 test_mm256_cmp_pd_sfalse(__m256 a, __m256 b) {
				// CHECK-LABEL: @test_mm256_cmp_pd_sfalse
				// CHECK: store <4 x double> zeroinitializer, <4 x double>* %tmp, align 32
				return _mm256_cmp_pd(a, b, _CMP_FALSE_OS);
				}