This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics
ClosedPublic

Authored by foad on May 28 2020, 2:30 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
b-sumner

Commits

rGc823cfde21b2: [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.May 28 2020, 2:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 2:31 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 8 others. · View Herald Transcript

foad marked an inline comment as done.May 28 2020, 2:34 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1918	This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an explicit check for all of the quarter-integer values that should fold to exactly -1.0 or 0.0 or +1.0 ?

Harbormaster failed remote builds in B58183: Diff 266785!May 28 2020, 4:51 AM

arsenm added inline comments.May 28 2020, 6:44 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	I've never been sure what exactly the policy should be for folding these sorts of intrinsics. I think nobody is happy with relying on the host libm call to begin with, plus the hardware instructions don't give the exact same results. We're already constant folding more precise results for rcp than the hardware instruction gives though (which is probably more useful in general). I'm not sure this is how this will work going forward, but you could also consider strictfp on the call site (it may happen already, but you should add a test where certain folds are skipped) Also you should use numbers::pi rather than relying on M_PI (and I strongly prefer using an explicit 2.0 rather than 2)
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175	Also should test snan -> qnan (this also won't happen for !ieee_mode, so this case should depend on strictfp also)
178	Add some cases where strictfp calls are skipped

foad marked 2 inline comments as done.May 28 2020, 7:14 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1918	you could also consider strictfp on the call site I thought: strictfp is to do with whether certain math functions are guaranteed to respect the ambient rounding mode etc the definition of amdgcn.cos is "do what the hardware instruction does" the hardware instruction doesn't behave differently depending on the rounding mode so there's no need to worry about strictfp But perhaps I'm wrong about some or all of this.
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175	All of the FP constant folding code is skipped for infinities and nans so there's not much to check. (I didn't really mean to add any of these inf and nan checks here, just carelessly cut n pasted from another test.)

arsenm added inline comments.May 28 2020, 7:39 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	I'm pretty sure the nan quieting behavior will change based on the ieee_mode bit. AMDGPU has a wider set of FP environment properties. Plus, these support exceptions anyway
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175	That would be a bug, since I would expect them to fold too

rampitec added a reviewer: b-sumner.May 28 2020, 9:51 AM

b-sumner added inline comments.May 28 2020, 10:13 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	I don't understand the first comment above where the argument is 1.0. These intrinsics take radians, and not the scaled values the ISA expects,, correct? Similarly, if any bounds checking is done, shouldn't it be on radians?

arsenm added inline comments.May 28 2020, 10:32 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	From the lowering it looks like they directly feed the subtarget instruction. The lowering for llvm.sin/llvm.cos inserts the scale or not depending on the target (I'm not sure why you would ever need the raw amdgcn intrinsic form)

arsenm added inline comments.May 28 2020, 10:35 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	The device libraries have one function that uses both in the raw form, and doesn't consider the two subtarget behaviors

foad marked an inline comment as done.May 29 2020, 1:58 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1918	LLPC sometimes generates the amdgcn.sin/cos intrinsics. The problem with introducing the multiply when llvm.sin/cos is lowered is that it doesn't get folded away if the argument was already something multiplied or divided by a constant, which is pretty common.

Force exact results for quarter-integer inputs
Use numbers::pi
Add strictfp tests

foad marked 2 inline comments as done.May 29 2020, 2:34 AM

foad added inline comments.

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
175	A missed optimization perhaps, but it's always been like that for all math intrinsics, so I don't plan to fix it as part of this patch.

Harbormaster failed remote builds in B58390: Diff 267149!May 29 2020, 3:45 AM

arsenm added inline comments.May 29 2020, 7:30 AM

llvm/lib/Analysis/ConstantFolding.cpp
1918	It should in the DAG lowering. We're not propagating fast math flags now in the sin lowering, maybe that's your problem?

arsenm accepted this revision.Jun 2 2020, 5:16 PM

This revision is now accepted and ready to land.Jun 2 2020, 5:16 PM

Closed by commit rGc823cfde21b2: [AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics (authored by foad). · Explain WhyJun 3 2020, 1:36 AM

This revision was automatically updated to reflect the committed changes.

foad marked an inline comment as done.

thakis added a subscriber: thakis.Jun 3 2020, 3:07 AM

thakis added inline comments.

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
139	Looks like this ends in BCD instead of BCC on Windows (with clang-cl as host compiler at least): http://45.33.8.238/win/16679/step_11.txt

foad marked 2 inline comments as done.Jun 3 2020, 3:41 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1918	Good point, thanks, see D80813. Perhaps there is no good reason for LLPC to use the amdgcn intrinsics after all.
llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll
139	Thanks, should be fixed in rGc27214c23446e423ec2e7eb8650a65cc5f0a16aa.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ConstantFolding.cpp

21 lines

test/

Analysis/

ConstantFolding/

AMDGPU/

cos.ll

243 lines

sin.ll

243 lines

Diff 267149

llvm/lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 1,444 Lines • ▼ Show 20 Lines	bool llvm::canConstantFoldCallTo(const CallBase Call, const Function F) {
case Intrinsic::sin:		case Intrinsic::sin:
case Intrinsic::cos:		case Intrinsic::cos:
case Intrinsic::pow:		case Intrinsic::pow:
case Intrinsic::powi:		case Intrinsic::powi:
case Intrinsic::fma:		case Intrinsic::fma:
case Intrinsic::fmuladd:		case Intrinsic::fmuladd:
case Intrinsic::convert_from_fp16:		case Intrinsic::convert_from_fp16:
case Intrinsic::convert_to_fp16:		case Intrinsic::convert_to_fp16:
// The intrinsics below depend on rounding mode in MXCSR.		case Intrinsic::amdgcn_cos:
case Intrinsic::amdgcn_cubeid:		case Intrinsic::amdgcn_cubeid:
case Intrinsic::amdgcn_cubema:		case Intrinsic::amdgcn_cubema:
case Intrinsic::amdgcn_cubesc:		case Intrinsic::amdgcn_cubesc:
case Intrinsic::amdgcn_cubetc:		case Intrinsic::amdgcn_cubetc:
case Intrinsic::amdgcn_fmul_legacy:		case Intrinsic::amdgcn_fmul_legacy:
case Intrinsic::amdgcn_fract:		case Intrinsic::amdgcn_fract:
case Intrinsic::amdgcn_ldexp:		case Intrinsic::amdgcn_ldexp:
		case Intrinsic::amdgcn_sin:
		// The intrinsics below depend on rounding mode in MXCSR.
case Intrinsic::x86_sse_cvtss2si:		case Intrinsic::x86_sse_cvtss2si:
case Intrinsic::x86_sse_cvtss2si64:		case Intrinsic::x86_sse_cvtss2si64:
case Intrinsic::x86_sse_cvttss2si:		case Intrinsic::x86_sse_cvttss2si:
case Intrinsic::x86_sse_cvttss2si64:		case Intrinsic::x86_sse_cvttss2si64:
case Intrinsic::x86_sse2_cvtsd2si:		case Intrinsic::x86_sse2_cvtsd2si:
case Intrinsic::x86_sse2_cvtsd2si64:		case Intrinsic::x86_sse2_cvtsd2si64:
case Intrinsic::x86_sse2_cvttsd2si:		case Intrinsic::x86_sse2_cvttsd2si:
case Intrinsic::x86_sse2_cvttsd2si64:		case Intrinsic::x86_sse2_cvttsd2si64:
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	switch (IntrinsicID) {
// Fold exp2(x) as pow(2, x), in case the host lacks a C99 library.		// Fold exp2(x) as pow(2, x), in case the host lacks a C99 library.
return ConstantFoldBinaryFP(pow, 2.0, V, Ty);		return ConstantFoldBinaryFP(pow, 2.0, V, Ty);
case Intrinsic::sin:		case Intrinsic::sin:
return ConstantFoldFP(sin, V, Ty);		return ConstantFoldFP(sin, V, Ty);
case Intrinsic::cos:		case Intrinsic::cos:
return ConstantFoldFP(cos, V, Ty);		return ConstantFoldFP(cos, V, Ty);
case Intrinsic::sqrt:		case Intrinsic::sqrt:
return ConstantFoldFP(sqrt, V, Ty);		return ConstantFoldFP(sqrt, V, Ty);
		case Intrinsic::amdgcn_cos:
		case Intrinsic::amdgcn_sin:
		if (V < -256.0 \|\| V > 256.0)
		// The gfx8 and gfx9 architectures handle arguments outside the range
		// [-256, 256] differently. This should be a rare case so bail out
		// rather than trying to handle the difference.
		return nullptr;
		bool IsCos = IntrinsicID == Intrinsic::amdgcn_cos;
		double V4 = V * 4.0;
		foadAuthorUnsubmitted Done Reply Inline Actions This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an explicit check for all of the quarter-integer values that should fold to exactly -1.0 or 0.0 or +1.0 ? foad: This folds amdgcn.sin(1.0) to a very small value that is not exactly 0.0. Should I add an…
		arsenmUnsubmitted Done Reply Inline Actions I've never been sure what exactly the policy should be for folding these sorts of intrinsics. I think nobody is happy with relying on the host libm call to begin with, plus the hardware instructions don't give the exact same results. We're already constant folding more precise results for rcp than the hardware instruction gives though (which is probably more useful in general). I'm not sure this is how this will work going forward, but you could also consider strictfp on the call site (it may happen already, but you should add a test where certain folds are skipped) Also you should use numbers::pi rather than relying on M_PI (and I strongly prefer using an explicit 2.0 rather than 2) arsenm: I've never been sure what exactly the policy should be for folding these sorts of intrinsics. I…
		foadAuthorUnsubmitted Done Reply Inline Actions you could also consider strictfp on the call site I thought: strictfp is to do with whether certain math functions are guaranteed to respect the ambient rounding mode etc the definition of amdgcn.cos is "do what the hardware instruction does" the hardware instruction doesn't behave differently depending on the rounding mode so there's no need to worry about strictfp But perhaps I'm wrong about some or all of this. foad: > you could also consider strictfp on the call site I thought: - strictfp is to do with…
		arsenmUnsubmitted Not Done Reply Inline Actions I'm pretty sure the nan quieting behavior will change based on the ieee_mode bit. AMDGPU has a wider set of FP environment properties. Plus, these support exceptions anyway arsenm: I'm pretty sure the nan quieting behavior will change based on the ieee_mode bit. AMDGPU has a…
		b-sumnerUnsubmitted Not Done Reply Inline Actions I don't understand the first comment above where the argument is 1.0. These intrinsics take radians, and not the scaled values the ISA expects,, correct? Similarly, if any bounds checking is done, shouldn't it be on radians? b-sumner: I don't understand the first comment above where the argument is 1.0. These intrinsics take…
		arsenmUnsubmitted Not Done Reply Inline Actions From the lowering it looks like they directly feed the subtarget instruction. The lowering for llvm.sin/llvm.cos inserts the scale or not depending on the target (I'm not sure why you would ever need the raw amdgcn intrinsic form) arsenm: From the lowering it looks like they directly feed the subtarget instruction. The lowering for…
		arsenmUnsubmitted Not Done Reply Inline Actions The device libraries have one function that uses both in the raw form, and doesn't consider the two subtarget behaviors arsenm: The device libraries have one function that uses both in the raw form, and doesn't consider the…
		foadAuthorUnsubmitted Done Reply Inline Actions LLPC sometimes generates the amdgcn.sin/cos intrinsics. The problem with introducing the multiply when llvm.sin/cos is lowered is that it doesn't get folded away if the argument was already something multiplied or divided by a constant, which is pretty common. foad: LLPC sometimes generates the amdgcn.sin/cos intrinsics. The problem with introducing the…
		arsenmUnsubmitted Not Done Reply Inline Actions It should in the DAG lowering. We're not propagating fast math flags now in the sin lowering, maybe that's your problem? arsenm: It should in the DAG lowering. We're not propagating fast math flags now in the sin lowering…
		foadAuthorUnsubmitted Done Reply Inline Actions Good point, thanks, see D80813. Perhaps there is no good reason for LLPC to use the amdgcn intrinsics after all. foad: Good point, thanks, see D80813. Perhaps there is no good reason for LLPC to use the amdgcn…
		if (V4 == floor(V4)) {
		// Force exact results for quarter-integer inputs.
		const double SinVals[4] = { 0.0, 1.0, 0.0, -1.0 };
		V = SinVals[((int)V4 + (IsCos ? 1 : 0)) & 3];
		} else {
		V = (IsCos ? cos : sin)(V * 2.0 * numbers::pi);
		}
		return GetConstantFoldFPValue(V, Ty);
}		}

if (!TLI)		if (!TLI)
return nullptr;		return nullptr;

LibFunc Func = NotLibFunc;		LibFunc Func = NotLibFunc;
TLI->getLibFunc(Name, Func);		TLI->getLibFunc(Name, Func);
switch (Func) {		switch (Func) {
▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

llvm/test/Analysis/ConstantFolding/AMDGPU/cos.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				declare half @llvm.amdgcn.cos.f16(half) #0
				declare float @llvm.amdgcn.cos.f32(float) #0
				declare double @llvm.amdgcn.cos.f64(double) #0

				define void @test_f16(half* %p) {
				; CHECK-LABEL: @test_f16(
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P:%.*]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH39A8, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH39A8, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xHBC00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xHBC00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: [[P1000:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xH63D0)
				; CHECK-NEXT: store volatile half [[P1000]], half* [[P]], align 2
				; CHECK-NEXT: [[N1000:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xHE3D0)
				; CHECK-NEXT: store volatile half [[N1000]], half* [[P]], align 2
				; CHECK-NEXT: [[PINF:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xH7C00)
				; CHECK-NEXT: store volatile half [[PINF]], half* [[P]], align 2
				; CHECK-NEXT: [[NINF:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xHFC00)
				; CHECK-NEXT: store volatile half [[NINF]], half* [[P]], align 2
				; CHECK-NEXT: [[NAN:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xH7E00)
				; CHECK-NEXT: store volatile half [[NAN]], half* [[P]], align 2
				; CHECK-NEXT: ret void
				;
				%p0 = call half @llvm.amdgcn.cos.f16(half +0.0)
				store volatile half %p0, half* %p
				%n0 = call half @llvm.amdgcn.cos.f16(half -0.0)
				store volatile half %n0, half* %p
				%p0125 = call half @llvm.amdgcn.cos.f16(half +0.125)
				store volatile half %p0125, half* %p
				%n0125 = call half @llvm.amdgcn.cos.f16(half -0.125)
				store volatile half %n0125, half* %p
				%p025 = call half @llvm.amdgcn.cos.f16(half +0.25)
				store volatile half %p025, half* %p
				%n025 = call half @llvm.amdgcn.cos.f16(half -0.25)
				store volatile half %n025, half* %p
				%p05 = call half @llvm.amdgcn.cos.f16(half +0.5)
				store volatile half %p05, half* %p
				%n05 = call half @llvm.amdgcn.cos.f16(half -0.5)
				store volatile half %n05, half* %p
				%p1 = call half @llvm.amdgcn.cos.f16(half +1.0)
				store volatile half %p1, half* %p
				%n1 = call half @llvm.amdgcn.cos.f16(half -1.0)
				store volatile half %n1, half* %p
				%p256 = call half @llvm.amdgcn.cos.f16(half +256.0)
				store volatile half %p256, half* %p
				%n256 = call half @llvm.amdgcn.cos.f16(half -256.0)
				store volatile half %n256, half* %p
				%p1000 = call half @llvm.amdgcn.cos.f16(half +1000.0)
				store volatile half %p1000, half* %p
				%n1000 = call half @llvm.amdgcn.cos.f16(half -1000.0)
				store volatile half %n1000, half* %p
				%pinf = call half @llvm.amdgcn.cos.f16(half 0xH7C00) ; +inf
				store volatile half %pinf, half* %p
				%ninf = call half @llvm.amdgcn.cos.f16(half 0xHFC00) ; -inf
				store volatile half %ninf, half* %p
				%nan = call half @llvm.amdgcn.cos.f16(half 0xH7E00) ; nan
				store volatile half %nan, half* %p
				ret void
				}

				define void @test_f32(float* %p) {
				; CHECK-LABEL: @test_f32(
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P:%.*]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0x3FE6A09E60000000, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0x3FE6A09E60000000, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float -1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float -1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: [[P1000:%.*]] = call float @llvm.amdgcn.cos.f32(float 1.000000e+03)
				; CHECK-NEXT: store volatile float [[P1000]], float* [[P]], align 4
				; CHECK-NEXT: [[N1000:%.*]] = call float @llvm.amdgcn.cos.f32(float -1.000000e+03)
				; CHECK-NEXT: store volatile float [[N1000]], float* [[P]], align 4
				; CHECK-NEXT: [[PINF:%.*]] = call float @llvm.amdgcn.cos.f32(float 0x7FF0000000000000)
				; CHECK-NEXT: store volatile float [[PINF]], float* [[P]], align 4
				; CHECK-NEXT: [[NINF:%.*]] = call float @llvm.amdgcn.cos.f32(float 0xFFF0000000000000)
				; CHECK-NEXT: store volatile float [[NINF]], float* [[P]], align 4
				; CHECK-NEXT: [[NAN:%.*]] = call float @llvm.amdgcn.cos.f32(float 0x7FF8000000000000)
				; CHECK-NEXT: store volatile float [[NAN]], float* [[P]], align 4
				; CHECK-NEXT: ret void
				;
				%p0 = call float @llvm.amdgcn.cos.f32(float +0.0)
				store volatile float %p0, float* %p
				%n0 = call float @llvm.amdgcn.cos.f32(float -0.0)
				store volatile float %n0, float* %p
				%p0125 = call float @llvm.amdgcn.cos.f32(float +0.125)
				store volatile float %p0125, float* %p
				%n0125 = call float @llvm.amdgcn.cos.f32(float -0.125)
				store volatile float %n0125, float* %p
				%p025 = call float @llvm.amdgcn.cos.f32(float +0.25)
				store volatile float %p025, float* %p
				%n025 = call float @llvm.amdgcn.cos.f32(float -0.25)
				store volatile float %n025, float* %p
				%p05 = call float @llvm.amdgcn.cos.f32(float +0.5)
				store volatile float %p05, float* %p
				%n05 = call float @llvm.amdgcn.cos.f32(float -0.5)
				store volatile float %n05, float* %p
				%p1 = call float @llvm.amdgcn.cos.f32(float +1.0)
				store volatile float %p1, float* %p
				%n1 = call float @llvm.amdgcn.cos.f32(float -1.0)
				store volatile float %n1, float* %p
				%p256 = call float @llvm.amdgcn.cos.f32(float +256.0)
				store volatile float %p256, float* %p
				%n256 = call float @llvm.amdgcn.cos.f32(float -256.0)
				store volatile float %n256, float* %p
				%p1000 = call float @llvm.amdgcn.cos.f32(float +1000.0)
				store volatile float %p1000, float* %p
				%n1000 = call float @llvm.amdgcn.cos.f32(float -1000.0)
				store volatile float %n1000, float* %p
				%pinf = call float @llvm.amdgcn.cos.f32(float 0x7FF0000000000000) ; +inf
				store volatile float %pinf, float* %p
				%ninf = call float @llvm.amdgcn.cos.f32(float 0xFFF0000000000000) ; -inf
				store volatile float %ninf, float* %p
				%nan = call float @llvm.amdgcn.cos.f32(float 0x7FF8000000000000) ; nan
				store volatile float %nan, float* %p
				ret void
				}

				define void @test_f64(double* %p) {
				; CHECK-LABEL: @test_f64(
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P:%.*]], align 8
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0x3FE6A09E667F3BCD, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0x3FE6A09E667F3BCD, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double -1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double -1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: [[P1000:%.*]] = call double @llvm.amdgcn.cos.f64(double 1.000000e+03)
				; CHECK-NEXT: store volatile double [[P1000]], double* [[P]], align 8
				; CHECK-NEXT: [[N1000:%.*]] = call double @llvm.amdgcn.cos.f64(double -1.000000e+03)
				; CHECK-NEXT: store volatile double [[N1000]], double* [[P]], align 8
				; CHECK-NEXT: [[PINF:%.*]] = call double @llvm.amdgcn.cos.f64(double 0x7FF0000000000000)
				; CHECK-NEXT: store volatile double [[PINF]], double* [[P]], align 8
				; CHECK-NEXT: [[NINF:%.*]] = call double @llvm.amdgcn.cos.f64(double 0xFFF0000000000000)
				; CHECK-NEXT: store volatile double [[NINF]], double* [[P]], align 8
				; CHECK-NEXT: [[NAN:%.*]] = call double @llvm.amdgcn.cos.f64(double 0x7FF8000000000000)
				; CHECK-NEXT: store volatile double [[NAN]], double* [[P]], align 8
				; CHECK-NEXT: ret void
				;
				%p0 = call double @llvm.amdgcn.cos.f64(double +0.0)
				store volatile double %p0, double* %p
				%n0 = call double @llvm.amdgcn.cos.f64(double -0.0)
				store volatile double %n0, double* %p
				%p0125 = call double @llvm.amdgcn.cos.f64(double +0.125)
				store volatile double %p0125, double* %p
				%n0125 = call double @llvm.amdgcn.cos.f64(double -0.125)
				store volatile double %n0125, double* %p
				%p025 = call double @llvm.amdgcn.cos.f64(double +0.25)
				store volatile double %p025, double* %p
				%n025 = call double @llvm.amdgcn.cos.f64(double -0.25)
				store volatile double %n025, double* %p
				%p05 = call double @llvm.amdgcn.cos.f64(double +0.5)
				store volatile double %p05, double* %p
				%n05 = call double @llvm.amdgcn.cos.f64(double -0.5)
				store volatile double %n05, double* %p
				%p1 = call double @llvm.amdgcn.cos.f64(double +1.0)
				store volatile double %p1, double* %p
				%n1 = call double @llvm.amdgcn.cos.f64(double -1.0)
				store volatile double %n1, double* %p
				%p256 = call double @llvm.amdgcn.cos.f64(double +256.0)
				store volatile double %p256, double* %p
				%n256 = call double @llvm.amdgcn.cos.f64(double -256.0)
				store volatile double %n256, double* %p
				%p1000 = call double @llvm.amdgcn.cos.f64(double +1000.0)
				store volatile double %p1000, double* %p
				%n1000 = call double @llvm.amdgcn.cos.f64(double -1000.0)
				store volatile double %n1000, double* %p
				%pinf = call double @llvm.amdgcn.cos.f64(double 0x7FF0000000000000) ; +inf
				store volatile double %pinf, double* %p
				%ninf = call double @llvm.amdgcn.cos.f64(double 0xFFF0000000000000) ; -inf
				store volatile double %ninf, double* %p
				%nan = call double @llvm.amdgcn.cos.f64(double 0x7FF8000000000000) ; nan
				store volatile double %nan, double* %p
				ret void
				}

				define void @test_f16_strictfp (half* %p) #1 {
				; CHECK-LABEL: @test_f16_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xH0000) #1
				; CHECK-NEXT: store volatile half [[P0]], half* [[P:%.*]], align 2
				; CHECK-NEXT: [[P025:%.*]] = call half @llvm.amdgcn.cos.f16(half 0xH3400) #1
				; CHECK-NEXT: store volatile half [[P025]], half* [[P]], align 2
				; CHECK-NEXT: ret void
				;
				%p0 = call half @llvm.amdgcn.cos.f16(half +0.0) #1
				store volatile half %p0, half* %p
				%p025 = call half @llvm.amdgcn.cos.f16(half +0.25) #1
				store volatile half %p025, half* %p
				ret void
				}

				define void @test_f32_strictfp(float* %p) #1 {
				; CHECK-LABEL: @test_f32_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call float @llvm.amdgcn.cos.f32(float 0.000000e+00) #1
				; CHECK-NEXT: store volatile float [[P0]], float* [[P:%.*]], align 4
				; CHECK-NEXT: [[P025:%.*]] = call float @llvm.amdgcn.cos.f32(float 2.500000e-01) #1
				; CHECK-NEXT: store volatile float [[P025]], float* [[P]], align 4
				; CHECK-NEXT: ret void
				;
				%p0 = call float @llvm.amdgcn.cos.f32(float +0.0) #1
				store volatile float %p0, float* %p
				%p025 = call float @llvm.amdgcn.cos.f32(float +0.25) #1
				store volatile float %p025, float* %p
				ret void
				}

				define void @test_f64_strictfp(double* %p) #1 {
				; CHECK-LABEL: @test_f64_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call double @llvm.amdgcn.cos.f64(double 0.000000e+00) #1
				; CHECK-NEXT: store volatile double [[P0]], double* [[P:%.*]], align 8
				; CHECK-NEXT: [[P025:%.*]] = call double @llvm.amdgcn.cos.f64(double 2.500000e-01) #1
				; CHECK-NEXT: store volatile double [[P025]], double* [[P]], align 8
				; CHECK-NEXT: ret void
				;
				%p0 = call double @llvm.amdgcn.cos.f64(double +0.0) #1
				store volatile double %p0, double* %p
				%p025 = call double @llvm.amdgcn.cos.f64(double +0.25) #1
				store volatile double %p025, double* %p
				ret void
				}

				attributes #0 = { nounwind readnone speculatable }
				attributes #1 = { strictfp }

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				declare half @llvm.amdgcn.sin.f16(half) #0
				declare float @llvm.amdgcn.sin.f32(float) #0
				declare double @llvm.amdgcn.sin.f64(double) #0

				define void @test_f16(half* %p) {
				; CHECK-LABEL: @test_f16(
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P:%.*]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH39A8, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xHB9A8, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH3C00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xHBC00, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]], align 2
				; CHECK-NEXT: [[P1000:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xH63D0)
				; CHECK-NEXT: store volatile half [[P1000]], half* [[P]], align 2
				; CHECK-NEXT: [[N1000:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xHE3D0)
				; CHECK-NEXT: store volatile half [[N1000]], half* [[P]], align 2
				; CHECK-NEXT: [[PINF:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xH7C00)
				; CHECK-NEXT: store volatile half [[PINF]], half* [[P]], align 2
				; CHECK-NEXT: [[NINF:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xHFC00)
				; CHECK-NEXT: store volatile half [[NINF]], half* [[P]], align 2
				; CHECK-NEXT: [[NAN:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xH7E00)
				; CHECK-NEXT: store volatile half [[NAN]], half* [[P]], align 2
				; CHECK-NEXT: ret void
				;
				%p0 = call half @llvm.amdgcn.sin.f16(half +0.0)
				store volatile half %p0, half* %p
				%n0 = call half @llvm.amdgcn.sin.f16(half -0.0)
				store volatile half %n0, half* %p
				%p0125 = call half @llvm.amdgcn.sin.f16(half +0.125)
				store volatile half %p0125, half* %p
				%n0125 = call half @llvm.amdgcn.sin.f16(half -0.125)
				store volatile half %n0125, half* %p
				%p025 = call half @llvm.amdgcn.sin.f16(half +0.25)
				store volatile half %p025, half* %p
				%n025 = call half @llvm.amdgcn.sin.f16(half -0.25)
				store volatile half %n025, half* %p
				%p05 = call half @llvm.amdgcn.sin.f16(half +0.5)
				store volatile half %p05, half* %p
				%n05 = call half @llvm.amdgcn.sin.f16(half -0.5)
				store volatile half %n05, half* %p
				%p1 = call half @llvm.amdgcn.sin.f16(half +1.0)
				store volatile half %p1, half* %p
				%n1 = call half @llvm.amdgcn.sin.f16(half -1.0)
				store volatile half %n1, half* %p
				%p256 = call half @llvm.amdgcn.sin.f16(half +256.0)
				store volatile half %p256, half* %p
				%n256 = call half @llvm.amdgcn.sin.f16(half -256.0)
				store volatile half %n256, half* %p
				%p1000 = call half @llvm.amdgcn.sin.f16(half +1000.0)
				store volatile half %p1000, half* %p
				%n1000 = call half @llvm.amdgcn.sin.f16(half -1000.0)
				store volatile half %n1000, half* %p
				%pinf = call half @llvm.amdgcn.sin.f16(half 0xH7C00) ; +inf
				store volatile half %pinf, half* %p
				%ninf = call half @llvm.amdgcn.sin.f16(half 0xHFC00) ; -inf
				store volatile half %ninf, half* %p
				%nan = call half @llvm.amdgcn.sin.f16(half 0xH7E00) ; nan
				store volatile half %nan, half* %p
				ret void
				}

				define void @test_f32(float* %p) {
				; CHECK-LABEL: @test_f32(
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P:%.*]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0x3FE6A09E60000000, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0xBFE6A09E60000000, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float -1.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]], align 4
				; CHECK-NEXT: [[P1000:%.*]] = call float @llvm.amdgcn.sin.f32(float 1.000000e+03)
				; CHECK-NEXT: store volatile float [[P1000]], float* [[P]], align 4
				; CHECK-NEXT: [[N1000:%.*]] = call float @llvm.amdgcn.sin.f32(float -1.000000e+03)
				; CHECK-NEXT: store volatile float [[N1000]], float* [[P]], align 4
				; CHECK-NEXT: [[PINF:%.*]] = call float @llvm.amdgcn.sin.f32(float 0x7FF0000000000000)
				; CHECK-NEXT: store volatile float [[PINF]], float* [[P]], align 4
				; CHECK-NEXT: [[NINF:%.*]] = call float @llvm.amdgcn.sin.f32(float 0xFFF0000000000000)
				; CHECK-NEXT: store volatile float [[NINF]], float* [[P]], align 4
				; CHECK-NEXT: [[NAN:%.*]] = call float @llvm.amdgcn.sin.f32(float 0x7FF8000000000000)
				; CHECK-NEXT: store volatile float [[NAN]], float* [[P]], align 4
				; CHECK-NEXT: ret void
				;
				%p0 = call float @llvm.amdgcn.sin.f32(float +0.0)
				store volatile float %p0, float* %p
				%n0 = call float @llvm.amdgcn.sin.f32(float -0.0)
				store volatile float %n0, float* %p
				%p0125 = call float @llvm.amdgcn.sin.f32(float +0.125)
				store volatile float %p0125, float* %p
				%n0125 = call float @llvm.amdgcn.sin.f32(float -0.125)
				store volatile float %n0125, float* %p
				%p025 = call float @llvm.amdgcn.sin.f32(float +0.25)
				store volatile float %p025, float* %p
				%n025 = call float @llvm.amdgcn.sin.f32(float -0.25)
				store volatile float %n025, float* %p
				%p05 = call float @llvm.amdgcn.sin.f32(float +0.5)
				store volatile float %p05, float* %p
				%n05 = call float @llvm.amdgcn.sin.f32(float -0.5)
				store volatile float %n05, float* %p
				%p1 = call float @llvm.amdgcn.sin.f32(float +1.0)
				store volatile float %p1, float* %p
				%n1 = call float @llvm.amdgcn.sin.f32(float -1.0)
				store volatile float %n1, float* %p
				%p256 = call float @llvm.amdgcn.sin.f32(float +256.0)
				store volatile float %p256, float* %p
				%n256 = call float @llvm.amdgcn.sin.f32(float -256.0)
				store volatile float %n256, float* %p
				%p1000 = call float @llvm.amdgcn.sin.f32(float +1000.0)
				store volatile float %p1000, float* %p
				%n1000 = call float @llvm.amdgcn.sin.f32(float -1000.0)
				store volatile float %n1000, float* %p
				%pinf = call float @llvm.amdgcn.sin.f32(float 0x7FF0000000000000) ; +inf
				store volatile float %pinf, float* %p
				%ninf = call float @llvm.amdgcn.sin.f32(float 0xFFF0000000000000) ; -inf
				store volatile float %ninf, float* %p
				%nan = call float @llvm.amdgcn.sin.f32(float 0x7FF8000000000000) ; nan
				store volatile float %nan, float* %p
				ret void
				}

				define void @test_f64(double* %p) {
				; CHECK-LABEL: @test_f64(
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P:%.*]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0x3FE6A09E667F3BCC, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0xBFE6A09E667F3BCC, double* [[P]], align 8
				thakisUnsubmitted Not Done Reply Inline Actions Looks like this ends in BCD instead of BCC on Windows (with clang-cl as host compiler at least): http://45.33.8.238/win/16679/step_11.txt thakis: Looks like this ends in BCD instead of BCC on Windows (with clang-cl as host compiler at least)…
				foadAuthorUnsubmitted Done Reply Inline Actions Thanks, should be fixed in rGc27214c23446e423ec2e7eb8650a65cc5f0a16aa. foad: Thanks, should be fixed in rGc27214c23446e423ec2e7eb8650a65cc5f0a16aa.
				; CHECK-NEXT: store volatile double 1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double -1.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]], align 8
				; CHECK-NEXT: [[P1000:%.*]] = call double @llvm.amdgcn.sin.f64(double 1.000000e+03)
				; CHECK-NEXT: store volatile double [[P1000]], double* [[P]], align 8
				; CHECK-NEXT: [[N1000:%.*]] = call double @llvm.amdgcn.sin.f64(double -1.000000e+03)
				; CHECK-NEXT: store volatile double [[N1000]], double* [[P]], align 8
				; CHECK-NEXT: [[PINF:%.*]] = call double @llvm.amdgcn.sin.f64(double 0x7FF0000000000000)
				; CHECK-NEXT: store volatile double [[PINF]], double* [[P]], align 8
				; CHECK-NEXT: [[NINF:%.*]] = call double @llvm.amdgcn.sin.f64(double 0xFFF0000000000000)
				; CHECK-NEXT: store volatile double [[NINF]], double* [[P]], align 8
				; CHECK-NEXT: [[NAN:%.*]] = call double @llvm.amdgcn.sin.f64(double 0x7FF8000000000000)
				; CHECK-NEXT: store volatile double [[NAN]], double* [[P]], align 8
				; CHECK-NEXT: ret void
				;
				%p0 = call double @llvm.amdgcn.sin.f64(double +0.0)
				store volatile double %p0, double* %p
				%n0 = call double @llvm.amdgcn.sin.f64(double -0.0)
				store volatile double %n0, double* %p
				%p0125 = call double @llvm.amdgcn.sin.f64(double +0.125)
				store volatile double %p0125, double* %p
				%n0125 = call double @llvm.amdgcn.sin.f64(double -0.125)
				store volatile double %n0125, double* %p
				%p025 = call double @llvm.amdgcn.sin.f64(double +0.25)
				store volatile double %p025, double* %p
				%n025 = call double @llvm.amdgcn.sin.f64(double -0.25)
				store volatile double %n025, double* %p
				%p05 = call double @llvm.amdgcn.sin.f64(double +0.5)
				store volatile double %p05, double* %p
				%n05 = call double @llvm.amdgcn.sin.f64(double -0.5)
				store volatile double %n05, double* %p
				arsenmUnsubmitted Not Done Reply Inline Actions Also should test snan -> qnan (this also won't happen for !ieee_mode, so this case should depend on strictfp also) arsenm: Also should test snan -> qnan (this also won't happen for !ieee_mode, so this case should…
				foadAuthorUnsubmitted Done Reply Inline Actions All of the FP constant folding code is skipped for infinities and nans so there's not much to check. (I didn't really mean to add any of these inf and nan checks here, just carelessly cut n pasted from another test.) foad: All of the FP constant folding code is skipped for infinities and nans so there's not much to…
				arsenmUnsubmitted Not Done Reply Inline Actions That would be a bug, since I would expect them to fold too arsenm: That would be a bug, since I would expect them to fold too
				foadAuthorUnsubmitted Done Reply Inline Actions A missed optimization perhaps, but it's always been like that for all math intrinsics, so I don't plan to fix it as part of this patch. foad: A missed optimization perhaps, but it's always been like that for all math intrinsics, so I…
				%p1 = call double @llvm.amdgcn.sin.f64(double +1.0)
				store volatile double %p1, double* %p
				%n1 = call double @llvm.amdgcn.sin.f64(double -1.0)
				arsenmUnsubmitted Done Reply Inline Actions Add some cases where strictfp calls are skipped arsenm: Add some cases where strictfp calls are skipped
				store volatile double %n1, double* %p
				%p256 = call double @llvm.amdgcn.sin.f64(double +256.0)
				store volatile double %p256, double* %p
				%n256 = call double @llvm.amdgcn.sin.f64(double -256.0)
				store volatile double %n256, double* %p
				%p1000 = call double @llvm.amdgcn.sin.f64(double +1000.0)
				store volatile double %p1000, double* %p
				%n1000 = call double @llvm.amdgcn.sin.f64(double -1000.0)
				store volatile double %n1000, double* %p
				%pinf = call double @llvm.amdgcn.sin.f64(double 0x7FF0000000000000) ; +inf
				store volatile double %pinf, double* %p
				%ninf = call double @llvm.amdgcn.sin.f64(double 0xFFF0000000000000) ; -inf
				store volatile double %ninf, double* %p
				%nan = call double @llvm.amdgcn.sin.f64(double 0x7FF8000000000000) ; nan
				store volatile double %nan, double* %p
				ret void
				}

				define void @test_f16_strictfp (half* %p) #1 {
				; CHECK-LABEL: @test_f16_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xH0000) #1
				; CHECK-NEXT: store volatile half [[P0]], half* [[P:%.*]], align 2
				; CHECK-NEXT: [[P025:%.*]] = call half @llvm.amdgcn.sin.f16(half 0xH3400) #1
				; CHECK-NEXT: store volatile half [[P025]], half* [[P]], align 2
				; CHECK-NEXT: ret void
				;
				%p0 = call half @llvm.amdgcn.sin.f16(half +0.0) #1
				store volatile half %p0, half* %p
				%p025 = call half @llvm.amdgcn.sin.f16(half +0.25) #1
				store volatile half %p025, half* %p
				ret void
				}

				define void @test_f32_strictfp(float* %p) #1 {
				; CHECK-LABEL: @test_f32_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call float @llvm.amdgcn.sin.f32(float 0.000000e+00) #1
				; CHECK-NEXT: store volatile float [[P0]], float* [[P:%.*]], align 4
				; CHECK-NEXT: [[P025:%.*]] = call float @llvm.amdgcn.sin.f32(float 2.500000e-01) #1
				; CHECK-NEXT: store volatile float [[P025]], float* [[P]], align 4
				; CHECK-NEXT: ret void
				;
				%p0 = call float @llvm.amdgcn.sin.f32(float +0.0) #1
				store volatile float %p0, float* %p
				%p025 = call float @llvm.amdgcn.sin.f32(float +0.25) #1
				store volatile float %p025, float* %p
				ret void
				}

				define void @test_f64_strictfp(double* %p) #1 {
				; CHECK-LABEL: @test_f64_strictfp(
				; CHECK-NEXT: [[P0:%.*]] = call double @llvm.amdgcn.sin.f64(double 0.000000e+00) #1
				; CHECK-NEXT: store volatile double [[P0]], double* [[P:%.*]], align 8
				; CHECK-NEXT: [[P025:%.*]] = call double @llvm.amdgcn.sin.f64(double 2.500000e-01) #1
				; CHECK-NEXT: store volatile double [[P025]], double* [[P]], align 8
				; CHECK-NEXT: ret void
				;
				%p0 = call double @llvm.amdgcn.sin.f64(double +0.0) #1
				store volatile double %p0, double* %p
				%p025 = call double @llvm.amdgcn.sin.f64(double +0.25) #1
				store volatile double %p025, double* %p
				ret void
				}

				attributes #0 = { nounwind readnone speculatable }
				attributes #1 = { strictfp }

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267149

llvm/lib/Analysis/ConstantFolding.cpp

llvm/test/Analysis/ConstantFolding/AMDGPU/cos.ll

llvm/test/Analysis/ConstantFolding/AMDGPU/sin.ll

[AMDGPU] Fold llvm.amdgcn.cos and llvm.amdgcn.sin intrinsics
ClosedPublic