This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic
ClosedPublic

Authored by foad on Feb 26 2020, 7:17 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm
rampitec

Commits

rG5900d3f2e94f: [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

foad created this revision.Feb 26 2020, 7:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2020, 7:17 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 7 others. · View Herald Transcript

arsenm added inline comments.Feb 26 2020, 7:27 AM

llvm/lib/Analysis/ConstantFolding.cpp
1810	This should match the instruction behavior (although I guess we can ignore the bug on SI)
1811–1814	The specs aren't necessarily relevant here, since this just needs to match the instruction behavior. Talking about the min when it isn't here is potentially confusing

foad marked an inline comment as done.Feb 26 2020, 7:46 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1810	Is there a good public reference for that? The Vega ISA Reference Guide doesn't go into much detail.

arsenm added inline comments.Feb 26 2020, 7:56 AM

llvm/lib/Analysis/ConstantFolding.cpp

1810

This is always a problem, and no. I just go by this comment:

// V_FRACT is buggy on SI, so the F32 version is never used and (x-floor(x)) is
// used instead. However, SI doesn't have V_FLOOR_F64, so the most efficient
// way to implement it is using V_FRACT_F64.
// The workaround for the V_FRACT bug is:
//    fract(x) = isnan(x) ? x : min(V_FRACT(x), 0.99999999999999999)

// Convert floor(x) to (x - fract(x))

Harbormaster completed remote builds in B47305: Diff 246718.Feb 26 2020, 8:42 AM

Implement the OpenCL-like behaviour.

foad marked 3 inline comments as done.Feb 26 2020, 9:33 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1810	OK, so it sounds like the (non-buggy) hardware uses the same trick as the OpenCL definition, to avoid ever returning 1.0. I'll try to confirm that on some real hardware.

Harbormaster completed remote builds in B47331: Diff 246767.Feb 26 2020, 11:14 AM

foad marked 2 inline comments as done.Feb 27 2020, 6:16 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1810	I've confirmed this for f16 and f32 types, on some real gfx9 hardware.

arsenm accepted this revision.Feb 27 2020, 6:26 AM

This revision is now accepted and ready to land.Feb 27 2020, 6:26 AM

foad marked an inline comment as done.Feb 27 2020, 6:36 AM

foad added inline comments.

llvm/lib/Analysis/ConstantFolding.cpp
1810	... and confirmed for f64 too.

Closed by commit rG5900d3f2e94f: [AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic (authored by foad). · Explain WhyFeb 27 2020, 9:50 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ConstantFolding.cpp

16 lines

test/

Analysis/

ConstantFolding/

AMDGPU/

fract.ll

126 lines

Diff 247005

llvm/lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 1,454 Lines • ▼ Show 20 Lines	bool llvm::canConstantFoldCallTo(const CallBase Call, const Function F) {
case Intrinsic::ssub_sat:		case Intrinsic::ssub_sat:
case Intrinsic::usub_sat:		case Intrinsic::usub_sat:
case Intrinsic::smul_fix:		case Intrinsic::smul_fix:
case Intrinsic::smul_fix_sat:		case Intrinsic::smul_fix_sat:
case Intrinsic::convert_from_fp16:		case Intrinsic::convert_from_fp16:
case Intrinsic::convert_to_fp16:		case Intrinsic::convert_to_fp16:
case Intrinsic::bitreverse:		case Intrinsic::bitreverse:
case Intrinsic::amdgcn_fmul_legacy:		case Intrinsic::amdgcn_fmul_legacy:
		case Intrinsic::amdgcn_fract:
case Intrinsic::x86_sse_cvtss2si:		case Intrinsic::x86_sse_cvtss2si:
case Intrinsic::x86_sse_cvtss2si64:		case Intrinsic::x86_sse_cvtss2si64:
case Intrinsic::x86_sse_cvttss2si:		case Intrinsic::x86_sse_cvttss2si:
case Intrinsic::x86_sse_cvttss2si64:		case Intrinsic::x86_sse_cvttss2si64:
case Intrinsic::x86_sse2_cvtsd2si:		case Intrinsic::x86_sse2_cvtsd2si:
case Intrinsic::x86_sse2_cvtsd2si64:		case Intrinsic::x86_sse2_cvtsd2si64:
case Intrinsic::x86_sse2_cvttsd2si:		case Intrinsic::x86_sse2_cvttsd2si:
case Intrinsic::x86_sse2_cvttsd2si64:		case Intrinsic::x86_sse2_cvttsd2si64:
▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	if (IntrinsicID == Intrinsic::trunc) {
return ConstantFP::get(Ty->getContext(), U);		return ConstantFP::get(Ty->getContext(), U);
}		}

if (IntrinsicID == Intrinsic::fabs) {		if (IntrinsicID == Intrinsic::fabs) {
U.clearSign();		U.clearSign();
return ConstantFP::get(Ty->getContext(), U);		return ConstantFP::get(Ty->getContext(), U);
}		}

		if (IntrinsicID == Intrinsic::amdgcn_fract) {
		// The v_fract instruction behaves like the OpenCL spec, which defines
		// fract(x) as fmin(x - floor(x), 0x1.fffffep-1f): "The min() operator is
		// there to prevent fract(-small) from returning 1.0. It returns the
		// largest positive floating-point number less than 1.0."
		APFloat FloorU(U);
		FloorU.roundToIntegral(APFloat::rmTowardNegative);
		APFloat FractU(U - FloorU);
		APFloat AlmostOne(U.getSemantics(), 1);
		AlmostOne.next(/nextDown/ true);
		return ConstantFP::get(Ty->getContext(), minimum(FractU, AlmostOne));
		}

/// We only fold functions with finite arguments. Folding NaN and inf is		/// We only fold functions with finite arguments. Folding NaN and inf is
/// likely to be aborted with an exception anyway, and some host libms		/// likely to be aborted with an exception anyway, and some host libms
/// have known errors raising exceptions.		/// have known errors raising exceptions.
if (Op->getValueAPF().isNaN() \|\| Op->getValueAPF().isInfinity())		if (!U.isFinite())
return nullptr;		return nullptr;

/// Currently APFloat versions of these functions do not exist, so we use		/// Currently APFloat versions of these functions do not exist, so we use
/// the host native double versions. Float versions are not called		/// the host native double versions. Float versions are not called
		arsenmUnsubmitted Done Reply Inline Actions This should match the instruction behavior (although I guess we can ignore the bug on SI) arsenm: This should match the instruction behavior (although I guess we can ignore the bug on SI)
		foadAuthorUnsubmitted Done Reply Inline Actions Is there a good public reference for that? The Vega ISA Reference Guide doesn't go into much detail. foad: Is there a good public reference for that? The Vega ISA Reference Guide doesn't go into much…
		arsenmUnsubmitted Done Reply Inline Actions This is always a problem, and no. I just go by this comment: // V_FRACT is buggy on SI, so the F32 version is never used and (x-floor(x)) is // used instead. However, SI doesn't have V_FLOOR_F64, so the most efficient // way to implement it is using V_FRACT_F64. // The workaround for the V_FRACT bug is: // fract(x) = isnan(x) ? x : min(V_FRACT(x), 0.99999999999999999) // Convert floor(x) to (x - fract(x)) arsenm: This is always a problem, and no. I just go by this comment: ``` // V_FRACT is buggy on SI…
		foadAuthorUnsubmitted Done Reply Inline Actions OK, so it sounds like the (non-buggy) hardware uses the same trick as the OpenCL definition, to avoid ever returning 1.0. I'll try to confirm that on some real hardware. foad: OK, so it sounds like the (non-buggy) hardware uses the same trick as the OpenCL definition, to…
		foadAuthorUnsubmitted Done Reply Inline Actions I've confirmed this for f16 and f32 types, on some real gfx9 hardware. foad: I've confirmed this for f16 and f32 types, on some real gfx9 hardware.
		foadAuthorUnsubmitted Done Reply Inline Actions ... and confirmed for f64 too. foad: ... and confirmed for f64 too.
/// directly but for all these it is true (float)(f((double)arg)) ==		/// directly but for all these it is true (float)(f((double)arg)) ==
/// f(arg). Long double not supported yet.		/// f(arg). Long double not supported yet.
double V = getValueAsDouble(Op);		double V = getValueAsDouble(Op);

		arsenmUnsubmitted Done Reply Inline Actions The specs aren't necessarily relevant here, since this just needs to match the instruction behavior. Talking about the min when it isn't here is potentially confusing arsenm: The specs aren't necessarily relevant here, since this just needs to match the instruction…
switch (IntrinsicID) {		switch (IntrinsicID) {
default: break;		default: break;
case Intrinsic::log:		case Intrinsic::log:
return ConstantFoldFP(log, V, Ty);		return ConstantFoldFP(log, V, Ty);
case Intrinsic::log2:		case Intrinsic::log2:
// TODO: What about hosts that lack a C99 library?		// TODO: What about hosts that lack a C99 library?
return ConstantFoldFP(Log2, V, Ty);		return ConstantFoldFP(Log2, V, Ty);
case Intrinsic::log10:		case Intrinsic::log10:
▲ Show 20 Lines • Show All 869 Lines • Show Last 20 Lines

llvm/test/Analysis/ConstantFolding/AMDGPU/fract.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				declare half @llvm.amdgcn.fract.f16(half)
				declare float @llvm.amdgcn.fract.f32(float)
				declare double @llvm.amdgcn.fract.f64(double)

				define void @test_f16(half* %p) {
				; CHECK-LABEL: @test_f16(
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P:%.*]]
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH0000, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH3400, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH3B00, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH0400, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH3BFF, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH7E00, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH7E00, half* [[P]]
				; CHECK-NEXT: store volatile half 0xH7E00, half* [[P]]
				; CHECK-NEXT: ret void
				;
				%p0 = call half @llvm.amdgcn.fract.f16(half +0.0)
				store volatile half %p0, half* %p
				%n0 = call half @llvm.amdgcn.fract.f16(half -0.0)
				store volatile half %n0, half* %p
				%p1 = call half @llvm.amdgcn.fract.f16(half +1.0)
				store volatile half %p1, half* %p
				%n1 = call half @llvm.amdgcn.fract.f16(half -1.0)
				store volatile half %n1, half* %p
				%p225 = call half @llvm.amdgcn.fract.f16(half +2.25)
				store volatile half %p225, half* %p
				%n6125 = call half @llvm.amdgcn.fract.f16(half -6.125)
				store volatile half %n6125, half* %p
				%ptiny = call half @llvm.amdgcn.fract.f16(half 0xH0400) ; +min normal
				store volatile half %ptiny, half* %p
				%ntiny = call half @llvm.amdgcn.fract.f16(half 0xH8400) ; -min normal
				store volatile half %ntiny, half* %p
				%pinf = call half @llvm.amdgcn.fract.f16(half 0xH7C00) ; +inf
				store volatile half %pinf, half* %p
				%ninf = call half @llvm.amdgcn.fract.f16(half 0xHFC00) ; -inf
				store volatile half %ninf, half* %p
				%nan = call half @llvm.amdgcn.fract.f16(half 0xH7E00) ; nan
				store volatile half %nan, half* %p
				ret void
				}

				define void @test_f32(float* %p) {
				; CHECK-LABEL: @test_f32(
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P:%.*]]
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]]
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]]
				; CHECK-NEXT: store volatile float 0.000000e+00, float* [[P]]
				; CHECK-NEXT: store volatile float 2.500000e-01, float* [[P]]
				; CHECK-NEXT: store volatile float 8.750000e-01, float* [[P]]
				; CHECK-NEXT: store volatile float 0x3810000000000000, float* [[P]]
				; CHECK-NEXT: store volatile float 0x3FEFFFFFE0000000, float* [[P]]
				; CHECK-NEXT: store volatile float 0x7FF8000000000000, float* [[P]]
				; CHECK-NEXT: store volatile float 0x7FF8000000000000, float* [[P]]
				; CHECK-NEXT: store volatile float 0x7FF8000000000000, float* [[P]]
				; CHECK-NEXT: ret void
				;
				%p0 = call float @llvm.amdgcn.fract.f32(float +0.0)
				store volatile float %p0, float* %p
				%n0 = call float @llvm.amdgcn.fract.f32(float -0.0)
				store volatile float %n0, float* %p
				%p1 = call float @llvm.amdgcn.fract.f32(float +1.0)
				store volatile float %p1, float* %p
				%n1 = call float @llvm.amdgcn.fract.f32(float -1.0)
				store volatile float %n1, float* %p
				%p225 = call float @llvm.amdgcn.fract.f32(float +2.25)
				store volatile float %p225, float* %p
				%n6125 = call float @llvm.amdgcn.fract.f32(float -6.125)
				store volatile float %n6125, float* %p
				%ptiny = call float @llvm.amdgcn.fract.f32(float 0x3810000000000000) ; +min normal
				store volatile float %ptiny, float* %p
				%ntiny = call float @llvm.amdgcn.fract.f32(float 0xB810000000000000) ; -min normal
				store volatile float %ntiny, float* %p
				%pinf = call float @llvm.amdgcn.fract.f32(float 0x7FF0000000000000) ; +inf
				store volatile float %pinf, float* %p
				%ninf = call float @llvm.amdgcn.fract.f32(float 0xFFF0000000000000) ; -inf
				store volatile float %ninf, float* %p
				%nan = call float @llvm.amdgcn.fract.f32(float 0x7FF8000000000000) ; nan
				store volatile float %nan, float* %p
				ret void
				}

				define void @test_f64(double* %p) {
				; CHECK-LABEL: @test_f64(
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P:%.*]]
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]]
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]]
				; CHECK-NEXT: store volatile double 0.000000e+00, double* [[P]]
				; CHECK-NEXT: store volatile double 2.500000e-01, double* [[P]]
				; CHECK-NEXT: store volatile double 8.750000e-01, double* [[P]]
				; CHECK-NEXT: store volatile double 2.000000e-308, double* [[P]]
				; CHECK-NEXT: store volatile double 0x3FEFFFFFFFFFFFFF, double* [[P]]
				; CHECK-NEXT: store volatile double 0x7FF8000000000000, double* [[P]]
				; CHECK-NEXT: store volatile double 0x7FF8000000000000, double* [[P]]
				; CHECK-NEXT: store volatile double 0x7FF8000000000000, double* [[P]]
				; CHECK-NEXT: ret void
				;
				%p0 = call double @llvm.amdgcn.fract.f64(double +0.0)
				store volatile double %p0, double* %p
				%n0 = call double @llvm.amdgcn.fract.f64(double -0.0)
				store volatile double %n0, double* %p
				%p1 = call double @llvm.amdgcn.fract.f64(double +1.0)
				store volatile double %p1, double* %p
				%n1 = call double @llvm.amdgcn.fract.f64(double -1.0)
				store volatile double %n1, double* %p
				%p225 = call double @llvm.amdgcn.fract.f64(double +2.25)
				store volatile double %p225, double* %p
				%n6125 = call double @llvm.amdgcn.fract.f64(double -6.125)
				store volatile double %n6125, double* %p
				%ptiny = call double @llvm.amdgcn.fract.f64(double +2.0e-308) ; +min normal
				store volatile double %ptiny, double* %p
				%ntiny = call double @llvm.amdgcn.fract.f64(double -2.0e-308) ; -min normal
				store volatile double %ntiny, double* %p
				%pinf = call double @llvm.amdgcn.fract.f64(double 0x7FF0000000000000) ; +inf
				store volatile double %pinf, double* %p
				%ninf = call double @llvm.amdgcn.fract.f64(double 0xFFF0000000000000) ; -inf
				store volatile double %ninf, double* %p
				%nan = call double @llvm.amdgcn.fract.f64(double 0x7FF8000000000000) ; nan
				store volatile double %nan, double* %p
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 247005

llvm/lib/Analysis/ConstantFolding.cpp

llvm/test/Analysis/ConstantFolding/AMDGPU/fract.ll

[AMDGPU][ConstantFolding] Fold llvm.amdgcn.fract intrinsic
ClosedPublic