This is an archive of the discontinued LLVM Phabricator instance.

Fix asserts in AMDGCN fmed3 folding by handling more cases of NaN
ClosedPublic

Authored by alan-baker on Jun 25 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

arsenm
tpr
dstuttard

Summary

Better NaN handling for AMDGCN fmed3.

All operands are checked for NaN now. The checks were moved before the canonicalization to provide a better mapping from fclamp. Changed the behaviour of fmed3(x,y,NaN) to return max(x,y) instead of min(x,y) in light of this. Updated tests as a result and added some new cases to cover the fix.

Diff Detail

Repository: rL LLVM

Event Timeline

alan-baker created this revision.Jun 25 2018, 12:04 PM

Herald added subscribers: llvm-commits, nhaehnle, wdng. · View Herald TranscriptJun 25 2018, 12:04 PM

arsenm added inline comments.Jun 25 2018, 12:51 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432	This comment is sort of confusing me. Is it saying there is still a problem? Probably shouldn't mention fclamp, as no operation called fclamp is involved here.
3447–3452	I don't really like repeating all of these so many times, but it's probably clearer than the alternatives. Should this used FastMathFlagGuard to avoid repeating the copyFastMathFlags at least?
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
1337–1357	Since the undef value check is repeated so many times, better add versions of these with undef instead of nan to make sure those work

alan-baker added inline comments.Jun 25 2018, 1:36 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432	If fmed3(x,y,z) was used to model fclamp(x,y,z) (i.e. fmin(fmax(x,y), z)), and x or y are determined to be a constant prior to determining one of the operands is a NaN, then the canonicalization will swap operands around. If later an operand is determined to be a NaN then the evaluation of the fmed3 may no longer match the original fclamp operation. So there is a potential problem if not all the constants are determined prior to the invocation of inst combine, but are for subsequent invocations. This problem is specifically for modelling fclamp, which is why I mentioned it here. I can remove the comment if you prefer, but LLPC maps fclamp directly to fmed3 (and I'm not sure I see a performant alternative without introducing a fclamp intrinsic).

alan-baker added inline comments.Jun 25 2018, 1:48 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3447–3452	Calls don't automatically inherit fast math flags from the builder so creating the guard doesn't work. If it's preferable, I could refactor to create the call instruction first and in a subsequent if take the name and copy the fast math flags.

Adding extra tests.

alan-baker marked an inline comment as done.Jun 25 2018, 1:57 PM

arsenm added inline comments.Jun 26 2018, 2:01 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
3429–3432	There should be a comment, it just needs to be reworded. As is I think it relies on knowing the history of you adding this code here before the existing code
3447–3452	I guess that's sort of a bug in the IRBuilder but I can sort of see why it is that way. I guess factoring that way may be better

Refactor NaN checks to reduce duplication. Reword comment.

alan-baker marked an inline comment as done.Jun 26 2018, 6:17 AM

LGTM

This revision is now accepted and ready to land.Jun 27 2018, 5:15 AM

I don't have commit access to LLVM, could you please land this patch on my behalf?

r336375

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

25 lines

test/

Transforms/

InstCombine/

AMDGPU/

amdgcn-intrinsics.ll

48 lines

Diff 152877

lib/Transforms/InstCombine/InstCombineCalls.cpp

Context not available.
	Value *Src1 = II->getArgOperand(1);	Value *Src1 = II->getArgOperand(1);
	Value *Src2 = II->getArgOperand(2);	Value *Src2 = II->getArgOperand(2);

		// Checking for NaN before canonicalization provides better fidelity when
		// mapping other operations onto fmed3 since the order of operands is
		// unchanged.
		CallInst *NewCall = nullptr;
		arsenmUnsubmitted Not Done Reply Inline Actions This comment is sort of confusing me. Is it saying there is still a problem? Probably shouldn't mention fclamp, as no operation called fclamp is involved here. arsenm: This comment is sort of confusing me. Is it saying there is still a problem? Probably…
		alan-bakerAuthorUnsubmitted Not Done Reply Inline Actions If fmed3(x,y,z) was used to model fclamp(x,y,z) (i.e. fmin(fmax(x,y), z)), and x or y are determined to be a constant prior to determining one of the operands is a NaN, then the canonicalization will swap operands around. If later an operand is determined to be a NaN then the evaluation of the fmed3 may no longer match the original fclamp operation. So there is a potential problem if not all the constants are determined prior to the invocation of inst combine, but are for subsequent invocations. This problem is specifically for modelling fclamp, which is why I mentioned it here. I can remove the comment if you prefer, but LLPC maps fclamp directly to fmed3 (and I'm not sure I see a performant alternative without introducing a fclamp intrinsic). alan-baker: If fmed3(x,y,z) was used to model fclamp(x,y,z) (i.e. fmin(fmax(x,y), z)), and x or y are…
		arsenmUnsubmitted Done Reply Inline Actions There should be a comment, it just needs to be reworded. As is I think it relies on knowing the history of you adding this code here before the existing code arsenm: There should be a comment, it just needs to be reworded. As is I think it relies on knowing the…
		if (match(Src0, m_NaN()) \|\| isa<UndefValue>(Src0)) {
		NewCall = Builder.CreateMinNum(Src1, Src2);
		} else if (match(Src1, m_NaN()) \|\| isa<UndefValue>(Src1)) {
		NewCall = Builder.CreateMinNum(Src0, Src2);
		} else if (match(Src2, m_NaN()) \|\| isa<UndefValue>(Src2)) {
		NewCall = Builder.CreateMaxNum(Src0, Src1);
		}

		if (NewCall) {
		NewCall->copyFastMathFlags(II);
		NewCall->takeName(II);
		return replaceInstUsesWith(*II, NewCall);
		}

	bool Swap = false;	bool Swap = false;
	// Canonicalize constants to RHS operands.	// Canonicalize constants to RHS operands.
	//	//
Context not available.
	return II;	return II;
	}	}

	if (match(Src2, m_NaN()) \|\| isa<UndefValue>(Src2)) {
	CallInst *NewCall = Builder.CreateMinNum(Src0, Src1);
	NewCall->copyFastMathFlags(II);
	NewCall->takeName(II);
	return replaceInstUsesWith(*II, NewCall);
	}

	if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {	if (const ConstantFP *C0 = dyn_cast<ConstantFP>(Src0)) {
	if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {	if (const ConstantFP *C1 = dyn_cast<ConstantFP>(Src1)) {
	if (const ConstantFP *C2 = dyn_cast<ConstantFP>(Src2)) {	if (const ConstantFP *C2 = dyn_cast<ConstantFP>(Src2)) {
Context not available.

test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll

Context not available.
	}	}

	; CHECK-LABEL: @fmed3_x_y_undef_f32(	; CHECK-LABEL: @fmed3_x_y_undef_f32(
	; CHECK: call float @llvm.minnum.f32(float %x, float %y)	; CHECK: call float @llvm.maxnum.f32(float %x, float %y)
	define float @fmed3_x_y_undef_f32(float %x, float %y) {	define float @fmed3_x_y_undef_f32(float %x, float %y) {
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float undef)	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float undef)
	ret float %med3	ret float %med3
Context not available.
	}	}

	; CHECK-LABEL: @fmed3_x_y_qnan0_f32(	; CHECK-LABEL: @fmed3_x_y_qnan0_f32(
	; CHECK: call float @llvm.minnum.f32(float %x, float %y)	; CHECK: call float @llvm.maxnum.f32(float %x, float %y)
	define float @fmed3_x_y_qnan0_f32(float %x, float %y) {	define float @fmed3_x_y_qnan0_f32(float %x, float %y) {
	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF8000000000000)	%med3 = call float @llvm.amdgcn.fmed3.f32(float %x, float %y, float 0x7FF8000000000000)
	ret float %med3	ret float %med3
Context not available.

	; This can return any of the qnans.	; This can return any of the qnans.
	; CHECK-LABEL: @fmed3_qnan0_qnan1_qnan2_f32(	; CHECK-LABEL: @fmed3_qnan0_qnan1_qnan2_f32(
	; CHECK: ret float 0x7FF8002000000000	; CHECK: ret float 0x7FF8030000000000
	define float @fmed3_qnan0_qnan1_qnan2_f32(float %x, float %y) {	define float @fmed3_qnan0_qnan1_qnan2_f32(float %x, float %y) {
	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float 0x7FF8002000000000, float 0x7FF8030000000000)	%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8000100000000, float 0x7FF8002000000000, float 0x7FF8030000000000)
	ret float %med3	ret float %med3
Context not available.
	ret float %med3	ret float %med3
	}	}

		; CHECK-LABEL: @fmed3_nan_0_1_f32(
		; CHECK: ret float 0.0
		define float @fmed3_nan_0_1_f32() {
		%med3 = call float @llvm.amdgcn.fmed3.f32(float 0x7FF8001000000000, float 0.0, float 1.0)
		ret float %med3
		}

		; CHECK-LABEL: @fmed3_0_nan_1_f32(
		; CHECK: ret float 0.0
		define float @fmed3_0_nan_1_f32() {
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 0x7FF8001000000000, float 1.0)
		ret float %med
		}

		; CHECK-LABEL: @fmed3_0_1_nan_f32(
		; CHECK: ret float 1.0
		define float @fmed3_0_1_nan_f32() {
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float 0x7FF8001000000000)
		ret float %med
		}

		arsenmUnsubmitted Done Reply Inline Actions Since the undef value check is repeated so many times, better add versions of these with undef instead of nan to make sure those work arsenm: Since the undef value check is repeated so many times, better add versions of these with undef…
		; CHECK-LABEL: @fmed3_undef_0_1_f32(
		; CHECK: ret float 0.0
		define float @fmed3_undef_0_1_f32() {
		%med3 = call float @llvm.amdgcn.fmed3.f32(float undef, float 0.0, float 1.0)
		ret float %med3
		}

		; CHECK-LABEL: @fmed3_0_undef_1_f32(
		; CHECK: ret float 0.0
		define float @fmed3_0_undef_1_f32() {
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float undef, float 1.0)
		ret float %med
		}

		; CHECK-LABEL: @fmed3_0_1_undef_f32(
		; CHECK: ret float 1.0
		define float @fmed3_0_1_undef_f32() {
		%med = call float @llvm.amdgcn.fmed3.f32(float 0.0, float 1.0, float undef)
		ret float %med
		}

	; --------------------------------------------------------------------	; --------------------------------------------------------------------
	; llvm.amdgcn.icmp	; llvm.amdgcn.icmp
	; --------------------------------------------------------------------	; --------------------------------------------------------------------
Context not available.