This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] fix crash in getNegatedExpression() by ignoring transient fadd
AbandonedPublic

Authored by spatel on Mar 23 2020, 12:56 PM.

Download Raw Diff

Details

Reviewers

steven.zhang
RKSimon
bjope
dstuttard

Summary

This is an alternative proposal to D76439 to fix the crashing test. This is similar to a bailout that already exists for "fmul X, 2.0".

An fadd with fneg operand should always be reduced to fsub, so if we encounter that pattern while running getNegatibleCost(), just ignore it until we have a chance to run combines on that fadd node.

In the x86 tests, this produces slightly better code - we form an expression with an fadd rather than all fsub. It does not make a real difference on the examples here, but we prefer the commutability of fadd because that gives register allocation more flexibility to handle destructive SSE ops without extra register moves.

Diff Detail

Event Timeline

spatel created this revision.Mar 23 2020, 12:56 PM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptMar 23 2020, 12:56 PM

spatel mentioned this in D76439: [SDAG] fix crash in getNegatedExpression() from altered number of uses.Mar 23 2020, 12:57 PM

spatel mentioned this in D76585: [PowerPC] Require NSZ flag for c-a*b to FNMSUB.Mar 26 2020, 6:53 AM

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

In D76638#1956952, @spatel wrote:

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

In D76638#1957541, @steven.zhang wrote:

In D76638#1956952, @spatel wrote:

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

Thanks for improving it!
I'm seeing something interesting with that patch applied for the test that is currently crashing - we re-use a common term in the equation, so that eliminates an instruction. For PPC, it looks like this:

xssubsp f0, f1, f2
xsmulsp f2, f0, f3
xssubsp f0, f3, f0
xsresp f4, f2
xsmulsp f1, f0, f4
xsnmsubasp f0, f2, f1
xsmaddasp f1, f4, f0

But this seems to just be a lucky case because the other test (which is the same math except the fdiv operands are reversed) is not affected:

xssubsp f0, f1, f2
xssubsp f1, f2, f1
xsmulsp f0, f0, f3
xsaddsp f3, f1, f3
xsresp f2,f 0
xsmulsp f1, f3, f2
xsnmsubasp f3, f0, f1
xsmaddasp f1, f2, f3

steven.zhang mentioned this in D77319: [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression.Apr 3 2020, 1:00 AM

In D76638#1958124, @spatel wrote:
In D76638#1957541, @steven.zhang wrote:

In D76638#1956952, @spatel wrote:

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

Thanks for improving it!
I'm seeing something interesting with that patch applied for the test that is currently crashing - we re-use a common term in the equation, so that eliminates an instruction. For PPC, it looks like this:
xssubsp f0, f1, f2
xsmulsp f2, f0, f3
xssubsp f0, f3, f0
xsresp f4, f2
xsmulsp f1, f0, f4
xsnmsubasp f0, f2, f1
xsmaddasp f1, f4, f0
But this seems to just be a lucky case because the other test (which is the same math except the fdiv operands are reversed) is not affected:
xssubsp f0, f1, f2
xssubsp f1, f2, f1
xsmulsp f0, f0, f3
xsaddsp f3, f1, f3
xsresp f2,f 0
xsmulsp f1, f3, f2
xsnmsubasp f3, f0, f1
xsmaddasp f1, f2, f3

Yeah, it is sensitive to how we combine the nodes. I tried your patch(add FENG check for FADD), and see two extra instructions generated as you mentioned, on X86. So, we need some positive tests for this patch if it helps the codegen. What do you think ?

! In D76638#1959075, @steven.zhang wrote:
Yeah, it is sensitive to how we combine the nodes. I tried your patch(add FENG check for FADD), and see two extra instructions generated as you mentioned, on X86. So, we need some positive tests for this patch if it helps the codegen. What do you think ?

Yes, I see that this patch would regress that case - the 2 instructions are the extra fadd/fsub and a move - but it still gives the slight improvement on the 1st test (fsub becomes fadd).
I'm going to step through the new code with a few examples to get a better idea of the possibilities.

If it's not necessary to add complexity, then I'll abandon this patch. It's a question of how often would we realistically expect to see non-canonical patterns by the time we reach here.

The (fadd (fneg X), Y) pattern should not occur in optimized IR (instcombine should fold it), so as long as we're not crashing on it, we are probably ok. It would be a nice enhancement to -reassociate and/or -instcombine to optimize the negated common term from the equation in IR.

Abandoning - I don't think this pattern is likely with optimized IR, and we have a better way to solve the crashing bug with D77319.

steven.zhang mentioned this in rG3c44c441db0f: [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with….May 10 2020, 8:13 PM

steven.zhang mentioned this in rG2b59e9f1bdd8: [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with….May 19 2020, 7:16 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

5 lines

test/

CodeGen/

X86/

fdiv.ll

8 lines

neg_fp.ll

29 lines

Diff 252124

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,623 Lines • ▼ Show 20 Lines	TargetLowering::getNegatibleCost(SDValue Op, SelectionDAG &DAG,
case ISD::FADD: {		case ISD::FADD: {
if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())		if (!Options.NoSignedZerosFPMath && !Flags.hasNoSignedZeros())
return NegatibleCost::Expensive;		return NegatibleCost::Expensive;

// After operation legalization, it might not be legal to create new FSUBs.		// After operation legalization, it might not be legal to create new FSUBs.
if (LegalOperations && !isOperationLegalOrCustom(ISD::FSUB, VT))		if (LegalOperations && !isOperationLegalOrCustom(ISD::FSUB, VT))
return NegatibleCost::Expensive;		return NegatibleCost::Expensive;

		// Ignore fadd with fneg because that will be canonicalized to fsub.
		if (Op.getOperand(0).getOpcode() == ISD::FNEG \|\|
		Op.getOperand(1).getOpcode() == ISD::FNEG)
		return NegatibleCost::Expensive;

// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)		// fold (fneg (fadd A, B)) -> (fsub (fneg A), B)
NegatibleCost V0 = getNegatibleCost(Op.getOperand(0), DAG, LegalOperations,		NegatibleCost V0 = getNegatibleCost(Op.getOperand(0), DAG, LegalOperations,
ForCodeSize, Depth + 1);		ForCodeSize, Depth + 1);
if (V0 != NegatibleCost::Expensive)		if (V0 != NegatibleCost::Expensive)
return V0;		return V0;
// fold (fneg (fadd A, B)) -> (fsub (fneg B), A)		// fold (fneg (fadd A, B)) -> (fsub (fneg B), A)
return getNegatibleCost(Op.getOperand(1), DAG, LegalOperations, ForCodeSize,		return getNegatibleCost(Op.getOperand(1), DAG, LegalOperations, ForCodeSize,
Depth + 1);		Depth + 1);
▲ Show 20 Lines • Show All 2,083 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fdiv.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; clang/gcc), due to order of argument evaluation not being well defined. We			; clang/gcc), due to order of argument evaluation not being well defined. We
	; ended up hitting llvm_unreachable in getNegatedExpression when building with			; ended up hitting llvm_unreachable in getNegatedExpression when building with
	; gcc. Just make sure that we get a deterministic result.			; gcc. Just make sure that we get a deterministic result.
	define float @fdiv_fneg_combine(float %a0, float %a1, float %a2) #0 {			define float @fdiv_fneg_combine(float %a0, float %a1, float %a2) #0 {
	; CHECK-LABEL: fdiv_fneg_combine:			; CHECK-LABEL: fdiv_fneg_combine:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movaps %xmm0, %xmm3			; CHECK-NEXT: movaps %xmm0, %xmm3
	; CHECK-NEXT: subss %xmm1, %xmm3			; CHECK-NEXT: subss %xmm1, %xmm3
				; CHECK-NEXT: mulss %xmm2, %xmm3
	; CHECK-NEXT: subss %xmm0, %xmm1			; CHECK-NEXT: subss %xmm0, %xmm1
	; CHECK-NEXT: mulss %xmm2, %xmm1			; CHECK-NEXT: addss %xmm2, %xmm1
	; CHECK-NEXT: subss %xmm2, %xmm3			; CHECK-NEXT: divss %xmm1, %xmm3
	; CHECK-NEXT: divss %xmm3, %xmm1			; CHECK-NEXT: movaps %xmm3, %xmm0
	; CHECK-NEXT: movaps %xmm1, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%sub1 = fsub fast float %a0, %a1			%sub1 = fsub fast float %a0, %a1
	%mul2 = fmul fast float %sub1, %a2			%mul2 = fmul fast float %sub1, %a2
	%neg = fneg fast float %a0			%neg = fneg fast float %a0
	%add3 = fadd fast float %a1, %neg			%add3 = fadd fast float %a1, %neg
	%sub4 = fadd fast float %add3, %a2			%sub4 = fadd fast float %add3, %a2
	%div5 = fdiv fast float %mul2, %sub4			%div5 = fdiv fast float %mul2, %sub4
	ret float %div5			ret float %div5
	}			}

	attributes #0 = { "unsafe-fp-math"="false" }			attributes #0 = { "unsafe-fp-math"="false" }

llvm/test/CodeGen/X86/neg_fp.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retl
%t11 = fmul double %t, %t		%t11 = fmul double %t, %t
%t13 = fsub double %t11, %t		%t13 = fsub double %t11, %t
%t14 = fneg double %t		%t14 = fneg double %t
%t15 = fmul double %t, %t14		%t15 = fmul double %t, %t14
%t16 = fmul double %t, %t15		%t16 = fmul double %t, %t15
%t18 = fadd double %t16, %t7		%t18 = fadd double %t16, %t7
ret double %t18		ret double %t18
}		}

		; This would crash because the negated expression for %sub4
		; creates a new use of %sub1 and that alters the negated cost

		define float @fdiv_extra_use_changes_cost(float %a0, float %a1, float %a2) nounwind {
		; CHECK-LABEL: fdiv_extra_use_changes_cost:
		; CHECK: # %bb.0:
		; CHECK-NEXT: pushl %eax
		; CHECK-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
		; CHECK-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
		; CHECK-NEXT: movss {{.*#+}} xmm2 = mem[0],zero,zero,zero
		; CHECK-NEXT: movaps %xmm2, %xmm3
		; CHECK-NEXT: subss %xmm1, %xmm3
		; CHECK-NEXT: mulss %xmm0, %xmm3
		; CHECK-NEXT: subss %xmm2, %xmm1
		; CHECK-NEXT: addss %xmm0, %xmm1
		; CHECK-NEXT: divss %xmm3, %xmm1
		; CHECK-NEXT: movss %xmm1, (%esp)
		; CHECK-NEXT: flds (%esp)
		; CHECK-NEXT: popl %eax
		; CHECK-NEXT: retl
		%sub1 = fsub fast float %a0, %a1
		%mul2 = fmul fast float %sub1, %a2
		%neg = fneg fast float %a0
		%add3 = fadd fast float %a1, %neg
		%sub4 = fadd fast float %add3, %a2
		%div5 = fdiv fast float %sub4, %mul2
		ret float %div5
		}