Page MenuHomePhabricator

[SDAG] fix crash in getNegatedExpression() by ignoring transient fadd
Needs ReviewPublic

Authored by spatel on Mon, Mar 23, 12:56 PM.

Details

Summary

This is an alternative proposal to D76439 to fix the crashing test. This is similar to a bailout that already exists for "fmul X, 2.0".

An fadd with fneg operand should always be reduced to fsub, so if we encounter that pattern while running getNegatibleCost(), just ignore it until we have a chance to run combines on that fadd node.

In the x86 tests, this produces slightly better code - we form an expression with an fadd rather than all fsub. It does not make a real difference on the examples here, but we prefer the commutability of fadd because that gives register allocation more flexibility to handle destructive SSE ops without extra register moves.

Diff Detail

Event Timeline

spatel created this revision.Mon, Mar 23, 12:56 PM
spatel added a comment.Thu, Apr 2, 4:38 AM

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

spatel added a comment.Thu, Apr 2, 2:24 PM

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

Thanks for improving it!
I'm seeing something interesting with that patch applied for the test that is currently crashing - we re-use a common term in the equation, so that eliminates an instruction. For PPC, it looks like this:

xssubsp f0, f1, f2
xsmulsp f2, f0, f3
xssubsp f0, f3, f0
xsresp f4, f2
xsmulsp f1, f0, f4
xsnmsubasp f0, f2, f1
xsmaddasp f1, f4, f0

But this seems to just be a lucky case because the other test (which is the same math except the fdiv operands are reversed) is not affected:

xssubsp f0, f1, f2
xssubsp f1, f2, f1
xsmulsp f0, f0, f3
xsaddsp f3, f1, f3
xsresp f2,f 0
xsmulsp f1, f3, f2
xsnmsubasp f3, f0, f1
xsmaddasp f1, f2, f3

Ping. I think this change is useful regardless of whether/how we make a larger fix for getNegatibleCost+getNegatedExpression.

I have posted a patch(https://reviews.llvm.org/D77319) to remove the getNegatibleCost. It is almost ready now. I am not sure if it is the best way , and welcome for the comments in advance. Regarding to the opportunities for this patch, can you double confirm it together with my patch to see if there is any improvement ?

Thanks for improving it!
I'm seeing something interesting with that patch applied for the test that is currently crashing - we re-use a common term in the equation, so that eliminates an instruction. For PPC, it looks like this:

xssubsp f0, f1, f2
xsmulsp f2, f0, f3
xssubsp f0, f3, f0
xsresp f4, f2
xsmulsp f1, f0, f4
xsnmsubasp f0, f2, f1
xsmaddasp f1, f4, f0

But this seems to just be a lucky case because the other test (which is the same math except the fdiv operands are reversed) is not affected:

xssubsp f0, f1, f2
xssubsp f1, f2, f1
xsmulsp f0, f0, f3
xsaddsp f3, f1, f3
xsresp f2,f 0
xsmulsp f1, f3, f2
xsnmsubasp f3, f0, f1
xsmaddasp f1, f2, f3

Yeah, it is sensitive to how we combine the nodes. I tried your patch(add FENG check for FADD), and see two extra instructions generated as you mentioned, on X86. So, we need some positive tests for this patch if it helps the codegen. What do you think ?

spatel added a comment.Fri, Apr 3, 5:26 AM

! In D76638#1959075, @steven.zhang wrote:
Yeah, it is sensitive to how we combine the nodes. I tried your patch(add FENG check for FADD), and see two extra instructions generated as you mentioned, on X86. So, we need some positive tests for this patch if it helps the codegen. What do you think ?

Yes, I see that this patch would regress that case - the 2 instructions are the extra fadd/fsub and a move - but it still gives the slight improvement on the 1st test (fsub becomes fadd).
I'm going to step through the new code with a few examples to get a better idea of the possibilities.

If it's not necessary to add complexity, then I'll abandon this patch. It's a question of how often would we realistically expect to see non-canonical patterns by the time we reach here.

The (fadd (fneg X), Y) pattern should not occur in optimized IR (instcombine should fold it), so as long as we're not crashing on it, we are probably ok. It would be a nice enhancement to -reassociate and/or -instcombine to optimize the negated common term from the equation in IR.