Page MenuHomePhabricator

Utilize new SDNode flag functionality to expand current support for fdiv

Authored by mcberg2017 on Jun 8 2018, 10:50 AM.

Diff Detail


Event Timeline

mcberg2017 created this revision.Jun 8 2018, 10:50 AM

Note, I will unify the multiple/divide context as as before if needed once one or the other is checked in.

spatel added inline comments.Jun 11 2018, 10:11 AM
732–734 ↗(On Diff #150542)

See comments in D47911 - we're adding FMF constraints where there were none before?

221 ↗(On Diff #150542)

I don't know enough about AMDGPU to understand what this diff means.

I'd prefer that we add a test for some target where this kind of test currently produces fdiv asm, and this patch will allow it to become an fmul.

Be advised that I am splitting the remaining fmul, fdiv and fadd reviews out for negation context, it will have its own review shortly. The old reviews will remain but only under visit wrt context.

Removed negation context...

221 ↗(On Diff #150542)

Working on it, should have a new test in bit...

arsenm added inline comments.Jun 11 2018, 2:36 PM
221 ↗(On Diff #150542)

Changing the value of the constant here is pretty suspicious

mcberg2017 added inline comments.Jun 11 2018, 7:47 PM
221 ↗(On Diff #150542)

So I walked through the APFloat interface, it does return the correct 16bit value, 0x2E66, which is then converted to F32, and is 0x3dccc000. The old value bypassed conversion in visitFDIV and we given the decimal value of 1/10 which is in 32bit is 0x3dcccccd. The new behavior is more correct than what used to happen since we are being asked to rcp divide approximate a 16bit value.

I will see about adding a fdiv const test to the fmf flag tests as well

221 ↗(On Diff #150542)

Matt and I have discussed this offline noting that the new behavior is correct wrt to the changed constants in the this test.

Added test for fmf path.

Tighter binding of the folded constant to the multiply produced in the check patterns. Please note the long constant 1036828672 is the hex value 0x3DCCC000 and is represented as float 0.0999755859, this shows the same propagation as the AMDGPU test, first folding the constant 1/10 to F16 and then converting it to F32.

spatel accepted this revision.Jun 15 2018, 12:33 PM


12–15 ↗(On Diff #151429)

This is an interesting difference. Before this patch, we wouldn't recognize that we could use an estimate from the fdiv node, but we did recognize that we could use an estimate on the square root itself. But at that point, we don't realize that we're actually calculating a reciprocal square root (as opposed to a plain square root estimate) we generate the safety check for a 0.0 input.

This revision is now accepted and ready to land.Jun 15 2018, 12:33 PM
This revision was automatically updated to reflect the committed changes.