This patch originated from D46562 and is a proper subset, with some issues addressed.

# Details

# Diff Detail

### Event Timeline

Note, I will unify the multiple/divide context as as before if needed once one or the other is checked in.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|

743–745 | See comments in D47911 - we're adding FMF constraints where there were none before? | |

test/CodeGen/AMDGPU/fdiv.f16.ll | ||

221 | I don't know enough about AMDGPU to understand what this diff means. I'd prefer that we add a test for some target where this kind of test currently produces fdiv asm, and this patch will allow it to become an fmul. |

Be advised that I am splitting the remaining fmul, fdiv and fadd reviews out for negation context, it will have its own review shortly. The old reviews will remain but only under visit wrt context.

Removed negation context...

test/CodeGen/AMDGPU/fdiv.f16.ll | ||
---|---|---|

221 | Working on it, should have a new test in bit... |

test/CodeGen/AMDGPU/fdiv.f16.ll | ||
---|---|---|

221 | Changing the value of the constant here is pretty suspicious |

test/CodeGen/AMDGPU/fdiv.f16.ll | ||
---|---|---|

221 | So I walked through the APFloat interface, it does return the correct 16bit value, 0x2E66, which is then converted to F32, and is 0x3dccc000. The old value bypassed conversion in visitFDIV and we given the decimal value of 1/10 which is in 32bit is 0x3dcccccd. The new behavior is more correct than what used to happen since we are being asked to rcp divide approximate a 16bit value. |

I will see about adding a fdiv const test to the fmf flag tests as well

test/CodeGen/AMDGPU/fdiv.f16.ll | ||
---|---|---|

221 | Matt and I have discussed this offline noting that the new behavior is correct wrt to the changed constants in the this test. |

Tighter binding of the folded constant to the multiply produced in the check patterns. Please note the long constant 1036828672 is the hex value 0x3DCCC000 and is represented as float 0.0999755859, this shows the same propagation as the AMDGPU test, first folding the constant 1/10 to F16 and then converting it to F32.

LGTM.

test/CodeGen/X86/fmf-flags.ll | ||
---|---|---|

12–16 | This is an interesting difference. Before this patch, we wouldn't recognize that we could use an estimate from the fdiv node, but we did recognize that we could use an estimate on the square root itself. But at that point, we don't realize that we're actually calculating a reciprocal square root (as opposed to a plain square root estimate)...so we generate the safety check for a 0.0 input. |