This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that.
Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear which cases in which this ought to be used.
Reading up on this a bit more, I found this (12-year-old) thread: http://lists.llvm.org/pipermail/llvm-dev/2007-August/010253.html
"Also, this makes llvm.sqrt side-effect free, where sqrt potentially has side-effects (making it harder to CSE, hoist out of loops etc)."
If that's true today, then I guess the llvm.sqrt case should be handled separately.
I think I need to think about this a little bit more, but it sounds like a reasonable solution to me.
Wouldn't G_INTRINSIC_SQRT be the side-effect-free one though?
I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal-instruction with no side effects that returns NaN, which legalizations will use. We'll also pattern match to it from the intrinsic.
We currently do silly things like recognize x < 0 ? nan : sqrt(x) -> sqrt(x) in DAGCombiner. It would be slightly less concerning looking if we were matching to a different opcode.
This probably needs more thought, but I hope we can improve the global-isel situation from the current confusion. The current situation in the IR that SelectionDAGBuilder recognizes are:
The intrinsic and readnone lib call seem to go through the same path on x86 and use the chainless ISD::FSQRT. The not-readnone lib call somehow end up producing a conditional branch on x86, but also involves the no-side effect ISD::FSQRT. I've never been able to keep this entire situation straight, so I'm not sure what exactly is going on there for errno handling. Given that there's no chain on ISD::FSQRT there seems to be no expectation of errno setting, and somewhere x86 avoids it by inserting the branch.
So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two lib calls with the readnone function attribute one being selected into an instruction, and the vanilla lib call remaining a function call. The idea is that the common case of the param >= 0 uses the fast path of a native instruction, followed by a check and conditional branch if the result shows that a real lib call is actually needed. I think this optimisation is orthogonal to the discussion since it happens at the IR level.
It's really scattered around. The exact rules are split between various mailing list posts and scattered around the sources, so I hope we can clearly document all the rules for whatever goes in here.
The combine to undo the 0 check and select:
If I disable PartiallyInlineLibcalls, ISD::FSQRT isn't used, so whatever instructions are introduced here I don't think should be worrying about errno.
Right, let's forget about errno because if that behaviour is required it will come through as a standard function call and none of this code will affect it.
To be clear, the proposed way forward is for the G_FSQRT op to be specified to return NaN for the negative operand case. It also returns NaN if the operand is NaN.
Since the negative operand case is undefined for llvm.sqrt, then it will always be correct to translate it to G_FSQRT, and if for some reason someone wants to have undefined semantics, then that can be a separate opcode.
On the issue of pattern matching x < 0 ? nan : llvm.sqrt(x), I don't think we can fix that issue in codegen. The semantics of llvm.sqrt are fixed and we will have to try to optimise it anyway.
Finally, we don't try to model the fp environment.