This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that.
Diff Detail
Event Timeline
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | Needs a comment about the < 0 behavior. One particularly awful detail of SelectionDAG is ISD::FSQRT 's behavior differs from the llvm.sqrt intrinsic. |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | To be more specific, the IR claims this is undefined, which is the wrong answer. This should return a NAN |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear which cases in which this ought to be used. Reading up on this a bit more, I found this (12-year-old) thread: http://lists.llvm.org/pipermail/llvm-dev/2007-August/010253.html "Also, this makes llvm.sqrt side-effect free, where sqrt potentially has side-effects (making it harder to CSE, hoist out of loops etc)." If that's true today, then I guess the llvm.sqrt case should be handled separately. |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | I think having both would be helpful. Maybe say G_FSQRT is side effect free and defined for NAN, but G_INTRINSIC_SQRT is undefined for NAN? |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | I think I need to think about this a little bit more, but it sounds like a reasonable solution to me. Wouldn't G_INTRINSIC_SQRT be the side-effect-free one though? |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | Do you have a source reference what the semantics of ISD::FSQRT is then? |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal-instruction with no side effects that returns NaN, which legalizations will use. We'll also pattern match to it from the intrinsic. We currently do silly things like recognize x < 0 ? nan : sqrt(x) -> sqrt(x) in DAGCombiner. It would be slightly less concerning looking if we were matching to a different opcode. This probably needs more thought, but I hope we can improve the global-isel situation from the current confusion. The current situation in the IR that SelectionDAGBuilder recognizes are:
The intrinsic and readnone lib call seem to go through the same path on x86 and use the chainless ISD::FSQRT. The not-readnone lib call somehow end up producing a conditional branch on x86, but also involves the no-side effect ISD::FSQRT. I've never been able to keep this entire situation straight, so I'm not sure what exactly is going on there for errno handling. Given that there's no chain on ISD::FSQRT there seems to be no expectation of errno setting, and somewhere x86 avoids it by inserting the branch. |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two lib calls with the readnone function attribute one being selected into an instruction, and the vanilla lib call remaining a function call. The idea is that the common case of the param >= 0 uses the fast path of a native instruction, followed by a check and conditional branch if the result shows that a real lib call is actually needed. I think this optimisation is orthogonal to the discussion since it happens at the IR level. |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | It's really scattered around. The exact rules are split between various mailing list posts and scattered around the sources, so I hope we can clearly document all the rules for whatever goes in here. The combine to undo the 0 check and select: If I disable PartiallyInlineLibcalls, ISD::FSQRT isn't used, so whatever instructions are introduced here I don't think should be worrying about errno. |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | Right, let's forget about errno because if that behaviour is required it will come through as a standard function call and none of this code will affect it. To be clear, the proposed way forward is for the G_FSQRT op to be specified to return NaN for the negative operand case. It also returns NaN if the operand is NaN. Since the negative operand case is undefined for llvm.sqrt, then it will always be correct to translate it to G_FSQRT, and if for some reason someone wants to have undefined semantics, then that can be a separate opcode. On the issue of pattern matching x < 0 ? nan : llvm.sqrt(x), I don't think we can fix that issue in codegen. The semantics of llvm.sqrt are fixed and we will have to try to optimise it anyway. Finally, we don't try to model the fp environment. Agreed? |
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
572–577 | That sounds fine to me. I do want it explicitly documented here though that G_FSQRT returns NAN here, and does not touch errno. |
So, is the consensus here that we
- Don't split this into two opcodes (for now)
- Document the behaviour of G_FSQRT wrt negative values and errno
?
Sounds right to me. Since the intrinsic definition was fixed, I don't see a reason to ever have two opcodes.
include/llvm/Target/GenericOpcodes.td | ||
---|---|---|
573 | Should also mention it returns NAN for < 0 |
Should also mention it returns NAN for < 0