Page MenuHomePhabricator

[GlobalISel] Introduce a G_FSQRT generic instruction
ClosedPublic

Authored by paquette on Jan 28 2019, 2:35 PM.

Details

Summary

This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that.

Diff Detail

Event Timeline

paquette created this revision.Jan 28 2019, 2:35 PM
volkan accepted this revision.Jan 28 2019, 2:40 PM
volkan added a reviewer: volkan.

LGTM

This revision is now accepted and ready to land.Jan 28 2019, 2:40 PM
arsenm added inline comments.Jan 28 2019, 2:42 PM
include/llvm/Target/GenericOpcodes.td
572–577

Needs a comment about the < 0 behavior. One particularly awful detail of SelectionDAG is ISD::FSQRT 's behavior differs from the llvm.sqrt intrinsic.

arsenm added inline comments.Jan 28 2019, 2:48 PM
include/llvm/Target/GenericOpcodes.td
572–577

To be more specific, the IR claims this is undefined, which is the wrong answer. This should return a NAN

paquette marked an inline comment as done.Jan 28 2019, 2:58 PM
paquette added inline comments.
include/llvm/Target/GenericOpcodes.td
572–577

Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear which cases in which this ought to be used.

Reading up on this a bit more, I found this (12-year-old) thread: http://lists.llvm.org/pipermail/llvm-dev/2007-August/010253.html

"Also, this makes llvm.sqrt side-effect free, where sqrt potentially has side-effects (making it harder to CSE, hoist out of loops etc)."

If that's true today, then I guess the llvm.sqrt case should be handled separately.

arsenm added inline comments.Jan 28 2019, 3:01 PM
include/llvm/Target/GenericOpcodes.td
572–577

I think having both would be helpful. Maybe say G_FSQRT is side effect free and defined for NAN, but G_INTRINSIC_SQRT is undefined for NAN?

paquette marked an inline comment as done.Jan 28 2019, 4:14 PM
paquette added inline comments.
include/llvm/Target/GenericOpcodes.td
572–577

I think I need to think about this a little bit more, but it sounds like a reasonable solution to me.

Wouldn't G_INTRINSIC_SQRT be the side-effect-free one though?

aemerson added inline comments.Jan 28 2019, 4:30 PM
include/llvm/Target/GenericOpcodes.td
572–577

Do you have a source reference what the semantics of ISD::FSQRT is then?

arsenm added inline comments.
include/llvm/Target/GenericOpcodes.td
572–577

I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal-instruction with no side effects that returns NaN, which legalizations will use. We'll also pattern match to it from the intrinsic.

We currently do silly things like recognize x < 0 ? nan : sqrt(x) -> sqrt(x) in DAGCombiner. It would be slightly less concerning looking if we were matching to a different opcode.

This probably needs more thought, but I hope we can improve the global-isel situation from the current confusion. The current situation in the IR that SelectionDAGBuilder recognizes are:

  • llvm.sqrt
  • lib call to sqrt that is readnone
  • lib call to sqrt that is not readnone

The intrinsic and readnone lib call seem to go through the same path on x86 and use the chainless ISD::FSQRT. The not-readnone lib call somehow end up producing a conditional branch on x86, but also involves the no-side effect ISD::FSQRT. I've never been able to keep this entire situation straight, so I'm not sure what exactly is going on there for errno handling. Given that there's no chain on ISD::FSQRT there seems to be no expectation of errno setting, and somewhere x86 avoids it by inserting the branch.

aemerson added inline comments.Jan 28 2019, 4:41 PM
include/llvm/Target/GenericOpcodes.td
572–577

So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two lib calls with the readnone function attribute one being selected into an instruction, and the vanilla lib call remaining a function call. The idea is that the common case of the param >= 0 uses the fast path of a native instruction, followed by a check and conditional branch if the result shows that a real lib call is actually needed. I think this optimisation is orthogonal to the discussion since it happens at the IR level.

arsenm added inline comments.Jan 28 2019, 4:46 PM
include/llvm/Target/GenericOpcodes.td
572–577

It's really scattered around. The exact rules are split between various mailing list posts and scattered around the sources, so I hope we can clearly document all the rules for whatever goes in here.

The combine to undo the 0 check and select:
https://github.com/llvm-mirror/llvm/blob/3b68b45b637e85e7db2aae4d5ed50bb75ff0155f/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18277

If I disable PartiallyInlineLibcalls, ISD::FSQRT isn't used, so whatever instructions are introduced here I don't think should be worrying about errno.

aemerson added inline comments.Jan 29 2019, 1:37 PM
include/llvm/Target/GenericOpcodes.td
572–577

Right, let's forget about errno because if that behaviour is required it will come through as a standard function call and none of this code will affect it.

To be clear, the proposed way forward is for the G_FSQRT op to be specified to return NaN for the negative operand case. It also returns NaN if the operand is NaN.

Since the negative operand case is undefined for llvm.sqrt, then it will always be correct to translate it to G_FSQRT, and if for some reason someone wants to have undefined semantics, then that can be a separate opcode.

On the issue of pattern matching x < 0 ? nan : llvm.sqrt(x), I don't think we can fix that issue in codegen. The semantics of llvm.sqrt are fixed and we will have to try to optimise it anyway.

Finally, we don't try to model the fp environment.

Agreed?

arsenm added inline comments.Jan 29 2019, 1:46 PM
include/llvm/Target/GenericOpcodes.td
572–577

That sounds fine to me. I do want it explicitly documented here though that G_FSQRT returns NAN here, and does not touch errno.

Uhhh...from what I can tell it looks like llvm.sqrt's semantics have been changed to match the libm semantics, except for errno.

@spatel can you confirm that the change from D39304 indeed made llvm.sqrt(-x) == NaN?

Uhhh...from what I can tell it looks like llvm.sqrt's semantics have been changed to match the libm semantics, except for errno.

@spatel can you confirm that the change from D39304 indeed made llvm.sqrt(-x) == NaN?

That was just a cosmetic change to the docs. The real changes were made in:
D28797
D28928
D28929
D39204

So yes, llvm.sqrt() is supposed to have conformed to IEEE/libm sqrt for about 2 years now. If anyone sees misbehavior/missed optimizations, please let me know.

So, is the consensus here that we

  • Don't split this into two opcodes (for now)
  • Document the behaviour of G_FSQRT wrt negative values and errno

?

So, is the consensus here that we

  • Don't split this into two opcodes (for now)
  • Document the behaviour of G_FSQRT wrt negative values and errno ?

Sounds right to me. Since the intrinsic definition was fixed, I don't see a reason to ever have two opcodes.

paquette updated this revision to Diff 184331.Jan 30 2019, 10:39 AM

Updated patch to document errno behaviour.

arsenm added inline comments.Jan 30 2019, 10:40 AM
include/llvm/Target/GenericOpcodes.td
573

Should also mention it returns NAN for < 0

paquette updated this revision to Diff 184340.Jan 30 2019, 11:12 AM
paquette marked an inline comment as done.

Add some more detail to comment wrt behaviour in comparison to libm sqrt()

arsenm accepted this revision.Jan 30 2019, 12:06 PM

LGTM

paquette closed this revision.Jan 30 2019, 12:50 PM

Committed in r352668. Thanks!