This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Support/
-
TargetOpcodes.def
-
Target/
3/12
GenericOpcodes.td
-
GlobalISel/
-
SelectionDAGCompat.td
-
test/CodeGen/AArch64/GlobalISel/
-
CodeGen/
-
AArch64/
-
GlobalISel/
-
legalizer-info-validation.mir

Differential D57359

[GlobalISel] Introduce a G_FSQRT generic instruction
ClosedPublic

Authored by paquette on Jan 28 2019, 2:35 PM.

Download Raw Diff

Details

Reviewers

aemerson
dsanders
arsenm
volkan
hfinkel

Summary

This introduces a generic instruction for computing the floating point square root of a value. Right now, we can't select @llvm.sqrt, so this is working towards fixing that.

Diff Detail

Event Timeline

paquette created this revision.Jan 28 2019, 2:35 PM

Herald added subscribers: Petar.Avramovic, volkan, javed.absar and 3 others. · View Herald TranscriptJan 28 2019, 2:35 PM

paquette added a child revision: D57360: [GlobalISel] Add IRTranslator support for @llvm.sqrt -> G_FSQRT.Jan 28 2019, 2:38 PM

LGTM

This revision is now accepted and ready to land.Jan 28 2019, 2:40 PM

paquette added a child revision: D57361: [GlobalISel][AArch64] Add instruction selection support for @llvm.sqrt.Jan 28 2019, 2:41 PM

arsenm added inline comments.Jan 28 2019, 2:42 PM

include/llvm/Target/GenericOpcodes.td
572–577	Needs a comment about the < 0 behavior. One particularly awful detail of SelectionDAG is ISD::FSQRT 's behavior differs from the llvm.sqrt intrinsic.

arsenm added inline comments.Jan 28 2019, 2:48 PM

include/llvm/Target/GenericOpcodes.td
572–577	To be more specific, the IR claims this is undefined, which is the wrong answer. This should return a NAN

paquette marked an inline comment as done.Jan 28 2019, 2:58 PM

paquette added inline comments.

include/llvm/Target/GenericOpcodes.td
572–577	Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear which cases in which this ought to be used. Reading up on this a bit more, I found this (12-year-old) thread: http://lists.llvm.org/pipermail/llvm-dev/2007-August/010253.html "Also, this makes llvm.sqrt side-effect free, where sqrt potentially has side-effects (making it harder to CSE, hoist out of loops etc)." If that's true today, then I guess the llvm.sqrt case should be handled separately.

arsenm added inline comments.Jan 28 2019, 3:01 PM

include/llvm/Target/GenericOpcodes.td
572–577	I think having both would be helpful. Maybe say G_FSQRT is side effect free and defined for NAN, but G_INTRINSIC_SQRT is undefined for NAN?

paquette marked an inline comment as done.Jan 28 2019, 4:14 PM

paquette added inline comments.

include/llvm/Target/GenericOpcodes.td
572–577	I think I need to think about this a little bit more, but it sounds like a reasonable solution to me. Wouldn't G_INTRINSIC_SQRT be the side-effect-free one though?

aemerson added inline comments.Jan 28 2019, 4:30 PM

include/llvm/Target/GenericOpcodes.td
572–577	Do you have a source reference what the semantics of ISD::FSQRT is then?

arsenm added a reviewer: hfinkel.Jan 28 2019, 4:32 PM

arsenm added inline comments.

include/llvm/Target/GenericOpcodes.td
572–577	I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal-instruction with no side effects that returns NaN, which legalizations will use. We'll also pattern match to it from the intrinsic. We currently do silly things like recognize x < 0 ? nan : sqrt(x) -> sqrt(x) in DAGCombiner. It would be slightly less concerning looking if we were matching to a different opcode. This probably needs more thought, but I hope we can improve the global-isel situation from the current confusion. The current situation in the IR that SelectionDAGBuilder recognizes are: llvm.sqrt lib call to sqrt that is readnone lib call to sqrt that is not readnone The intrinsic and readnone lib call seem to go through the same path on x86 and use the chainless ISD::FSQRT. The not-readnone lib call somehow end up producing a conditional branch on x86, but also involves the no-side effect ISD::FSQRT. I've never been able to keep this entire situation straight, so I'm not sure what exactly is going on there for errno handling. Given that there's no chain on ISD::FSQRT there seems to be no expectation of errno setting, and somewhere x86 avoids it by inserting the branch.

aemerson added inline comments.Jan 28 2019, 4:41 PM

include/llvm/Target/GenericOpcodes.td
572–577	So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two lib calls with the readnone function attribute one being selected into an instruction, and the vanilla lib call remaining a function call. The idea is that the common case of the param >= 0 uses the fast path of a native instruction, followed by a check and conditional branch if the result shows that a real lib call is actually needed. I think this optimisation is orthogonal to the discussion since it happens at the IR level.

arsenm added inline comments.Jan 28 2019, 4:46 PM

include/llvm/Target/GenericOpcodes.td
572–577	It's really scattered around. The exact rules are split between various mailing list posts and scattered around the sources, so I hope we can clearly document all the rules for whatever goes in here. The combine to undo the 0 check and select: https://github.com/llvm-mirror/llvm/blob/3b68b45b637e85e7db2aae4d5ed50bb75ff0155f/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18277 If I disable PartiallyInlineLibcalls, ISD::FSQRT isn't used, so whatever instructions are introduced here I don't think should be worrying about errno.

aemerson added inline comments.Jan 29 2019, 1:37 PM

include/llvm/Target/GenericOpcodes.td
572–577	Right, let's forget about errno because if that behaviour is required it will come through as a standard function call and none of this code will affect it. To be clear, the proposed way forward is for the G_FSQRT op to be specified to return NaN for the negative operand case. It also returns NaN if the operand is NaN. Since the negative operand case is undefined for llvm.sqrt, then it will always be correct to translate it to G_FSQRT, and if for some reason someone wants to have undefined semantics, then that can be a separate opcode. On the issue of pattern matching `x < 0 ? nan : llvm.sqrt(x)`, I don't think we can fix that issue in codegen. The semantics of llvm.sqrt are fixed and we will have to try to optimise it anyway. Finally, we don't try to model the fp environment. Agreed?

arsenm added inline comments.Jan 29 2019, 1:46 PM

include/llvm/Target/GenericOpcodes.td
572–577	That sounds fine to me. I do want it explicitly documented here though that G_FSQRT returns NAN here, and does not touch errno.

Uhhh...from what I can tell it looks like llvm.sqrt's semantics have been changed to match the libm semantics, except for errno.

@spatel can you confirm that the change from D39304 indeed made llvm.sqrt(-x) == NaN?

In D57359#1376446, @aemerson wrote:

Uhhh...from what I can tell it looks like llvm.sqrt's semantics have been changed to match the libm semantics, except for errno.

@spatel can you confirm that the change from D39304 indeed made llvm.sqrt(-x) == NaN?

That was just a cosmetic change to the docs. The real changes were made in:
D28797
D28928
D28929
D39204

So yes, llvm.sqrt() is supposed to have conformed to IEEE/libm sqrt for about 2 years now. If anyone sees misbehavior/missed optimizations, please let me know.

So, is the consensus here that we

Don't split this into two opcodes (for now)
Document the behaviour of G_FSQRT wrt negative values and errno

In D57359#1377345, @paquette wrote:

So, is the consensus here that we

Don't split this into two opcodes (for now)

Document the behaviour of G_FSQRT wrt negative values and errno

?

Sounds right to me. Since the intrinsic definition was fixed, I don't see a reason to ever have two opcodes.

Updated patch to document errno behaviour.

arsenm added inline comments.Jan 30 2019, 10:40 AM

include/llvm/Target/GenericOpcodes.td
573	Should also mention it returns NAN for < 0

Add some more detail to comment wrt behaviour in comparison to libm sqrt()

LGTM

Committed in r352668. Thanks!

Revision Contents

Path

Size

include/

llvm/

Support/

TargetOpcodes.def

3 lines

Target/

GenericOpcodes.td

10 lines

GlobalISel/

SelectionDAGCompat.td

1 line

test/

CodeGen/

AArch64/

GlobalISel/

legalizer-info-validation.mir

3 lines

Diff 184340

include/llvm/Support/TargetOpcodes.def

	Show First 20 Lines • Show All 520 Lines • ▼ Show 20 Lines
	HANDLE_TARGET_OPCODE(G_FCEIL)			HANDLE_TARGET_OPCODE(G_FCEIL)

	/// Floating point cosine.			/// Floating point cosine.
	HANDLE_TARGET_OPCODE(G_FCOS)			HANDLE_TARGET_OPCODE(G_FCOS)

	/// Floating point sine.			/// Floating point sine.
	HANDLE_TARGET_OPCODE(G_FSIN)			HANDLE_TARGET_OPCODE(G_FSIN)

				/// Floating point square root.
				HANDLE_TARGET_OPCODE(G_FSQRT)

	/// Generic AddressSpaceCast.			/// Generic AddressSpaceCast.
	HANDLE_TARGET_OPCODE(G_ADDRSPACE_CAST)			HANDLE_TARGET_OPCODE(G_ADDRSPACE_CAST)

	/// Generic block address			/// Generic block address
	HANDLE_TARGET_OPCODE(G_BLOCK_ADDR)			HANDLE_TARGET_OPCODE(G_BLOCK_ADDR)

	// TODO: Add more generic opcodes as we move along.			// TODO: Add more generic opcodes as we move along.

	/// Marker for the end of the generic opcode.			/// Marker for the end of the generic opcode.
	/// This is used to check if an opcode is in the range of the			/// This is used to check if an opcode is in the range of the
	/// generic opcodes.			/// generic opcodes.
	HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_BLOCK_ADDR)			HANDLE_TARGET_OPCODE_MARKER(PRE_ISEL_GENERIC_OPCODE_END, G_BLOCK_ADDR)

	/// BUILTIN_OP_END - This must be the last enum value in this list.			/// BUILTIN_OP_END - This must be the last enum value in this list.
	/// The target-specific post-isel opcode values start here.			/// The target-specific post-isel opcode values start here.
	HANDLE_TARGET_OPCODE_MARKER(GENERIC_OP_END, PRE_ISEL_GENERIC_OPCODE_END)			HANDLE_TARGET_OPCODE_MARKER(GENERIC_OP_END, PRE_ISEL_GENERIC_OPCODE_END)

include/llvm/Target/GenericOpcodes.td

	Show First 20 Lines • Show All 563 Lines • ▼ Show 20 Lines

	// Floating point sine of a value.			// Floating point sine of a value.
	def G_FSIN : GenericInstruction {			def G_FSIN : GenericInstruction {
	let OutOperandList = (outs type0:$dst);			let OutOperandList = (outs type0:$dst);
	let InOperandList = (ins type0:$src1);			let InOperandList = (ins type0:$src1);
	let hasSideEffects = 0;			let hasSideEffects = 0;
	}			}

				// Floating point square root of a value.
				// This returns NaN for negative nonzero values.
				arsenmUnsubmitted Done Reply Inline Actions Should also mention it returns NAN for < 0 arsenm: Should also mention it returns NAN for < 0
				// NOTE: Unlike libm sqrt(), this never sets errno. In all other respects it's
				// libm-conformant.
				def G_FSQRT : GenericInstruction {
				let OutOperandList = (outs type0:$dst);
				arsenmUnsubmitted Not Done Reply Inline Actions Needs a comment about the < 0 behavior. One particularly awful detail of SelectionDAG is ISD::FSQRT 's behavior differs from the llvm.sqrt intrinsic. arsenm: Needs a comment about the < 0 behavior. One particularly awful detail of SelectionDAG is ISD…
				arsenmUnsubmitted Not Done Reply Inline Actions To be more specific, the IR claims this is undefined, which is the wrong answer. This should return a NAN arsenm: To be more specific, the IR claims this is undefined, which is the wrong answer. This should…
				paquetteAuthorUnsubmitted Done Reply Inline Actions Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear which cases in which this ought to be used. Reading up on this a bit more, I found this (12-year-old) thread: http://lists.llvm.org/pipermail/llvm-dev/2007-August/010253.html "Also, this makes llvm.sqrt side-effect free, where sqrt potentially has side-effects (making it harder to CSE, hoist out of loops etc)." If that's true today, then I guess the llvm.sqrt case should be handled separately. paquette: Hmm, actually, I'm thinking maybe this should be G_INTRINSIC_SQRT, so that we can be clear…
				arsenmUnsubmitted Not Done Reply Inline Actions I think having both would be helpful. Maybe say G_FSQRT is side effect free and defined for NAN, but G_INTRINSIC_SQRT is undefined for NAN? arsenm: I think having both would be helpful. Maybe say G_FSQRT is side effect free and defined for NAN…
				paquetteAuthorUnsubmitted Done Reply Inline Actions I think I need to think about this a little bit more, but it sounds like a reasonable solution to me. Wouldn't G_INTRINSIC_SQRT be the side-effect-free one though? paquette: I think I need to think about this a little bit more, but it sounds like a reasonable solution…
				arsenmUnsubmitted Not Done Reply Inline Actions I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal-instruction with no side effects that returns NaN, which legalizations will use. We'll also pattern match to it from the intrinsic. We currently do silly things like recognize x < 0 ? nan : sqrt(x) -> sqrt(x) in DAGCombiner. It would be slightly less concerning looking if we were matching to a different opcode. This probably needs more thought, but I hope we can improve the global-isel situation from the current confusion. The current situation in the IR that SelectionDAGBuilder recognizes are: llvm.sqrt lib call to sqrt that is readnone lib call to sqrt that is not readnone The intrinsic and readnone lib call seem to go through the same path on x86 and use the chainless ISD::FSQRT. The not-readnone lib call somehow end up producing a conditional branch on x86, but also involves the no-side effect ISD::FSQRT. I've never been able to keep this entire situation straight, so I'm not sure what exactly is going on there for errno handling. Given that there's no chain on ISD::FSQRT there seems to be no expectation of errno setting, and somewhere x86 avoids it by inserting the branch. arsenm: I was saying both will be side effect free. For the AMDGPU usage I want a no-errno-ever-normal…
				aemersonUnsubmitted Not Done Reply Inline Actions Do you have a source reference what the semantics of ISD::FSQRT is then? aemerson: Do you have a source reference what the semantics of ISD::FSQRT is then?
				aemersonUnsubmitted Not Done Reply Inline Actions So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two lib calls with the readnone function attribute one being selected into an instruction, and the vanilla lib call remaining a function call. The idea is that the common case of the param >= 0 uses the fast path of a native instruction, followed by a check and conditional branch if the result shows that a real lib call is actually needed. I think this optimisation is orthogonal to the discussion since it happens at the IR level. aemerson: So the readnone lib calls come from the PartiallyInlineLibCalls pass, and that pass emits two…
				arsenmUnsubmitted Not Done Reply Inline Actions It's really scattered around. The exact rules are split between various mailing list posts and scattered around the sources, so I hope we can clearly document all the rules for whatever goes in here. The combine to undo the 0 check and select: https://github.com/llvm-mirror/llvm/blob/3b68b45b637e85e7db2aae4d5ed50bb75ff0155f/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L18277 If I disable PartiallyInlineLibcalls, ISD::FSQRT isn't used, so whatever instructions are introduced here I don't think should be worrying about errno. arsenm: It's really scattered around. The exact rules are split between various mailing list posts and…
				aemersonUnsubmitted Not Done Reply Inline Actions Right, let's forget about errno because if that behaviour is required it will come through as a standard function call and none of this code will affect it. To be clear, the proposed way forward is for the G_FSQRT op to be specified to return NaN for the negative operand case. It also returns NaN if the operand is NaN. Since the negative operand case is undefined for llvm.sqrt, then it will always be correct to translate it to G_FSQRT, and if for some reason someone wants to have undefined semantics, then that can be a separate opcode. On the issue of pattern matching `x < 0 ? nan : llvm.sqrt(x)`, I don't think we can fix that issue in codegen. The semantics of llvm.sqrt are fixed and we will have to try to optimise it anyway. Finally, we don't try to model the fp environment. Agreed? aemerson: Right, let's forget about errno because if that behaviour is required it will come through as a…
				arsenmUnsubmitted Not Done Reply Inline Actions That sounds fine to me. I do want it explicitly documented here though that G_FSQRT returns NAN here, and does not touch errno. arsenm: That sounds fine to me. I do want it explicitly documented here though that G_FSQRT returns NAN…
				let InOperandList = (ins type0:$src1);
				let hasSideEffects = 0;
				}

	//------------------------------------------------------------------------------			//------------------------------------------------------------------------------
	// Opcodes for LLVM Intrinsics			// Opcodes for LLVM Intrinsics
	//------------------------------------------------------------------------------			//------------------------------------------------------------------------------
	def G_INTRINSIC_TRUNC : GenericInstruction {			def G_INTRINSIC_TRUNC : GenericInstruction {
	let OutOperandList = (outs type0:$dst);			let OutOperandList = (outs type0:$dst);
	let InOperandList = (ins type0:$src1);			let InOperandList = (ins type0:$src1);
	let hasSideEffects = 0;			let hasSideEffects = 0;
	}			}
	▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

include/llvm/Target/GlobalISel/SelectionDAGCompat.td

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	def : GINodeEquiv<G_CTTZ, cttz>;			def : GINodeEquiv<G_CTTZ, cttz>;
	def : GINodeEquiv<G_CTLZ_ZERO_UNDEF, ctlz_zero_undef>;			def : GINodeEquiv<G_CTLZ_ZERO_UNDEF, ctlz_zero_undef>;
	def : GINodeEquiv<G_CTTZ_ZERO_UNDEF, cttz_zero_undef>;			def : GINodeEquiv<G_CTTZ_ZERO_UNDEF, cttz_zero_undef>;
	def : GINodeEquiv<G_CTPOP, ctpop>;			def : GINodeEquiv<G_CTPOP, ctpop>;
	def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;			def : GINodeEquiv<G_EXTRACT_VECTOR_ELT, vector_extract>;
	def : GINodeEquiv<G_FCEIL, fceil>;			def : GINodeEquiv<G_FCEIL, fceil>;
	def : GINodeEquiv<G_FCOS, fcos>;			def : GINodeEquiv<G_FCOS, fcos>;
	def : GINodeEquiv<G_FSIN, fsin>;			def : GINodeEquiv<G_FSIN, fsin>;
				def : GINodeEquiv<G_FSQRT, fsqrt>;

	// Broadly speaking G_LOAD is equivalent to ISD::LOAD but there are some			// Broadly speaking G_LOAD is equivalent to ISD::LOAD but there are some
	// complications that tablegen must take care of. For example, Predicates such			// complications that tablegen must take care of. For example, Predicates such
	// as isSignExtLoad require that this is not a perfect 1:1 mapping since a			// as isSignExtLoad require that this is not a perfect 1:1 mapping since a
	// sign-extending load is (G_SEXTLOAD x) in GlobalISel. Additionally,			// sign-extending load is (G_SEXTLOAD x) in GlobalISel. Additionally,
	// G_LOAD handles both atomic and non-atomic loads where as SelectionDAG had			// G_LOAD handles both atomic and non-atomic loads where as SelectionDAG had
	// separate nodes for them. This GINodeEquiv maps the non-atomic loads to			// separate nodes for them. This GINodeEquiv maps the non-atomic loads to
	// G_LOAD with a non-atomic MachineMemOperand.			// G_LOAD with a non-atomic MachineMemOperand.
	Show All 38 Lines

test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir

	Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines
	# DEBUG-NEXT: G_FCEIL (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_FCEIL (opcode {{[0-9]+}}): 1 type index
	# DEBUG: .. type index coverage check SKIPPED: user-defined predicate detected			# DEBUG: .. type index coverage check SKIPPED: user-defined predicate detected
	#			#
	# DEBUG-NEXT: G_FCOS (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_FCOS (opcode {{[0-9]+}}): 1 type index
	# DEBUG: .. the first uncovered type index: 1, OK			# DEBUG: .. the first uncovered type index: 1, OK
	#			#
	# DEBUG-NEXT: G_FSIN (opcode {{[0-9]+}}): 1 type index			# DEBUG-NEXT: G_FSIN (opcode {{[0-9]+}}): 1 type index
	# DEBUG: .. the first uncovered type index: 1, OK			# DEBUG: .. the first uncovered type index: 1, OK
				#
				# DEBUG-NEXT: G_FSQRT (opcode {{[0-9]+}}): 1 type index
				# DEBUG: .. the first uncovered type index: 1, OK

	# CHECK-NOT: ill-defined			# CHECK-NOT: ill-defined

	---			---
	name: dummy			name: dummy
	body: \|			body: \|
	bb.0:			bb.0:
	...			...