This is an archive of the discontinued LLVM Phabricator instance.

Add ‘llvm.experimental.constrained.fma‘ Intrinsic
ClosedPublic

Authored by wdng on Aug 4 2017, 12:10 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
andrew.w.kaylor
craig.topper

Commits

rGa131d3fb29eb: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
rL311629: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.

Summary

Add ‘llvm.experimental.constrained.fma‘ Intrinsic.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 4 2017, 12:10 PM

Needs tests

b-sumner added inline comments.Aug 4 2017, 12:23 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6651	This is a ternary operation. Code below assumes unary or binary.

An update to docs/LangRef.rst is needed.

Add missing lit tests.

In D36335#832367, @b-sumner wrote:

An update to docs/LangRef.rst is needed.

Sure, will do.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6651	In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to FMA and IsUnary is false by default. So we may not need to specify whether is unary of binary here?

b-sumner added inline comments.Aug 4 2017, 2:29 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6651	Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to MorphNodeTo for the FMA case?

andrew.w.kaylor requested changes to this revision.Aug 4 2017, 3:09 PM

andrew.w.kaylor added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6651	You definitely need to add code to handle the third argument.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6015	This code also needs to be updated to handle the case of three value operands.
lib/IR/Verifier.cpp
3985	The implementation of this function assumes only 1 or 2 value operands. It will need to be updated.
test/Feature/fp-intrinsics.ll
236	If you checked the arguments here it should reveal the problems in the code. There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic?

This revision now requires changes to proceed.Aug 4 2017, 3:09 PM

Could you also add a use of this new intrinsic to llvm/test/Verifier/fp-intrinsics.ll?

Address code reviews.

Upload correct diff.

b-sumner added inline comments.Aug 11 2017, 2:06 PM

docs/LangRef.rst
13035	Too much cut and paste from frem
13043	rounding only once
lib/Target/X86/X86ISelDAGToDAG.cpp
2017 ↗	(On Diff #110808)	Did you run clang-format?

Code changes based on Brian's comments.

wdng marked 3 inline comments as done.Aug 11 2017, 3:31 PM

arsenm added inline comments.Aug 11 2017, 3:33 PM

test/CodeGen/X86/fp-intrinsics.ll
2	Missing -check-prefix=CHECK
246	You need a separate check-label for the FMAless run line
test/Feature/fp-intrinsics.ll
244	Should also test for the other FP types

andrew.w.kaylor added inline comments.Aug 14 2017, 10:20 AM

docs/LangRef.rst
13035	I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient. In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed.
13037	I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should. Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately.
lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA.

Address code reviews.

wdng added inline comments.Aug 15 2017, 3:42 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection.

andrew.w.kaylor added inline comments.Aug 15 2017, 3:58 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	I still don't understand. What happens when -mattr=+fma is used? The CodeGen/X86/fma.ll test uses that option. This case should work in the same way.

b-sumner added inline comments.Aug 15 2017, 4:15 PM

docs/LangRef.rst
13023	How about "...returns the result of a fused-multiply-add operation on its operands."?
13043	How about "The result produced is the product of the first two operands added to the third operand computed with infinite precision, and then rounded to the target precision."

wdng added inline comments.Aug 16 2017, 9:07 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there. Without -mattr=+fma, a FMA libcall will be generated With -mattr=+fma, we are expecting the corresponding FMA instruction to be generated. In fma.ll, all fma tests are not constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump. (gdb) p CurDAG->dump() SelectionDAG has 12 nodes: t0: ch = EntryToken t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0 t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1 t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2 t12: f64 = X86ISD::FMADD t2, t4, t6 t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12 t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1 However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection.

Update LangRef.rst based on comments.

Update LangRef.rst: put more accurate descriptions into the constrained.fma semantic section.

b-sumner added inline comments.Aug 16 2017, 2:22 PM

docs/LangRef.rst
13040	Extra period

Remove extra period. Thanks!

Ping.

andrew.w.kaylor added inline comments.Aug 17 2017, 11:17 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else. Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.

wdng added inline comments.Aug 18 2017, 11:55 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? --> Yes My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. ---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector: switch (Action) { case TargetLowering::Legal: if (Node->isStrictFPOpcode()) Node = DAG.mutateStrictFPToFP(Node); return; So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator. However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think?

Ping.

andrew.w.kaylor added inline comments.Aug 18 2017, 2:52 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015 ↗	(On Diff #110823)	I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection. Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there? Also, have you considered how non-X86 architectures need to handle this case?

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

In D36335#846018, @craig.topper wrote:

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

That's exactly the kind of thing I was afraid of missing by not channeling this through the normal path that ISD::FMA takes.

I'm wondering if X86 really needs X86ISD::FMADD opcode at all. We definitely need FNMADD, FMSUB, and FNMSUB. But I don't think there's any real difference between X86ISD::FMADD and ISD::FMA.

include/llvm/CodeGen/ISDOpcodes.h
266	Please keep the blank line here.
lib/Target/X86/X86ISelDAGToDAG.cpp
2016 ↗	(On Diff #111428)	Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and DAG combine.

craig.topper mentioned this in D36983: [X86] Remove X86ISD::FMADD in favor ISD::FMA.Aug 21 2017, 3:04 PM

Address code reviews. Thanks a lot!

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

In D36335#849535, @andrew.w.kaylor wrote:

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

Sure, thanks!

Diffusion mentioned this in rL311568: [X86] Remove X86ISD::FMADD in favor ISD::FMA.Aug 23 2017, 9:29 AM

Patch update after the patch [X86] Remove X86ISD::FMADD in favor ISD::FMA has been upstreamed.

wdng added a reviewer: craig.topper.Aug 23 2017, 11:28 AM

Fixed a format issue.

andrew.w.kaylor added inline comments.Aug 23 2017, 11:49 AM

test/CodeGen/X86/fp-intrinsics.ll
245	These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors.

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

In D36335#850574, @andrew.w.kaylor wrote:

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

Hi, Andrew, no problem at all. I will provide an updated full patch for this. Thanks a lot!

Address code reviews. Thanks!

LGTM

This revision is now accepted and ready to land.Aug 23 2017, 4:19 PM

Closed by commit rL311629: Add ‘llvm.experimental.constrained.fma‘ Intrinsic. (authored by wdng). · Explain WhyAug 23 2017, 9:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

LangRef.rst

127 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

1 line

SelectionDAGNodes.h

3 lines

IR/

IntrinsicInst.h

2 lines

Intrinsics.td

7 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

8 lines

SelectionDAG.cpp

6 lines

SelectionDAGBuilder.cpp

13 lines

IR/

IntrinsicInst.cpp

20 lines

Verifier.cpp

6 lines

test/

CodeGen/

X86/

fp-intrinsics.ll

81 lines

Feature/

fp-intrinsics.ll

16 lines

Diff 112408

docs/LangRef.rst

Context not available.
	int i; // offset 0	int i; // offset 0
	float f; // offset 4	float f; // offset 4
	};	};

	struct Outer {	struct Outer {
	float f; // offset 0	float f; // offset 0
	double d; // offset 4	double d; // offset 4
	struct Inner inner_a; // offset 12	struct Inner inner_a; // offset 12
	};	};

	void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {	void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {
	outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)	outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)
	outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)	outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)
Context not available.
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.	The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.
	The existence of the ``invariant.group`` metadata on the instruction tells	The existence of the ``invariant.group`` metadata on the instruction tells
	the optimizer that every ``load`` and ``store`` to the same pointer operand	the optimizer that every ``load`` and ``store`` to the same pointer operand
	within the same invariant group can be assumed to load or store the same	within the same invariant group can be assumed to load or store the same
	value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects	value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects
	when two pointers are considered the same). Pointers returned by bitcast or	when two pointers are considered the same). Pointers returned by bitcast or
	getelementptr with only zero indices are considered the same.	getelementptr with only zero indices are considered the same.

Context not available.
	%ptr = alloca i8	%ptr = alloca i8
	store i8 42, i8* %ptr, !invariant.group !0	store i8 42, i8* %ptr, !invariant.group !0
	call void @foo(i8* %ptr)	call void @foo(i8* %ptr)

	%a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change	%a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
	call void @foo(i8* %ptr)	call void @foo(i8* %ptr)
	%b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed	%b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed

	%newPtr = call i8* @getPointer(i8* %ptr)	%newPtr = call i8* @getPointer(i8* %ptr)
	%c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr	%c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr

	%unknownValue = load i8, i8* @unknownPtr	%unknownValue = load i8, i8* @unknownPtr
	store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42	store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42

	call void @foo(i8* %ptr)	call void @foo(i8* %ptr)
	%newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)	%newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)
	%d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through invariant.group.barrier to get value of %ptr	%d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through invariant.group.barrier to get value of %ptr

	...	...
	declare void @foo(i8*)	declare void @foo(i8*)
	declare i8* @getPointer(i8*)	declare i8* @getPointer(i8*)
	declare i8* @llvm.invariant.group.barrier(i8*)	declare i8* @llvm.invariant.group.barrier(i8*)

	!0 = !{!"magic ptr"}	!0 = !{!"magic ptr"}
	!1 = !{!"other ptr"}	!1 = !{!"other ptr"}

Context not available.
	to the SSA value of the pointer operand.	to the SSA value of the pointer operand.

	.. code-block:: llvm	.. code-block:: llvm

	%v = load i8, i8* %x, !invariant.group !0	%v = load i8, i8* %x, !invariant.group !0
	; if %x mustalias %y then we can replace the above instruction with	; if %x mustalias %y then we can replace the above instruction with
	%v = load i8, i8* %y	%v = load i8, i8* %y
Context not available.

	Note that unsigned integer remainder and signed integer remainder are	Note that unsigned integer remainder and signed integer remainder are
	distinct operations; for signed integer remainder, use '``srem``'.	distinct operations; for signed integer remainder, use '``srem``'.

	Taking the remainder of a division by zero is undefined behavior.	Taking the remainder of a division by zero is undefined behavior.
	For vectors, if any element of the divisor is zero, the operation has	For vectors, if any element of the divisor is zero, the operation has
	undefined behavior.	undefined behavior.

	Example:	Example:
Context not available.
	distinct operations; for unsigned integer remainder, use '``urem``'.	distinct operations; for unsigned integer remainder, use '``urem``'.

	Taking the remainder of a division by zero is undefined behavior.	Taking the remainder of a division by zero is undefined behavior.
	For vectors, if any element of the divisor is zero, the operation has	For vectors, if any element of the divisor is zero, the operation has
	undefined behavior.	undefined behavior.
	Overflow also leads to undefined behavior; this is a rare case, but can	Overflow also leads to undefined behavior; this is a rare case, but can
	occur, for example, by taking the remainder of a 32-bit division of	occur, for example, by taking the remainder of a 32-bit division of
Context not available.
	instructions to save cache bandwidth, such as the ``MOVNT`` instruction on	instructions to save cache bandwidth, such as the ``MOVNT`` instruction on
	x86.	x86.

	The optional ``!invariant.group`` metadata must reference a	The optional ``!invariant.group`` metadata must reference a
	single metadata name ``<index>``. See ``invariant.group`` metadata.	single metadata name ``<index>``. See ``invariant.group`` metadata.

	Semantics:	Semantics:
Context not available.
	to operate on, a value to compare to the value currently be at that	to operate on, a value to compare to the value currently be at that
	address, and a new value to place at that address if the compared values	address, and a new value to place at that address if the compared values
	are equal. The type of '<cmp>' must be an integer or pointer type whose	are equal. The type of '<cmp>' must be an integer or pointer type whose
	bit width is a power of two greater than or equal to eight and less	bit width is a power of two greater than or equal to eight and less
	than or equal to a target-specific size limit. '<cmp>' and '<new>' must	than or equal to a target-specific size limit. '<cmp>' and '<new>' must
	have the same type, and the type of '<pointer>' must be a pointer to	have the same type, and the type of '<pointer>' must be a pointer to
	that type. If the ``cmpxchg`` is marked as ``volatile``, then the	that type. If the ``cmpxchg`` is marked as ``volatile``, then the
	optimizer is not allowed to modify the number or order of execution of	optimizer is not allowed to modify the number or order of execution of
	this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.	this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.

Context not available.
	``tail`` or ``musttail`` markers to the call. It is used to prevent tail	``tail`` or ``musttail`` markers to the call. It is used to prevent tail
	call optimization from being performed on the call.	call optimization from being performed on the call.

	#. The optional ``fast-math flags`` marker indicates that the call has one or more	#. The optional ``fast-math flags`` marker indicates that the call has one or more
	:ref:`fast-math flags <fastmath>`, which are optimization hints to enable	:ref:`fast-math flags <fastmath>`, which are optimization hints to enable
	otherwise unsafe floating-point optimizations. Fast-math flags are only valid	otherwise unsafe floating-point optimizations. Fast-math flags are only valid
	for calls that return a floating-point scalar or vector type.	for calls that return a floating-point scalar or vector type.
Context not available.
	Overview:	Overview:
	"""""""""	"""""""""

	The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant	The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant
	established by invariant.group metadata no longer holds, to obtain a new pointer	established by invariant.group metadata no longer holds, to obtain a new pointer
	value that does not carry the invariant information.	value that does not carry the invariant information.

Context not available.
	Semantics:	Semantics:
	""""""""""	""""""""""

	Returns another pointer that aliases its argument but which is considered different	Returns another pointer that aliases its argument but which is considered different
	for the purposes of ``load``/``store`` ``invariant.group`` metadata.	for the purposes of ``load``/``store`` ``invariant.group`` metadata.

	Constrained Floating Point Intrinsics	Constrained Floating Point Intrinsics
Context not available.
	Any FP exception that would have been raised by the original code must be raised	Any FP exception that would have been raised by the original code must be raised
	by the transformed code, and the transformed code must not raise any FP	by the transformed code, and the transformed code must not raise any FP
	exceptions that would not have been raised by the original code. This is the	exceptions that would not have been raised by the original code. This is the
	exception behavior argument that will be used if the code being compiled reads	exception behavior argument that will be used if the code being compiled reads
	the FP exception status flags, but this mode can also be used with code that	the FP exception status flags, but this mode can also be used with code that
	unmasks FP exceptions.	unmasks FP exceptions.

Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	The value produced is the floating point remainder from the division of the two	The value produced is the floating point remainder from the division of the two
	value operands and has the same type as the operands. The remainder has the	value operands and has the same type as the operands. The remainder has the
	same sign as the dividend.	same sign as the dividend.

		b-sumnerUnsubmitted Not Done Reply Inline Actions How about "...returns the result of a fused-multiply-add operation on its operands."? b-sumner: How about "...returns the result of a fused-multiply-add operation on its operands."?
		'``llvm.experimental.constrained.fma``' Intrinsic
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""

		::

		declare <type>
		@llvm.experimental.constrained.fma(<type> <op1>, <type> <op2>, <type> <op3>,
		metadata <rounding mode>,
		metadata <exception behavior>)
		b-sumnerUnsubmitted Done Reply Inline Actions Too much cut and paste from frem b-sumner: Too much cut and paste from frem
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient. In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed. andrew.w.kaylor: I'm not sure it's clear what the comment "Note that the rounding happens only once here" means…

		Overview:
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should. Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately. andrew.w.kaylor: I think it would be a good idea to discuss here the circumstances under which this intrinsic…
		"""""""""

		The '``llvm.experimental.constrained.fma``' intrinsic returns the result of a
		b-sumnerUnsubmitted Done Reply Inline Actions Extra period b-sumner: Extra period
		fused-multiply-add operation on its operands.

		Arguments:
		b-sumnerUnsubmitted Done Reply Inline Actions rounding only once b-sumner: rounding only once
		b-sumnerUnsubmitted Done Reply Inline Actions How about "The result produced is the product of the first two operands added to the third operand computed with infinite precision, and then rounded to the target precision." b-sumner: How about "The result produced is the product of the first two operands added to the third…
		""""""""""

		The first three arguments to the '``llvm.experimental.constrained.fma``'
		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
		<t_vector>` of floating point values. All arguments must have identical types.

		The fourth and fifth arguments specify the rounding mode and exception behavior
		as described above.

		Semantics:
		""""""""""

		The result produced is the product of the first two operands added to the third
		operand computed with infinite precision, and then rounded to the target
		precision.

	Constrained libm-equivalent Intrinsics	Constrained libm-equivalent Intrinsics
	--------------------------------------	--------------------------------------
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.sqrt(<type> <op1>,	@llvm.experimental.constrained.sqrt(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,	@llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,	@llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.sin(<type> <op1>,	@llvm.experimental.constrained.sin(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.cos(<type> <op1>,	@llvm.experimental.constrained.cos(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.exp(<type> <op1>,	@llvm.experimental.constrained.exp(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.exp2(<type> <op1>,	@llvm.experimental.constrained.exp2(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.log(<type> <op1>,	@llvm.experimental.constrained.log(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.log10(<type> <op1>,	@llvm.experimental.constrained.log10(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.log2(<type> <op1>,	@llvm.experimental.constrained.log2(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.rint(<type> <op1>,	@llvm.experimental.constrained.rint(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.

	::	::

	declare <type>	declare <type>
	@llvm.experimental.constrained.nearbyint(<type> <op1>,	@llvm.experimental.constrained.nearbyint(<type> <op1>,
	metadata <rounding mode>,	metadata <rounding mode>,
	metadata <exception behavior>)	metadata <exception behavior>)
Context not available.
	memory from the source location to the destination location. These locations are not	memory from the source location to the destination location. These locations are not
	allowed to overlap. The memory copy is performed as a sequence of load/store operations	allowed to overlap. The memory copy is performed as a sequence of load/store operations
	where each access is guaranteed to be a multiple of ``element_size`` bytes wide and	where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
	aligned at an ``element_size`` boundary.	aligned at an ``element_size`` boundary.

	The order of the copy is unspecified. The same value may be read from the source	The order of the copy is unspecified. The same value may be read from the source
	buffer many times, but only one write is issued to the destination buffer per	buffer many times, but only one write is issued to the destination buffer per
Context not available.
	of memory from the source location to the destination location. These locations	of memory from the source location to the destination location. These locations
	are allowed to overlap. The memory copy is performed as a sequence of load/store	are allowed to overlap. The memory copy is performed as a sequence of load/store
	operations where each access is guaranteed to be a multiple of ``element_size``	operations where each access is guaranteed to be a multiple of ``element_size``
	bytes wide and aligned at an ``element_size`` boundary.	bytes wide and aligned at an ``element_size`` boundary.

	The order of the copy is unspecified. The same value may be read from the source	The order of the copy is unspecified. The same value may be read from the source
	buffer many times, but only one write is issued to the destination buffer per	buffer many times, but only one write is issued to the destination buffer per
Context not available.
	The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of	The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of
	memory starting at the destination location to the given ``value``. The memory is	memory starting at the destination location to the given ``value``. The memory is
	set with a sequence of store operations where each access is guaranteed to be a	set with a sequence of store operations where each access is guaranteed to be a
	multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.	multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.

	The order of the assignment is unspecified. Only one write is issued to the	The order of the assignment is unspecified. Only one write is issued to the
	destination buffer per element. It is well defined to have concurrent reads and	destination buffer per element. It is well defined to have concurrent reads and
Context not available.

include/llvm/CodeGen/ISDOpcodes.h

Context not available.
	/// They are used to limit optimizations while the DAG is being	/// They are used to limit optimizations while the DAG is being
	/// optimized.	/// optimized.
	STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,	STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
		STRICT_FMA,
		craig.topperUnsubmitted Done Reply Inline Actions Please keep the blank line here. craig.topper: Please keep the blank line here.

	/// Constrained versions of libm-equivalent floating point intrinsics.	/// Constrained versions of libm-equivalent floating point intrinsics.
	/// These will be lowered to the equivalent non-constrained pseudo-op	/// These will be lowered to the equivalent non-constrained pseudo-op
Context not available.

include/llvm/CodeGen/SelectionDAGNodes.h

Context not available.
	/// Test if this node is a strict floating point pseudo-op.	/// Test if this node is a strict floating point pseudo-op.
	bool isStrictFPOpcode() {	bool isStrictFPOpcode() {
	switch (NodeType) {	switch (NodeType) {
	default:	default:
	return false;	return false;
	case ISD::STRICT_FADD:	case ISD::STRICT_FADD:
	case ISD::STRICT_FSUB:	case ISD::STRICT_FSUB:
	case ISD::STRICT_FMUL:	case ISD::STRICT_FMUL:
	case ISD::STRICT_FDIV:	case ISD::STRICT_FDIV:
	case ISD::STRICT_FREM:	case ISD::STRICT_FREM:
		case ISD::STRICT_FMA:
	case ISD::STRICT_FSQRT:	case ISD::STRICT_FSQRT:
	case ISD::STRICT_FPOW:	case ISD::STRICT_FPOW:
	case ISD::STRICT_FPOWI:	case ISD::STRICT_FPOWI:
Context not available.

include/llvm/IR/IntrinsicInst.h

Context not available.
	};	};

	bool isUnaryOp() const;	bool isUnaryOp() const;
		bool isTernaryOp() const;
	RoundingMode getRoundingMode() const;	RoundingMode getRoundingMode() const;
	ExceptionBehavior getExceptionBehavior() const;	ExceptionBehavior getExceptionBehavior() const;

Context not available.
	case Intrinsic::experimental_constrained_fmul:	case Intrinsic::experimental_constrained_fmul:
	case Intrinsic::experimental_constrained_fdiv:	case Intrinsic::experimental_constrained_fdiv:
	case Intrinsic::experimental_constrained_frem:	case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
	case Intrinsic::experimental_constrained_sqrt:	case Intrinsic::experimental_constrained_sqrt:
	case Intrinsic::experimental_constrained_pow:	case Intrinsic::experimental_constrained_pow:
	case Intrinsic::experimental_constrained_powi:	case Intrinsic::experimental_constrained_powi:
Context not available.

include/llvm/IR/Intrinsics.td

Context not available.
	llvm_metadata_ty,	llvm_metadata_ty,
	llvm_metadata_ty ]>;	llvm_metadata_ty ]>;

		def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

	// These intrinsics are sensitive to the rounding mode so we need constrained	// These intrinsics are sensitive to the rounding mode so we need constrained
	// versions of each of them. When strict rounding and exception control are	// versions of each of them. When strict rounding and exception control are
	// not required the non-constrained versions of these intrinsics should be	// not required the non-constrained versions of these intrinsics should be
Context not available.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Context not available.
	case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;	case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;
	case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;	case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;
	case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;	case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;
		case ISD::STRICT_FMA: EqOpc = ISD::FMA; break;
	case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;	case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;
	case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;	case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;
	case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;	case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;
Context not available.
	}	}
	break;	break;
	case ISD::STRICT_FSQRT:	case ISD::STRICT_FSQRT:
		case ISD::STRICT_FMA:
	case ISD::STRICT_FPOW:	case ISD::STRICT_FPOW:
	case ISD::STRICT_FPOWI:	case ISD::STRICT_FPOWI:
	case ISD::STRICT_FSIN:	case ISD::STRICT_FSIN:
Context not available.
	// If the index is dependent on the store we will introduce a cycle when	// If the index is dependent on the store we will introduce a cycle when
	// creating the load (the load uses the index, and by replacing the chain	// creating the load (the load uses the index, and by replacing the chain
	// we will make the index dependent on the load). Also, the store might be	// we will make the index dependent on the load). Also, the store might be
	// dependent on the extractelement and introduce a cycle when creating	// dependent on the extractelement and introduce a cycle when creating
	// the load.	// the load.
	if (SDNode::hasPredecessorHelper(ST, Visited, Worklist) \|\|	if (SDNode::hasPredecessorHelper(ST, Visited, Worklist) \|\|
	ST->hasPredecessor(Op.getNode()))	ST->hasPredecessor(Op.getNode()))
Context not available.
	Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,	Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
	RTLIB::FMA_F80, RTLIB::FMA_F128,	RTLIB::FMA_F80, RTLIB::FMA_F128,
	RTLIB::FMA_PPCF128));	RTLIB::FMA_PPCF128));
		case ISD::STRICT_FMA:
		Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
		RTLIB::FMA_F80, RTLIB::FMA_F128,
		RTLIB::FMA_PPCF128));
	break;	break;
	case ISD::FADD:	case ISD::FADD:
	Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,	Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,
Context not available.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

Context not available.
	unsigned OrigOpc = Node->getOpcode();	unsigned OrigOpc = Node->getOpcode();
	unsigned NewOpc;	unsigned NewOpc;
	bool IsUnary = false;	bool IsUnary = false;
		bool IsTernary = false;
	switch (OrigOpc) {	switch (OrigOpc) {
	default:	default:
	llvm_unreachable("mutateStrictFPToFP called with unexpected opcode!");	llvm_unreachable("mutateStrictFPToFP called with unexpected opcode!");
Context not available.
	case ISD::STRICT_FMUL: NewOpc = ISD::FMUL; break;	case ISD::STRICT_FMUL: NewOpc = ISD::FMUL; break;
	case ISD::STRICT_FDIV: NewOpc = ISD::FDIV; break;	case ISD::STRICT_FDIV: NewOpc = ISD::FDIV; break;
	case ISD::STRICT_FREM: NewOpc = ISD::FREM; break;	case ISD::STRICT_FREM: NewOpc = ISD::FREM; break;
		case ISD::STRICT_FMA: NewOpc = ISD::FMA; IsTernary = true; break;
	case ISD::STRICT_FSQRT: NewOpc = ISD::FSQRT; IsUnary = true; break;	case ISD::STRICT_FSQRT: NewOpc = ISD::FSQRT; IsUnary = true; break;
	case ISD::STRICT_FPOW: NewOpc = ISD::FPOW; break;	case ISD::STRICT_FPOW: NewOpc = ISD::FPOW; break;
	case ISD::STRICT_FPOWI: NewOpc = ISD::FPOWI; break;	case ISD::STRICT_FPOWI: NewOpc = ISD::FPOWI; break;
Context not available.
	SDNode *Res = nullptr;	SDNode *Res = nullptr;
	if (IsUnary)	if (IsUnary)
	Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });	Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });
		else if (IsTernary)
		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
		Node->getOperand(2),
		Node->getOperand(3)});
	else	else
	Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),	Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
	Node->getOperand(2) });	Node->getOperand(2) });
Context not available.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Context not available.
	case Intrinsic::experimental_constrained_fmul:	case Intrinsic::experimental_constrained_fmul:
	case Intrinsic::experimental_constrained_fdiv:	case Intrinsic::experimental_constrained_fdiv:
	case Intrinsic::experimental_constrained_frem:	case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
	case Intrinsic::experimental_constrained_sqrt:	case Intrinsic::experimental_constrained_sqrt:
	case Intrinsic::experimental_constrained_pow:	case Intrinsic::experimental_constrained_pow:
	case Intrinsic::experimental_constrained_powi:	case Intrinsic::experimental_constrained_powi:
Context not available.
	case Intrinsic::experimental_constrained_frem:	case Intrinsic::experimental_constrained_frem:
	Opcode = ISD::STRICT_FREM;	Opcode = ISD::STRICT_FREM;
	break;	break;
		case Intrinsic::experimental_constrained_fma:
		Opcode = ISD::STRICT_FMA;
		break;
	case Intrinsic::experimental_constrained_sqrt:	case Intrinsic::experimental_constrained_sqrt:
	Opcode = ISD::STRICT_FSQRT;	Opcode = ISD::STRICT_FSQRT;
	break;	break;
Context not available.
	SDVTList VTs = DAG.getVTList(ValueVTs);	SDVTList VTs = DAG.getVTList(ValueVTs);
	SDValue Result;	SDValue Result;
	if (FPI.isUnaryOp())	if (FPI.isUnaryOp())
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions This code also needs to be updated to handle the case of three value operands. andrew.w.kaylor: This code also needs to be updated to handle the case of three value operands.
	Result = DAG.getNode(Opcode, sdl, VTs,	Result = DAG.getNode(Opcode, sdl, VTs,
	{ Chain, getValue(FPI.getArgOperand(0)) });	{ Chain, getValue(FPI.getArgOperand(0)) });
		else if (FPI.isTernaryOp())
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ Chain, getValue(FPI.getArgOperand(0)),
		getValue(FPI.getArgOperand(1)),
		getValue(FPI.getArgOperand(2)) });
	else	else
	Result = DAG.getNode(Opcode, sdl, VTs,	Result = DAG.getNode(Opcode, sdl, VTs,
	{ Chain, getValue(FPI.getArgOperand(0)),	{ Chain, getValue(FPI.getArgOperand(0)),
	getValue(FPI.getArgOperand(1)) });	getValue(FPI.getArgOperand(1)) });

Context not available.

lib/IR/IntrinsicInst.cpp

Context not available.
	// are all subclasses of the CallInst class. Note that none of these classes	// are all subclasses of the CallInst class. Note that none of these classes
	// has state or virtual methods, which is an important part of this gross/neat	// has state or virtual methods, which is an important part of this gross/neat
	// hack working.	// hack working.
	//	//
	// In some cases, arguments to intrinsics need to be generic and are defined as	// In some cases, arguments to intrinsics need to be generic and are defined as
	// type pointer to empty struct { }*. To access the real item of interest the	// type pointer to empty struct { }*. To access the real item of interest the
	// cast instruction needs to be stripped away.	// cast instruction needs to be stripped away.
	//	//
	//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//

Context not available.
	ConstrainedFPIntrinsic::RoundingMode	ConstrainedFPIntrinsic::RoundingMode
	ConstrainedFPIntrinsic::getRoundingMode() const {	ConstrainedFPIntrinsic::getRoundingMode() const {
	unsigned NumOperands = getNumArgOperands();	unsigned NumOperands = getNumArgOperands();
	Metadata *MD =	Metadata *MD =
	dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();	dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();
	if (!MD \|\| !isa<MDString>(MD))	if (!MD \|\| !isa<MDString>(MD))
	return rmInvalid;	return rmInvalid;
Context not available.
	ConstrainedFPIntrinsic::ExceptionBehavior	ConstrainedFPIntrinsic::ExceptionBehavior
	ConstrainedFPIntrinsic::getExceptionBehavior() const {	ConstrainedFPIntrinsic::getExceptionBehavior() const {
	unsigned NumOperands = getNumArgOperands();	unsigned NumOperands = getNumArgOperands();
	Metadata *MD =	Metadata *MD =
	dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();	dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();
	if (!MD \|\| !isa<MDString>(MD))	if (!MD \|\| !isa<MDString>(MD))
	return ebInvalid;	return ebInvalid;
Context not available.

	bool ConstrainedFPIntrinsic::isUnaryOp() const {	bool ConstrainedFPIntrinsic::isUnaryOp() const {
	switch (getIntrinsicID()) {	switch (getIntrinsicID()) {
	default:	default:
	return false;	return false;
	case Intrinsic::experimental_constrained_sqrt:	case Intrinsic::experimental_constrained_sqrt:
	case Intrinsic::experimental_constrained_sin:	case Intrinsic::experimental_constrained_sin:
Context not available.
	return true;	return true;
	}	}
	}	}

		bool ConstrainedFPIntrinsic::isTernaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::experimental_constrained_fma:
		return true;
		}
		}

Context not available.

lib/IR/Verifier.cpp

Context not available.
	case Intrinsic::experimental_constrained_fmul:	case Intrinsic::experimental_constrained_fmul:
	case Intrinsic::experimental_constrained_fdiv:	case Intrinsic::experimental_constrained_fdiv:
	case Intrinsic::experimental_constrained_frem:	case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
	case Intrinsic::experimental_constrained_sqrt:	case Intrinsic::experimental_constrained_sqrt:
	case Intrinsic::experimental_constrained_pow:	case Intrinsic::experimental_constrained_pow:
	case Intrinsic::experimental_constrained_powi:	case Intrinsic::experimental_constrained_powi:
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions The implementation of this function assumes only 1 or 2 value operands. It will need to be updated. andrew.w.kaylor: The implementation of this function assumes only 1 or 2 value operands. It will need to be…
Context not available.

	void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {	void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
	unsigned NumOperands = FPI.getNumArgOperands();	unsigned NumOperands = FPI.getNumArgOperands();
	Assert(((NumOperands == 3 && FPI.isUnaryOp()) \|\| (NumOperands == 4)),	Assert(((NumOperands == 5 && FPI.isTernaryOp()) \|\|
	"invalid arguments for constrained FP intrinsic", &FPI);	(NumOperands == 3 && FPI.isUnaryOp()) \|\| (NumOperands == 4)),
		"invalid arguments for constrained FP intrinsic", &FPI);
	Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-1)),	Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-1)),
	"invalid exception behavior argument", &FPI);	"invalid exception behavior argument", &FPI);
	Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-2)),	Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-2)),
Context not available.

test/CodeGen/X86/fp-intrinsics.ll

	; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck %s	; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck --check-prefix=COMMON --check-prefix=NO-FMA --check-prefix=FMACALL64 --check-prefix=FMACALL32 %s
		; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+fma < %s \| FileCheck -check-prefix=COMMON --check-prefix=HAS-FMA --check-prefix=FMA64 --check-prefix=FMA32 %s
		arsenmUnsubmitted Done Reply Inline Actions Missing -check-prefix=CHECK arsenm: Missing -check-prefix=CHECK

	; Verify that constants aren't folded to inexact results when the rounding mode	; Verify that constants aren't folded to inexact results when the rounding mode
	; is unknown.	; is unknown.
Context not available.
	; }	; }
	;	;
	; CHECK-LABEL: f1	; CHECK-LABEL: f1
	; CHECK: divsd	; COMMON: divsd
	define double @f1() {	define double @f1() {
	entry:	entry:
	%div = call double @llvm.experimental.constrained.fdiv.f64(	%div = call double @llvm.experimental.constrained.fdiv.f64(
Context not available.
	; }	; }
	;	;
	; CHECK-LABEL: f2	; CHECK-LABEL: f2
	; CHECK: subsd	; COMMON: subsd
	define double @f2(double %a) {	define double @f2(double %a) {
	entry:	entry:
	%div = call double @llvm.experimental.constrained.fsub.f64(	%div = call double @llvm.experimental.constrained.fsub.f64(
Context not available.
	; }	; }
	;	;
	; CHECK-LABEL: f3:	; CHECK-LABEL: f3:
	; CHECK: subsd	; COMMON: subsd
	; CHECK: mulsd	; COMMON: mulsd
	; CHECK: subsd	; COMMON: subsd
	define double @f3(double %a, double %b) {	define double @f3(double %a, double %b) {
	entry:	entry:
	%sub = call double @llvm.experimental.constrained.fsub.f64(	%sub = call double @llvm.experimental.constrained.fsub.f64(
Context not available.
	; return a;	; return a;
	; }	; }
	;	;
	;	;
	; CHECK-LABEL: f4:	; CHECK-LABEL: f4:
	; CHECK: testl	; COMMON: testl
	; CHECK: jle	; COMMON: jle
	; CHECK: addsd	; COMMON: addsd
	define double @f4(i32 %n, double %a) {	define double @f4(i32 %n, double %a) {
	entry:	entry:
	%cmp = icmp sgt i32 %n, 0	%cmp = icmp sgt i32 %n, 0
Context not available.

	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f5	; CHECK-LABEL: f5
	; CHECK: sqrtsd	; COMMON: sqrtsd
	define double @f5() {	define double @f5() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,	%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,
Context not available.

	; Verify that pow(42.1, 3.0) isn't simplified when the rounding mode is unknown.	; Verify that pow(42.1, 3.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f6	; CHECK-LABEL: f6
	; CHECK: pow	; COMMON: pow
	define double @f6() {	define double @f6() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.pow.f64(double 42.1,	%result = call double @llvm.experimental.constrained.pow.f64(double 42.1,
Context not available.

	; Verify that powi(42.1, 3) isn't simplified when the rounding mode is unknown.	; Verify that powi(42.1, 3) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f7	; CHECK-LABEL: f7
	; CHECK: powi	; COMMON: powi
	define double @f7() {	define double @f7() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.powi.f64(double 42.1,	%result = call double @llvm.experimental.constrained.powi.f64(double 42.1,
Context not available.

	; Verify that sin(42.0) isn't simplified when the rounding mode is unknown.	; Verify that sin(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f8	; CHECK-LABEL: f8
	; CHECK: sin	; COMMON: sin
	define double @f8() {	define double @f8() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.sin.f64(double 42.0,	%result = call double @llvm.experimental.constrained.sin.f64(double 42.0,
Context not available.

	; Verify that cos(42.0) isn't simplified when the rounding mode is unknown.	; Verify that cos(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f9	; CHECK-LABEL: f9
	; CHECK: cos	; COMMON: cos
	define double @f9() {	define double @f9() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.cos.f64(double 42.0,	%result = call double @llvm.experimental.constrained.cos.f64(double 42.0,
Context not available.

	; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.	; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f10	; CHECK-LABEL: f10
	; CHECK: exp	; COMMON: exp
	define double @f10() {	define double @f10() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.exp.f64(double 42.0,	%result = call double @llvm.experimental.constrained.exp.f64(double 42.0,
Context not available.

	; Verify that exp2(42.1) isn't simplified when the rounding mode is unknown.	; Verify that exp2(42.1) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f11	; CHECK-LABEL: f11
	; CHECK: exp2	; COMMON: exp2
	define double @f11() {	define double @f11() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,	%result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,
Context not available.

	; Verify that log(42.0) isn't simplified when the rounding mode is unknown.	; Verify that log(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f12	; CHECK-LABEL: f12
	; CHECK: log	; COMMON: log
	define double @f12() {	define double @f12() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.log.f64(double 42.0,	%result = call double @llvm.experimental.constrained.log.f64(double 42.0,
Context not available.

	; Verify that log10(42.0) isn't simplified when the rounding mode is unknown.	; Verify that log10(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f13	; CHECK-LABEL: f13
	; CHECK: log10	; COMMON: log10
	define double @f13() {	define double @f13() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.log10.f64(double 42.0,	%result = call double @llvm.experimental.constrained.log10.f64(double 42.0,
Context not available.

	; Verify that log2(42.0) isn't simplified when the rounding mode is unknown.	; Verify that log2(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f14	; CHECK-LABEL: f14
	; CHECK: log2	; COMMON: log2
	define double @f14() {	define double @f14() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.log2.f64(double 42.0,	%result = call double @llvm.experimental.constrained.log2.f64(double 42.0,
Context not available.

	; Verify that rint(42.1) isn't simplified when the rounding mode is unknown.	; Verify that rint(42.1) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f15	; CHECK-LABEL: f15
	; CHECK: rint	; NO-FMA: rint
		; HAS-FMA: vroundsd
	define double @f15() {	define double @f15() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.rint.f64(double 42.1,	%result = call double @llvm.experimental.constrained.rint.f64(double 42.1,
Context not available.
	; Verify that nearbyint(42.1) isn't simplified when the rounding mode is	; Verify that nearbyint(42.1) isn't simplified when the rounding mode is
	; unknown.	; unknown.
	; CHECK-LABEL: f16	; CHECK-LABEL: f16
	; CHECK: nearbyint	; NO-FMA: nearbyint
		; HAS-FMA: vroundsd
	define double @f16() {	define double @f16() {
	entry:	entry:
	%result = call double @llvm.experimental.constrained.nearbyint.f64(	%result = call double @llvm.experimental.constrained.nearbyint.f64(
Context not available.
	ret double %result	ret double %result
	}	}

		; Verify that fma(1.0) isn't simplified when the rounding mode is
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors. andrew.w.kaylor: These values could be constant folded without rounding, so even though this test case works now…
		; unknown.
		arsenmUnsubmitted Done Reply Inline Actions You need a separate check-label for the FMAless run line arsenm: You need a separate check-label for the FMAless run line
		; CHECK-LABEL: f17
		; FMACALL32: jmp fmaf # TAILCALL
		; FMA32: vfmadd213ss
		define float @f17() {
		entry:
		%result = call float @llvm.experimental.constrained.fma.f32(
		float 1.000000e+00,
		float 2.000000e+00,
		float 3.000000e+00,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict")
		ret float %result
		}

		; Verify that fma(42.1) isn't simplified when the rounding mode is
		; unknown.
		; CHECK-LABEL: f18
		; FMACALL64: jmp fma # TAILCALL
		; FMA64: vfmadd213sd
		define double @f18() {
		entry:
		%result = call double @llvm.experimental.constrained.fma.f64(
		double 42.1,
		double 42.1,
		double 42.1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict")
		ret double %result
		}

	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
Context not available.
	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
		declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
		declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
Context not available.

test/Feature/fp-intrinsics.ll

Context not available.
	; return a;	; return a;
	; }	; }
	;	;
	;	;
	; CHECK-LABEL: @f4	; CHECK-LABEL: @f4
	; CHECK-NOT: select	; CHECK-NOT: select
	; CHECK: br i1 %cmp	; CHECK: br i1 %cmp
Context not available.
	ret double %a.0	ret double %a.0
	}	}


	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f5	; CHECK-LABEL: f5
	; CHECK: call double @llvm.experimental.constrained.sqrt	; CHECK: call double @llvm.experimental.constrained.sqrt
Context not available.
	ret double %result	ret double %result
	}	}

		; Verify that fma(42.1) isn't simplified when the rounding mode is
		; unknown.
		; CHECK-LABEL: f17
		; CHECK: call double @llvm.experimental.constrained.fma
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions If you checked the arguments here it should reveal the problems in the code. There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic? andrew.w.kaylor: If you checked the arguments here it should reveal the problems in the code. There's also a…
		define double @f17() {
		entry:
		%result = call double @llvm.experimental.constrained.fma.f64(double 42.1, double 42.1, double 42.1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict")
		ret double %result
		}

		arsenmUnsubmitted Done Reply Inline Actions Should also test for the other FP types arsenm: Should also test for the other FP types
	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
Context not available.
	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
		declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

Add ‘llvm.experimental.constrained.fma‘ IntrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 112408

docs/LangRef.rst

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/IR/IntrinsicInst.cpp

lib/IR/Verifier.cpp

test/CodeGen/X86/fp-intrinsics.ll

test/Feature/fp-intrinsics.ll

Add ‘llvm.experimental.constrained.fma‘ Intrinsic
ClosedPublic