This is an archive of the discontinued LLVM Phabricator instance.

Add ‘llvm.experimental.constrained.fma‘ Intrinsic
ClosedPublic

Authored by wdng on Aug 4 2017, 12:10 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
andrew.w.kaylor
craig.topper

Commits

rGa131d3fb29eb: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.
rL311629: Add ‘llvm.experimental.constrained.fma‘ Intrinsic.

Summary

Add ‘llvm.experimental.constrained.fma‘ Intrinsic.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 4 2017, 12:10 PM

Needs tests

b-sumner added inline comments.Aug 4 2017, 12:23 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652	This is a ternary operation. Code below assumes unary or binary.

An update to docs/LangRef.rst is needed.

Add missing lit tests.

In D36335#832367, @b-sumner wrote:

An update to docs/LangRef.rst is needed.

Sure, will do.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652	In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to FMA and IsUnary is false by default. So we may not need to specify whether is unary of binary here?

b-sumner added inline comments.Aug 4 2017, 2:29 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652	Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to MorphNodeTo for the FMA case?

andrew.w.kaylor requested changes to this revision.Aug 4 2017, 3:09 PM

andrew.w.kaylor added inline comments.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6652	You definitely need to add code to handle the third argument.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6015	This code also needs to be updated to handle the case of three value operands.
lib/IR/Verifier.cpp
3985	The implementation of this function assumes only 1 or 2 value operands. It will need to be updated.
test/Feature/fp-intrinsics.ll
236	If you checked the arguments here it should reveal the problems in the code. There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic?

This revision now requires changes to proceed.Aug 4 2017, 3:09 PM

Could you also add a use of this new intrinsic to llvm/test/Verifier/fp-intrinsics.ll?

Address code reviews.

Upload correct diff.

b-sumner added inline comments.Aug 11 2017, 2:06 PM

docs/LangRef.rst
13035	Too much cut and paste from frem
13039	rounding only once
lib/Target/X86/X86ISelDAGToDAG.cpp
2017	Did you run clang-format?

Code changes based on Brian's comments.

wdng marked 3 inline comments as done.Aug 11 2017, 3:31 PM

arsenm added inline comments.Aug 11 2017, 3:33 PM

test/CodeGen/X86/fp-intrinsics.ll
2	Missing -check-prefix=CHECK
248	You need a separate check-label for the FMAless run line
test/Feature/fp-intrinsics.ll
244	Should also test for the other FP types

andrew.w.kaylor added inline comments.Aug 14 2017, 10:20 AM

docs/LangRef.rst
13035	I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient. In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed.
13037	I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should. Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately.
lib/Target/X86/X86ISelDAGToDAG.cpp
2015	Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA.

Address code reviews.

wdng added inline comments.Aug 15 2017, 3:42 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection.

andrew.w.kaylor added inline comments.Aug 15 2017, 3:58 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	I still don't understand. What happens when -mattr=+fma is used? The CodeGen/X86/fma.ll test uses that option. This case should work in the same way.

b-sumner added inline comments.Aug 15 2017, 4:15 PM

docs/LangRef.rst
13023	How about "...returns the result of a fused-multiply-add operation on its operands."?
13039	How about "The result produced is the product of the first two operands added to the third operand computed with infinite precision, and then rounded to the target precision."

wdng added inline comments.Aug 16 2017, 9:07 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there. Without -mattr=+fma, a FMA libcall will be generated With -mattr=+fma, we are expecting the corresponding FMA instruction to be generated. In fma.ll, all fma tests are not constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump. (gdb) p CurDAG->dump() SelectionDAG has 12 nodes: t0: ch = EntryToken t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0 t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1 t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2 t12: f64 = X86ISD::FMADD t2, t4, t6 t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12 t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1 However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection.

Update LangRef.rst based on comments.

Update LangRef.rst: put more accurate descriptions into the constrained.fma semantic section.

b-sumner added inline comments.Aug 16 2017, 2:22 PM

docs/LangRef.rst
13040	Extra period

Remove extra period. Thanks!

Ping.

andrew.w.kaylor added inline comments.Aug 17 2017, 11:17 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else. Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path.

wdng added inline comments.Aug 18 2017, 11:55 AM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? --> Yes My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. ---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector: switch (Action) { case TargetLowering::Legal: if (Node->isStrictFPOpcode()) Node = DAG.mutateStrictFPToFP(Node); return; So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator. However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think?

Ping.

andrew.w.kaylor added inline comments.Aug 18 2017, 2:52 PM

lib/Target/X86/X86ISelDAGToDAG.cpp
2015	I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection. Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there? Also, have you considered how non-X86 architectures need to handle this case?

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

In D36335#846018, @craig.topper wrote:

Do we want to give the target any chance to use FMSUB/FNMADD/FNSUB if any of the arguments are negated?

That's exactly the kind of thing I was afraid of missing by not channeling this through the normal path that ISD::FMA takes.

I'm wondering if X86 really needs X86ISD::FMADD opcode at all. We definitely need FNMADD, FMSUB, and FNMSUB. But I don't think there's any real difference between X86ISD::FMADD and ISD::FMA.

include/llvm/CodeGen/ISDOpcodes.h
267	Please keep the blank line here.
lib/Target/X86/X86ISelDAGToDAG.cpp
2016	Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and DAG combine.

craig.topper mentioned this in D36983: [X86] Remove X86ISD::FMADD in favor ISD::FMA.Aug 21 2017, 3:04 PM

Address code reviews. Thanks a lot!

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

In D36335#849535, @andrew.w.kaylor wrote:

Can you put this off until the patch that Craig submitted in D36983 either lands or gets rejected? If that change goes through, you should be able to remove your modifications to X86ISelDAGToDAG.cpp.

Sure, thanks!

Diffusion mentioned this in rL311568: [X86] Remove X86ISD::FMADD in favor ISD::FMA.Aug 23 2017, 9:29 AM

Patch update after the patch [X86] Remove X86ISD::FMADD in favor ISD::FMA has been upstreamed.

wdng added a reviewer: craig.topper.Aug 23 2017, 11:28 AM

Fixed a format issue.

andrew.w.kaylor added inline comments.Aug 23 2017, 11:49 AM

test/CodeGen/X86/fp-intrinsics.ll
245	These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors.

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

In D36335#850574, @andrew.w.kaylor wrote:

Can you revert the white space changes in the places you aren't otherwise modifying? In general, you shouldn't make formatting changes outside of the parts of the file your patch is modifying. It complicates the version control blame process without adding a lot of benefit.

Also, your latest diffs seem not to have full file context (such as you get with the -x -U99999 switch with diff). This isn't important for the current review, but it is something to keep in mind going forward.

I really appreciate your work on this patch, and I hate to seem like I'm nit-picking a lot. I just want to make sure we do things correctly. Thanks!

Hi, Andrew, no problem at all. I will provide an updated full patch for this. Thanks a lot!

Address code reviews. Thanks!

LGTM

This revision is now accepted and ready to land.Aug 23 2017, 4:19 PM

Closed by commit rL311629: Add ‘llvm.experimental.constrained.fma‘ Intrinsic. (authored by wdng). · Explain WhyAug 23 2017, 9:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

LangRef.rst

127 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

1 line

SelectionDAGNodes.h

3 lines

IR/

IntrinsicInst.h

2 lines

Intrinsics.td

7 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

8 lines

SelectionDAG.cpp

6 lines

SelectionDAGBuilder.cpp

13 lines

IR/

IntrinsicInst.cpp

20 lines

Verifier.cpp

6 lines

Target/

X86/

X86ISelDAGToDAG.cpp

9 lines

test/

CodeGen/

X86/

fp-intrinsics.ll

81 lines

Feature/

fp-intrinsics.ll

16 lines

Diff 112215

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,602 Lines • ▼ Show 20 Lines
As a concrete example, the type descriptor graph for the following program		As a concrete example, the type descriptor graph for the following program

.. code-block:: c		.. code-block:: c

struct Inner {		struct Inner {
int i; // offset 0		int i; // offset 0
float f; // offset 4		float f; // offset 4
};		};

struct Outer {		struct Outer {
float f; // offset 0		float f; // offset 0
double d; // offset 4		double d; // offset 4
struct Inner inner_a; // offset 12		struct Inner inner_a; // offset 12
};		};

void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {		void f(struct Outer* outer, struct Inner* inner, float* f, int* i, char* c) {
outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)		outer->f = 0; // tag0: (OuterStructTy, FloatScalarTy, 0)
outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)		outer->inner_a.i = 0; // tag1: (OuterStructTy, IntScalarTy, 12)
outer->inner_a.f = 0.0; // tag2: (OuterStructTy, IntScalarTy, 16)		outer->inner_a.f = 0.0; // tag2: (OuterStructTy, IntScalarTy, 16)
*f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0)		*f = 0.0; // tag3: (FloatScalarTy, FloatScalarTy, 0)
}		}

is (note that in C and C++, ``char`` can be used to access any arbitrary		is (note that in C and C++, ``char`` can be used to access any arbitrary
▲ Show 20 Lines • Show All 519 Lines • ▼ Show 20 Lines	.. code-block:: llvm
!0 = !{!1, !2} ; a list of loop identifiers		!0 = !{!1, !2} ; a list of loop identifiers
!1 = !{!1} ; an identifier for the inner loop		!1 = !{!1} ; an identifier for the inner loop
!2 = !{!2} ; an identifier for the outer loop		!2 = !{!2} ; an identifier for the outer loop

'``invariant.group``' Metadata		'``invariant.group``' Metadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.		The ``invariant.group`` metadata may be attached to ``load``/``store`` instructions.
The existence of the ``invariant.group`` metadata on the instruction tells		The existence of the ``invariant.group`` metadata on the instruction tells
the optimizer that every ``load`` and ``store`` to the same pointer operand		the optimizer that every ``load`` and ``store`` to the same pointer operand
within the same invariant group can be assumed to load or store the same		within the same invariant group can be assumed to load or store the same
value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects		value (but see the ``llvm.invariant.group.barrier`` intrinsic which affects
when two pointers are considered the same). Pointers returned by bitcast or		when two pointers are considered the same). Pointers returned by bitcast or
getelementptr with only zero indices are considered the same.		getelementptr with only zero indices are considered the same.

Examples:		Examples:

.. code-block:: llvm		.. code-block:: llvm

@unknownPtr = external global i8		@unknownPtr = external global i8
...		...
%ptr = alloca i8		%ptr = alloca i8
store i8 42, i8* %ptr, !invariant.group !0		store i8 42, i8* %ptr, !invariant.group !0
call void @foo(i8* %ptr)		call void @foo(i8* %ptr)

%a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change		%a = load i8, i8* %ptr, !invariant.group !0 ; Can assume that value under %ptr didn't change
call void @foo(i8* %ptr)		call void @foo(i8* %ptr)
%b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed		%b = load i8, i8* %ptr, !invariant.group !1 ; Can't assume anything, because group changed

%newPtr = call i8* @getPointer(i8* %ptr)		%newPtr = call i8* @getPointer(i8* %ptr)
%c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr		%c = load i8, i8* %newPtr, !invariant.group !0 ; Can't assume anything, because we only have information about %ptr

%unknownValue = load i8, i8* @unknownPtr		%unknownValue = load i8, i8* @unknownPtr
store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42		store i8 %unknownValue, i8* %ptr, !invariant.group !0 ; Can assume that %unknownValue == 42

call void @foo(i8* %ptr)		call void @foo(i8* %ptr)
%newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)		%newPtr2 = call i8* @llvm.invariant.group.barrier(i8* %ptr)
%d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through invariant.group.barrier to get value of %ptr		%d = load i8, i8* %newPtr2, !invariant.group !0 ; Can't step through invariant.group.barrier to get value of %ptr

...		...
declare void @foo(i8*)		declare void @foo(i8*)
declare i8* @getPointer(i8*)		declare i8* @getPointer(i8*)
declare i8* @llvm.invariant.group.barrier(i8*)		declare i8* @llvm.invariant.group.barrier(i8*)

!0 = !{!"magic ptr"}		!0 = !{!"magic ptr"}
!1 = !{!"other ptr"}		!1 = !{!"other ptr"}

The invariant.group metadata must be dropped when replacing one pointer by		The invariant.group metadata must be dropped when replacing one pointer by
another based on aliasing information. This is because invariant.group is tied		another based on aliasing information. This is because invariant.group is tied
to the SSA value of the pointer operand.		to the SSA value of the pointer operand.

.. code-block:: llvm		.. code-block:: llvm

%v = load i8, i8* %x, !invariant.group !0		%v = load i8, i8* %x, !invariant.group !0
; if %x mustalias %y then we can replace the above instruction with		; if %x mustalias %y then we can replace the above instruction with
%v = load i8, i8* %y		%v = load i8, i8* %y


'``type``' Metadata		'``type``' Metadata
^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^

▲ Show 20 Lines • Show All 1,437 Lines • ▼ Show 20 Lines
""""""""""		""""""""""

This instruction returns the unsigned integer remainder of a division.		This instruction returns the unsigned integer remainder of a division.
This instruction always performs an unsigned division to get the		This instruction always performs an unsigned division to get the
remainder.		remainder.

Note that unsigned integer remainder and signed integer remainder are		Note that unsigned integer remainder and signed integer remainder are
distinct operations; for signed integer remainder, use '``srem``'.		distinct operations; for signed integer remainder, use '``srem``'.

Taking the remainder of a division by zero is undefined behavior.		Taking the remainder of a division by zero is undefined behavior.
For vectors, if any element of the divisor is zero, the operation has		For vectors, if any element of the divisor is zero, the operation has
undefined behavior.		undefined behavior.

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = urem i32 4, %var ; yields i32:result = 4 % %var		<result> = urem i32 4, %var ; yields i32:result = 4 % %var
Show All 35 Lines
table of how this is implemented in various languages, please see		table of how this is implemented in various languages, please see
`Wikipedia: modulo		`Wikipedia: modulo
operation <http://en.wikipedia.org/wiki/Modulo_operation>`_.		operation <http://en.wikipedia.org/wiki/Modulo_operation>`_.

Note that signed integer remainder and unsigned integer remainder are		Note that signed integer remainder and unsigned integer remainder are
distinct operations; for unsigned integer remainder, use '``urem``'.		distinct operations; for unsigned integer remainder, use '``urem``'.

Taking the remainder of a division by zero is undefined behavior.		Taking the remainder of a division by zero is undefined behavior.
For vectors, if any element of the divisor is zero, the operation has		For vectors, if any element of the divisor is zero, the operation has
undefined behavior.		undefined behavior.
Overflow also leads to undefined behavior; this is a rare case, but can		Overflow also leads to undefined behavior; this is a rare case, but can
occur, for example, by taking the remainder of a 32-bit division of		occur, for example, by taking the remainder of a 32-bit division of
-2147483648 by -1. (The remainder doesn't actually overflow, but this		-2147483648 by -1. (The remainder doesn't actually overflow, but this
rule lets srem be implemented using instructions that return both the		rule lets srem be implemented using instructions that return both the
result of the division and the remainder.)		result of the division and the remainder.)

Example:		Example:
▲ Show 20 Lines • Show All 856 Lines • ▼ Show 20 Lines
The optional ``!nontemporal`` metadata must reference a single metadata		The optional ``!nontemporal`` metadata must reference a single metadata
name ``<index>`` corresponding to a metadata node with one ``i32`` entry of		name ``<index>`` corresponding to a metadata node with one ``i32`` entry of
value 1. The existence of the ``!nontemporal`` metadata on the instruction		value 1. The existence of the ``!nontemporal`` metadata on the instruction
tells the optimizer and code generator that this load is not expected to		tells the optimizer and code generator that this load is not expected to
be reused in the cache. The code generator may select special		be reused in the cache. The code generator may select special
instructions to save cache bandwidth, such as the ``MOVNT`` instruction on		instructions to save cache bandwidth, such as the ``MOVNT`` instruction on
x86.		x86.

The optional ``!invariant.group`` metadata must reference a		The optional ``!invariant.group`` metadata must reference a
single metadata name ``<index>``. See ``invariant.group`` metadata.		single metadata name ``<index>``. See ``invariant.group`` metadata.

Semantics:		Semantics:
""""""""""		""""""""""

The contents of memory are updated to contain ``<value>`` at the		The contents of memory are updated to contain ``<value>`` at the
location specified by the ``<pointer>`` operand. If ``<value>`` is		location specified by the ``<pointer>`` operand. If ``<value>`` is
of scalar type then the number of bytes written does not exceed the		of scalar type then the number of bytes written does not exceed the
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines

Arguments:		Arguments:
""""""""""		""""""""""

There are three arguments to the '``cmpxchg``' instruction: an address		There are three arguments to the '``cmpxchg``' instruction: an address
to operate on, a value to compare to the value currently be at that		to operate on, a value to compare to the value currently be at that
address, and a new value to place at that address if the compared values		address, and a new value to place at that address if the compared values
are equal. The type of '<cmp>' must be an integer or pointer type whose		are equal. The type of '<cmp>' must be an integer or pointer type whose
bit width is a power of two greater than or equal to eight and less		bit width is a power of two greater than or equal to eight and less
than or equal to a target-specific size limit. '<cmp>' and '<new>' must		than or equal to a target-specific size limit. '<cmp>' and '<new>' must
have the same type, and the type of '<pointer>' must be a pointer to		have the same type, and the type of '<pointer>' must be a pointer to
that type. If the ``cmpxchg`` is marked as ``volatile``, then the		that type. If the ``cmpxchg`` is marked as ``volatile``, then the
optimizer is not allowed to modify the number or order of execution of		optimizer is not allowed to modify the number or order of execution of
this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.		this ``cmpxchg`` with other :ref:`volatile operations <volatile>`.

The success and failure :ref:`ordering <ordering>` arguments specify how this		The success and failure :ref:`ordering <ordering>` arguments specify how this
``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters		``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters
must be at least ``monotonic``, the ordering constraint on failure must be no		must be at least ``monotonic``, the ordering constraint on failure must be no
stronger than that on success, and the failure ordering cannot be either		stronger than that on success, and the failure ordering cannot be either
``release`` or ``acq_rel``.		``release`` or ``acq_rel``.
▲ Show 20 Lines • Show All 1,271 Lines • ▼ Show 20 Lines	- Option ``-tailcallopt`` is enabled, or
``llvm::GuaranteedTailCallOpt`` is ``true``.		``llvm::GuaranteedTailCallOpt`` is ``true``.
- `Platform-specific constraints are		- `Platform-specific constraints are
met. <CodeGenerator.html#tailcallopt>`_		met. <CodeGenerator.html#tailcallopt>`_

#. The optional ``notail`` marker indicates that the optimizers should not add		#. The optional ``notail`` marker indicates that the optimizers should not add
``tail`` or ``musttail`` markers to the call. It is used to prevent tail		``tail`` or ``musttail`` markers to the call. It is used to prevent tail
call optimization from being performed on the call.		call optimization from being performed on the call.

#. The optional ``fast-math flags`` marker indicates that the call has one or more		#. The optional ``fast-math flags`` marker indicates that the call has one or more
:ref:`fast-math flags <fastmath>`, which are optimization hints to enable		:ref:`fast-math flags <fastmath>`, which are optimization hints to enable
otherwise unsafe floating-point optimizations. Fast-math flags are only valid		otherwise unsafe floating-point optimizations. Fast-math flags are only valid
for calls that return a floating-point scalar or vector type.		for calls that return a floating-point scalar or vector type.

#. The optional "cconv" marker indicates which :ref:`calling		#. The optional "cconv" marker indicates which :ref:`calling
convention <callingconv>` the call should use. If none is		convention <callingconv>` the call should use. If none is
specified, the call defaults to using C calling conventions. The		specified, the call defaults to using C calling conventions. The
calling convention of the call must match the calling convention of		calling convention of the call must match the calling convention of
▲ Show 20 Lines • Show All 3,725 Lines • ▼ Show 20 Lines

::		::

declare i8* @llvm.invariant.group.barrier(i8* <ptr>)		declare i8* @llvm.invariant.group.barrier(i8* <ptr>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant		The '``llvm.invariant.group.barrier``' intrinsic can be used when an invariant
established by invariant.group metadata no longer holds, to obtain a new pointer		established by invariant.group metadata no longer holds, to obtain a new pointer
value that does not carry the invariant information.		value that does not carry the invariant information.


Arguments:		Arguments:
""""""""""		""""""""""

The ``llvm.invariant.group.barrier`` takes only one argument, which is		The ``llvm.invariant.group.barrier`` takes only one argument, which is
the pointer to the memory for which the ``invariant.group`` no longer holds.		the pointer to the memory for which the ``invariant.group`` no longer holds.

Semantics:		Semantics:
""""""""""		""""""""""

Returns another pointer that aliases its argument but which is considered different		Returns another pointer that aliases its argument but which is considered different
for the purposes of ``load``/``store`` ``invariant.group`` metadata.		for the purposes of ``load``/``store`` ``invariant.group`` metadata.

Constrained Floating Point Intrinsics		Constrained Floating Point Intrinsics
-------------------------------------		-------------------------------------

These intrinsics are used to provide special handling of floating point		These intrinsics are used to provide special handling of floating point
operations when specific rounding mode or floating point exception behavior is		operations when specific rounding mode or floating point exception behavior is
required. By default, LLVM optimization passes assume that the rounding mode is		required. By default, LLVM optimization passes assume that the rounding mode is
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
original code. For example, exceptions may be potentially hidden by constant		original code. For example, exceptions may be potentially hidden by constant
folding.		folding.

If the exception behavior argument is "fpexcept.strict" all transformations must		If the exception behavior argument is "fpexcept.strict" all transformations must
strictly preserve the floating point exception semantics of the original code.		strictly preserve the floating point exception semantics of the original code.
Any FP exception that would have been raised by the original code must be raised		Any FP exception that would have been raised by the original code must be raised
by the transformed code, and the transformed code must not raise any FP		by the transformed code, and the transformed code must not raise any FP
exceptions that would not have been raised by the original code. This is the		exceptions that would not have been raised by the original code. This is the
exception behavior argument that will be used if the code being compiled reads		exception behavior argument that will be used if the code being compiled reads
the FP exception status flags, but this mode can also be used with code that		the FP exception status flags, but this mode can also be used with code that
unmasks FP exceptions.		unmasks FP exceptions.

The number and order of floating point exceptions is NOT guaranteed. For		The number and order of floating point exceptions is NOT guaranteed. For
example, a series of FP operations that each may raise exceptions may be		example, a series of FP operations that each may raise exceptions may be
vectorized into a single instruction that raises each unique exception a single		vectorized into a single instruction that raises each unique exception a single
time.		time.


'``llvm.experimental.constrained.fadd``' Intrinsic		'``llvm.experimental.constrained.fadd``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.fadd(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fadd``' intrinsic returns the sum of its		The '``llvm.experimental.constrained.fadd``' intrinsic returns the sum of its
Show All 20 Lines
'``llvm.experimental.constrained.fsub``' Intrinsic		'``llvm.experimental.constrained.fsub``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.fsub(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fsub``' intrinsic returns the difference		The '``llvm.experimental.constrained.fsub``' intrinsic returns the difference
Show All 20 Lines
'``llvm.experimental.constrained.fmul``' Intrinsic		'``llvm.experimental.constrained.fmul``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.fmul(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fmul``' intrinsic returns the product of		The '``llvm.experimental.constrained.fmul``' intrinsic returns the product of
Show All 20 Lines
'``llvm.experimental.constrained.fdiv``' Intrinsic		'``llvm.experimental.constrained.fdiv``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.fdiv(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.fdiv``' intrinsic returns the quotient of		The '``llvm.experimental.constrained.fdiv``' intrinsic returns the quotient of
Show All 20 Lines
'``llvm.experimental.constrained.frem``' Intrinsic		'``llvm.experimental.constrained.frem``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder		The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder
Show All 12 Lines
the result of frem is never rounded, but the argument is included for		the result of frem is never rounded, but the argument is included for
consistency with the other constrained floating point intrinsics.		consistency with the other constrained floating point intrinsics.

Semantics:		Semantics:
""""""""""		""""""""""

The value produced is the floating point remainder from the division of the two		The value produced is the floating point remainder from the division of the two
value operands and has the same type as the operands. The remainder has the		value operands and has the same type as the operands. The remainder has the
same sign as the dividend.		same sign as the dividend.

		'``llvm.experimental.constrained.fma``' Intrinsic
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""

		::

		declare <type>
		@llvm.experimental.constrained.fma(<type> <op1>, <type> <op2>, <type> <op3>,
		metadata <rounding mode>,
		metadata <exception behavior>)

		Overview:
		"""""""""

		The '``llvm.experimental.constrained.fma``' intrinsic returns the result of a
		fused-multiply-add operation on its operands.
		b-sumnerUnsubmitted Not Done Reply Inline Actions How about "...returns the result of a fused-multiply-add operation on its operands."? b-sumner: How about "...returns the result of a fused-multiply-add operation on its operands."?

		Arguments:
		""""""""""

		The first three arguments to the '``llvm.experimental.constrained.fma``'
		intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
		<t_vector>` of floating point values. All arguments must have identical types.

		The fourth and fifth arguments specify the rounding mode and exception behavior
		as described above.

		Semantics:
		b-sumnerUnsubmitted Done Reply Inline Actions Too much cut and paste from frem b-sumner: Too much cut and paste from frem
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I'm not sure it's clear what the comment "Note that the rounding happens only once here" means in this context. The rounding mode argument provides information to the optimizer and does not have any functional effect. I hope that this is straightforward enough with the other intrinsics that the terse comments there were sufficient. In the case of the constrained fma intrinsic, it is worth mentioning that any actions the optimizer performs on the intrinsic must be consistent with the rounding behavior of an fma instruction. For instance, the optimizer cannot perform constant folding where a rounded multiply is performed followed a rounded add -- the rounding must be atomic. Perhaps that is what you intended to say here. If so, I believe a more verbose statement is needed. andrew.w.kaylor: I'm not sure it's clear what the comment "Note that the rounding happens only once here" means…
		""""""""""

		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think it would be a good idea to discuss here the circumstances under which this intrinsic can be formed. Specifically, what is the relationship between rounding mode control and the fp-contract setting. If strict rounding behavior is required within a scope, but fusing is enabled globally within the compilation unit does the rounding requirement override the fp-contract setting? I think it should. Also, what are the expected exception semantics? If a scope is governed by strict exception behavior, how will the FP status flags be handled if a multiply and an add are fused? I believe what is required is that if either operation would have set an FP status flag then the fused operation must also set that flag, and no flag should be set by the fused operation that would not have been set by either of the two operations separately. andrew.w.kaylor: I think it would be a good idea to discuss here the circumstances under which this intrinsic…
		The result produced is the product of the first two operands added to the third
		operand computed with infinite precision, and then rounded to the target
		b-sumnerUnsubmitted Done Reply Inline Actions rounding only once b-sumner: rounding only once
		b-sumnerUnsubmitted Done Reply Inline Actions How about "The result produced is the product of the first two operands added to the third operand computed with infinite precision, and then rounded to the target precision." b-sumner: How about "The result produced is the product of the first two operands added to the third…
		precision.
		b-sumnerUnsubmitted Done Reply Inline Actions Extra period b-sumner: Extra period

Constrained libm-equivalent Intrinsics		Constrained libm-equivalent Intrinsics
--------------------------------------		--------------------------------------

In addition to the basic floating point operations for which constrained		In addition to the basic floating point operations for which constrained
intrinsics are described above, there are constrained versions of various		intrinsics are described above, there are constrained versions of various
operations which provide equivalent behavior to a corresponding libm function.		operations which provide equivalent behavior to a corresponding libm function.
These intrinsics allow the precise behavior of these operations with respect to		These intrinsics allow the precise behavior of these operations with respect to
rounding mode and exception behavior to be controlled.		rounding mode and exception behavior to be controlled.

As with the basic constrained floating point intrinsics, the rounding mode		As with the basic constrained floating point intrinsics, the rounding mode
and exception behavior arguments only control the behavior of the optimizer.		and exception behavior arguments only control the behavior of the optimizer.
They do not change the runtime floating point environment.		They do not change the runtime floating point environment.


'``llvm.experimental.constrained.sqrt``' Intrinsic		'``llvm.experimental.constrained.sqrt``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.sqrt(<type> <op1>,		@llvm.experimental.constrained.sqrt(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.sqrt``' intrinsic returns the square root		The '``llvm.experimental.constrained.sqrt``' intrinsic returns the square root
Show All 20 Lines
'``llvm.experimental.constrained.pow``' Intrinsic		'``llvm.experimental.constrained.pow``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,		@llvm.experimental.constrained.pow(<type> <op1>, <type> <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.pow``' intrinsic returns the first operand		The '``llvm.experimental.constrained.pow``' intrinsic returns the first operand
Show All 20 Lines
'``llvm.experimental.constrained.powi``' Intrinsic		'``llvm.experimental.constrained.powi``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,		@llvm.experimental.constrained.powi(<type> <op1>, i32 <op2>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand		The '``llvm.experimental.constrained.powi``' intrinsic returns the first operand
Show All 22 Lines
'``llvm.experimental.constrained.sin``' Intrinsic		'``llvm.experimental.constrained.sin``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.sin(<type> <op1>,		@llvm.experimental.constrained.sin(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.sin``' intrinsic returns the sine of the		The '``llvm.experimental.constrained.sin``' intrinsic returns the sine of the
Show All 19 Lines
'``llvm.experimental.constrained.cos``' Intrinsic		'``llvm.experimental.constrained.cos``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.cos(<type> <op1>,		@llvm.experimental.constrained.cos(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.cos``' intrinsic returns the cosine of the		The '``llvm.experimental.constrained.cos``' intrinsic returns the cosine of the
Show All 19 Lines
'``llvm.experimental.constrained.exp``' Intrinsic		'``llvm.experimental.constrained.exp``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.exp(<type> <op1>,		@llvm.experimental.constrained.exp(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.exp``' intrinsic computes the base-e		The '``llvm.experimental.constrained.exp``' intrinsic computes the base-e
Show All 18 Lines
'``llvm.experimental.constrained.exp2``' Intrinsic		'``llvm.experimental.constrained.exp2``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.exp2(<type> <op1>,		@llvm.experimental.constrained.exp2(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.exp2``' intrinsic computes the base-2		The '``llvm.experimental.constrained.exp2``' intrinsic computes the base-2
Show All 19 Lines
'``llvm.experimental.constrained.log``' Intrinsic		'``llvm.experimental.constrained.log``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.log(<type> <op1>,		@llvm.experimental.constrained.log(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.log``' intrinsic computes the base-e		The '``llvm.experimental.constrained.log``' intrinsic computes the base-e
Show All 19 Lines
'``llvm.experimental.constrained.log10``' Intrinsic		'``llvm.experimental.constrained.log10``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.log10(<type> <op1>,		@llvm.experimental.constrained.log10(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.log10``' intrinsic computes the base-10		The '``llvm.experimental.constrained.log10``' intrinsic computes the base-10
Show All 18 Lines
'``llvm.experimental.constrained.log2``' Intrinsic		'``llvm.experimental.constrained.log2``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.log2(<type> <op1>,		@llvm.experimental.constrained.log2(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.log2``' intrinsic computes the base-2		The '``llvm.experimental.constrained.log2``' intrinsic computes the base-2
Show All 18 Lines
'``llvm.experimental.constrained.rint``' Intrinsic		'``llvm.experimental.constrained.rint``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.rint(<type> <op1>,		@llvm.experimental.constrained.rint(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.rint``' intrinsic returns the first		The '``llvm.experimental.constrained.rint``' intrinsic returns the first
Show All 22 Lines
'``llvm.experimental.constrained.nearbyint``' Intrinsic		'``llvm.experimental.constrained.nearbyint``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare <type>		declare <type>
@llvm.experimental.constrained.nearbyint(<type> <op1>,		@llvm.experimental.constrained.nearbyint(<type> <op1>,
metadata <rounding mode>,		metadata <rounding mode>,
metadata <exception behavior>)		metadata <exception behavior>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first		The '``llvm.experimental.constrained.nearbyint``' intrinsic returns the first
▲ Show 20 Lines • Show All 723 Lines • ▼ Show 20 Lines

Semantics:		Semantics:
""""""""""		""""""""""

The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of		The '``llvm.memcpy.element.unordered.atomic.*``' intrinsic copies ``len`` bytes of
memory from the source location to the destination location. These locations are not		memory from the source location to the destination location. These locations are not
allowed to overlap. The memory copy is performed as a sequence of load/store operations		allowed to overlap. The memory copy is performed as a sequence of load/store operations
where each access is guaranteed to be a multiple of ``element_size`` bytes wide and		where each access is guaranteed to be a multiple of ``element_size`` bytes wide and
aligned at an ``element_size`` boundary.		aligned at an ``element_size`` boundary.

The order of the copy is unspecified. The same value may be read from the source		The order of the copy is unspecified. The same value may be read from the source
buffer many times, but only one write is issued to the destination buffer per		buffer many times, but only one write is issued to the destination buffer per
element. It is well defined to have concurrent reads and writes to both source and		element. It is well defined to have concurrent reads and writes to both source and
destination provided those reads and writes are unordered atomic when specified.		destination provided those reads and writes are unordered atomic when specified.

This intrinsic does not provide any additional ordering guarantees over those		This intrinsic does not provide any additional ordering guarantees over those
provided by a set of unordered loads from the source location and stores to the		provided by a set of unordered loads from the source location and stores to the
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

Semantics:		Semantics:
""""""""""		""""""""""

The '``llvm.memmove.element.unordered.atomic.*``' intrinsic copies ``len`` bytes		The '``llvm.memmove.element.unordered.atomic.*``' intrinsic copies ``len`` bytes
of memory from the source location to the destination location. These locations		of memory from the source location to the destination location. These locations
are allowed to overlap. The memory copy is performed as a sequence of load/store		are allowed to overlap. The memory copy is performed as a sequence of load/store
operations where each access is guaranteed to be a multiple of ``element_size``		operations where each access is guaranteed to be a multiple of ``element_size``
bytes wide and aligned at an ``element_size`` boundary.		bytes wide and aligned at an ``element_size`` boundary.

The order of the copy is unspecified. The same value may be read from the source		The order of the copy is unspecified. The same value may be read from the source
buffer many times, but only one write is issued to the destination buffer per		buffer many times, but only one write is issued to the destination buffer per
element. It is well defined to have concurrent reads and writes to both source		element. It is well defined to have concurrent reads and writes to both source
and destination provided those reads and writes are unordered atomic when		and destination provided those reads and writes are unordered atomic when
specified.		specified.

This intrinsic does not provide any additional ordering guarantees over those		This intrinsic does not provide any additional ordering guarantees over those
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
the destination pointer is aligned to that boundary.		the destination pointer is aligned to that boundary.

Semantics:		Semantics:
""""""""""		""""""""""

The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of		The '``llvm.memset.element.unordered.atomic.*``' intrinsic sets the ``len`` bytes of
memory starting at the destination location to the given ``value``. The memory is		memory starting at the destination location to the given ``value``. The memory is
set with a sequence of store operations where each access is guaranteed to be a		set with a sequence of store operations where each access is guaranteed to be a
multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.		multiple of ``element_size`` bytes wide and aligned at an ``element_size`` boundary.

The order of the assignment is unspecified. Only one write is issued to the		The order of the assignment is unspecified. Only one write is issued to the
destination buffer per element. It is well defined to have concurrent reads and		destination buffer per element. It is well defined to have concurrent reads and
writes to the destination provided those reads and writes are unordered atomic		writes to the destination provided those reads and writes are unordered atomic
when specified.		when specified.

This intrinsic does not provide any additional ordering guarantees over those		This intrinsic does not provide any additional ordering guarantees over those
provided by a set of unordered stores to the destination.		provided by a set of unordered stores to the destination.
Show All 10 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	enum NodeType {
/// Simple binary floating point operators.		/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,		FADD, FSUB, FMUL, FDIV, FREM,

/// Constrained versions of the binary floating point operators.		/// Constrained versions of the binary floating point operators.
/// These will be lowered to the simple operators before final selection.		/// These will be lowered to the simple operators before final selection.
/// They are used to limit optimizations while the DAG is being		/// They are used to limit optimizations while the DAG is being
/// optimized.		/// optimized.
STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,		STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
		STRICT_FMA,

		craig.topperUnsubmitted Done Reply Inline Actions Please keep the blank line here. craig.topper: Please keep the blank line here.
/// Constrained versions of libm-equivalent floating point intrinsics.		/// Constrained versions of libm-equivalent floating point intrinsics.
/// These will be lowered to the equivalent non-constrained pseudo-op		/// These will be lowered to the equivalent non-constrained pseudo-op
/// (or expanded to the equivalent library call) before final selection.		/// (or expanded to the equivalent library call) before final selection.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,		STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,
STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,		STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,
STRICT_FRINT, STRICT_FNEARBYINT,		STRICT_FRINT, STRICT_FNEARBYINT,

▲ Show 20 Lines • Show All 714 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 617 Lines • ▼ Show 20 Lines	bool isMemIntrinsic() const {
return (NodeType == ISD::INTRINSIC_W_CHAIN \|\|		return (NodeType == ISD::INTRINSIC_W_CHAIN \|\|
NodeType == ISD::INTRINSIC_VOID) &&		NodeType == ISD::INTRINSIC_VOID) &&
SDNodeBits.IsMemIntrinsic;		SDNodeBits.IsMemIntrinsic;
}		}

/// Test if this node is a strict floating point pseudo-op.		/// Test if this node is a strict floating point pseudo-op.
bool isStrictFPOpcode() {		bool isStrictFPOpcode() {
switch (NodeType) {		switch (NodeType) {
default:		default:
return false;		return false;
case ISD::STRICT_FADD:		case ISD::STRICT_FADD:
case ISD::STRICT_FSUB:		case ISD::STRICT_FSUB:
case ISD::STRICT_FMUL:		case ISD::STRICT_FMUL:
case ISD::STRICT_FDIV:		case ISD::STRICT_FDIV:
case ISD::STRICT_FREM:		case ISD::STRICT_FREM:
		case ISD::STRICT_FMA:
case ISD::STRICT_FSQRT:		case ISD::STRICT_FSQRT:
case ISD::STRICT_FPOW:		case ISD::STRICT_FPOW:
case ISD::STRICT_FPOWI:		case ISD::STRICT_FPOWI:
case ISD::STRICT_FSIN:		case ISD::STRICT_FSIN:
case ISD::STRICT_FCOS:		case ISD::STRICT_FCOS:
case ISD::STRICT_FEXP:		case ISD::STRICT_FEXP:
case ISD::STRICT_FEXP2:		case ISD::STRICT_FEXP2:
case ISD::STRICT_FLOG:		case ISD::STRICT_FLOG:
▲ Show 20 Lines • Show All 1,689 Lines • Show Last 20 Lines

include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
enum ExceptionBehavior {		enum ExceptionBehavior {
ebInvalid,		ebInvalid,
ebIgnore,		ebIgnore,
ebMayTrap,		ebMayTrap,
ebStrict		ebStrict
};		};

bool isUnaryOp() const;		bool isUnaryOp() const;
		bool isTernaryOp() const;
RoundingMode getRoundingMode() const;		RoundingMode getRoundingMode() const;
ExceptionBehavior getExceptionBehavior() const;		ExceptionBehavior getExceptionBehavior() const;

// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
▲ Show 20 Lines • Show All 546 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 484 Lines • ▼ Show 20 Lines	def int_experimental_constrained_fdiv : Intrinsic<[ llvm_anyfloat_ty ],
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
def int_experimental_constrained_frem : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_frem : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

		def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

// These intrinsics are sensitive to the rounding mode so we need constrained		// These intrinsics are sensitive to the rounding mode so we need constrained
// versions of each of them. When strict rounding and exception control are		// versions of each of them. When strict rounding and exception control are
// not required the non-constrained versions of these intrinsics should be		// not required the non-constrained versions of these intrinsics should be
// used.		// used.
def int_experimental_constrained_sqrt : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_sqrt : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
▲ Show 20 Lines • Show All 455 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 901 Lines • ▼ Show 20 Lines
static TargetLowering::LegalizeAction		static TargetLowering::LegalizeAction
getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {		getStrictFPOpcodeAction(const TargetLowering &TLI, unsigned Opcode, EVT VT) {
unsigned EqOpc;		unsigned EqOpc;
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("Unexpected FP pseudo-opcode");		default: llvm_unreachable("Unexpected FP pseudo-opcode");
case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;		case ISD::STRICT_FSQRT: EqOpc = ISD::FSQRT; break;
case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;		case ISD::STRICT_FPOW: EqOpc = ISD::FPOW; break;
case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;		case ISD::STRICT_FPOWI: EqOpc = ISD::FPOWI; break;
		case ISD::STRICT_FMA: EqOpc = ISD::FMA; break;
case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;		case ISD::STRICT_FSIN: EqOpc = ISD::FSIN; break;
case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;		case ISD::STRICT_FCOS: EqOpc = ISD::FCOS; break;
case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;		case ISD::STRICT_FEXP: EqOpc = ISD::FEXP; break;
case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;		case ISD::STRICT_FEXP2: EqOpc = ISD::FEXP2; break;
case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;		case ISD::STRICT_FLOG: EqOpc = ISD::FLOG; break;
case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;		case ISD::STRICT_FLOG10: EqOpc = ISD::FLOG10; break;
case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;		case ISD::STRICT_FLOG2: EqOpc = ISD::FLOG2; break;
case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;		case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;
▲ Show 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	if (Action == TargetLowering::Expand) {
NewVal = DAG.getNode(ISD::TRAP, SDLoc(Node), Node->getVTList(),		NewVal = DAG.getNode(ISD::TRAP, SDLoc(Node), Node->getVTList(),
Node->getOperand(0));		Node->getOperand(0));
ReplaceNode(Node, NewVal.getNode());		ReplaceNode(Node, NewVal.getNode());
LegalizeOp(NewVal.getNode());		LegalizeOp(NewVal.getNode());
return;		return;
}		}
break;		break;
case ISD::STRICT_FSQRT:		case ISD::STRICT_FSQRT:
		case ISD::STRICT_FMA:
case ISD::STRICT_FPOW:		case ISD::STRICT_FPOW:
case ISD::STRICT_FPOWI:		case ISD::STRICT_FPOWI:
case ISD::STRICT_FSIN:		case ISD::STRICT_FSIN:
case ISD::STRICT_FCOS:		case ISD::STRICT_FCOS:
case ISD::STRICT_FEXP:		case ISD::STRICT_FEXP:
case ISD::STRICT_FEXP2:		case ISD::STRICT_FEXP2:
case ISD::STRICT_FLOG:		case ISD::STRICT_FLOG:
case ISD::STRICT_FLOG10:		case ISD::STRICT_FLOG10:
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	if (StoreSDNode *ST = dyn_cast<StoreSDNode>(User)) {
// Make sure that nothing else could have stored into the destination of		// Make sure that nothing else could have stored into the destination of
// this store.		// this store.
if (!ST->getChain().reachesChainWithoutSideEffects(DAG.getEntryNode()))		if (!ST->getChain().reachesChainWithoutSideEffects(DAG.getEntryNode()))
continue;		continue;

// If the index is dependent on the store we will introduce a cycle when		// If the index is dependent on the store we will introduce a cycle when
// creating the load (the load uses the index, and by replacing the chain		// creating the load (the load uses the index, and by replacing the chain
// we will make the index dependent on the load). Also, the store might be		// we will make the index dependent on the load). Also, the store might be
// dependent on the extractelement and introduce a cycle when creating		// dependent on the extractelement and introduce a cycle when creating
// the load.		// the load.
if (SDNode::hasPredecessorHelper(ST, Visited, Worklist) \|\|		if (SDNode::hasPredecessorHelper(ST, Visited, Worklist) \|\|
ST->hasPredecessor(Op.getNode()))		ST->hasPredecessor(Op.getNode()))
continue;		continue;

StackPtr = ST->getBasePtr();		StackPtr = ST->getBasePtr();
Ch = SDValue(ST, 0);		Ch = SDValue(ST, 0);
break;		break;
▲ Show 20 Lines • Show All 2,803 Lines • ▼ Show 20 Lines	case ISD::FREM:
Results.push_back(ExpandFPLibCall(Node, RTLIB::REM_F32, RTLIB::REM_F64,		Results.push_back(ExpandFPLibCall(Node, RTLIB::REM_F32, RTLIB::REM_F64,
RTLIB::REM_F80, RTLIB::REM_F128,		RTLIB::REM_F80, RTLIB::REM_F128,
RTLIB::REM_PPCF128));		RTLIB::REM_PPCF128));
break;		break;
case ISD::FMA:		case ISD::FMA:
Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,		Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
RTLIB::FMA_F80, RTLIB::FMA_F128,		RTLIB::FMA_F80, RTLIB::FMA_F128,
RTLIB::FMA_PPCF128));		RTLIB::FMA_PPCF128));
		case ISD::STRICT_FMA:
		Results.push_back(ExpandFPLibCall(Node, RTLIB::FMA_F32, RTLIB::FMA_F64,
		RTLIB::FMA_F80, RTLIB::FMA_F128,
		RTLIB::FMA_PPCF128));
break;		break;
case ISD::FADD:		case ISD::FADD:
Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,		Results.push_back(ExpandFPLibCall(Node, RTLIB::ADD_F32, RTLIB::ADD_F64,
RTLIB::ADD_F80, RTLIB::ADD_F128,		RTLIB::ADD_F80, RTLIB::ADD_F128,
RTLIB::ADD_PPCF128));		RTLIB::ADD_PPCF128));
break;		break;
case ISD::FMUL:		case ISD::FMUL:
Results.push_back(ExpandFPLibCall(Node, RTLIB::MUL_F32, RTLIB::MUL_F64,		Results.push_back(ExpandFPLibCall(Node, RTLIB::MUL_F32, RTLIB::MUL_F64,
▲ Show 20 Lines • Show All 577 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,634 Lines • ▼ Show 20 Lines	if (IP)
CSEMap.InsertNode(N, IP); // Memoize the new node.		CSEMap.InsertNode(N, IP); // Memoize the new node.
return N;		return N;
}		}

SDNode* SelectionDAG::mutateStrictFPToFP(SDNode *Node) {		SDNode* SelectionDAG::mutateStrictFPToFP(SDNode *Node) {
unsigned OrigOpc = Node->getOpcode();		unsigned OrigOpc = Node->getOpcode();
unsigned NewOpc;		unsigned NewOpc;
bool IsUnary = false;		bool IsUnary = false;
		bool IsTernary = false;
switch (OrigOpc) {		switch (OrigOpc) {
default:		default:
llvm_unreachable("mutateStrictFPToFP called with unexpected opcode!");		llvm_unreachable("mutateStrictFPToFP called with unexpected opcode!");
case ISD::STRICT_FADD: NewOpc = ISD::FADD; break;		case ISD::STRICT_FADD: NewOpc = ISD::FADD; break;
case ISD::STRICT_FSUB: NewOpc = ISD::FSUB; break;		case ISD::STRICT_FSUB: NewOpc = ISD::FSUB; break;
case ISD::STRICT_FMUL: NewOpc = ISD::FMUL; break;		case ISD::STRICT_FMUL: NewOpc = ISD::FMUL; break;
case ISD::STRICT_FDIV: NewOpc = ISD::FDIV; break;		case ISD::STRICT_FDIV: NewOpc = ISD::FDIV; break;
case ISD::STRICT_FREM: NewOpc = ISD::FREM; break;		case ISD::STRICT_FREM: NewOpc = ISD::FREM; break;
		case ISD::STRICT_FMA: NewOpc = ISD::FMA; IsTernary = true; break;
		b-sumnerUnsubmitted Done Reply Inline Actions This is a ternary operation. Code below assumes unary or binary. b-sumner: This is a ternary operation. Code below assumes unary or binary.
		wdngAuthorUnsubmitted Done Reply Inline Actions In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to FMA and IsUnary is false by default. So we may not need to specify whether is unary of binary here? wdng: In Intrinsics.td, we have defined fma is a ternary operator. Here it only mutates STRICT_FMA to…
		b-sumnerUnsubmitted Done Reply Inline Actions Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to MorphNodeTo for the FMA case? b-sumner: Please take a look at lines 6676 - 6680 below. Do you not need to pass a 3 element list to…
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions You definitely need to add code to handle the third argument. andrew.w.kaylor: You definitely need to add code to handle the third argument.
case ISD::STRICT_FSQRT: NewOpc = ISD::FSQRT; IsUnary = true; break;		case ISD::STRICT_FSQRT: NewOpc = ISD::FSQRT; IsUnary = true; break;
case ISD::STRICT_FPOW: NewOpc = ISD::FPOW; break;		case ISD::STRICT_FPOW: NewOpc = ISD::FPOW; break;
case ISD::STRICT_FPOWI: NewOpc = ISD::FPOWI; break;		case ISD::STRICT_FPOWI: NewOpc = ISD::FPOWI; break;
case ISD::STRICT_FSIN: NewOpc = ISD::FSIN; IsUnary = true; break;		case ISD::STRICT_FSIN: NewOpc = ISD::FSIN; IsUnary = true; break;
case ISD::STRICT_FCOS: NewOpc = ISD::FCOS; IsUnary = true; break;		case ISD::STRICT_FCOS: NewOpc = ISD::FCOS; IsUnary = true; break;
case ISD::STRICT_FEXP: NewOpc = ISD::FEXP; IsUnary = true; break;		case ISD::STRICT_FEXP: NewOpc = ISD::FEXP; IsUnary = true; break;
case ISD::STRICT_FEXP2: NewOpc = ISD::FEXP2; IsUnary = true; break;		case ISD::STRICT_FEXP2: NewOpc = ISD::FEXP2; IsUnary = true; break;
case ISD::STRICT_FLOG: NewOpc = ISD::FLOG; IsUnary = true; break;		case ISD::STRICT_FLOG: NewOpc = ISD::FLOG; IsUnary = true; break;
Show All 10 Lines	SDNode* SelectionDAG::mutateStrictFPToFP(SDNode *Node) {
SDValue InputChain = Node->getOperand(0);		SDValue InputChain = Node->getOperand(0);
SDValue OutputChain = SDValue(Node, 1);		SDValue OutputChain = SDValue(Node, 1);
ReplaceAllUsesOfValueWith(OutputChain, InputChain);		ReplaceAllUsesOfValueWith(OutputChain, InputChain);

SDVTList VTs = getVTList(Node->getOperand(1).getValueType());		SDVTList VTs = getVTList(Node->getOperand(1).getValueType());
SDNode *Res = nullptr;		SDNode *Res = nullptr;
if (IsUnary)		if (IsUnary)
Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });
		else if (IsTernary)
		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
		Node->getOperand(2),
		Node->getOperand(3)});
else		else
Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
Node->getOperand(2) });		Node->getOperand(2) });

// MorphNodeTo can operate in two ways: if an existing node with the		// MorphNodeTo can operate in two ways: if an existing node with the
// specified operands exists, it can just return it. Otherwise, it		// specified operands exists, it can just return it. Otherwise, it
// updates the node in place to have the requested operands.		// updates the node in place to have the requested operands.
if (Res == Node) {		if (Res == Node) {
▲ Show 20 Lines • Show All 1,334 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,426 Lines • ▼ Show 20 Lines	setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
getValue(I.getArgOperand(2))));		getValue(I.getArgOperand(2))));
return nullptr;		return nullptr;
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
▲ Show 20 Lines • Show All 515 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_constrained_fmul:
Opcode = ISD::STRICT_FMUL;		Opcode = ISD::STRICT_FMUL;
break;		break;
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
Opcode = ISD::STRICT_FDIV;		Opcode = ISD::STRICT_FDIV;
break;		break;
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
Opcode = ISD::STRICT_FREM;		Opcode = ISD::STRICT_FREM;
break;		break;
		case Intrinsic::experimental_constrained_fma:
		Opcode = ISD::STRICT_FMA;
		break;
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
Opcode = ISD::STRICT_FSQRT;		Opcode = ISD::STRICT_FSQRT;
break;		break;
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
Opcode = ISD::STRICT_FPOW;		Opcode = ISD::STRICT_FPOW;
break;		break;
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
Opcode = ISD::STRICT_FPOWI;		Opcode = ISD::STRICT_FPOWI;
Show All 29 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Chain = getRoot();		SDValue Chain = getRoot();
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);		ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);
ValueVTs.push_back(MVT::Other); // Out chain		ValueVTs.push_back(MVT::Other); // Out chain

SDVTList VTs = DAG.getVTList(ValueVTs);		SDVTList VTs = DAG.getVTList(ValueVTs);
SDValue Result;		SDValue Result;
if (FPI.isUnaryOp())		if (FPI.isUnaryOp())
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions This code also needs to be updated to handle the case of three value operands. andrew.w.kaylor: This code also needs to be updated to handle the case of three value operands.
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)) });		{ Chain, getValue(FPI.getArgOperand(0)) });
		else if (FPI.isTernaryOp())
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ Chain, getValue(FPI.getArgOperand(0)),
		getValue(FPI.getArgOperand(1)),
		getValue(FPI.getArgOperand(2)) });
else		else
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)),		{ Chain, getValue(FPI.getArgOperand(0)),
getValue(FPI.getArgOperand(1)) });		getValue(FPI.getArgOperand(1)) });

assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
▲ Show 20 Lines • Show All 3,815 Lines • Show Last 20 Lines

lib/IR/IntrinsicInst.cpp

//===-- InstrinsicInst.cpp - Intrinsic Instruction Wrappers ---------------===//		//===-- InstrinsicInst.cpp - Intrinsic Instruction Wrappers ---------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements methods that make it really easy to deal with intrinsic		// This file implements methods that make it really easy to deal with intrinsic
// functions.		// functions.
//		//
// All intrinsic function calls are instances of the call instruction, so these		// All intrinsic function calls are instances of the call instruction, so these
// are all subclasses of the CallInst class. Note that none of these classes		// are all subclasses of the CallInst class. Note that none of these classes
// has state or virtual methods, which is an important part of this gross/neat		// has state or virtual methods, which is an important part of this gross/neat
// hack working.		// hack working.
//		//
// In some cases, arguments to intrinsics need to be generic and are defined as		// In some cases, arguments to intrinsics need to be generic and are defined as
// type pointer to empty struct { }*. To access the real item of interest the		// type pointer to empty struct { }*. To access the real item of interest the
// cast instruction needs to be stripped away.		// cast instruction needs to be stripped away.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	Value *InstrProfIncrementInst::getStep() const {
const Module *M = getModule();		const Module *M = getModule();
LLVMContext &Context = M->getContext();		LLVMContext &Context = M->getContext();
return ConstantInt::get(Type::getInt64Ty(Context), 1);		return ConstantInt::get(Type::getInt64Ty(Context), 1);
}		}

ConstrainedFPIntrinsic::RoundingMode		ConstrainedFPIntrinsic::RoundingMode
ConstrainedFPIntrinsic::getRoundingMode() const {		ConstrainedFPIntrinsic::getRoundingMode() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return rmInvalid;		return rmInvalid;
StringRef RoundingArg = cast<MDString>(MD)->getString();		StringRef RoundingArg = cast<MDString>(MD)->getString();

// For dynamic rounding mode, we use round to nearest but we will set the		// For dynamic rounding mode, we use round to nearest but we will set the
// 'exact' SDNodeFlag so that the value will not be rounded.		// 'exact' SDNodeFlag so that the value will not be rounded.
return StringSwitch<RoundingMode>(RoundingArg)		return StringSwitch<RoundingMode>(RoundingArg)
.Case("round.dynamic", rmDynamic)		.Case("round.dynamic", rmDynamic)
.Case("round.tonearest", rmToNearest)		.Case("round.tonearest", rmToNearest)
.Case("round.downward", rmDownward)		.Case("round.downward", rmDownward)
.Case("round.upward", rmUpward)		.Case("round.upward", rmUpward)
.Case("round.towardzero", rmTowardZero)		.Case("round.towardzero", rmTowardZero)
.Default(rmInvalid);		.Default(rmInvalid);
}		}

ConstrainedFPIntrinsic::ExceptionBehavior		ConstrainedFPIntrinsic::ExceptionBehavior
ConstrainedFPIntrinsic::getExceptionBehavior() const {		ConstrainedFPIntrinsic::getExceptionBehavior() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return ebInvalid;		return ebInvalid;
StringRef ExceptionArg = cast<MDString>(MD)->getString();		StringRef ExceptionArg = cast<MDString>(MD)->getString();
return StringSwitch<ExceptionBehavior>(ExceptionArg)		return StringSwitch<ExceptionBehavior>(ExceptionArg)
.Case("fpexcept.ignore", ebIgnore)		.Case("fpexcept.ignore", ebIgnore)
.Case("fpexcept.maytrap", ebMayTrap)		.Case("fpexcept.maytrap", ebMayTrap)
.Case("fpexcept.strict", ebStrict)		.Case("fpexcept.strict", ebStrict)
.Default(ebInvalid);		.Default(ebInvalid);
}		}

bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
case Intrinsic::experimental_constrained_log10:		case Intrinsic::experimental_constrained_log10:
case Intrinsic::experimental_constrained_log2:		case Intrinsic::experimental_constrained_log2:
case Intrinsic::experimental_constrained_rint:		case Intrinsic::experimental_constrained_rint:
case Intrinsic::experimental_constrained_nearbyint:		case Intrinsic::experimental_constrained_nearbyint:
return true;		return true;
}		}
}		}

		bool ConstrainedFPIntrinsic::isTernaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::experimental_constrained_fma:
		return true;
		}
		}

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 3,963 Lines • ▼ Show 20 Lines	Assert(isa<ConstantInt>(CS.getArgOperand(1)),
"constant int",		"constant int",
CS);		CS);
break;		break;
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
		case Intrinsic::experimental_constrained_fma:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
case Intrinsic::experimental_constrained_log10:		case Intrinsic::experimental_constrained_log10:
case Intrinsic::experimental_constrained_log2:		case Intrinsic::experimental_constrained_log2:
case Intrinsic::experimental_constrained_rint:		case Intrinsic::experimental_constrained_rint:
case Intrinsic::experimental_constrained_nearbyint:		case Intrinsic::experimental_constrained_nearbyint:
visitConstrainedFPIntrinsic(		visitConstrainedFPIntrinsic(
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions The implementation of this function assumes only 1 or 2 value operands. It will need to be updated. andrew.w.kaylor: The implementation of this function assumes only 1 or 2 value operands. It will need to be…
cast<ConstrainedFPIntrinsic>(*CS.getInstruction()));		cast<ConstrainedFPIntrinsic>(*CS.getInstruction()));
break;		break;
case Intrinsic::dbg_declare: // llvm.dbg.declare		case Intrinsic::dbg_declare: // llvm.dbg.declare
Assert(isa<MetadataAsValue>(CS.getArgOperand(0)),		Assert(isa<MetadataAsValue>(CS.getArgOperand(0)),
"invalid llvm.dbg.declare intrinsic call 1", CS);		"invalid llvm.dbg.declare intrinsic call 1", CS);
visitDbgIntrinsic("declare", cast<DbgDeclareInst>(*CS.getInstruction()));		visitDbgIntrinsic("declare", cast<DbgDeclareInst>(*CS.getInstruction()));
break;		break;
case Intrinsic::dbg_value: // llvm.dbg.value		case Intrinsic::dbg_value: // llvm.dbg.value
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	static DISubprogram getSubprogram(Metadata LocalScope) {

// Just return null; broken scope chains are checked elsewhere.		// Just return null; broken scope chains are checked elsewhere.
assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");		assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");
return nullptr;		return nullptr;
}		}

void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {		void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
unsigned NumOperands = FPI.getNumArgOperands();		unsigned NumOperands = FPI.getNumArgOperands();
Assert(((NumOperands == 3 && FPI.isUnaryOp()) \|\| (NumOperands == 4)),		Assert(((NumOperands == 5 && FPI.isTernaryOp()) \|\|
		(NumOperands == 3 && FPI.isUnaryOp()) \|\| (NumOperands == 4)),
"invalid arguments for constrained FP intrinsic", &FPI);		"invalid arguments for constrained FP intrinsic", &FPI);
Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-1)),		Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-1)),
"invalid exception behavior argument", &FPI);		"invalid exception behavior argument", &FPI);
Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-2)),		Assert(isa<MetadataAsValue>(FPI.getArgOperand(NumOperands-2)),
"invalid rounding mode argument", &FPI);		"invalid rounding mode argument", &FPI);
Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid,		Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid,
"invalid rounding mode argument", &FPI);		"invalid rounding mode argument", &FPI);
Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,		Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,
"invalid exception behavior argument", &FPI);		"invalid exception behavior argument", &FPI);
▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,006 Lines • ▼ Show 20 Lines	void X86DAGToDAGISel::Select(SDNode *Node) {
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << '\n');		DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << '\n');
Node->setNodeId(-1);		Node->setNodeId(-1);
return; // Already selected.		return; // Already selected.
}		}

switch (Opcode) {		switch (Opcode) {
default: break;		default: break;
		case ISD::FMA: {
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Can you explain why this was necessary? I would have expected there to have been handling already in place for ISD::FMA. andrew.w.kaylor: Can you explain why this was necessary? I would have expected there to have been handling…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is -mattr=+fma option. Without this, CodeGen/X86/fp-intrinsics.ll will fail in instruction selection. wdng: No it doesn't, looks like X86 doesn't handle ISD:FMA automatically unless we there is…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I still don't understand. What happens when -mattr=+fma is used? The CodeGen/X86/fma.ll test uses that option. This case should work in the same way. andrew.w.kaylor: I still don't understand. What happens when -mattr=+fma is used? The CodeGen/X86/fma.ll test…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions I think I made a mistake when describing the problem in my early comments. Let me rephrase and explain it there. Without -mattr=+fma, a FMA libcall will be generated With -mattr=+fma, we are expecting the corresponding FMA instruction to be generated. In fma.ll, all fma tests are not constrained fp operations, during the during the X86ISelLowering phase, the FMA node has been lowered to X86ISD::FMADD. So there is no ISD::FMA at this phase since it has already been changed to X86ISD::FMADD before the instruction selection starts. Please refer to the following dump. (gdb) p CurDAG->dump() SelectionDAG has 12 nodes: t0: ch = EntryToken t2: f64,ch = CopyFromReg t0, Register:f64 %vreg0 t4: f64,ch = CopyFromReg t0, Register:f64 %vreg1 t6: f64,ch = CopyFromReg t0, Register:f64 %vreg2 t12: f64 = X86ISD::FMADD t2, t4, t6 t10: ch,glue = CopyToReg t0, Register:f64 %XMM0, t12 t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:f64 %XMM0, t10:1 However, for the constrained fma, we use mutateStrictFPToFP( ) function to mutate constrained_fma to normal fma, namely ISD::FMA before the instruction selction starts. The X86 backend cannot recognize the ISD::FMA, so we have to add codes to convert ISD::FMA to X86ISD::FMADD during the instruction selection. wdng: I think I made a mistake when describing the problem in my early comments. Let me rephrase and…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I'm still not sure I understand this, but it sounds to me like this should be happening somewhere else. Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. andrew.w.kaylor: I'm still not sure I understand this, but it sounds to me like this should be happening…
		wdngAuthorUnsubmitted Not Done Reply Inline Actions Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a libcall before we reach mutateStrictFpToFP and so this code will never be reached in that case? And if so, are you further saying that when -mattr=+fma is used we will reach this code only after mutateStrictFpToFp() has converted ISD::STRICT_FMA to ISD::FMA? --> Yes My concern is that this is adding a generic (not constrained-specific) handler to handle the constrained case. I would much rather figure out a way to get ISD::STRICT_FMA to follow the existing path. ---> I once tried to move the "mutateStrictFPToFP( )" to the LegalizeDAG phase, like the following code shows and I found it works and there is no need to add codes into X86 backend instruction selector: switch (Action) { case TargetLowering::Legal: if (Node->isStrictFPOpcode()) Node = DAG.mutateStrictFPToFP(Node); return; So once those strict fp operator haven legalized to legal, we can directly mutate them to their corresponding normal fp operator. However, here comes a problem that non-default FP (or constrained fp operations) exception behaviors are target-specific, which means we have to leave it to each sub-target selectors to handle them. So I would not suggest mutating those instructions at somewhere. What do you think? wdng: Are you saying that if -mattr=+fma is not used the ISD::STRICT_FMA will be expanded to a…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I do think the mutate needs to be done as late as possible. I'm not even entirely certain that we won't need to figure out a way to communicate the FP constraints beyond instruction selection. Would it be possible to have the mutateStrictFPToFP call (in its current location) call a target-specific hook to get a target-specific mutated node, so that we could convert directly to X86ISD::FMADD there? Also, have you considered how non-X86 architectures need to handle this case? andrew.w.kaylor: I do think the mutate needs to be done as late as possible. I'm not even entirely certain that…
		//STRICT_FMA is turned into FMA after legalization and DAG combine.
		craig.topperUnsubmitted Done Reply Inline Actions Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and DAG combine. craig.topper: Add a comment that this is here because STRICT_FMA is turned into FMA after legalization and…
		SDValue ISDFMA = CurDAG->getNode(X86ISD::FMADD, SDLoc(Node),
		b-sumnerUnsubmitted Done Reply Inline Actions Did you run clang-format? b-sumner: Did you run clang-format?
		Node->getValueType(0), Node->getOperand(0),
		Node->getOperand(1), Node->getOperand(2));
		ReplaceUses(SDValue(Node, 0), ISDFMA);
		SelectCode(ISDFMA.getNode());
		return;
		}
case ISD::BRIND: {		case ISD::BRIND: {
if (Subtarget->isTargetNaCl())		if (Subtarget->isTargetNaCl())
// NaCl has its own pass where jmp %r32 are converted to jmp %r64. We		// NaCl has its own pass where jmp %r32 are converted to jmp %r64. We
// leave the instruction alone.		// leave the instruction alone.
break;		break;
if (Subtarget->isTarget64BitILP32()) {		if (Subtarget->isTarget64BitILP32()) {
// Converts a 32-bit register to a 64-bit, zero-extended version of		// Converts a 32-bit register to a 64-bit, zero-extended version of
// it. This is needed because x86-64 can do many things, but jmp %r32		// it. This is needed because x86-64 can do many things, but jmp %r32
▲ Show 20 Lines • Show All 726 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-intrinsics.ll

	; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck %s			; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck --check-prefix=COMMON --check-prefix=NO-FMA --check-prefix=FMACALL64 --check-prefix=FMACALL32 %s
				; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+fma < %s \| FileCheck -check-prefix=COMMON --check-prefix=HAS-FMA --check-prefix=FMA64 --check-prefix=FMA32 %s
				arsenmUnsubmitted Done Reply Inline Actions Missing -check-prefix=CHECK arsenm: Missing -check-prefix=CHECK

	; Verify that constants aren't folded to inexact results when the rounding mode			; Verify that constants aren't folded to inexact results when the rounding mode
	; is unknown.			; is unknown.
	;			;
	; double f1() {			; double f1() {
	; // Because 0.1 cannot be represented exactly, this shouldn't be folded.			; // Because 0.1 cannot be represented exactly, this shouldn't be folded.
	; return 1.0/10.0;			; return 1.0/10.0;
	; }			; }
	;			;
	; CHECK-LABEL: f1			; CHECK-LABEL: f1
	; CHECK: divsd			; COMMON: divsd
	define double @f1() {			define double @f1() {
	entry:			entry:
	%div = call double @llvm.experimental.constrained.fdiv.f64(			%div = call double @llvm.experimental.constrained.fdiv.f64(
	double 1.000000e+00,			double 1.000000e+00,
	double 1.000000e+01,			double 1.000000e+01,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %div			ret double %div
	}			}

	; Verify that 'a - 0' isn't simplified to 'a' when the rounding mode is unknown.			; Verify that 'a - 0' isn't simplified to 'a' when the rounding mode is unknown.
	;			;
	; double f2(double a) {			; double f2(double a) {
	; // Because the result of '0 - 0' is negative zero if rounding mode is			; // Because the result of '0 - 0' is negative zero if rounding mode is
	; // downward, this shouldn't be simplified.			; // downward, this shouldn't be simplified.
	; return a - 0;			; return a - 0;
	; }			; }
	;			;
	; CHECK-LABEL: f2			; CHECK-LABEL: f2
	; CHECK: subsd			; COMMON: subsd
	define double @f2(double %a) {			define double @f2(double %a) {
	entry:			entry:
	%div = call double @llvm.experimental.constrained.fsub.f64(			%div = call double @llvm.experimental.constrained.fsub.f64(
	double %a,			double %a,
	double 0.000000e+00,			double 0.000000e+00,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %div			ret double %div
	}			}

	; Verify that '-((-a)b)' isn't simplified to 'ab' when the rounding mode is			; Verify that '-((-a)b)' isn't simplified to 'ab' when the rounding mode is
	; unknown.			; unknown.
	;			;
	; double f3(double a, double b) {			; double f3(double a, double b) {
	; // Because the intermediate value involved in this calculation may require			; // Because the intermediate value involved in this calculation may require
	; // rounding, this shouldn't be simplified.			; // rounding, this shouldn't be simplified.
	; return -((-a)*b);			; return -((-a)*b);
	; }			; }
	;			;
	; CHECK-LABEL: f3:			; CHECK-LABEL: f3:
	; CHECK: subsd			; COMMON: subsd
	; CHECK: mulsd			; COMMON: mulsd
	; CHECK: subsd			; COMMON: subsd
	define double @f3(double %a, double %b) {			define double @f3(double %a, double %b) {
	entry:			entry:
	%sub = call double @llvm.experimental.constrained.fsub.f64(			%sub = call double @llvm.experimental.constrained.fsub.f64(
	double -0.000000e+00, double %a,			double -0.000000e+00, double %a,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	%mul = call double @llvm.experimental.constrained.fmul.f64(			%mul = call double @llvm.experimental.constrained.fmul.f64(
	double %sub, double %b,			double %sub, double %b,
	Show All 12 Lines
	;			;
	; double f4(int n, double a) {			; double f4(int n, double a) {
	; // Because a + 1 may overflow, this should not be simplified.			; // Because a + 1 may overflow, this should not be simplified.
	; if (n > 0)			; if (n > 0)
	; return a + 1.0;			; return a + 1.0;
	; return a;			; return a;
	; }			; }
	;			;
	;			;
	; CHECK-LABEL: f4:			; CHECK-LABEL: f4:
	; CHECK: testl			; COMMON: testl
	; CHECK: jle			; COMMON: jle
	; CHECK: addsd			; COMMON: addsd
	define double @f4(i32 %n, double %a) {			define double @f4(i32 %n, double %a) {
	entry:			entry:
	%cmp = icmp sgt i32 %n, 0			%cmp = icmp sgt i32 %n, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then:			if.then:
	%add = call double @llvm.experimental.constrained.fadd.f64(			%add = call double @llvm.experimental.constrained.fadd.f64(
	double 1.000000e+00, double %a,			double 1.000000e+00, double %a,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	br label %if.end			br label %if.end

	if.end:			if.end:
	%a.0 = phi double [%add, %if.then], [ %a, %entry ]			%a.0 = phi double [%add, %if.then], [ %a, %entry ]
	ret double %a.0			ret double %a.0
	}			}

	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.			; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f5			; CHECK-LABEL: f5
	; CHECK: sqrtsd			; COMMON: sqrtsd
	define double @f5() {			define double @f5() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,			%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that pow(42.1, 3.0) isn't simplified when the rounding mode is unknown.			; Verify that pow(42.1, 3.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f6			; CHECK-LABEL: f6
	; CHECK: pow			; COMMON: pow
	define double @f6() {			define double @f6() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.pow.f64(double 42.1,			%result = call double @llvm.experimental.constrained.pow.f64(double 42.1,
	double 3.0,			double 3.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that powi(42.1, 3) isn't simplified when the rounding mode is unknown.			; Verify that powi(42.1, 3) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f7			; CHECK-LABEL: f7
	; CHECK: powi			; COMMON: powi
	define double @f7() {			define double @f7() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.powi.f64(double 42.1,			%result = call double @llvm.experimental.constrained.powi.f64(double 42.1,
	i32 3,			i32 3,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that sin(42.0) isn't simplified when the rounding mode is unknown.			; Verify that sin(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f8			; CHECK-LABEL: f8
	; CHECK: sin			; COMMON: sin
	define double @f8() {			define double @f8() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.sin.f64(double 42.0,			%result = call double @llvm.experimental.constrained.sin.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that cos(42.0) isn't simplified when the rounding mode is unknown.			; Verify that cos(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f9			; CHECK-LABEL: f9
	; CHECK: cos			; COMMON: cos
	define double @f9() {			define double @f9() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.cos.f64(double 42.0,			%result = call double @llvm.experimental.constrained.cos.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.			; Verify that exp(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f10			; CHECK-LABEL: f10
	; CHECK: exp			; COMMON: exp
	define double @f10() {			define double @f10() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.exp.f64(double 42.0,			%result = call double @llvm.experimental.constrained.exp.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that exp2(42.1) isn't simplified when the rounding mode is unknown.			; Verify that exp2(42.1) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f11			; CHECK-LABEL: f11
	; CHECK: exp2			; COMMON: exp2
	define double @f11() {			define double @f11() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,			%result = call double @llvm.experimental.constrained.exp2.f64(double 42.1,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that log(42.0) isn't simplified when the rounding mode is unknown.			; Verify that log(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f12			; CHECK-LABEL: f12
	; CHECK: log			; COMMON: log
	define double @f12() {			define double @f12() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.log.f64(double 42.0,			%result = call double @llvm.experimental.constrained.log.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that log10(42.0) isn't simplified when the rounding mode is unknown.			; Verify that log10(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f13			; CHECK-LABEL: f13
	; CHECK: log10			; COMMON: log10
	define double @f13() {			define double @f13() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.log10.f64(double 42.0,			%result = call double @llvm.experimental.constrained.log10.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that log2(42.0) isn't simplified when the rounding mode is unknown.			; Verify that log2(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f14			; CHECK-LABEL: f14
	; CHECK: log2			; COMMON: log2
	define double @f14() {			define double @f14() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.log2.f64(double 42.0,			%result = call double @llvm.experimental.constrained.log2.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that rint(42.1) isn't simplified when the rounding mode is unknown.			; Verify that rint(42.1) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f15			; CHECK-LABEL: f15
	; CHECK: rint			; NO-FMA: rint
				; HAS-FMA: vroundsd
	define double @f15() {			define double @f15() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.rint.f64(double 42.1,			%result = call double @llvm.experimental.constrained.rint.f64(double 42.1,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

	; Verify that nearbyint(42.1) isn't simplified when the rounding mode is			; Verify that nearbyint(42.1) isn't simplified when the rounding mode is
	; unknown.			; unknown.
	; CHECK-LABEL: f16			; CHECK-LABEL: f16
	; CHECK: nearbyint			; NO-FMA: nearbyint
				; HAS-FMA: vroundsd
	define double @f16() {			define double @f16() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.nearbyint.f64(			%result = call double @llvm.experimental.constrained.nearbyint.f64(
	double 42.1,			double 42.1,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

				; Verify that fma(1.0) isn't simplified when the rounding mode is
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions These values could be constant folded without rounding, so even though this test case works now it's testing something that we don't necessarily want to be true. At some point, we're going to want to teach optimizations to recognize these intrinsics and fold cases like this. That's why I was using 42.1 in the other tests. It's just an arbitrary value that introduces rounding errors. andrew.w.kaylor: These values could be constant folded without rounding, so even though this test case works now…
				; unknown.
				; CHECK-LABEL: f17
				; FMACALL32: jmp fmaf # TAILCALL
				arsenmUnsubmitted Done Reply Inline Actions You need a separate check-label for the FMAless run line arsenm: You need a separate check-label for the FMAless run line
				; FMA32: vfmadd213ss
				define float @f17() {
				entry:
				%result = call float @llvm.experimental.constrained.fma.f32(
				float 1.000000e+00,
				float 2.000000e+00,
				float 3.000000e+00,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict")
				ret float %result
				}

				; Verify that fma(42.1) isn't simplified when the rounding mode is
				; unknown.
				; CHECK-LABEL: f18
				; FMACALL64: jmp fma # TAILCALL
				; FMA64: vfmadd213sd
				define double @f18() {
				entry:
				%result = call double @llvm.experimental.constrained.fma.f64(
				double 42.1,
				double 42.1,
				double 42.1,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict")
				ret double %result
				}

	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"			@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)			declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)
	declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
				declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
				declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)

test/Feature/fp-intrinsics.ll

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	;			;
	; double f4(int n, double a) {			; double f4(int n, double a) {
	; // Because a + 1 may overflow, this should not be simplified.			; // Because a + 1 may overflow, this should not be simplified.
	; if (n > 0)			; if (n > 0)
	; return a + 1.0;			; return a + 1.0;
	; return a;			; return a;
	; }			; }
	;			;
	;			;
	; CHECK-LABEL: @f4			; CHECK-LABEL: @f4
	; CHECK-NOT: select			; CHECK-NOT: select
	; CHECK: br i1 %cmp			; CHECK: br i1 %cmp
	define double @f4(i32 %n, double %a) {			define double @f4(i32 %n, double %a) {
	entry:			entry:
	%cmp = icmp sgt i32 %n, 0			%cmp = icmp sgt i32 %n, 0
	br i1 %cmp, label %if.then, label %if.end			br i1 %cmp, label %if.then, label %if.end

	if.then:			if.then:
	%add = call double @llvm.experimental.constrained.fadd.f64(			%add = call double @llvm.experimental.constrained.fadd.f64(
	double 1.000000e+00, double %a,			double 1.000000e+00, double %a,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	br label %if.end			br label %if.end

	if.end:			if.end:
	%a.0 = phi double [%add, %if.then], [ %a, %entry ]			%a.0 = phi double [%add, %if.then], [ %a, %entry ]
	ret double %a.0			ret double %a.0
	}			}


	; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.			; Verify that sqrt(42.0) isn't simplified when the rounding mode is unknown.
	; CHECK-LABEL: f5			; CHECK-LABEL: f5
	; CHECK: call double @llvm.experimental.constrained.sqrt			; CHECK: call double @llvm.experimental.constrained.sqrt
	define double @f5() {			define double @f5() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,			%result = call double @llvm.experimental.constrained.sqrt.f64(double 42.0,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	entry:			entry:
	%result = call double @llvm.experimental.constrained.nearbyint.f64(			%result = call double @llvm.experimental.constrained.nearbyint.f64(
	double 42.1,			double 42.1,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

				; Verify that fma(42.1) isn't simplified when the rounding mode is
				; unknown.
				; CHECK-LABEL: f17
				; CHECK: call double @llvm.experimental.constrained.fma
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions If you checked the arguments here it should reveal the problems in the code. There's also a test at llvm/tests/CodeGen/X86/fp-intrinsics.ll that carries the constrained FP intrinsics all the way through code generation. Can you add a case there for this intrinsic? andrew.w.kaylor: If you checked the arguments here it should reveal the problems in the code. There's also a…
				define double @f17() {
				entry:
				%result = call double @llvm.experimental.constrained.fma.f64(double 42.1, double 42.1, double 42.1,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict")
				ret double %result
				}

				arsenmUnsubmitted Done Reply Inline Actions Should also test for the other FP types arsenm: Should also test for the other FP types
	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"			@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.pow.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)			declare double @llvm.experimental.constrained.powi.f64(double, i32, metadata, metadata)
	declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.sin.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.cos.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
				declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)

This is an archive of the discontinued LLVM Phabricator instance.

Add ‘llvm.experimental.constrained.fma‘ IntrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 112215

docs/LangRef.rst

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/IR/IntrinsicInst.cpp

lib/IR/Verifier.cpp

lib/Target/X86/X86ISelDAGToDAG.cpp

test/CodeGen/X86/fp-intrinsics.ll

test/Feature/fp-intrinsics.ll

Add ‘llvm.experimental.constrained.fma‘ Intrinsic
ClosedPublic