This is an archive of the discontinued LLVM Phabricator instance.

The title of this review is misleading. It should at least mention FPEnv, constrained intrinsics, or strict fp or something. Right now it sounds like FP_CONTRACT isn't supported at all.

Can we split most of the X86 changes into a separate patch? Most of it can be tested with fneg and constrained.fma.

craig.topper added inline comments.Jan 15 2020, 7:46 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7099	Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result.

Address review comments.

Harbormaster completed remote builds in B44120: Diff 238414.Jan 15 2020, 9:10 PM

craig.topper added inline comments.Jan 15 2020, 9:22 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7096	Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code.

Address review comment.

Harbormaster completed remote builds in B44122: Diff 238417.Jan 15 2020, 10:05 PM

craig.topper added a reviewer: rjmccall.Jan 15 2020, 10:07 PM

pengfei retitled this revision from Add pragma FP_CONTRACT support. to [FPEnv] Add pragma FP_CONTRACT support under strict FP..Jan 15 2020, 10:08 PM

pengfei edited the summary of this revision. (Show Details)

andrew.w.kaylor added inline comments.Jan 16 2020, 12:04 PM

clang/lib/CodeGen/CGExprScalar.cpp
3385	You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true.
3431	I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add: assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul);
clang/test/CodeGen/constrained-math-builtins.c
160	I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed.
llvm/docs/LangRef.rst
16174	s/specifie/specify s/the exception behavior/the rounding mode and exception behavior
16184	missing metadata arguments
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1517	I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be.
llvm/include/llvm/CodeGen/ISDOpcodes.h
355 ↗	(On Diff #238417)	Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is wrong. After looking through the rest of the code, I think I understand what's going on. I think we need a verbose comment to explain it. Here's my suggestion FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is required for correct macro expansion and default handling (which is never reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to determine (based on TargetOptions and target cost information) whether the constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and FADD operations. Having thought through that, however, it strikes me as a lot of overhead. Can we just add special handling for the constrained.fmuladd intrinsic and make the decision then to create either a STRICT_FMA node or separate STRICT_FMUL and STRICT_FADD? The idea that ISD::FMULADD is going to exist as a defined opcode but we never intend to add any support for handling it is particularly bad.

craig.topper mentioned this in D72871: [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION, and macro FUNCTION likewise. NFCI..Jan 16 2020, 1:13 PM

cameron.mcinally added a subscriber: cameron.mcinally.Jan 16 2020, 1:48 PM

cameron.mcinally added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3444	I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it?

cameron.mcinally added inline comments.Jan 16 2020, 1:50 PM

clang/lib/CodeGen/CGExprScalar.cpp
3444	To be clear, we could miss very-edge-case overflow/underflow exceptions.

cameron.mcinally added inline comments.Jan 16 2020, 1:53 PM

clang/lib/CodeGen/CGExprScalar.cpp
3444	Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise.

andrew.w.kaylor added inline comments.Jan 16 2020, 2:15 PM

clang/lib/CodeGen/CGExprScalar.cpp
3444	We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not. I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma"). For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics.

Address review comments.

Harbormaster completed remote builds in B44270: Diff 238769.Jan 17 2020, 6:59 AM

pengfei marked an inline comment as done.Jan 17 2020, 7:01 AM

pengfei added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3385	I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type.
3431	Add assertion in line 3380. We only need to check once there.
llvm/test/TableGen/GlobalISelEmitter-input-discard.td
18 ↗	(On Diff #238769)	It's strange the number is affected. I haven't found any cause.

Remove unnecessary comment.

Harbormaster completed remote builds in B44271: Diff 238770.Jan 17 2020, 7:09 AM

pengfei added a parent revision: D72871: [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION, and macro FUNCTION likewise. NFCI..Jan 17 2020, 7:10 AM

cameron.mcinally added inline comments.Jan 17 2020, 2:36 PM

clang/lib/CodeGen/CGExprScalar.cpp
3444	You had me until: For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. To be clear, we'd need them for the `fast` case, but I don't see a lot of value for the `strict` case. We definitely want reassoc/recip/etc for the `optimized but trap-safe` case, so that's enough to require FMF flags on constrained intrinsics alone. We should probably break this conversation out into an llvm-dev thread...

Remember that the design is that constrained intrinsics must be used whenever *any* code in the function is constrained. It is not unreasonable that part of the function might be constrained and the rest subject to fast-math; it'd be a shame if the intrinsics couldn't even express that.

andrew.w.kaylor added inline comments.Jan 17 2020, 3:24 PM

clang/lib/CodeGen/CGExprScalar.cpp
3444	I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it.

kpn added a subscriber: kpn.Jan 21 2020, 11:54 AM

craig.topper added inline comments.Jan 23 2020, 10:06 PM

clang/lib/CodeGen/CGExprScalar.cpp
3385	Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too.

kpn added inline comments.Jan 24 2020, 4:56 AM

clang/lib/CodeGen/CGExprScalar.cpp
3385	Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments. So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this.

craig.topper added inline comments.Jan 24 2020, 8:40 AM

clang/lib/CodeGen/CGExprScalar.cpp
3385	This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3).

kpn added inline comments.Jan 24 2020, 9:14 AM

clang/lib/CodeGen/CGExprScalar.cpp
3385	Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute.

Address review comment.

pengfei marked an inline comment as done.Jan 26 2020, 9:19 PM

pengfei added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3385	Yes, it's the best choice. Thanks!

Harbormaster completed remote builds in B44964: Diff 240474.Jan 26 2020, 9:19 PM

LGTM

This revision is now accepted and ready to land.Jan 27 2020, 10:31 PM

Closed by commit rG3239b5034ee9: [FPEnv] Add pragma FP_CONTRACT support under strict FP. (authored by Wang, Pengfei <pengfei.wang@intel.com>). · Explain WhyJan 28 2020, 4:50 AM

This revision was automatically updated to reflect the committed changes.

jhenderson added a subscriber: jhenderson.Jan 28 2020, 5:41 AM

jhenderson added inline comments.

llvm/docs/LangRef.rst
16145	This underline isn't long enough and is breaking the sphinx build bot. Please fix.

pengfei marked an inline comment as done.Jan 28 2020, 6:03 AM

pengfei added inline comments.

llvm/docs/LangRef.rst
16145	Thanks! I'll fix it soon.

Allen added a subscriber: Allen.Feb 27 2023, 5:57 PM

Allen added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3386	Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 5:57 PM

pengfei added inline comments.Feb 28 2023, 1:31 AM

clang/lib/CodeGen/CGExprScalar.cpp
3386	No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGExprScalar.cpp

36 lines

test/

CodeGen/

constrained-math-builtins.c

12 lines

llvm/

docs/

LangRef.rst

63 lines

include/

llvm/

CodeGen/

BasicTTIImpl.h

9 lines

IR/

ConstrainedOps.def

4 lines

Intrinsics.td

7 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

72 lines

test/

CodeGen/

X86/

fp-intrinsics-fma.ll

128 lines

Diff 240840

clang/lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 3,359 Lines • ▼ Show 20 Lines	return CGF.EmitCheckedInBoundsGEP(pointer, index, isSigned, isSubtraction,
op.E->getExprLoc(), "add.ptr");		op.E->getExprLoc(), "add.ptr");
}		}

// Construct an fmuladd intrinsic to represent a fused mul-add of MulOp and		// Construct an fmuladd intrinsic to represent a fused mul-add of MulOp and
// Addend. Use negMul and negAdd to negate the first operand of the Mul or		// Addend. Use negMul and negAdd to negate the first operand of the Mul or
// the add operand respectively. This allows fmuladd to represent a*b-c, or		// the add operand respectively. This allows fmuladd to represent a*b-c, or
// c-a*b. Patterns in LLVM should catch the negated forms and translate them to		// c-a*b. Patterns in LLVM should catch the negated forms and translate them to
// efficient operations.		// efficient operations.
static Value* buildFMulAdd(llvm::BinaryOperator MulOp, Value Addend,		static Value* buildFMulAdd(llvm::Instruction MulOp, Value Addend,
const CodeGenFunction &CGF, CGBuilderTy &Builder,		const CodeGenFunction &CGF, CGBuilderTy &Builder,
bool negMul, bool negAdd) {		bool negMul, bool negAdd) {
assert(!(negMul && negAdd) && "Only one of negMul and negAdd should be set.");		assert(!(negMul && negAdd) && "Only one of negMul and negAdd should be set.");

Value *MulOp0 = MulOp->getOperand(0);		Value *MulOp0 = MulOp->getOperand(0);
Value *MulOp1 = MulOp->getOperand(1);		Value *MulOp1 = MulOp->getOperand(1);
if (negMul)		if (negMul)
MulOp0 = Builder.CreateFNeg(MulOp0, "neg");		MulOp0 = Builder.CreateFNeg(MulOp0, "neg");
if (negAdd)		if (negAdd)
Addend = Builder.CreateFNeg(Addend, "neg");		Addend = Builder.CreateFNeg(Addend, "neg");

Value *FMulAdd = Builder.CreateCall(		Value *FMulAdd = nullptr;
		if (Builder.getIsFPConstrained()) {
		assert(isa<llvm::ConstrainedFPIntrinsic>(MulOp) &&
		"Only constrained operation should be created when Builder is in FP "
		"constrained mode");
		FMulAdd = Builder.CreateConstrainedFPCall(
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true. andrew.w.kaylor: You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type. pengfei: I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same…
		craig.topperUnsubmitted Not Done Reply Inline Actions Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too. craig.topper: Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That…
		kpnUnsubmitted Not Done Reply Inline Actions Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments. So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this. kpn: Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained…
		craig.topperUnsubmitted Not Done Reply Inline Actions This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3). craig.topper: This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the…
		kpnUnsubmitted Not Done Reply Inline Actions Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute. kpn: Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Yes, it's the best choice. Thanks! pengfei: Yes, it's the best choice. Thanks!
		CGF.CGM.getIntrinsic(llvm::Intrinsic::experimental_constrained_fmuladd,
		AllenUnsubmitted Not Done Reply Inline Actions Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd? Allen: Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them. pengfei: No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be…
		Addend->getType()),
		{MulOp0, MulOp1, Addend});
		} else {
		FMulAdd = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),		CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),
{MulOp0, MulOp1, Addend});		{MulOp0, MulOp1, Addend});
		}
MulOp->eraseFromParent();		MulOp->eraseFromParent();

return FMulAdd;		return FMulAdd;
}		}

// Check whether it would be legal to emit an fmuladd intrinsic call to		// Check whether it would be legal to emit an fmuladd intrinsic call to
// represent op and if so, build the fmuladd.		// represent op and if so, build the fmuladd.
//		//
// Checks that (a) the operation is fusable, and (b) -ffp-contract=on.		// Checks that (a) the operation is fusable, and (b) -ffp-contract=on.
// Does NOT check the type of the operation - it's assumed that this function		// Does NOT check the type of the operation - it's assumed that this function
// will be called from contexts where it's known that the type is contractable.		// will be called from contexts where it's known that the type is contractable.
Show All 18 Lines	if (LHSBinOp->getOpcode() == llvm::Instruction::FMul &&
return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);		return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);
}		}
if (auto *RHSBinOp = dyn_cast<llvm::BinaryOperator>(op.RHS)) {		if (auto *RHSBinOp = dyn_cast<llvm::BinaryOperator>(op.RHS)) {
if (RHSBinOp->getOpcode() == llvm::Instruction::FMul &&		if (RHSBinOp->getOpcode() == llvm::Instruction::FMul &&
RHSBinOp->use_empty())		RHSBinOp->use_empty())
return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);		return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);
}		}

		if (auto *LHSBinOp = dyn_cast<llvm::CallBase>(op.LHS)) {
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add: assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul); andrew.w.kaylor: I don't think we should ever non-constrained create FMul instructions if Builder is in FP…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Add assertion in line 3380. We only need to check once there. pengfei: Add assertion in line 3380. We only need to check once there.
		if (LHSBinOp->getIntrinsicID() ==
		llvm::Intrinsic::experimental_constrained_fmul &&
		LHSBinOp->use_empty())
		return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);
		}
		if (auto *RHSBinOp = dyn_cast<llvm::CallBase>(op.RHS)) {
		if (RHSBinOp->getIntrinsicID() ==
		llvm::Intrinsic::experimental_constrained_fmul &&
		RHSBinOp->use_empty())
		return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);
		}

return nullptr;		return nullptr;
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it? cameron.mcinally: I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions To be clear, we could miss very-edge-case overflow/underflow exceptions. cameron.mcinally: To be clear, we could miss very-edge-case overflow/underflow exceptions.
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise. cameron.mcinally: Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not. I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma"). For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. andrew.w.kaylor: We've talked about this before but I don't think we ever documented a decision as to whether we…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions You had me until: For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. To be clear, we'd need them for the `fast` case, but I don't see a lot of value for the `strict` case. We definitely want reassoc/recip/etc for the `optimized but trap-safe` case, so that's enough to require FMF flags on constrained intrinsics alone. We should probably break this conversation out into an llvm-dev thread... cameron.mcinally: You had me until: >For the record, I also think it makes sense to use nnan, ninf, and nsz with…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it. andrew.w.kaylor: I agree about starting an llvm-dev thread. I'll send something out unless you've already done…
}		}

Value *ScalarExprEmitter::EmitAdd(const BinOpInfo &op) {		Value *ScalarExprEmitter::EmitAdd(const BinOpInfo &op) {
if (op.LHS->getType()->isPointerTy() \|\|		if (op.LHS->getType()->isPointerTy() \|\|
op.RHS->getType()->isPointerTy())		op.RHS->getType()->isPointerTy())
return emitPointerArithmetic(CGF, op, CodeGenFunction::NotSubtraction);		return emitPointerArithmetic(CGF, op, CodeGenFunction::NotSubtraction);

if (op.Ty->isSignedIntegerOrEnumerationType()) {		if (op.Ty->isSignedIntegerOrEnumerationType()) {
▲ Show 20 Lines • Show All 1,441 Lines • Show Last 20 Lines

clang/test/CodeGen/constrained-math-builtins.c

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	// CHECK: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80, metadata, metadata)

__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f);		__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f);

// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)		// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)		// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)		// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
};		};

		#pragma STDC FP_CONTRACT ON
		void bar(float f) {
		f * f + f;
		(double)f * f - f;
		(long double)-f * f + f;

		// CHECK: call float @llvm.experimental.constrained.fmuladd.f32
		// CHECK: fneg
		// CHECK: call double @llvm.experimental.constrained.fmuladd.f64
		// CHECK: fneg
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed. andrew.w.kaylor: I'd like to see a test that verifies the calls generated in the function and specifically a…
		// CHECK: call x86_fp80 @llvm.experimental.constrained.fmuladd.f80
		};

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,135 Lines • ▼ Show 20 Lines
	- "``uno``": yields ``true`` if either operand is a NAN.			- "``uno``": yields ``true`` if either operand is a NAN.

	The quiet comparison operation performed by			The quiet comparison operation performed by
	'``llvm.experimental.constrained.fcmp``' will only raise an exception			'``llvm.experimental.constrained.fcmp``' will only raise an exception
	if either operand is a SNAN. The signaling comparison operation			if either operand is a SNAN. The signaling comparison operation
	performed by '``llvm.experimental.constrained.fcmps``' will raise an			performed by '``llvm.experimental.constrained.fcmps``' will raise an
	exception if either operand is a NAN (QNAN or SNAN).			exception if either operand is a NAN (QNAN or SNAN).

				'``llvm.experimental.constrained.fmuladd``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				jhendersonUnsubmitted Not Done Reply Inline Actions This underline isn't long enough and is breaking the sphinx build bot. Please fix. jhenderson: This underline isn't long enough and is breaking the sphinx build bot. Please fix.
				pengfeiAuthorUnsubmitted Done Reply Inline Actions Thanks! I'll fix it soon. pengfei: Thanks! I'll fix it soon.

				Syntax:
				"""""""

				::

				declare <type>
				@llvm.experimental.constrained.fmuladd(<type> <op1>, <type> <op2>,
				<type> <op3>,
				metadata <rounding mode>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.fmuladd``' intrinsic represents
				multiply-add expressions that can be fused if the code generator determines
				that (a) the target instruction set has support for a fused operation,
				and (b) that the fused operation is more efficient than the equivalent,
				separate pair of mul and add instructions.

				Arguments:
				""""""""""

				The first three arguments to the '``llvm.experimental.constrained.fmuladd``'
				intrinsic must be floating-point or vector of floating-point values.
				All three arguments must have identical types.

				The fourth and fifth arguments specifiy the rounding mode and exception behavior
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions s/specifie/specify s/the exception behavior/the rounding mode and exception behavior andrew.w.kaylor: s/specifie/specify s/the exception behavior/the rounding mode and exception behavior
				as described above.

				Semantics:
				""""""""""

				The expression:

				::

				%0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c,
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions missing metadata arguments andrew.w.kaylor: missing metadata arguments
				metadata <rounding mode>,
				metadata <exception behavior>)

				is equivalent to the expression:

				::

				%0 = call float @llvm.experimental.constrained.fmul.f32(%a, %b,
				metadata <rounding mode>,
				metadata <exception behavior>)
				%1 = call float @llvm.experimental.constrained.fadd.f32(%0, %c,
				metadata <rounding mode>,
				metadata <exception behavior>)

				except that it is unspecified whether rounding will be performed between the
				multiplication and addition steps. Fusion is not guaranteed, even if the target
				platform supports it.
				If a fused multiply-add is required, the corresponding
				:ref:`llvm.experimental.constrained.fma <int_fma>` intrinsic function should be
				used instead.
				This never sets errno, just as '``llvm.experimental.constrained.fma.*``'.

	Constrained libm-equivalent Intrinsics			Constrained libm-equivalent Intrinsics
	--------------------------------------			--------------------------------------

	In addition to the basic floating-point operations for which constrained			In addition to the basic floating-point operations for which constrained
	intrinsics are described above, there are constrained versions of various			intrinsics are described above, there are constrained versions of various
	operations which provide equivalent behavior to a corresponding libm function.			operations which provide equivalent behavior to a corresponding libm function.
	These intrinsics allow the precise behavior of these operations with respect to			These intrinsics allow the precise behavior of these operations with respect to
	rounding mode and exception behavior to be controlled.			rounding mode and exception behavior to be controlled.
	▲ Show 20 Lines • Show All 2,433 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,282 Lines • ▼ Show 20 Lines	case Intrinsic::pow:
ISDs.push_back(ISD::FPOW);		ISDs.push_back(ISD::FPOW);
break;		break;
case Intrinsic::fma:		case Intrinsic::fma:
ISDs.push_back(ISD::FMA);		ISDs.push_back(ISD::FMA);
break;		break;
case Intrinsic::fmuladd:		case Intrinsic::fmuladd:
ISDs.push_back(ISD::FMA);		ISDs.push_back(ISD::FMA);
break;		break;
		case Intrinsic::experimental_constrained_fmuladd:
		ISDs.push_back(ISD::STRICT_FMA);
		break;
// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.		// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.
case Intrinsic::lifetime_start:		case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end:		case Intrinsic::lifetime_end:
case Intrinsic::sideeffect:		case Intrinsic::sideeffect:
return 0;		return 0;
case Intrinsic::masked_store:		case Intrinsic::masked_store:
return ConcreteTTI->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0,		return ConcreteTTI->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0,
0);		0);
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(
if (MinCustomCostI != CustomCost.end())		if (MinCustomCostI != CustomCost.end())
return *MinCustomCostI;		return *MinCustomCostI;

// If we can't lower fmuladd into an FMA estimate the cost as a floating		// If we can't lower fmuladd into an FMA estimate the cost as a floating
// point mul followed by an add.		// point mul followed by an add.
if (IID == Intrinsic::fmuladd)		if (IID == Intrinsic::fmuladd)
return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +		return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +
ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);		ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);
		if (IID == Intrinsic::experimental_constrained_fmuladd)
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be. andrew.w.kaylor: I don't think that matters. The cost calculation here is a conservative estimate based on the…
		return ConcreteTTI->getIntrinsicCost(
		Intrinsic::experimental_constrained_fmul, RetTy, Tys,
		nullptr) +
		ConcreteTTI->getIntrinsicCost(
		Intrinsic::experimental_constrained_fadd, RetTy, Tys, nullptr);

// Else, assume that we need to scalarize this intrinsic. For math builtins		// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it		// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.		// very expensive.
if (RetTy->isVectorTy()) {		if (RetTy->isVectorTy()) {
unsigned ScalarizationCost =		unsigned ScalarizationCost =
((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())		((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())
? ScalarizationCostPassed		? ScalarizationCostPassed
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/include/llvm/IR/ConstrainedOps.def

	Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	DAG_FUNCTION(pow, 2, 1, experimental_constrained_pow, FPOW)			DAG_FUNCTION(pow, 2, 1, experimental_constrained_pow, FPOW)
	DAG_FUNCTION(powi, 2, 1, experimental_constrained_powi, FPOWI)			DAG_FUNCTION(powi, 2, 1, experimental_constrained_powi, FPOWI)
	DAG_FUNCTION(rint, 1, 1, experimental_constrained_rint, FRINT)			DAG_FUNCTION(rint, 1, 1, experimental_constrained_rint, FRINT)
	DAG_FUNCTION(round, 1, 0, experimental_constrained_round, FROUND)			DAG_FUNCTION(round, 1, 0, experimental_constrained_round, FROUND)
	DAG_FUNCTION(sin, 1, 1, experimental_constrained_sin, FSIN)			DAG_FUNCTION(sin, 1, 1, experimental_constrained_sin, FSIN)
	DAG_FUNCTION(sqrt, 1, 1, experimental_constrained_sqrt, FSQRT)			DAG_FUNCTION(sqrt, 1, 1, experimental_constrained_sqrt, FSQRT)
	DAG_FUNCTION(trunc, 1, 0, experimental_constrained_trunc, FTRUNC)			DAG_FUNCTION(trunc, 1, 0, experimental_constrained_trunc, FTRUNC)

				// This is definition for fmuladd intrinsic function, that is converted into
				// constrained FMA or FMUL + FADD intrinsics.
				FUNCTION(fmuladd, 3, 1, experimental_constrained_fmuladd)

	#undef INSTRUCTION			#undef INSTRUCTION
	#undef FUNCTION			#undef FUNCTION
	#undef CMP_INSTRUCTION			#undef CMP_INSTRUCTION
	#undef DAG_INSTRUCTION			#undef DAG_INSTRUCTION
	#undef DAG_FUNCTION			#undef DAG_FUNCTION

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn] in {

def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

		def int_experimental_constrained_fmuladd : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

▲ Show 20 Lines • Show All 730 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,015 Lines • ▼ Show 20 Lines	if (FPI.isUnaryOp()) {
Opers.push_back(getValue(FPI.getArgOperand(0)));		Opers.push_back(getValue(FPI.getArgOperand(0)));
Opers.push_back(getValue(FPI.getArgOperand(1)));		Opers.push_back(getValue(FPI.getArgOperand(1)));
Opers.push_back(getValue(FPI.getArgOperand(2)));		Opers.push_back(getValue(FPI.getArgOperand(2)));
} else {		} else {
Opers.push_back(getValue(FPI.getArgOperand(0)));		Opers.push_back(getValue(FPI.getArgOperand(0)));
Opers.push_back(getValue(FPI.getArgOperand(1)));		Opers.push_back(getValue(FPI.getArgOperand(1)));
}		}

		auto pushOutChain = [this](SDValue Result, fp::ExceptionBehavior EB) {
		assert(Result.getNode()->getNumValues() == 2);

		// Push node to the appropriate list so that future instructions can be
		// chained up correctly.
		SDValue OutChain = Result.getValue(1);
		switch (EB) {
		case fp::ExceptionBehavior::ebIgnore:
		// The only reason why ebIgnore nodes still need to be chained is that
		// they might depend on the current rounding mode, and therefore must
		// not be moved across instruction that may change that mode.
		LLVM_FALLTHROUGH;
		case fp::ExceptionBehavior::ebMayTrap:
		// These must not be moved across calls or instructions that may change
		// floating-point exception masks.
		PendingConstrainedFP.push_back(OutChain);
		break;
		case fp::ExceptionBehavior::ebStrict:
		// These must not be moved across calls or instructions that may change
		// floating-point exception masks or read floating-point exception flags.
		// In addition, they cannot be optimized out even if unused.
		PendingConstrainedFPStrict.push_back(OutChain);
		break;
		}
		};

		SDVTList VTs = DAG.getVTList(ValueVTs);
		fp::ExceptionBehavior EB = FPI.getExceptionBehavior().getValue();

unsigned Opcode;		unsigned Opcode;
switch (FPI.getIntrinsicID()) {		switch (FPI.getIntrinsicID()) {
default: llvm_unreachable("Impossible intrinsic"); // Can't reach here.		default: llvm_unreachable("Impossible intrinsic"); // Can't reach here.
#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \		#define DAG_INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case Intrinsic::INTRINSIC: \		case Intrinsic::INTRINSIC: \
Opcode = ISD::STRICT_##DAGN; \		Opcode = ISD::STRICT_##DAGN; \
break;		break;
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
		case Intrinsic::experimental_constrained_fmuladd: {
		Opcode = ISD::STRICT_FMA;
		// Break fmuladd into fmul and fadd.
		if (TM.Options.AllowFPOpFusion == FPOpFusion::Strict \|\|
		!TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(),
		ValueVTs[0])) {
		Opers.pop_back();
		SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers);
		pushOutChain(Mul, EB);
		Opcode = ISD::STRICT_FADD;
		Opers.clear();
		Opers.push_back(Mul.getValue(1));
		Opers.push_back(Mul.getValue(0));
		Opers.push_back(getValue(FPI.getArgOperand(2)));
		}
		break;
		}
}		}

// A few strict DAG nodes carry additional operands that are not		// A few strict DAG nodes carry additional operands that are not
// set up by the default code above.		// set up by the default code above.
switch (Opcode) {		switch (Opcode) {
default: break;		default: break;
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
Opers.push_back(		Opers.push_back(
DAG.getTargetConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));		DAG.getTargetConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout())));
break;		break;
case ISD::STRICT_FSETCC:		case ISD::STRICT_FSETCC:
case ISD::STRICT_FSETCCS: {		case ISD::STRICT_FSETCCS: {
auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);		auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);
Opers.push_back(DAG.getCondCode(getFCmpCondCode(FPCmp->getPredicate())));		Opers.push_back(DAG.getCondCode(getFCmpCondCode(FPCmp->getPredicate())));
break;		break;
}		}
}		}

SDVTList VTs = DAG.getVTList(ValueVTs);
SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers);		SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers);
		craig.topperUnsubmitted Done Reply Inline Actions Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code. craig.topper: Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm…
		pushOutChain(Result, EB);
assert(Result.getNode()->getNumValues() == 2);

// Push node to the appropriate list so that future instructions can be
// chained up correctly.
SDValue OutChain = Result.getValue(1);
switch (FPI.getExceptionBehavior().getValue()) {
case fp::ExceptionBehavior::ebIgnore:
// The only reason why ebIgnore nodes still need to be chained is that
// they might depend on the current rounding mode, and therefore must
// not be moved across instruction that may change that mode.
LLVM_FALLTHROUGH;
case fp::ExceptionBehavior::ebMayTrap:
// These must not be moved across calls or instructions that may change
// floating-point exception masks.
PendingConstrainedFP.push_back(OutChain);
break;
case fp::ExceptionBehavior::ebStrict:
// These must not be moved across calls or instructions that may change
// floating-point exception masks or read floating-point exception flags.
// In addition, they cannot be optimized out even if unused.
PendingConstrainedFPStrict.push_back(OutChain);
break;
}

SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
		craig.topperUnsubmitted Done Reply Inline Actions Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result. craig.topper: Can you make the SDValue Result an argument of this and only capture 'this'. I don't like…
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,		SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
const BasicBlock *EHPadBB) {		const BasicBlock *EHPadBB) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
▲ Show 20 Lines • Show All 3,563 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-intrinsics-fma.ll

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	entry:
%4 = fneg double %2		%4 = fneg double %2
%5 = call double @llvm.experimental.constrained.fma.f64(double %3, double %1, double %4,		%5 = call double @llvm.experimental.constrained.fma.f64(double %3, double %1, double %4,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
%result = fneg double %5		%result = fneg double %5
ret double %result		ret double %result
}		}

		; Verify constrained fmul and fadd aren't fused.
		define float @f11(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f11:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f11:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0
		; FMA-NEXT: vaddss %xmm2, %xmm0, %xmm0
		; FMA-NEXT: retq
		;
		; FMA4-LABEL: f11:
		; FMA4: # %bb.0: # %entry
		; FMA4-NEXT: vmulss %xmm1, %xmm0, %xmm0
		; FMA4-NEXT: vaddss %xmm2, %xmm0, %xmm0
		; FMA4-NEXT: retq
		entry:
		%3 = call float @llvm.experimental.constrained.fmul.f32(float %0, float %1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%4 = call float @llvm.experimental.constrained.fadd.f32(float %3, float %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %4
		}

		; Verify constrained fmul and fadd aren't fused.
		define double @f12(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f12:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f12:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmulsd %xmm1, %xmm0, %xmm0
		; FMA-NEXT: vaddsd %xmm2, %xmm0, %xmm0
		; FMA-NEXT: retq
		;
		; FMA4-LABEL: f12:
		; FMA4: # %bb.0: # %entry
		; FMA4-NEXT: vmulsd %xmm1, %xmm0, %xmm0
		; FMA4-NEXT: vaddsd %xmm2, %xmm0, %xmm0
		; FMA4-NEXT: retq
		entry:
		%3 = call double @llvm.experimental.constrained.fmul.f64(double %0, double %1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%4 = call double @llvm.experimental.constrained.fadd.f64(double %3, double %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %4
		}

		; Verify that fmuladd(3.5) isn't simplified when the rounding mode is
		; unknown.
		define float @f15() #0 {
		; NOFMA-LABEL: f15:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
		; NOFMA-NEXT: movaps %xmm1, %xmm0
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm1, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f15:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
		; FMA-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm0 xmm0) + xmm0
		; FMA-NEXT: retq
		;
		; FMA4-LABEL: f15:
		; FMA4: # %bb.0: # %entry
		; FMA4-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
		; FMA4-NEXT: vfmaddss %xmm0, %xmm0, %xmm0, %xmm0
		; FMA4-NEXT: retq
		entry:
		%result = call float @llvm.experimental.constrained.fmuladd.f32(
		float 3.5,
		float 3.5,
		float 3.5,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %result
		}

		; Verify that fmuladd(42.1) isn't simplified when the rounding mode is
		; unknown.
		define double @f16() #0 {
		; NOFMA-LABEL: f16:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
		; NOFMA-NEXT: movapd %xmm1, %xmm0
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm1, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f16:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		; FMA-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm0 xmm0) + xmm0
		; FMA-NEXT: retq
		;
		; FMA4-LABEL: f16:
		; FMA4: # %bb.0: # %entry
		; FMA4-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		; FMA4-NEXT: vfmaddsd %xmm0, %xmm0, %xmm0, %xmm0
		; FMA4-NEXT: retq
		entry:
		%result = call double @llvm.experimental.constrained.fmuladd.f64(
		double 42.1,
		double 42.1,
		double 42.1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %result
		}

; Verify that fma(3.5) isn't simplified when the rounding mode is		; Verify that fma(3.5) isn't simplified when the rounding mode is
; unknown.		; unknown.
define float @f17() #0 {		define float @f17() #0 {
; NOFMA-LABEL: f17:		; NOFMA-LABEL: f17:
; NOFMA: # %bb.0: # %entry		; NOFMA: # %bb.0: # %entry
; NOFMA-NEXT: pushq %rax		; NOFMA-NEXT: pushq %rax
; NOFMA-NEXT: .cfi_def_cfa_offset 16		; NOFMA-NEXT: .cfi_def_cfa_offset 16
; NOFMA-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; NOFMA-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	%5 = call <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double> %3, <2 x double> %1, <2 x double> %4,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
%result = fneg <2 x double> %5		%result = fneg <2 x double> %5
ret <2 x double> %result		ret <2 x double> %result
}		}

attributes #0 = { strictfp }		attributes #0 = { strictfp }

		declare float @llvm.experimental.constrained.fmul.f32(float, float, metadata, metadata)
		declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, metadata)
		declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
		declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)		declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)		declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
declare <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float>, <4 x float>, <4 x float>, metadata, metadata)		declare <4 x float> @llvm.experimental.constrained.fma.v4f32(<4 x float>, <4 x float>, <4 x float>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double>, <2 x double>, <2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.fma.v2f64(<2 x double>, <2 x double>, <2 x double>, metadata, metadata)
		declare float @llvm.experimental.constrained.fmuladd.f32(float, float, float, metadata, metadata)
		declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double, metadata, metadata)

This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Add pragma FP_CONTRACT support under strict FP.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 240840

clang/lib/CodeGen/CGExprScalar.cpp

clang/test/CodeGen/constrained-math-builtins.c

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/IR/ConstrainedOps.def

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/test/CodeGen/X86/fp-intrinsics-fma.ll

[FPEnv] Add pragma FP_CONTRACT support under strict FP.
ClosedPublic