This is an archive of the discontinued LLVM Phabricator instance.

The title of this review is misleading. It should at least mention FPEnv, constrained intrinsics, or strict fp or something. Right now it sounds like FP_CONTRACT isn't supported at all.

Can we split most of the X86 changes into a separate patch? Most of it can be tested with fneg and constrained.fma.

craig.topper added inline comments.Jan 15 2020, 7:46 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7064	Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result.

Address review comments.

Harbormaster completed remote builds in B44120: Diff 238414.Jan 15 2020, 9:10 PM

craig.topper added inline comments.Jan 15 2020, 9:22 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7061	Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code.

Address review comment.

Harbormaster completed remote builds in B44122: Diff 238417.Jan 15 2020, 10:05 PM

craig.topper added a reviewer: rjmccall.Jan 15 2020, 10:07 PM

pengfei retitled this revision from Add pragma FP_CONTRACT support. to [FPEnv] Add pragma FP_CONTRACT support under strict FP..Jan 15 2020, 10:08 PM

pengfei edited the summary of this revision. (Show Details)

andrew.w.kaylor added inline comments.Jan 16 2020, 12:04 PM

clang/lib/CodeGen/CGExprScalar.cpp
3381	You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true.
3423	I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add: assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul);
clang/test/CodeGen/constrained-math-builtins.c
160	I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed.
llvm/docs/LangRef.rst
16094	s/specifie/specify s/the exception behavior/the rounding mode and exception behavior
16104	missing metadata arguments
llvm/include/llvm/CodeGen/BasicTTIImpl.h
1515	I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be.
llvm/include/llvm/CodeGen/ISDOpcodes.h
355	Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is wrong. After looking through the rest of the code, I think I understand what's going on. I think we need a verbose comment to explain it. Here's my suggestion FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is required for correct macro expansion and default handling (which is never reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to determine (based on TargetOptions and target cost information) whether the constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and FADD operations. Having thought through that, however, it strikes me as a lot of overhead. Can we just add special handling for the constrained.fmuladd intrinsic and make the decision then to create either a STRICT_FMA node or separate STRICT_FMUL and STRICT_FADD? The idea that ISD::FMULADD is going to exist as a defined opcode but we never intend to add any support for handling it is particularly bad.

craig.topper mentioned this in D72871: [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION, and macro FUNCTION likewise. NFCI..Jan 16 2020, 1:13 PM

cameron.mcinally added a subscriber: cameron.mcinally.Jan 16 2020, 1:48 PM

cameron.mcinally added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3436	I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it?

cameron.mcinally added inline comments.Jan 16 2020, 1:50 PM

clang/lib/CodeGen/CGExprScalar.cpp
3436	To be clear, we could miss very-edge-case overflow/underflow exceptions.

cameron.mcinally added inline comments.Jan 16 2020, 1:53 PM

clang/lib/CodeGen/CGExprScalar.cpp
3436	Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise.

andrew.w.kaylor added inline comments.Jan 16 2020, 2:15 PM

clang/lib/CodeGen/CGExprScalar.cpp
3436	We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not. I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma"). For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics.

Address review comments.

Harbormaster completed remote builds in B44270: Diff 238769.Jan 17 2020, 6:59 AM

pengfei marked an inline comment as done.Jan 17 2020, 7:01 AM

pengfei added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3381	I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type.
3423	Add assertion in line 3380. We only need to check once there.
llvm/test/TableGen/GlobalISelEmitter-input-discard.td
18 ↗	(On Diff #238769)	It's strange the number is affected. I haven't found any cause.

Remove unnecessary comment.

Harbormaster completed remote builds in B44271: Diff 238770.Jan 17 2020, 7:09 AM

pengfei added a parent revision: D72871: [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION, and macro FUNCTION likewise. NFCI..Jan 17 2020, 7:10 AM

cameron.mcinally added inline comments.Jan 17 2020, 2:36 PM

clang/lib/CodeGen/CGExprScalar.cpp
3436	You had me until: For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. To be clear, we'd need them for the `fast` case, but I don't see a lot of value for the `strict` case. We definitely want reassoc/recip/etc for the `optimized but trap-safe` case, so that's enough to require FMF flags on constrained intrinsics alone. We should probably break this conversation out into an llvm-dev thread...

Remember that the design is that constrained intrinsics must be used whenever *any* code in the function is constrained. It is not unreasonable that part of the function might be constrained and the rest subject to fast-math; it'd be a shame if the intrinsics couldn't even express that.

andrew.w.kaylor added inline comments.Jan 17 2020, 3:24 PM

clang/lib/CodeGen/CGExprScalar.cpp
3436	I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it.

kpn added a subscriber: kpn.Jan 21 2020, 11:54 AM

craig.topper added inline comments.Jan 23 2020, 10:06 PM

clang/lib/CodeGen/CGExprScalar.cpp
3381	Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too.

kpn added inline comments.Jan 24 2020, 4:56 AM

clang/lib/CodeGen/CGExprScalar.cpp
3381	Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments. So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this.

craig.topper added inline comments.Jan 24 2020, 8:40 AM

clang/lib/CodeGen/CGExprScalar.cpp
3381	This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3).

kpn added inline comments.Jan 24 2020, 9:14 AM

clang/lib/CodeGen/CGExprScalar.cpp
3381	Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute.

Address review comment.

pengfei marked an inline comment as done.Jan 26 2020, 9:19 PM

pengfei added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3381	Yes, it's the best choice. Thanks!

Harbormaster completed remote builds in B44964: Diff 240474.Jan 26 2020, 9:19 PM

LGTM

This revision is now accepted and ready to land.Jan 27 2020, 10:31 PM

Closed by commit rG3239b5034ee9: [FPEnv] Add pragma FP_CONTRACT support under strict FP. (authored by Wang, Pengfei <pengfei.wang@intel.com>). · Explain WhyJan 28 2020, 4:50 AM

This revision was automatically updated to reflect the committed changes.

jhenderson added a subscriber: jhenderson.Jan 28 2020, 5:41 AM

jhenderson added inline comments.

llvm/docs/LangRef.rst
16065	This underline isn't long enough and is breaking the sphinx build bot. Please fix.

pengfei marked an inline comment as done.Jan 28 2020, 6:03 AM

pengfei added inline comments.

llvm/docs/LangRef.rst
16065	Thanks! I'll fix it soon.

Allen added a subscriber: Allen.Feb 27 2023, 5:57 PM

Allen added inline comments.

clang/lib/CodeGen/CGExprScalar.cpp
3382	Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 27 2023, 5:57 PM

pengfei added inline comments.Feb 28 2023, 1:31 AM

clang/lib/CodeGen/CGExprScalar.cpp
3382	No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGExprScalar.cpp

34 lines

test/

CodeGen/

constrained-math-builtins.c

10 lines

llvm/

docs/

LangRef.rst

57 lines

include/

llvm/

CodeGen/

BasicTTIImpl.h

10 lines

ISDOpcodes.h

4 lines

IR/

ConstrainedOps.def

4 lines

Intrinsics.td

7 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

72 lines

Target/

X86/

6 lines

78 lines

36 lines

66 lines

X86InstrFragmentsSIMD.td

12 lines

test/

CodeGen/

X86/

fp-intrinsics-fma.ll

342 lines

Diff 238408

clang/lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 3,355 Lines • ▼ Show 20 Lines	return CGF.EmitCheckedInBoundsGEP(pointer, index, isSigned, isSubtraction,
op.E->getExprLoc(), "add.ptr");		op.E->getExprLoc(), "add.ptr");
}		}

// Construct an fmuladd intrinsic to represent a fused mul-add of MulOp and		// Construct an fmuladd intrinsic to represent a fused mul-add of MulOp and
// Addend. Use negMul and negAdd to negate the first operand of the Mul or		// Addend. Use negMul and negAdd to negate the first operand of the Mul or
// the add operand respectively. This allows fmuladd to represent a*b-c, or		// the add operand respectively. This allows fmuladd to represent a*b-c, or
// c-a*b. Patterns in LLVM should catch the negated forms and translate them to		// c-a*b. Patterns in LLVM should catch the negated forms and translate them to
// efficient operations.		// efficient operations.
static Value* buildFMulAdd(llvm::BinaryOperator MulOp, Value Addend,		static Value* buildFMulAdd(llvm::Instruction MulOp, Value Addend,
const CodeGenFunction &CGF, CGBuilderTy &Builder,		const CodeGenFunction &CGF, CGBuilderTy &Builder,
bool negMul, bool negAdd) {		bool negMul, bool negAdd) {
assert(!(negMul && negAdd) && "Only one of negMul and negAdd should be set.");		assert(!(negMul && negAdd) && "Only one of negMul and negAdd should be set.");

Value *MulOp0 = MulOp->getOperand(0);		Value *MulOp0 = MulOp->getOperand(0);
Value *MulOp1 = MulOp->getOperand(1);		Value *MulOp1 = MulOp->getOperand(1);
if (negMul)		if (negMul)
MulOp0 = Builder.CreateFNeg(MulOp0, "neg");		MulOp0 = Builder.CreateFNeg(MulOp0, "neg");
if (negAdd)		if (negAdd)
Addend = Builder.CreateFNeg(Addend, "neg");		Addend = Builder.CreateFNeg(Addend, "neg");

Value *FMulAdd = Builder.CreateCall(		Value *FMulAdd = nullptr;
		if (Builder.getIsFPConstrained())
		FMulAdd = Builder.CreateCall(
		CGF.CGM.getIntrinsic(llvm::Intrinsic::experimental_constrained_fmuladd,
		Addend->getType()),
		{MulOp0, MulOp1, Addend, MulOp->getOperand(2), MulOp->getOperand(3)});
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic and use ConstrainedFPIntrinsic::getRoundingMode() and ConstrainedFPIntrinsic::getExceptionBehavior(). The cast will effectively assert that MulOp is a constrained intrisic. I think that should always be true. andrew.w.kaylor: You shouldn't just assume that MulOp is a constrained intrinsic. Cast to ConstrainedFPIntrinsic…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same exception/rounding mode with fmul. 2). the function getRoundingMode/getExceptionBehavior just return a enum value. We need more code to turn them into Value type. pengfei: I prefer to reuse the operands from the fmul intrinsic here. 1). fmuladd always has the same…
		craig.topperUnsubmitted Not Done Reply Inline Actions Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That will take care of adding the metadata operands too. craig.topper: Doesn't this need to be CreateConstrainedFPCall so that the strictfp attribute is added? That…
		kpnUnsubmitted Not Done Reply Inline Actions Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained intrinsic and the Instruction class blew up because the function signature was wrong. I wasn't passing in the metadata arguments. So, yes, it should be, and it would might make sense for the patch to have test coverage that catches any other cases of this. kpn: Is this code tested? I ran into a bug yesterday where CreateCall was used with a constrained…
		craig.topperUnsubmitted Not Done Reply Inline Actions This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the getOperand(2) and getOperand(3). craig.topper: This code is copying the metadata arguments from the fmul intrinsic, MulOp. that’s the…
		kpnUnsubmitted Not Done Reply Inline Actions Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using CreateConstrainedFPCall() is the easiest way to fix the attribute. kpn: Ah, yes, thanks. Your comment about the attribute is still valid, though. And, yes, using…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Yes, it's the best choice. Thanks! pengfei: Yes, it's the best choice. Thanks!
		else
		AllenUnsubmitted Not Done Reply Inline Actions Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this optimization supposed to assume that all the backends have instructions like Fmuladd? Allen: Sorry, I'm not familiar with the optimization of the clang front end. I'd like to ask, is this…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be either interpretered as mul + add or fma. It represents user doesn't care the differece between them. pengfei: No, it is a flexible intrinsic, which allows backends to choose their best approach. It can be…
		FMulAdd = Builder.CreateCall(
CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),		CGF.CGM.getIntrinsic(llvm::Intrinsic::fmuladd, Addend->getType()),
{MulOp0, MulOp1, Addend});		{MulOp0, MulOp1, Addend});
MulOp->eraseFromParent();		MulOp->eraseFromParent();

return FMulAdd;		return FMulAdd;
}		}

// Check whether it would be legal to emit an fmuladd intrinsic call to		// Check whether it would be legal to emit an fmuladd intrinsic call to
// represent op and if so, build the fmuladd.		// represent op and if so, build the fmuladd.
//		//
// Checks that (a) the operation is fusable, and (b) -ffp-contract=on.		// Checks that (a) the operation is fusable, and (b) -ffp-contract=on.
// Does NOT check the type of the operation - it's assumed that this function		// Does NOT check the type of the operation - it's assumed that this function
// will be called from contexts where it's known that the type is contractable.		// will be called from contexts where it's known that the type is contractable.
Show All 18 Lines	if (LHSBinOp->getOpcode() == llvm::Instruction::FMul &&
return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);		return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);
}		}
if (auto *RHSBinOp = dyn_cast<llvm::BinaryOperator>(op.RHS)) {		if (auto *RHSBinOp = dyn_cast<llvm::BinaryOperator>(op.RHS)) {
if (RHSBinOp->getOpcode() == llvm::Instruction::FMul &&		if (RHSBinOp->getOpcode() == llvm::Instruction::FMul &&
RHSBinOp->use_empty())		RHSBinOp->use_empty())
return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);		return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);
}		}

		if (Builder.getIsFPConstrained()) {
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I don't think we should ever non-constrained create FMul instructions if Builder is in FP constrained mode, but you should assert that somewhere above. Maybe move this block above line 3409 and add: assert(LHSBinOp->getOpcode() != llvm::Instruction::FMul && RHSBinOp->getOpcode() != llvm::Instruction::FMul); andrew.w.kaylor: I don't think we should ever non-constrained create FMul instructions if Builder is in FP…
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Add assertion in line 3380. We only need to check once there. pengfei: Add assertion in line 3380. We only need to check once there.
		if (auto *LHSBinOp = dyn_cast<llvm::CallBase>(op.LHS)) {
		if (LHSBinOp->getIntrinsicID() ==
		llvm::Intrinsic::experimental_constrained_fmul &&
		LHSBinOp->use_empty())
		return buildFMulAdd(LHSBinOp, op.RHS, CGF, Builder, false, isSub);
		}
		if (auto *RHSBinOp = dyn_cast<llvm::CallBase>(op.RHS)) {
		if (RHSBinOp->getIntrinsicID() ==
		llvm::Intrinsic::experimental_constrained_fmul &&
		RHSBinOp->use_empty())
		return buildFMulAdd(RHSBinOp, op.LHS, CGF, Builder, isSub, false);
		}
		}
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the same as those individual operations. FMULADD doesn't guarantee that, does it? cameron.mcinally: I don't think it's safe to fuse a FMUL and FADD if the intermediate rounding isn't exactly the…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions To be clear, we could miss very-edge-case overflow/underflow exceptions. cameron.mcinally: To be clear, we could miss very-edge-case overflow/underflow exceptions.
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise. cameron.mcinally: Ah, but I see C/C++ FP_CONTRACT allows the exceptions to be optimized away. Sorry for the noise.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions We've talked about this before but I don't think we ever documented a decision as to whether we want to allow constrained intrinsics and fast math flags to be combined. This patch moves that decision into clang's decision to generate this intrinsic or not. I think it definitely makes sense in the case of fp contraction, because even if a user cares about value safety they might want FMA, which is theorectically more accurate than the separate values even though it produces a different value. This is consistent with gcc (which produces FMA under "-ffp-contract=fast -fno-fast-math") and icc (which produced FMA under "-fp-model strict -fma"). For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. andrew.w.kaylor: We've talked about this before but I don't think we ever documented a decision as to whether we…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions You had me until: For the record, I also think it makes sense to use nnan, ninf, and nsz with constrained intrinsics. To be clear, we'd need them for the `fast` case, but I don't see a lot of value for the `strict` case. We definitely want reassoc/recip/etc for the `optimized but trap-safe` case, so that's enough to require FMF flags on constrained intrinsics alone. We should probably break this conversation out into an llvm-dev thread... cameron.mcinally: You had me until: >For the record, I also think it makes sense to use nnan, ninf, and nsz with…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I agree about starting an llvm-dev thread. I'll send something out unless you've already done so by the time I finish typing it. andrew.w.kaylor: I agree about starting an llvm-dev thread. I'll send something out unless you've already done…

return nullptr;		return nullptr;
}		}

Value *ScalarExprEmitter::EmitAdd(const BinOpInfo &op) {		Value *ScalarExprEmitter::EmitAdd(const BinOpInfo &op) {
if (op.LHS->getType()->isPointerTy() \|\|		if (op.LHS->getType()->isPointerTy() \|\|
op.RHS->getType()->isPointerTy())		op.RHS->getType()->isPointerTy())
return emitPointerArithmetic(CGF, op, CodeGenFunction::NotSubtraction);		return emitPointerArithmetic(CGF, op, CodeGenFunction::NotSubtraction);

▲ Show 20 Lines • Show All 1,442 Lines • Show Last 20 Lines

clang/test/CodeGen/constrained-math-builtins.c

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	// CHECK: declare x86_fp80 @llvm.experimental.constrained.sqrt.f80(x86_fp80, metadata, metadata)

__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f);		__builtin_trunc(f); __builtin_truncf(f); __builtin_truncl(f);

// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)		// CHECK: declare double @llvm.experimental.constrained.trunc.f64(double, metadata)
// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)		// CHECK: declare float @llvm.experimental.constrained.trunc.f32(float, metadata)
// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)		// CHECK: declare x86_fp80 @llvm.experimental.constrained.trunc.f80(x86_fp80, metadata)
};		};

		#pragma STDC FP_CONTRACT ON
		void bar(float f) {
		f * f + f;
		(double)f * f + f;
		(long double)f * f + f;

		// CHECK: declare float @llvm.experimental.constrained.fmuladd.f32(float, float, float, metadata, metadata)
		// CHECK: declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double, metadata, metadata)
		// CHECK: declare x86_fp80 @llvm.experimental.constrained.fmuladd.f80(x86_fp80, x86_fp80, x86_fp80, metadata, metadata)
		};
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I'd like to see a test that verifies the calls generated in the function and specifically a test that verifies that the constrained fneg is generated if needed. andrew.w.kaylor: I'd like to see a test that verifies the calls generated in the function and specifically a…

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,055 Lines • ▼ Show 20 Lines
	- "``uno``": yields ``true`` if either operand is a NAN.			- "``uno``": yields ``true`` if either operand is a NAN.

	The quiet comparison operation performed by			The quiet comparison operation performed by
	'``llvm.experimental.constrained.fcmp``' will only raise an exception			'``llvm.experimental.constrained.fcmp``' will only raise an exception
	if either operand is a SNAN. The signaling comparison operation			if either operand is a SNAN. The signaling comparison operation
	performed by '``llvm.experimental.constrained.fcmps``' will raise an			performed by '``llvm.experimental.constrained.fcmps``' will raise an
	exception if either operand is a NAN (QNAN or SNAN).			exception if either operand is a NAN (QNAN or SNAN).

				'``llvm.experimental.constrained.fmuladd``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				jhendersonUnsubmitted Not Done Reply Inline Actions This underline isn't long enough and is breaking the sphinx build bot. Please fix. jhenderson: This underline isn't long enough and is breaking the sphinx build bot. Please fix.
				pengfeiAuthorUnsubmitted Done Reply Inline Actions Thanks! I'll fix it soon. pengfei: Thanks! I'll fix it soon.

				Syntax:
				"""""""

				::

				declare <type>
				@llvm.experimental.constrained.fmuladd(<type> <op1>, <type> <op2>,
				<type> <op3>,
				metadata <rounding mode>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.fmuladd``' intrinsic represents
				multiply-add expressions that can be fused if the code generator determines
				that (a) the target instruction set has support for a fused operation,
				and (b) that the fused operation is more efficient than the equivalent,
				separate pair of mul and add instructions.

				Arguments:
				""""""""""

				The first three arguments to the '``llvm.experimental.constrained.fmuladd``'
				intrinsic must be floating-point or vector of floating-point values.
				All three arguments must have identical types.

				The fourth and fifth arguments specifie the exception behavior as described
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions s/specifie/specify s/the exception behavior/the rounding mode and exception behavior andrew.w.kaylor: s/specifie/specify s/the exception behavior/the rounding mode and exception behavior
				above.

				Semantics:
				""""""""""

				The expression:

				::

				%0 = call float @llvm.experimental.constrained.fmuladd.f32(%a, %b, %c)
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions missing metadata arguments andrew.w.kaylor: missing metadata arguments

				is equivalent to the expression:

				::

				%0 = call float @llvm.experimental.constrained.fmul.f32(%a, %b)
				%1 = call float @llvm.experimental.constrained.fadd.f32(%0, %c)

				except that it is unspecified whether rounding will be performed between the
				multiplication and addition steps. Fusion is not guaranteed, even if the target
				platform supports it.
				If a fused multiply-add is required, the corresponding
				:ref:`llvm.experimental.constrained.fma <int_fma>` intrinsic function should be
				used instead.
				This never sets errno, just as '``llvm.experimental.constrained.fma.*``'.

	Constrained libm-equivalent Intrinsics			Constrained libm-equivalent Intrinsics
	--------------------------------------			--------------------------------------

	In addition to the basic floating-point operations for which constrained			In addition to the basic floating-point operations for which constrained
	intrinsics are described above, there are constrained versions of various			intrinsics are described above, there are constrained versions of various
	operations which provide equivalent behavior to a corresponding libm function.			operations which provide equivalent behavior to a corresponding libm function.
	These intrinsics allow the precise behavior of these operations with respect to			These intrinsics allow the precise behavior of these operations with respect to
	rounding mode and exception behavior to be controlled.			rounding mode and exception behavior to be controlled.
	▲ Show 20 Lines • Show All 2,405 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,280 Lines • ▼ Show 20 Lines	case Intrinsic::pow:
ISDs.push_back(ISD::FPOW);		ISDs.push_back(ISD::FPOW);
break;		break;
case Intrinsic::fma:		case Intrinsic::fma:
ISDs.push_back(ISD::FMA);		ISDs.push_back(ISD::FMA);
break;		break;
case Intrinsic::fmuladd:		case Intrinsic::fmuladd:
ISDs.push_back(ISD::FMA);		ISDs.push_back(ISD::FMA);
break;		break;
		case Intrinsic::experimental_constrained_fmuladd:
		ISDs.push_back(ISD::STRICT_FMA);
		break;
// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.		// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.
case Intrinsic::lifetime_start:		case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end:		case Intrinsic::lifetime_end:
case Intrinsic::sideeffect:		case Intrinsic::sideeffect:
return 0;		return 0;
case Intrinsic::masked_store:		case Intrinsic::masked_store:
return ConcreteTTI->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0,		return ConcreteTTI->getMaskedMemoryOpCost(Instruction::Store, Tys[0], 0,
0);		0);
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	unsigned getIntrinsicInstrCost(
if (MinCustomCostI != CustomCost.end())		if (MinCustomCostI != CustomCost.end())
return *MinCustomCostI;		return *MinCustomCostI;

// If we can't lower fmuladd into an FMA estimate the cost as a floating		// If we can't lower fmuladd into an FMA estimate the cost as a floating
// point mul followed by an add.		// point mul followed by an add.
if (IID == Intrinsic::fmuladd)		if (IID == Intrinsic::fmuladd)
return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +		return ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +
ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);		ConcreteTTI->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);
		// FIXME: Is constrained intrinsic' cost equal to it's no strict one?
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I don't think that matters. The cost calculation here is a conservative estimate based on the cost if we are unable to generate an FMA instruction. So a constrained fmuladd that can't be lowered to FMA will be lower the same way a contrained mul followed by a constrained add would be. andrew.w.kaylor: I don't think that matters. The cost calculation here is a conservative estimate based on the…
		if (IID == Intrinsic::experimental_constrained_fmuladd)
		return ConcreteTTI->getIntrinsicCost(
		Intrinsic::experimental_constrained_fmul, RetTy, Tys,
		nullptr) +
		ConcreteTTI->getIntrinsicCost(
		Intrinsic::experimental_constrained_fadd, RetTy, Tys, nullptr);

// Else, assume that we need to scalarize this intrinsic. For math builtins		// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it		// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.		// very expensive.
if (RetTy->isVectorTy()) {		if (RetTy->isVectorTy()) {
unsigned ScalarizationCost =		unsigned ScalarizationCost =
((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())		((ScalarizationCostPassed != std::numeric_limits<unsigned>::max())
? ScalarizationCostPassed		? ScalarizationCostPassed
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	enum NodeType {
STRICT_FP_EXTEND,		STRICT_FP_EXTEND,

/// STRICT_FSETCC/STRICT_FSETCCS - Constrained versions of SETCC, used		/// STRICT_FSETCC/STRICT_FSETCCS - Constrained versions of SETCC, used
/// for floating-point operands only. STRICT_FSETCC performs a quiet		/// for floating-point operands only. STRICT_FSETCC performs a quiet
/// comparison operation, while STRICT_FSETCCS performs a signaling		/// comparison operation, while STRICT_FSETCCS performs a signaling
/// comparison operation.		/// comparison operation.
STRICT_FSETCC, STRICT_FSETCCS,		STRICT_FSETCC, STRICT_FSETCCS,

		/// FMULADD/STRICT_FMULADD - A intermediate node, made functions handle
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is wrong. After looking through the rest of the code, I think I understand what's going on. I think we need a verbose comment to explain it. Here's my suggestion FMULADD/STRICT_FMULADD -- These are intermediate opcodes used to handle the constrained.fmuladd intrinsic. The FMULADD opcode only exists because it is required for correct macro expansion and default handling (which is never reached). There should never be a node with ISD::FMULADD. The STRICT_FMULADD opcode is used to allow selectionDAGBuilder::visitConstrainedFPIntrinsic to determine (based on TargetOptions and target cost information) whether the constrained.fmuladd intrinsic should be lowered to FMA or separate FMUL and FADD operations. Having thought through that, however, it strikes me as a lot of overhead. Can we just add special handling for the constrained.fmuladd intrinsic and make the decision then to create either a STRICT_FMA node or separate STRICT_FMUL and STRICT_FADD? The idea that ISD::FMULADD is going to exist as a defined opcode but we never intend to add any support for handling it is particularly bad. andrew.w.kaylor: Something is wrong with this comment. I'm not sure what it's trying to say but the grammar is…
		/// constrained fmuladd the same as other constrained intrinsics.
		FMULADD, STRICT_FMULADD,

/// FMA - Perform a * b + c with no intermediate rounding step.		/// FMA - Perform a * b + c with no intermediate rounding step.
FMA,		FMA,

/// FMAD - Perform a * b + c, while getting the same result as the		/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.		/// separately rounded operations.
FMAD,		FMAD,

/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This		/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This
▲ Show 20 Lines • Show All 774 Lines • Show Last 20 Lines

llvm/include/llvm/IR/ConstrainedOps.def

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	FUNCTION(pow, 2, 1, experimental_constrained_pow, FPOW)			FUNCTION(pow, 2, 1, experimental_constrained_pow, FPOW)
	FUNCTION(powi, 2, 1, experimental_constrained_powi, FPOWI)			FUNCTION(powi, 2, 1, experimental_constrained_powi, FPOWI)
	FUNCTION(rint, 1, 1, experimental_constrained_rint, FRINT)			FUNCTION(rint, 1, 1, experimental_constrained_rint, FRINT)
	FUNCTION(round, 1, 0, experimental_constrained_round, FROUND)			FUNCTION(round, 1, 0, experimental_constrained_round, FROUND)
	FUNCTION(sin, 1, 1, experimental_constrained_sin, FSIN)			FUNCTION(sin, 1, 1, experimental_constrained_sin, FSIN)
	FUNCTION(sqrt, 1, 1, experimental_constrained_sqrt, FSQRT)			FUNCTION(sqrt, 1, 1, experimental_constrained_sqrt, FSQRT)
	FUNCTION(trunc, 1, 0, experimental_constrained_trunc, FTRUNC)			FUNCTION(trunc, 1, 0, experimental_constrained_trunc, FTRUNC)

				// This is definition for fmuladd intrinsic function, that is converted into
				// constrained FMA or FMUL + FADD intrinsics.
				FUNCTION(fmuladd, 3, 1, experimental_constrained_fmuladd, FMULADD)

	#undef INSTRUCTION			#undef INSTRUCTION
	#undef FUNCTION			#undef FUNCTION
	#undef CMP_INSTRUCTION			#undef CMP_INSTRUCTION

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 620 Lines • ▼ Show 20 Lines	let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn] in {

def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

		def int_experimental_constrained_fmuladd : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

▲ Show 20 Lines • Show All 724 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,052 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::STRICT_FSETCC:		case ISD::STRICT_FSETCC:
case ISD::STRICT_FSETCCS: {		case ISD::STRICT_FSETCCS: {
auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);		auto *FPCmp = dyn_cast<ConstrainedFPCmpIntrinsic>(&FPI);
Opers.push_back(DAG.getCondCode(getFCmpCondCode(FPCmp->getPredicate())));		Opers.push_back(DAG.getCondCode(getFCmpCondCode(FPCmp->getPredicate())));
break;		break;
}		}
}		}

SDVTList VTs = DAG.getVTList(ValueVTs);		SDVTList VTs = DAG.getVTList(ValueVTs);
		craig.topperUnsubmitted Done Reply Inline Actions Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm coding style prefers auto to only be used when the type is easily assumed by someone reading the code. craig.topper: Why is Result a reference? It's not modified is it? Don't use auto for parameter types. llvm…
SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers);		SDValue Result;

		auto pushOutChain = [&]() {
		craig.topperUnsubmitted Done Reply Inline Actions Can you make the SDValue Result an argument of this and only capture 'this'. I don't like depending on reassigning Result. craig.topper: Can you make the SDValue Result an argument of this and only capture 'this'. I don't like…
assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);

// Push node to the appropriate list so that future instructions can be		// Push node to the appropriate list so that future instructions can be
// chained up correctly.		// chained up correctly.
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
switch (FPI.getExceptionBehavior().getValue()) {		switch (FPI.getExceptionBehavior().getValue()) {
case fp::ExceptionBehavior::ebIgnore:		case fp::ExceptionBehavior::ebIgnore:
// The only reason why ebIgnore nodes still need to be chained is that		// The only reason why ebIgnore nodes still need to be chained is that
// they might depend on the current rounding mode, and therefore must		// they might depend on the current rounding mode, and therefore must
// not be moved across instruction that may change that mode.		// not be moved across instruction that may change that mode.
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case fp::ExceptionBehavior::ebMayTrap:		case fp::ExceptionBehavior::ebMayTrap:
// These must not be moved across calls or instructions that may change		// These must not be moved across calls or instructions that may change
// floating-point exception masks.		// floating-point exception masks.
PendingConstrainedFP.push_back(OutChain);		PendingConstrainedFP.push_back(OutChain);
break;		break;
case fp::ExceptionBehavior::ebStrict:		case fp::ExceptionBehavior::ebStrict:
// These must not be moved across calls or instructions that may change		// These must not be moved across calls or instructions that may change
// floating-point exception masks or read floating-point exception flags.		// floating-point exception masks or read floating-point exception flags.
// In addition, they cannot be optimized out even if unused.		// In addition, they cannot be optimized out even if unused.
PendingConstrainedFPStrict.push_back(OutChain);		PendingConstrainedFPStrict.push_back(OutChain);
break;		break;
}		}
		};

		if (Opcode == ISD::STRICT_FMULADD) {
		Opcode = ISD::STRICT_FMA;
		// Break fmuladd into fmul and fadd.
		if (TM.Options.AllowFPOpFusion == FPOpFusion::Strict \|\|
		!TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(),
		ValueVTs[0])) {
		Opers.pop_back();
		Result = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers);
		pushOutChain();
		Opcode = ISD::STRICT_FADD;
		Opers.clear();
		Opers.push_back(Result.getValue(1));
		Opers.push_back(Result.getValue(0));
		Opers.push_back(getValue(FPI.getArgOperand(2)));
		}
		}

		Result = DAG.getNode(Opcode, sdl, VTs, Opers);
		pushOutChain();

SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,		SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
const BasicBlock *EHPadBB) {		const BasicBlock *EHPadBB) {
▲ Show 20 Lines • Show All 3,565 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 462 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// VNNI		// VNNI
VPDPBUSD,		VPDPBUSD,
VPDPBUSDS,		VPDPBUSDS,
VPDPWSSD,		VPDPWSSD,
VPDPWSSDS,		VPDPWSSDS,

// FMA nodes.		// FMA nodes.
// We use the target independent ISD::FMA for the non-inverted case.		// We use the target independent ISD::FMA for the non-inverted case.
FNMADD,		FNMADD, STRICT_FNMADD,
FMSUB,		FMSUB, STRICT_FMSUB,
FNMSUB,		FNMSUB, STRICT_FNMSUB,
FMADDSUB,		FMADDSUB,
FMSUBADD,		FMSUBADD,

// FMA with rounding mode.		// FMA with rounding mode.
FMADD_RND,		FMADD_RND,
FNMADD_RND,		FNMADD_RND,
FMSUB_RND,		FMSUB_RND,
FNMSUB_RND,		FNMSUB_RND,
▲ Show 20 Lines • Show All 1,238 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,994 Lines • ▼ Show 20 Lines
setTargetDAGCombine(ISD::SRL);		setTargetDAGCombine(ISD::SRL);
setTargetDAGCombine(ISD::OR);		setTargetDAGCombine(ISD::OR);
setTargetDAGCombine(ISD::AND);		setTargetDAGCombine(ISD::AND);
setTargetDAGCombine(ISD::ADD);		setTargetDAGCombine(ISD::ADD);
setTargetDAGCombine(ISD::FADD);		setTargetDAGCombine(ISD::FADD);
setTargetDAGCombine(ISD::FSUB);		setTargetDAGCombine(ISD::FSUB);
setTargetDAGCombine(ISD::FNEG);		setTargetDAGCombine(ISD::FNEG);
setTargetDAGCombine(ISD::FMA);		setTargetDAGCombine(ISD::FMA);
		setTargetDAGCombine(ISD::STRICT_FMA);
setTargetDAGCombine(ISD::FMINNUM);		setTargetDAGCombine(ISD::FMINNUM);
setTargetDAGCombine(ISD::FMAXNUM);		setTargetDAGCombine(ISD::FMAXNUM);
setTargetDAGCombine(ISD::SUB);		setTargetDAGCombine(ISD::SUB);
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);
setTargetDAGCombine(ISD::MLOAD);		setTargetDAGCombine(ISD::MLOAD);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
setTargetDAGCombine(ISD::MSTORE);		setTargetDAGCombine(ISD::MSTORE);
setTargetDAGCombine(ISD::TRUNCATE);		setTargetDAGCombine(ISD::TRUNCATE);
▲ Show 20 Lines • Show All 27,801 Lines • ▼ Show 20 Lines	const char *X86TargetLowering::getTargetNodeName(unsigned Opcode) const {
case X86ISD::VPMADDUBSW: return "X86ISD::VPMADDUBSW";		case X86ISD::VPMADDUBSW: return "X86ISD::VPMADDUBSW";
case X86ISD::VPMADDWD: return "X86ISD::VPMADDWD";		case X86ISD::VPMADDWD: return "X86ISD::VPMADDWD";
case X86ISD::VPSHA: return "X86ISD::VPSHA";		case X86ISD::VPSHA: return "X86ISD::VPSHA";
case X86ISD::VPSHL: return "X86ISD::VPSHL";		case X86ISD::VPSHL: return "X86ISD::VPSHL";
case X86ISD::VPCOM: return "X86ISD::VPCOM";		case X86ISD::VPCOM: return "X86ISD::VPCOM";
case X86ISD::VPCOMU: return "X86ISD::VPCOMU";		case X86ISD::VPCOMU: return "X86ISD::VPCOMU";
case X86ISD::VPERMIL2: return "X86ISD::VPERMIL2";		case X86ISD::VPERMIL2: return "X86ISD::VPERMIL2";
case X86ISD::FMSUB: return "X86ISD::FMSUB";		case X86ISD::FMSUB: return "X86ISD::FMSUB";
		case X86ISD::STRICT_FMSUB: return "X86ISD::STRICT_FMSUB";
case X86ISD::FNMADD: return "X86ISD::FNMADD";		case X86ISD::FNMADD: return "X86ISD::FNMADD";
		case X86ISD::STRICT_FNMADD: return "X86ISD::STRICT_FNMADD";
case X86ISD::FNMSUB: return "X86ISD::FNMSUB";		case X86ISD::FNMSUB: return "X86ISD::FNMSUB";
		case X86ISD::STRICT_FNMSUB: return "X86ISD::STRICT_FNMSUB";
case X86ISD::FMADDSUB: return "X86ISD::FMADDSUB";		case X86ISD::FMADDSUB: return "X86ISD::FMADDSUB";
case X86ISD::FMSUBADD: return "X86ISD::FMSUBADD";		case X86ISD::FMSUBADD: return "X86ISD::FMSUBADD";
case X86ISD::FMADD_RND: return "X86ISD::FMADD_RND";		case X86ISD::FMADD_RND: return "X86ISD::FMADD_RND";
case X86ISD::FNMADD_RND: return "X86ISD::FNMADD_RND";		case X86ISD::FNMADD_RND: return "X86ISD::FNMADD_RND";
case X86ISD::FMSUB_RND: return "X86ISD::FMSUB_RND";		case X86ISD::FMSUB_RND: return "X86ISD::FMSUB_RND";
case X86ISD::FNMSUB_RND: return "X86ISD::FNMSUB_RND";		case X86ISD::FNMSUB_RND: return "X86ISD::FNMSUB_RND";
case X86ISD::FMADDSUB_RND: return "X86ISD::FMADDSUB_RND";		case X86ISD::FMADDSUB_RND: return "X86ISD::FMADDSUB_RND";
case X86ISD::FMSUBADD_RND: return "X86ISD::FMSUBADD_RND";		case X86ISD::FMSUBADD_RND: return "X86ISD::FMSUBADD_RND";
▲ Show 20 Lines • Show All 12,679 Lines • ▼ Show 20 Lines	static SDValue isFNEG(SelectionDAG &DAG, SDNode *N, unsigned Depth = 0) {
return SDValue();		return SDValue();
}		}

static unsigned negateFMAOpcode(unsigned Opcode, bool NegMul, bool NegAcc,		static unsigned negateFMAOpcode(unsigned Opcode, bool NegMul, bool NegAcc,
bool NegRes) {		bool NegRes) {
if (NegMul) {		if (NegMul) {
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("Unexpected opcode");		default: llvm_unreachable("Unexpected opcode");
case ISD::FMA: Opcode = X86ISD::FNMADD; break;		case ISD::FMA: Opcode = X86ISD::FNMADD; break;
		case ISD::STRICT_FMA: Opcode = X86ISD::STRICT_FNMADD; break;
case X86ISD::FMADD_RND: Opcode = X86ISD::FNMADD_RND; break;		case X86ISD::FMADD_RND: Opcode = X86ISD::FNMADD_RND; break;
case X86ISD::FMSUB: Opcode = X86ISD::FNMSUB; break;		case X86ISD::FMSUB: Opcode = X86ISD::FNMSUB; break;
		case X86ISD::STRICT_FMSUB: Opcode = X86ISD::STRICT_FNMSUB; break;
case X86ISD::FMSUB_RND: Opcode = X86ISD::FNMSUB_RND; break;		case X86ISD::FMSUB_RND: Opcode = X86ISD::FNMSUB_RND; break;
case X86ISD::FNMADD: Opcode = ISD::FMA; break;		case X86ISD::FNMADD: Opcode = ISD::FMA; break;
		case X86ISD::STRICT_FNMADD: Opcode = ISD::STRICT_FMA; break;
case X86ISD::FNMADD_RND: Opcode = X86ISD::FMADD_RND; break;		case X86ISD::FNMADD_RND: Opcode = X86ISD::FMADD_RND; break;
case X86ISD::FNMSUB: Opcode = X86ISD::FMSUB; break;		case X86ISD::FNMSUB: Opcode = X86ISD::FMSUB; break;
		case X86ISD::STRICT_FNMSUB: Opcode = X86ISD::STRICT_FMSUB; break;
case X86ISD::FNMSUB_RND: Opcode = X86ISD::FMSUB_RND; break;		case X86ISD::FNMSUB_RND: Opcode = X86ISD::FMSUB_RND; break;
}		}
}		}

if (NegAcc) {		if (NegAcc) {
switch (Opcode) {		switch (Opcode) {
default: llvm_unreachable("Unexpected opcode");		default: llvm_unreachable("Unexpected opcode");
case ISD::FMA: Opcode = X86ISD::FMSUB; break;		case ISD::FMA: Opcode = X86ISD::FMSUB; break;
		case ISD::STRICT_FMA: Opcode = X86ISD::STRICT_FMSUB; break;
case X86ISD::FMADD_RND: Opcode = X86ISD::FMSUB_RND; break;		case X86ISD::FMADD_RND: Opcode = X86ISD::FMSUB_RND; break;
case X86ISD::FMSUB: Opcode = ISD::FMA; break;		case X86ISD::FMSUB: Opcode = ISD::FMA; break;
		case X86ISD::STRICT_FMSUB: Opcode = ISD::STRICT_FMA; break;
case X86ISD::FMSUB_RND: Opcode = X86ISD::FMADD_RND; break;		case X86ISD::FMSUB_RND: Opcode = X86ISD::FMADD_RND; break;
case X86ISD::FNMADD: Opcode = X86ISD::FNMSUB; break;		case X86ISD::FNMADD: Opcode = X86ISD::FNMSUB; break;
		case X86ISD::STRICT_FNMADD: Opcode = X86ISD::STRICT_FNMSUB; break;
case X86ISD::FNMADD_RND: Opcode = X86ISD::FNMSUB_RND; break;		case X86ISD::FNMADD_RND: Opcode = X86ISD::FNMSUB_RND; break;
case X86ISD::FNMSUB: Opcode = X86ISD::FNMADD; break;		case X86ISD::FNMSUB: Opcode = X86ISD::FNMADD; break;
		case X86ISD::STRICT_FNMSUB: Opcode = X86ISD::STRICT_FNMADD; break;
case X86ISD::FNMSUB_RND: Opcode = X86ISD::FNMADD_RND; break;		case X86ISD::FNMSUB_RND: Opcode = X86ISD::FNMADD_RND; break;
case X86ISD::FMADDSUB: Opcode = X86ISD::FMSUBADD; break;		case X86ISD::FMADDSUB: Opcode = X86ISD::FMSUBADD; break;
case X86ISD::FMADDSUB_RND: Opcode = X86ISD::FMSUBADD_RND; break;		case X86ISD::FMADDSUB_RND: Opcode = X86ISD::FMSUBADD_RND; break;
case X86ISD::FMSUBADD: Opcode = X86ISD::FMADDSUB; break;		case X86ISD::FMSUBADD: Opcode = X86ISD::FMADDSUB; break;
case X86ISD::FMSUBADD_RND: Opcode = X86ISD::FMADDSUB_RND; break;		case X86ISD::FMSUBADD_RND: Opcode = X86ISD::FMADDSUB_RND; break;
}		}
}		}

if (NegRes) {		if (NegRes) {
switch (Opcode) {		switch (Opcode) {
		// For accuracy reason, we never combine fneg and fma under strict FP.
default: llvm_unreachable("Unexpected opcode");		default: llvm_unreachable("Unexpected opcode");
case ISD::FMA: Opcode = X86ISD::FNMSUB; break;		case ISD::FMA: Opcode = X86ISD::FNMSUB; break;
case X86ISD::FMADD_RND: Opcode = X86ISD::FNMSUB_RND; break;		case X86ISD::FMADD_RND: Opcode = X86ISD::FNMSUB_RND; break;
case X86ISD::FMSUB: Opcode = X86ISD::FNMADD; break;		case X86ISD::FMSUB: Opcode = X86ISD::FNMADD; break;
case X86ISD::FMSUB_RND: Opcode = X86ISD::FNMADD_RND; break;		case X86ISD::FMSUB_RND: Opcode = X86ISD::FNMADD_RND; break;
case X86ISD::FNMADD: Opcode = X86ISD::FMSUB; break;		case X86ISD::FNMADD: Opcode = X86ISD::FMSUB; break;
case X86ISD::FNMADD_RND: Opcode = X86ISD::FMSUB_RND; break;		case X86ISD::FNMADD_RND: Opcode = X86ISD::FMSUB_RND; break;
case X86ISD::FNMSUB: Opcode = ISD::FMA; break;		case X86ISD::FNMSUB: Opcode = ISD::FMA; break;
▲ Show 20 Lines • Show All 955 Lines • ▼ Show 20 Lines	static SDValue combineSext(SDNode *N, SelectionDAG &DAG,
return SDValue();		return SDValue();
}		}

static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,		static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		bool IsStrict = N->isStrictFPOpcode();

// Let legalize expand this if it isn't a legal type yet.		// Let legalize expand this if it isn't a legal type yet.
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (!TLI.isTypeLegal(VT))		if (!TLI.isTypeLegal(VT))
return SDValue();		return SDValue();

EVT ScalarVT = VT.getScalarType();		EVT ScalarVT = VT.getScalarType();
if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())		if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())
return SDValue();		return SDValue();

SDValue A = N->getOperand(0);		SDValue A = N->getOperand(IsStrict ? 1 : 0);
SDValue B = N->getOperand(1);		SDValue B = N->getOperand(IsStrict ? 2 : 1);
SDValue C = N->getOperand(2);		SDValue C = N->getOperand(IsStrict ? 3 : 2);

auto invertIfNegative = [&DAG, &TLI, &DCI](SDValue &V) {		auto invertIfNegative = [&DAG, &TLI, &DCI](SDValue &V) {
bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();		bool CodeSize = DAG.getMachineFunction().getFunction().hasOptSize();
bool LegalOperations = !DCI.isBeforeLegalizeOps();		bool LegalOperations = !DCI.isBeforeLegalizeOps();
if (TLI.isNegatibleForFree(V, DAG, LegalOperations, CodeSize) == 2) {		if (TLI.isNegatibleForFree(V, DAG, LegalOperations, CodeSize) == 2) {
V = TLI.getNegatedExpression(V, DAG, LegalOperations, CodeSize);		V = TLI.getNegatedExpression(V, DAG, LegalOperations, CodeSize);
return true;		return true;
}		}
Show All 21 Lines	static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,
bool NegC = invertIfNegative(C);		bool NegC = invertIfNegative(C);

if (!NegA && !NegB && !NegC)		if (!NegA && !NegB && !NegC)
return SDValue();		return SDValue();

unsigned NewOpcode =		unsigned NewOpcode =
negateFMAOpcode(N->getOpcode(), NegA != NegB, NegC, false);		negateFMAOpcode(N->getOpcode(), NegA != NegB, NegC, false);

		if (IsStrict) {
		assert(N->getNumOperands() == 4 && "Shouldn't be greater than 4");
		return DAG.getNode(NewOpcode, dl, {VT, MVT::Other},
		{N->getOperand(0), A, B, C});
		} else {
if (N->getNumOperands() == 4)		if (N->getNumOperands() == 4)
return DAG.getNode(NewOpcode, dl, VT, A, B, C, N->getOperand(3));		return DAG.getNode(NewOpcode, dl, VT, A, B, C, N->getOperand(3));
return DAG.getNode(NewOpcode, dl, VT, A, B, C);		return DAG.getNode(NewOpcode, dl, VT, A, B, C);
}		}
		}

// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)		// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)
// Combine FMSUBADD(A, B, FNEG(C)) -> FMADDSUB(A, B, C)		// Combine FMSUBADD(A, B, FNEG(C)) -> FMADDSUB(A, B, C)
static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,		static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI) {		TargetLowering::DAGCombinerInfo &DCI) {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 2,485 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::PerformDAGCombine(SDNode *N,
case X86ISD::VPERMILPI:		case X86ISD::VPERMILPI:
case X86ISD::VPERMILPV:		case X86ISD::VPERMILPV:
case X86ISD::VPERM2X128:		case X86ISD::VPERM2X128:
case X86ISD::SHUF128:		case X86ISD::SHUF128:
case X86ISD::VZEXT_MOVL:		case X86ISD::VZEXT_MOVL:
case ISD::VECTOR_SHUFFLE: return combineShuffle(N, DAG, DCI,Subtarget);		case ISD::VECTOR_SHUFFLE: return combineShuffle(N, DAG, DCI,Subtarget);
case X86ISD::FMADD_RND:		case X86ISD::FMADD_RND:
case X86ISD::FMSUB:		case X86ISD::FMSUB:
		case X86ISD::STRICT_FMSUB:
case X86ISD::FMSUB_RND:		case X86ISD::FMSUB_RND:
case X86ISD::FNMADD:		case X86ISD::FNMADD:
		case X86ISD::STRICT_FNMADD:
case X86ISD::FNMADD_RND:		case X86ISD::FNMADD_RND:
case X86ISD::FNMSUB:		case X86ISD::FNMSUB:
		case X86ISD::STRICT_FNMSUB:
case X86ISD::FNMSUB_RND:		case X86ISD::FNMSUB_RND:
case ISD::FMA: return combineFMA(N, DAG, DCI, Subtarget);		case ISD::FMA:
		case ISD::STRICT_FMA: return combineFMA(N, DAG, DCI, Subtarget);
case X86ISD::FMADDSUB_RND:		case X86ISD::FMADDSUB_RND:
case X86ISD::FMSUBADD_RND:		case X86ISD::FMSUBADD_RND:
case X86ISD::FMADDSUB:		case X86ISD::FMADDSUB:
case X86ISD::FMSUBADD: return combineFMADDSUB(N, DAG, DCI);		case X86ISD::FMSUBADD: return combineFMADDSUB(N, DAG, DCI);
case X86ISD::MOVMSK: return combineMOVMSK(N, DAG, DCI, Subtarget);		case X86ISD::MOVMSK: return combineMOVMSK(N, DAG, DCI, Subtarget);
case X86ISD::MGATHER:		case X86ISD::MGATHER:
case X86ISD::MSCATTER: return combineX86GatherScatter(N, DAG, DCI);		case X86ISD::MSCATTER: return combineX86GatherScatter(N, DAG, DCI);
case ISD::MGATHER:		case ISD::MGATHER:
▲ Show 20 Lines • Show All 1,223 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrAVX512.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,481 Lines • ▼ Show 20 Lines	multiclass avx512_fma3p_213_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
defm PS : avx512_fma3p_213_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_213_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f32_info, "PS">;		SchedWriteFMA, avx512vl_f32_info, "PS">;
defm PD : avx512_fma3p_213_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_213_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f64_info, "PD">,		SchedWriteFMA, avx512vl_f64_info, "PD">,
VEX_W;		VEX_W;
}		}

defm VFMADD213 : avx512_fma3p_213_f<0xA8, "vfmadd213", X86any_Fmadd, X86FmaddRnd>;		defm VFMADD213 : avx512_fma3p_213_f<0xA8, "vfmadd213", X86any_Fmadd, X86FmaddRnd>;
defm VFMSUB213 : avx512_fma3p_213_f<0xAA, "vfmsub213", X86Fmsub, X86FmsubRnd>;		defm VFMSUB213 : avx512_fma3p_213_f<0xAA, "vfmsub213", X86any_Fmsub, X86FmsubRnd>;
defm VFMADDSUB213 : avx512_fma3p_213_f<0xA6, "vfmaddsub213", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB213 : avx512_fma3p_213_f<0xA6, "vfmaddsub213", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD213 : avx512_fma3p_213_f<0xA7, "vfmsubadd213", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD213 : avx512_fma3p_213_f<0xA7, "vfmsubadd213", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD213 : avx512_fma3p_213_f<0xAC, "vfnmadd213", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD213 : avx512_fma3p_213_f<0xAC, "vfnmadd213", X86any_Fnmadd, X86FnmaddRnd>;
defm VFNMSUB213 : avx512_fma3p_213_f<0xAE, "vfnmsub213", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB213 : avx512_fma3p_213_f<0xAE, "vfnmsub213", X86any_Fnmsub, X86FnmsubRnd>;


multiclass avx512_fma3p_231_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_fma3p_231_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
X86FoldableSchedWrite sched,		X86FoldableSchedWrite sched,
X86VectorVTInfo _, string Suff> {		X86VectorVTInfo _, string Suff> {
let Constraints = "$src1 = $dst", ExeDomain = _.ExeDomain, hasSideEffects = 0,		let Constraints = "$src1 = $dst", ExeDomain = _.ExeDomain, hasSideEffects = 0,
Uses = [MXCSR], mayRaiseFPException = 1 in {		Uses = [MXCSR], mayRaiseFPException = 1 in {
defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	multiclass avx512_fma3p_231_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
defm PS : avx512_fma3p_231_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_231_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f32_info, "PS">;		SchedWriteFMA, avx512vl_f32_info, "PS">;
defm PD : avx512_fma3p_231_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_231_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f64_info, "PD">,		SchedWriteFMA, avx512vl_f64_info, "PD">,
VEX_W;		VEX_W;
}		}

defm VFMADD231 : avx512_fma3p_231_f<0xB8, "vfmadd231", X86any_Fmadd, X86FmaddRnd>;		defm VFMADD231 : avx512_fma3p_231_f<0xB8, "vfmadd231", X86any_Fmadd, X86FmaddRnd>;
defm VFMSUB231 : avx512_fma3p_231_f<0xBA, "vfmsub231", X86Fmsub, X86FmsubRnd>;		defm VFMSUB231 : avx512_fma3p_231_f<0xBA, "vfmsub231", X86any_Fmsub, X86FmsubRnd>;
defm VFMADDSUB231 : avx512_fma3p_231_f<0xB6, "vfmaddsub231", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB231 : avx512_fma3p_231_f<0xB6, "vfmaddsub231", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD231 : avx512_fma3p_231_f<0xB7, "vfmsubadd231", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD231 : avx512_fma3p_231_f<0xB7, "vfmsubadd231", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD231 : avx512_fma3p_231_f<0xBC, "vfnmadd231", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD231 : avx512_fma3p_231_f<0xBC, "vfnmadd231", X86any_Fnmadd, X86FnmaddRnd>;
defm VFNMSUB231 : avx512_fma3p_231_f<0xBE, "vfnmsub231", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB231 : avx512_fma3p_231_f<0xBE, "vfnmsub231", X86any_Fnmsub, X86FnmsubRnd>;

multiclass avx512_fma3p_132_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_fma3p_132_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
X86FoldableSchedWrite sched,		X86FoldableSchedWrite sched,
X86VectorVTInfo _, string Suff> {		X86VectorVTInfo _, string Suff> {
let Constraints = "$src1 = $dst", ExeDomain = _.ExeDomain, hasSideEffects = 0,		let Constraints = "$src1 = $dst", ExeDomain = _.ExeDomain, hasSideEffects = 0,
Uses = [MXCSR], mayRaiseFPException = 1 in {		Uses = [MXCSR], mayRaiseFPException = 1 in {
defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm r: AVX512_maskable_3src<opc, MRMSrcReg, _, (outs _.RC:$dst),
(ins _.RC:$src2, _.RC:$src3),		(ins _.RC:$src2, _.RC:$src3),
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	multiclass avx512_fma3p_132_f<bits<8> opc, string OpcodeStr, SDNode OpNode,
defm PS : avx512_fma3p_132_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,		defm PS : avx512_fma3p_132_common<opc, OpcodeStr#"ps", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f32_info, "PS">;		SchedWriteFMA, avx512vl_f32_info, "PS">;
defm PD : avx512_fma3p_132_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,		defm PD : avx512_fma3p_132_common<opc, OpcodeStr#"pd", OpNode, OpNodeRnd,
SchedWriteFMA, avx512vl_f64_info, "PD">,		SchedWriteFMA, avx512vl_f64_info, "PD">,
VEX_W;		VEX_W;
}		}

defm VFMADD132 : avx512_fma3p_132_f<0x98, "vfmadd132", X86any_Fmadd, X86FmaddRnd>;		defm VFMADD132 : avx512_fma3p_132_f<0x98, "vfmadd132", X86any_Fmadd, X86FmaddRnd>;
defm VFMSUB132 : avx512_fma3p_132_f<0x9A, "vfmsub132", X86Fmsub, X86FmsubRnd>;		defm VFMSUB132 : avx512_fma3p_132_f<0x9A, "vfmsub132", X86any_Fmsub, X86FmsubRnd>;
defm VFMADDSUB132 : avx512_fma3p_132_f<0x96, "vfmaddsub132", X86Fmaddsub, X86FmaddsubRnd>;		defm VFMADDSUB132 : avx512_fma3p_132_f<0x96, "vfmaddsub132", X86Fmaddsub, X86FmaddsubRnd>;
defm VFMSUBADD132 : avx512_fma3p_132_f<0x97, "vfmsubadd132", X86Fmsubadd, X86FmsubaddRnd>;		defm VFMSUBADD132 : avx512_fma3p_132_f<0x97, "vfmsubadd132", X86Fmsubadd, X86FmsubaddRnd>;
defm VFNMADD132 : avx512_fma3p_132_f<0x9C, "vfnmadd132", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD132 : avx512_fma3p_132_f<0x9C, "vfnmadd132", X86any_Fnmadd, X86FnmaddRnd>;
defm VFNMSUB132 : avx512_fma3p_132_f<0x9E, "vfnmsub132", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB132 : avx512_fma3p_132_f<0x9E, "vfnmsub132", X86any_Fnmsub, X86FnmsubRnd>;

// Scalar FMA		// Scalar FMA
multiclass avx512_fma3s_common<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,		multiclass avx512_fma3s_common<bits<8> opc, string OpcodeStr, X86VectorVTInfo _,
dag RHS_r, dag RHS_m, dag RHS_b, bit MaskOnlyReg> {		dag RHS_r, dag RHS_m, dag RHS_b, bit MaskOnlyReg> {
let Constraints = "$src1 = $dst", hasSideEffects = 0 in {		let Constraints = "$src1 = $dst", hasSideEffects = 0 in {
defm r_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),		defm r_Int: AVX512_maskable_3src_scalar<opc, MRMSrcReg, _, (outs _.RC:$dst),
(ins _.RC:$src2, _.RC:$src3), OpcodeStr,		(ins _.RC:$src2, _.RC:$src3), OpcodeStr,
"$src3, $src2", "$src2, $src3", (null_frag), 1, 1>,		"$src3, $src2", "$src2, $src3", (null_frag), 1, 1>,
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,
EVEX_CD8<32, CD8VT1>, VEX_LIG;		EVEX_CD8<32, CD8VT1>, VEX_LIG;
defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,		defm NAME : avx512_fma3s_all<opc213, opc231, opc132, OpcodeStr, OpNode,
OpNodeRnd, f64x_info, "SD">,		OpNodeRnd, f64x_info, "SD">,
EVEX_CD8<64, CD8VT1>, VEX_LIG, VEX_W;		EVEX_CD8<64, CD8VT1>, VEX_LIG, VEX_W;
}		}
}		}

defm VFMADD : avx512_fma3s<0xA9, 0xB9, 0x99, "vfmadd", X86any_Fmadd, X86FmaddRnd>;		defm VFMADD : avx512_fma3s<0xA9, 0xB9, 0x99, "vfmadd", X86any_Fmadd, X86FmaddRnd>;
defm VFMSUB : avx512_fma3s<0xAB, 0xBB, 0x9B, "vfmsub", X86Fmsub, X86FmsubRnd>;		defm VFMSUB : avx512_fma3s<0xAB, 0xBB, 0x9B, "vfmsub", X86any_Fmsub, X86FmsubRnd>;
defm VFNMADD : avx512_fma3s<0xAD, 0xBD, 0x9D, "vfnmadd", X86Fnmadd, X86FnmaddRnd>;		defm VFNMADD : avx512_fma3s<0xAD, 0xBD, 0x9D, "vfnmadd", X86any_Fnmadd, X86FnmaddRnd>;
defm VFNMSUB : avx512_fma3s<0xAF, 0xBF, 0x9F, "vfnmsub", X86Fnmsub, X86FnmsubRnd>;		defm VFNMSUB : avx512_fma3s<0xAF, 0xBF, 0x9F, "vfnmsub", X86any_Fnmsub, X86FnmsubRnd>;

multiclass avx512_scalar_fma_patterns<SDNode Op, SDNode RndOp, string Prefix,		multiclass avx512_scalar_fma_patterns<SDNode Op, SDNode RndOp, string Prefix,
string Suffix, SDNode Move,		string Suffix, SDNode Move,
X86VectorVTInfo _, PatLeaf ZeroFP> {		X86VectorVTInfo _, PatLeaf ZeroFP> {
let Predicates = [HasAVX512] in {		let Predicates = [HasAVX512] in {
def : Pat<(_.VT (Move (_.VT VR128X:$src1), (_.VT (scalar_to_vector		def : Pat<(_.VT (Move (_.VT VR128X:$src1), (_.VT (scalar_to_vector
(Op _.FRC:$src2,		(Op _.FRC:$src2,
(_.EltVT (extractelt (_.VT VR128X:$src1), (iPTR 0))),		(_.EltVT (extractelt (_.VT VR128X:$src1), (iPTR 0))),
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	def : Pat<(_.VT (Move (_.VT VR128X:$src1), (_.VT (scalar_to_vector
VR128X:$src1, VK1WM:$mask,		VR128X:$src1, VK1WM:$mask,
(_.VT (COPY_TO_REGCLASS _.FRC:$src2, VR128X)),		(_.VT (COPY_TO_REGCLASS _.FRC:$src2, VR128X)),
(_.VT (COPY_TO_REGCLASS _.FRC:$src3, VR128X)), AVX512RC:$rc)>;		(_.VT (COPY_TO_REGCLASS _.FRC:$src3, VR128X)), AVX512RC:$rc)>;
}		}
}		}

defm : avx512_scalar_fma_patterns<X86any_Fmadd, X86FmaddRnd, "VFMADD", "SS",		defm : avx512_scalar_fma_patterns<X86any_Fmadd, X86FmaddRnd, "VFMADD", "SS",
X86Movss, v4f32x_info, fp32imm0>;		X86Movss, v4f32x_info, fp32imm0>;
defm : avx512_scalar_fma_patterns<X86Fmsub, X86FmsubRnd, "VFMSUB", "SS",		defm : avx512_scalar_fma_patterns<X86any_Fmsub, X86FmsubRnd, "VFMSUB", "SS",
X86Movss, v4f32x_info, fp32imm0>;		X86Movss, v4f32x_info, fp32imm0>;
defm : avx512_scalar_fma_patterns<X86Fnmadd, X86FnmaddRnd, "VFNMADD", "SS",		defm : avx512_scalar_fma_patterns<X86any_Fnmadd, X86FnmaddRnd, "VFNMADD", "SS",
X86Movss, v4f32x_info, fp32imm0>;		X86Movss, v4f32x_info, fp32imm0>;
defm : avx512_scalar_fma_patterns<X86Fnmsub, X86FnmsubRnd, "VFNMSUB", "SS",		defm : avx512_scalar_fma_patterns<X86any_Fnmsub, X86FnmsubRnd, "VFNMSUB", "SS",
X86Movss, v4f32x_info, fp32imm0>;		X86Movss, v4f32x_info, fp32imm0>;

defm : avx512_scalar_fma_patterns<X86any_Fmadd, X86FmaddRnd, "VFMADD", "SD",		defm : avx512_scalar_fma_patterns<X86any_Fmadd, X86FmaddRnd, "VFMADD", "SD",
X86Movsd, v2f64x_info, fp64imm0>;		X86Movsd, v2f64x_info, fp64imm0>;
defm : avx512_scalar_fma_patterns<X86Fmsub, X86FmsubRnd, "VFMSUB", "SD",		defm : avx512_scalar_fma_patterns<X86any_Fmsub, X86FmsubRnd, "VFMSUB", "SD",
X86Movsd, v2f64x_info, fp64imm0>;		X86Movsd, v2f64x_info, fp64imm0>;
defm : avx512_scalar_fma_patterns<X86Fnmadd, X86FnmaddRnd, "VFNMADD", "SD",		defm : avx512_scalar_fma_patterns<X86any_Fnmadd, X86FnmaddRnd, "VFNMADD", "SD",
X86Movsd, v2f64x_info, fp64imm0>;		X86Movsd, v2f64x_info, fp64imm0>;
defm : avx512_scalar_fma_patterns<X86Fnmsub, X86FnmsubRnd, "VFNMSUB", "SD",		defm : avx512_scalar_fma_patterns<X86any_Fnmsub, X86FnmsubRnd, "VFNMSUB", "SD",
X86Movsd, v2f64x_info, fp64imm0>;		X86Movsd, v2f64x_info, fp64imm0>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AVX-512 Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit IFMA		// AVX-512 Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit IFMA
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
let Constraints = "$src1 = $dst" in {		let Constraints = "$src1 = $dst" in {
multiclass avx512_pmadd52_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,		multiclass avx512_pmadd52_rm<bits<8> opc, string OpcodeStr, SDNode OpNode,
X86FoldableSchedWrite sched, X86VectorVTInfo _> {		X86FoldableSchedWrite sched, X86VectorVTInfo _> {
▲ Show 20 Lines • Show All 5,333 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrFMA.td

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
}		}

// Fused Multiply-Add		// Fused Multiply-Add
let ExeDomain = SSEPackedSingle in {		let ExeDomain = SSEPackedSingle in {
defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "ps", "PS",		defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "ps", "PS",
loadv4f32, loadv8f32, X86any_Fmadd, v4f32, v8f32,		loadv4f32, loadv8f32, X86any_Fmadd, v4f32, v8f32,
SchedWriteFMA>;		SchedWriteFMA>;
defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "ps", "PS",		defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "ps", "PS",
loadv4f32, loadv8f32, X86Fmsub, v4f32, v8f32,		loadv4f32, loadv8f32, X86any_Fmsub, v4f32, v8f32,
SchedWriteFMA>;		SchedWriteFMA>;
defm VFMADDSUB : fma3p_forms<0x96, 0xA6, 0xB6, "vfmaddsub", "ps", "PS",		defm VFMADDSUB : fma3p_forms<0x96, 0xA6, 0xB6, "vfmaddsub", "ps", "PS",
loadv4f32, loadv8f32, X86Fmaddsub, v4f32, v8f32,		loadv4f32, loadv8f32, X86Fmaddsub, v4f32, v8f32,
SchedWriteFMA>;		SchedWriteFMA>;
defm VFMSUBADD : fma3p_forms<0x97, 0xA7, 0xB7, "vfmsubadd", "ps", "PS",		defm VFMSUBADD : fma3p_forms<0x97, 0xA7, 0xB7, "vfmsubadd", "ps", "PS",
loadv4f32, loadv8f32, X86Fmsubadd, v4f32, v8f32,		loadv4f32, loadv8f32, X86Fmsubadd, v4f32, v8f32,
SchedWriteFMA>;		SchedWriteFMA>;
}		}

let ExeDomain = SSEPackedDouble in {		let ExeDomain = SSEPackedDouble in {
defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "pd", "PD",		defm VFMADD : fma3p_forms<0x98, 0xA8, 0xB8, "vfmadd", "pd", "PD",
loadv2f64, loadv4f64, X86any_Fmadd, v2f64,		loadv2f64, loadv4f64, X86any_Fmadd, v2f64,
v4f64, SchedWriteFMA>, VEX_W;		v4f64, SchedWriteFMA>, VEX_W;
defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "pd", "PD",		defm VFMSUB : fma3p_forms<0x9A, 0xAA, 0xBA, "vfmsub", "pd", "PD",
loadv2f64, loadv4f64, X86Fmsub, v2f64,		loadv2f64, loadv4f64, X86any_Fmsub, v2f64,
v4f64, SchedWriteFMA>, VEX_W;		v4f64, SchedWriteFMA>, VEX_W;
defm VFMADDSUB : fma3p_forms<0x96, 0xA6, 0xB6, "vfmaddsub", "pd", "PD",		defm VFMADDSUB : fma3p_forms<0x96, 0xA6, 0xB6, "vfmaddsub", "pd", "PD",
loadv2f64, loadv4f64, X86Fmaddsub,		loadv2f64, loadv4f64, X86Fmaddsub,
v2f64, v4f64, SchedWriteFMA>, VEX_W;		v2f64, v4f64, SchedWriteFMA>, VEX_W;
defm VFMSUBADD : fma3p_forms<0x97, 0xA7, 0xB7, "vfmsubadd", "pd", "PD",		defm VFMSUBADD : fma3p_forms<0x97, 0xA7, 0xB7, "vfmsubadd", "pd", "PD",
loadv2f64, loadv4f64, X86Fmsubadd,		loadv2f64, loadv4f64, X86Fmsubadd,
v2f64, v4f64, SchedWriteFMA>, VEX_W;		v2f64, v4f64, SchedWriteFMA>, VEX_W;
}		}

// Fused Negative Multiply-Add		// Fused Negative Multiply-Add
let ExeDomain = SSEPackedSingle in {		let ExeDomain = SSEPackedSingle in {
defm VFNMADD : fma3p_forms<0x9C, 0xAC, 0xBC, "vfnmadd", "ps", "PS", loadv4f32,		defm VFNMADD : fma3p_forms<0x9C, 0xAC, 0xBC, "vfnmadd", "ps", "PS", loadv4f32,
loadv8f32, X86Fnmadd, v4f32, v8f32, SchedWriteFMA>;		loadv8f32, X86any_Fnmadd, v4f32, v8f32, SchedWriteFMA>;
defm VFNMSUB : fma3p_forms<0x9E, 0xAE, 0xBE, "vfnmsub", "ps", "PS", loadv4f32,		defm VFNMSUB : fma3p_forms<0x9E, 0xAE, 0xBE, "vfnmsub", "ps", "PS", loadv4f32,
loadv8f32, X86Fnmsub, v4f32, v8f32, SchedWriteFMA>;		loadv8f32, X86any_Fnmsub, v4f32, v8f32, SchedWriteFMA>;
}		}
let ExeDomain = SSEPackedDouble in {		let ExeDomain = SSEPackedDouble in {
defm VFNMADD : fma3p_forms<0x9C, 0xAC, 0xBC, "vfnmadd", "pd", "PD", loadv2f64,		defm VFNMADD : fma3p_forms<0x9C, 0xAC, 0xBC, "vfnmadd", "pd", "PD", loadv2f64,
loadv4f64, X86Fnmadd, v2f64, v4f64, SchedWriteFMA>, VEX_W;		loadv4f64, X86any_Fnmadd, v2f64, v4f64, SchedWriteFMA>, VEX_W;
defm VFNMSUB : fma3p_forms<0x9E, 0xAE, 0xBE, "vfnmsub", "pd", "PD", loadv2f64,		defm VFNMSUB : fma3p_forms<0x9E, 0xAE, 0xBE, "vfnmsub", "pd", "PD", loadv2f64,
loadv4f64, X86Fnmsub, v2f64, v4f64, SchedWriteFMA>, VEX_W;		loadv4f64, X86any_Fnmsub, v2f64, v4f64, SchedWriteFMA>, VEX_W;
}		}

// All source register operands of FMA opcodes defined in fma3s_rm multiclass		// All source register operands of FMA opcodes defined in fma3s_rm multiclass
// can be commuted. In many cases such commute transformation requres an opcode		// can be commuted. In many cases such commute transformation requres an opcode
// adjustment, for example, commuting the operands 1 and 2 in FMA*132 form		// adjustment, for example, commuting the operands 1 and 2 in FMA*132 form
// would require an opcode change to FMA*231:		// would require an opcode change to FMA*231:
// FMA132 reg1, reg2, reg3; // reg1 * reg3 + reg2;		// FMA132 reg1, reg2, reg3; // reg1 * reg3 + reg2;
// -->		// -->
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	multiclass fma3s<bits<8> opc132, bits<8> opc213, bits<8> opc231,
defm NAME : fma3s_forms<opc132, opc213, opc231, OpStr, "sd", "SD", OpNode,		defm NAME : fma3s_forms<opc132, opc213, opc231, OpStr, "sd", "SD", OpNode,
FR64, f64mem, sched>,		FR64, f64mem, sched>,
fma3s_int_forms<opc132, opc213, opc231, OpStr, "sd", "SD",		fma3s_int_forms<opc132, opc213, opc231, OpStr, "sd", "SD",
VR128, sdmem, sched>, VEX_W;		VR128, sdmem, sched>, VEX_W;
}		}

defm VFMADD : fma3s<0x99, 0xA9, 0xB9, "vfmadd", X86any_Fmadd,		defm VFMADD : fma3s<0x99, 0xA9, 0xB9, "vfmadd", X86any_Fmadd,
SchedWriteFMA.Scl>, VEX_LIG;		SchedWriteFMA.Scl>, VEX_LIG;
defm VFMSUB : fma3s<0x9B, 0xAB, 0xBB, "vfmsub", X86Fmsub,		defm VFMSUB : fma3s<0x9B, 0xAB, 0xBB, "vfmsub", X86any_Fmsub,
SchedWriteFMA.Scl>, VEX_LIG;		SchedWriteFMA.Scl>, VEX_LIG;

defm VFNMADD : fma3s<0x9D, 0xAD, 0xBD, "vfnmadd", X86Fnmadd,		defm VFNMADD : fma3s<0x9D, 0xAD, 0xBD, "vfnmadd", X86any_Fnmadd,
SchedWriteFMA.Scl>, VEX_LIG;		SchedWriteFMA.Scl>, VEX_LIG;
defm VFNMSUB : fma3s<0x9F, 0xAF, 0xBF, "vfnmsub", X86Fnmsub,		defm VFNMSUB : fma3s<0x9F, 0xAF, 0xBF, "vfnmsub", X86any_Fnmsub,
SchedWriteFMA.Scl>, VEX_LIG;		SchedWriteFMA.Scl>, VEX_LIG;

multiclass scalar_fma_patterns<SDNode Op, string Prefix, string Suffix,		multiclass scalar_fma_patterns<SDNode Op, string Prefix, string Suffix,
SDNode Move, ValueType VT, ValueType EltVT,		SDNode Move, ValueType VT, ValueType EltVT,
RegisterClass RC, PatFrag mem_frag> {		RegisterClass RC, PatFrag mem_frag> {
let Predicates = [HasFMA, NoAVX512] in {		let Predicates = [HasFMA, NoAVX512] in {
def : Pat<(VT (Move (VT VR128:$src1), (VT (scalar_to_vector		def : Pat<(VT (Move (VT VR128:$src1), (VT (scalar_to_vector
(Op RC:$src2,		(Op RC:$src2,
Show All 30 Lines	def : Pat<(VT (Move (VT VR128:$src1), (VT (scalar_to_vector
(EltVT (extractelt (VT VR128:$src1), (iPTR 0)))))))),		(EltVT (extractelt (VT VR128:$src1), (iPTR 0)))))))),
(!cast<Instruction>(Prefix#"231"#Suffix#"m_Int")		(!cast<Instruction>(Prefix#"231"#Suffix#"m_Int")
VR128:$src1, (VT (COPY_TO_REGCLASS RC:$src2, VR128)),		VR128:$src1, (VT (COPY_TO_REGCLASS RC:$src2, VR128)),
addr:$src3)>;		addr:$src3)>;
}		}
}		}

defm : scalar_fma_patterns<X86any_Fmadd, "VFMADD", "SS", X86Movss, v4f32, f32, FR32, loadf32>;		defm : scalar_fma_patterns<X86any_Fmadd, "VFMADD", "SS", X86Movss, v4f32, f32, FR32, loadf32>;
defm : scalar_fma_patterns<X86Fmsub, "VFMSUB", "SS", X86Movss, v4f32, f32, FR32, loadf32>;		defm : scalar_fma_patterns<X86any_Fmsub, "VFMSUB", "SS", X86Movss, v4f32, f32, FR32, loadf32>;
defm : scalar_fma_patterns<X86Fnmadd, "VFNMADD", "SS", X86Movss, v4f32, f32, FR32, loadf32>;		defm : scalar_fma_patterns<X86any_Fnmadd, "VFNMADD", "SS", X86Movss, v4f32, f32, FR32, loadf32>;
defm : scalar_fma_patterns<X86Fnmsub, "VFNMSUB", "SS", X86Movss, v4f32, f32, FR32, loadf32>;		defm : scalar_fma_patterns<X86any_Fnmsub, "VFNMSUB", "SS", X86Movss, v4f32, f32, FR32, loadf32>;

defm : scalar_fma_patterns<X86any_Fmadd, "VFMADD", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;		defm : scalar_fma_patterns<X86any_Fmadd, "VFMADD", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;
defm : scalar_fma_patterns<X86Fmsub, "VFMSUB", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;		defm : scalar_fma_patterns<X86any_Fmsub, "VFMSUB", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;
defm : scalar_fma_patterns<X86Fnmadd, "VFNMADD", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;		defm : scalar_fma_patterns<X86any_Fnmadd, "VFNMADD", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;
defm : scalar_fma_patterns<X86Fnmsub, "VFNMSUB", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;		defm : scalar_fma_patterns<X86any_Fnmsub, "VFNMSUB", "SD", X86Movsd, v2f64, f64, FR64, loadf64>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// FMA4 - AMD 4 operand Fused Multiply-Add instructions		// FMA4 - AMD 4 operand Fused Multiply-Add instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let Uses = [MXCSR], mayRaiseFPException = 1 in		let Uses = [MXCSR], mayRaiseFPException = 1 in
multiclass fma4s<bits<8> opc, string OpcodeStr, RegisterClass RC,		multiclass fma4s<bits<8> opc, string OpcodeStr, RegisterClass RC,
X86MemOperand x86memop, ValueType OpVT, SDNode OpNode,		X86MemOperand x86memop, ValueType OpVT, SDNode OpNode,
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
}		}

let ExeDomain = SSEPackedSingle in {		let ExeDomain = SSEPackedSingle in {
// Scalar Instructions		// Scalar Instructions
defm VFMADDSS4 : fma4s<0x6A, "vfmaddss", FR32, f32mem, f32, X86any_Fmadd, loadf32,		defm VFMADDSS4 : fma4s<0x6A, "vfmaddss", FR32, f32mem, f32, X86any_Fmadd, loadf32,
SchedWriteFMA.Scl>,		SchedWriteFMA.Scl>,
fma4s_int<0x6A, "vfmaddss", ssmem, v4f32,		fma4s_int<0x6A, "vfmaddss", ssmem, v4f32,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFMSUBSS4 : fma4s<0x6E, "vfmsubss", FR32, f32mem, f32, X86Fmsub, loadf32,		defm VFMSUBSS4 : fma4s<0x6E, "vfmsubss", FR32, f32mem, f32, X86any_Fmsub, loadf32,
SchedWriteFMA.Scl>,		SchedWriteFMA.Scl>,
fma4s_int<0x6E, "vfmsubss", ssmem, v4f32,		fma4s_int<0x6E, "vfmsubss", ssmem, v4f32,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFNMADDSS4 : fma4s<0x7A, "vfnmaddss", FR32, f32mem, f32,		defm VFNMADDSS4 : fma4s<0x7A, "vfnmaddss", FR32, f32mem, f32,
X86Fnmadd, loadf32, SchedWriteFMA.Scl>,		X86any_Fnmadd, loadf32, SchedWriteFMA.Scl>,
fma4s_int<0x7A, "vfnmaddss", ssmem, v4f32,		fma4s_int<0x7A, "vfnmaddss", ssmem, v4f32,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFNMSUBSS4 : fma4s<0x7E, "vfnmsubss", FR32, f32mem, f32,		defm VFNMSUBSS4 : fma4s<0x7E, "vfnmsubss", FR32, f32mem, f32,
X86Fnmsub, loadf32, SchedWriteFMA.Scl>,		X86any_Fnmsub, loadf32, SchedWriteFMA.Scl>,
fma4s_int<0x7E, "vfnmsubss", ssmem, v4f32,		fma4s_int<0x7E, "vfnmsubss", ssmem, v4f32,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
// Packed Instructions		// Packed Instructions
defm VFMADDPS4 : fma4p<0x68, "vfmaddps", X86any_Fmadd, v4f32, v8f32,		defm VFMADDPS4 : fma4p<0x68, "vfmaddps", X86any_Fmadd, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
defm VFMSUBPS4 : fma4p<0x6C, "vfmsubps", X86Fmsub, v4f32, v8f32,		defm VFMSUBPS4 : fma4p<0x6C, "vfmsubps", X86any_Fmsub, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
defm VFNMADDPS4 : fma4p<0x78, "vfnmaddps", X86Fnmadd, v4f32, v8f32,		defm VFNMADDPS4 : fma4p<0x78, "vfnmaddps", X86any_Fnmadd, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
defm VFNMSUBPS4 : fma4p<0x7C, "vfnmsubps", X86Fnmsub, v4f32, v8f32,		defm VFNMSUBPS4 : fma4p<0x7C, "vfnmsubps", X86any_Fnmsub, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
defm VFMADDSUBPS4 : fma4p<0x5C, "vfmaddsubps", X86Fmaddsub, v4f32, v8f32,		defm VFMADDSUBPS4 : fma4p<0x5C, "vfmaddsubps", X86Fmaddsub, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
defm VFMSUBADDPS4 : fma4p<0x5E, "vfmsubaddps", X86Fmsubadd, v4f32, v8f32,		defm VFMSUBADDPS4 : fma4p<0x5E, "vfmsubaddps", X86Fmsubadd, v4f32, v8f32,
loadv4f32, loadv8f32, SchedWriteFMA>;		loadv4f32, loadv8f32, SchedWriteFMA>;
}		}

let ExeDomain = SSEPackedDouble in {		let ExeDomain = SSEPackedDouble in {
// Scalar Instructions		// Scalar Instructions
defm VFMADDSD4 : fma4s<0x6B, "vfmaddsd", FR64, f64mem, f64, X86any_Fmadd, loadf64,		defm VFMADDSD4 : fma4s<0x6B, "vfmaddsd", FR64, f64mem, f64, X86any_Fmadd, loadf64,
SchedWriteFMA.Scl>,		SchedWriteFMA.Scl>,
fma4s_int<0x6B, "vfmaddsd", sdmem, v2f64,		fma4s_int<0x6B, "vfmaddsd", sdmem, v2f64,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFMSUBSD4 : fma4s<0x6F, "vfmsubsd", FR64, f64mem, f64, X86Fmsub, loadf64,		defm VFMSUBSD4 : fma4s<0x6F, "vfmsubsd", FR64, f64mem, f64, X86any_Fmsub, loadf64,
SchedWriteFMA.Scl>,		SchedWriteFMA.Scl>,
fma4s_int<0x6F, "vfmsubsd", sdmem, v2f64,		fma4s_int<0x6F, "vfmsubsd", sdmem, v2f64,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFNMADDSD4 : fma4s<0x7B, "vfnmaddsd", FR64, f64mem, f64,		defm VFNMADDSD4 : fma4s<0x7B, "vfnmaddsd", FR64, f64mem, f64,
X86Fnmadd, loadf64, SchedWriteFMA.Scl>,		X86any_Fnmadd, loadf64, SchedWriteFMA.Scl>,
fma4s_int<0x7B, "vfnmaddsd", sdmem, v2f64,		fma4s_int<0x7B, "vfnmaddsd", sdmem, v2f64,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
defm VFNMSUBSD4 : fma4s<0x7F, "vfnmsubsd", FR64, f64mem, f64,		defm VFNMSUBSD4 : fma4s<0x7F, "vfnmsubsd", FR64, f64mem, f64,
X86Fnmsub, loadf64, SchedWriteFMA.Scl>,		X86any_Fnmsub, loadf64, SchedWriteFMA.Scl>,
fma4s_int<0x7F, "vfnmsubsd", sdmem, v2f64,		fma4s_int<0x7F, "vfnmsubsd", sdmem, v2f64,
SchedWriteFMA.Scl>;		SchedWriteFMA.Scl>;
// Packed Instructions		// Packed Instructions
defm VFMADDPD4 : fma4p<0x69, "vfmaddpd", X86any_Fmadd, v2f64, v4f64,		defm VFMADDPD4 : fma4p<0x69, "vfmaddpd", X86any_Fmadd, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
defm VFMSUBPD4 : fma4p<0x6D, "vfmsubpd", X86Fmsub, v2f64, v4f64,		defm VFMSUBPD4 : fma4p<0x6D, "vfmsubpd", X86any_Fmsub, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
defm VFNMADDPD4 : fma4p<0x79, "vfnmaddpd", X86Fnmadd, v2f64, v4f64,		defm VFNMADDPD4 : fma4p<0x79, "vfnmaddpd", X86any_Fnmadd, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
defm VFNMSUBPD4 : fma4p<0x7D, "vfnmsubpd", X86Fnmsub, v2f64, v4f64,		defm VFNMSUBPD4 : fma4p<0x7D, "vfnmsubpd", X86any_Fnmsub, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
defm VFMADDSUBPD4 : fma4p<0x5D, "vfmaddsubpd", X86Fmaddsub, v2f64, v4f64,		defm VFMADDSUBPD4 : fma4p<0x5D, "vfmaddsubpd", X86Fmaddsub, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
defm VFMSUBADDPD4 : fma4p<0x5F, "vfmsubaddpd", X86Fmsubadd, v2f64, v4f64,		defm VFMSUBADDPD4 : fma4p<0x5F, "vfmsubaddpd", X86Fmsubadd, v2f64, v4f64,
loadv2f64, loadv4f64, SchedWriteFMA>;		loadv2f64, loadv4f64, SchedWriteFMA>;
}		}

multiclass scalar_fma4_patterns<SDNode Op, string Name,		multiclass scalar_fma4_patterns<SDNode Op, string Name,
Show All 19 Lines	def : Pat<(VT (X86vzmovl (VT (scalar_to_vector
RC:$src3))))),		RC:$src3))))),
(!cast<Instruction>(Name#"mr_Int")		(!cast<Instruction>(Name#"mr_Int")
(VT (COPY_TO_REGCLASS RC:$src1, VR128)), addr:$src2,		(VT (COPY_TO_REGCLASS RC:$src1, VR128)), addr:$src2,
(VT (COPY_TO_REGCLASS RC:$src3, VR128)))>;		(VT (COPY_TO_REGCLASS RC:$src3, VR128)))>;
}		}
}		}

defm : scalar_fma4_patterns<X86any_Fmadd, "VFMADDSS4", v4f32, f32, FR32, loadf32>;		defm : scalar_fma4_patterns<X86any_Fmadd, "VFMADDSS4", v4f32, f32, FR32, loadf32>;
defm : scalar_fma4_patterns<X86Fmsub, "VFMSUBSS4", v4f32, f32, FR32, loadf32>;		defm : scalar_fma4_patterns<X86any_Fmsub, "VFMSUBSS4", v4f32, f32, FR32, loadf32>;
defm : scalar_fma4_patterns<X86Fnmadd, "VFNMADDSS4", v4f32, f32, FR32, loadf32>;		defm : scalar_fma4_patterns<X86any_Fnmadd, "VFNMADDSS4", v4f32, f32, FR32, loadf32>;
defm : scalar_fma4_patterns<X86Fnmsub, "VFNMSUBSS4", v4f32, f32, FR32, loadf32>;		defm : scalar_fma4_patterns<X86any_Fnmsub, "VFNMSUBSS4", v4f32, f32, FR32, loadf32>;

defm : scalar_fma4_patterns<X86any_Fmadd, "VFMADDSD4", v2f64, f64, FR64, loadf64>;		defm : scalar_fma4_patterns<X86any_Fmadd, "VFMADDSD4", v2f64, f64, FR64, loadf64>;
defm : scalar_fma4_patterns<X86Fmsub, "VFMSUBSD4", v2f64, f64, FR64, loadf64>;		defm : scalar_fma4_patterns<X86any_Fmsub, "VFMSUBSD4", v2f64, f64, FR64, loadf64>;
defm : scalar_fma4_patterns<X86Fnmadd, "VFNMADDSD4", v2f64, f64, FR64, loadf64>;		defm : scalar_fma4_patterns<X86any_Fnmadd, "VFNMADDSD4", v2f64, f64, FR64, loadf64>;
defm : scalar_fma4_patterns<X86Fnmsub, "VFNMSUBSD4", v2f64, f64, FR64, loadf64>;		defm : scalar_fma4_patterns<X86any_Fnmsub, "VFNMSUBSD4", v2f64, f64, FR64, loadf64>;

llvm/lib/Target/X86/X86InstrFragmentsSIMD.td

	Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines
	def X86fgetexpSAEs : SDNode<"X86ISD::FGETEXPS_SAE", SDTFPBinOp>;			def X86fgetexpSAEs : SDNode<"X86ISD::FGETEXPS_SAE", SDTFPBinOp>;

	def X86Fmadd : SDNode<"ISD::FMA", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fmadd : SDNode<"ISD::FMA", SDTFPTernaryOp, [SDNPCommutative]>;
	def X86strict_Fmadd : SDNode<"ISD::STRICT_FMA", SDTFPTernaryOp, [SDNPCommutative, SDNPHasChain]>;			def X86strict_Fmadd : SDNode<"ISD::STRICT_FMA", SDTFPTernaryOp, [SDNPCommutative, SDNPHasChain]>;
	def X86any_Fmadd : PatFrags<(ops node:$src1, node:$src2, node:$src3),			def X86any_Fmadd : PatFrags<(ops node:$src1, node:$src2, node:$src3),
	[(X86strict_Fmadd node:$src1, node:$src2, node:$src3),			[(X86strict_Fmadd node:$src1, node:$src2, node:$src3),
	(X86Fmadd node:$src1, node:$src2, node:$src3)]>;			(X86Fmadd node:$src1, node:$src2, node:$src3)]>;
	def X86Fnmadd : SDNode<"X86ISD::FNMADD", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fnmadd : SDNode<"X86ISD::FNMADD", SDTFPTernaryOp, [SDNPCommutative]>;
				def X86strict_Fnmadd : SDNode<"X86ISD::STRICT_FNMADD", SDTFPTernaryOp, [SDNPCommutative, SDNPHasChain]>;
				def X86any_Fnmadd : PatFrags<(ops node:$src1, node:$src2, node:$src3),
				[(X86strict_Fnmadd node:$src1, node:$src2, node:$src3),
				(X86Fnmadd node:$src1, node:$src2, node:$src3)]>;
	def X86Fmsub : SDNode<"X86ISD::FMSUB", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fmsub : SDNode<"X86ISD::FMSUB", SDTFPTernaryOp, [SDNPCommutative]>;
				def X86strict_Fmsub : SDNode<"X86ISD::STRICT_FMSUB", SDTFPTernaryOp, [SDNPCommutative, SDNPHasChain]>;
				def X86any_Fmsub : PatFrags<(ops node:$src1, node:$src2, node:$src3),
				[(X86strict_Fmsub node:$src1, node:$src2, node:$src3),
				(X86Fmsub node:$src1, node:$src2, node:$src3)]>;
	def X86Fnmsub : SDNode<"X86ISD::FNMSUB", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fnmsub : SDNode<"X86ISD::FNMSUB", SDTFPTernaryOp, [SDNPCommutative]>;
				def X86strict_Fnmsub : SDNode<"X86ISD::STRICT_FNMSUB", SDTFPTernaryOp, [SDNPCommutative, SDNPHasChain]>;
				def X86any_Fnmsub : PatFrags<(ops node:$src1, node:$src2, node:$src3),
				[(X86strict_Fnmsub node:$src1, node:$src2, node:$src3),
				(X86Fnmsub node:$src1, node:$src2, node:$src3)]>;
	def X86Fmaddsub : SDNode<"X86ISD::FMADDSUB", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fmaddsub : SDNode<"X86ISD::FMADDSUB", SDTFPTernaryOp, [SDNPCommutative]>;
	def X86Fmsubadd : SDNode<"X86ISD::FMSUBADD", SDTFPTernaryOp, [SDNPCommutative]>;			def X86Fmsubadd : SDNode<"X86ISD::FMSUBADD", SDTFPTernaryOp, [SDNPCommutative]>;

	def X86FmaddRnd : SDNode<"X86ISD::FMADD_RND", SDTFmaRound, [SDNPCommutative]>;			def X86FmaddRnd : SDNode<"X86ISD::FMADD_RND", SDTFmaRound, [SDNPCommutative]>;
	def X86FnmaddRnd : SDNode<"X86ISD::FNMADD_RND", SDTFmaRound, [SDNPCommutative]>;			def X86FnmaddRnd : SDNode<"X86ISD::FNMADD_RND", SDTFmaRound, [SDNPCommutative]>;
	def X86FmsubRnd : SDNode<"X86ISD::FMSUB_RND", SDTFmaRound, [SDNPCommutative]>;			def X86FmsubRnd : SDNode<"X86ISD::FMSUB_RND", SDTFmaRound, [SDNPCommutative]>;
	def X86FnmsubRnd : SDNode<"X86ISD::FNMSUB_RND", SDTFmaRound, [SDNPCommutative]>;			def X86FnmsubRnd : SDNode<"X86ISD::FNMSUB_RND", SDTFmaRound, [SDNPCommutative]>;
	def X86FmaddsubRnd : SDNode<"X86ISD::FMADDSUB_RND", SDTFmaRound, [SDNPCommutative]>;			def X86FmaddsubRnd : SDNode<"X86ISD::FMADDSUB_RND", SDTFmaRound, [SDNPCommutative]>;
	▲ Show 20 Lines • Show All 697 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-intrinsics-fma.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck %s --check-prefixes=COMMON,NOFMA		; RUN: llc -O3 -mtriple=x86_64-pc-linux < %s \| FileCheck %s --check-prefixes=COMMON,NOFMA
; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+fma < %s \| FileCheck %s --check-prefixes=COMMON,FMA		; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+fma < %s \| FileCheck %s --check-prefixes=COMMON,FMA,FMA-AVX1
; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+avx512f < %s \| FileCheck %s --check-prefixes=COMMON,FMA		; RUN: llc -O3 -mtriple=x86_64-pc-linux -mattr=+avx512f < %s \| FileCheck %s --check-prefixes=COMMON,FMA,FMA-AVX512

		define float @f1(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f1:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: xorps {{.*}}(%rip), %xmm0
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f1:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfnmadd213ss {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg float %0
		%result = call float @llvm.experimental.constrained.fmuladd.f32(float %3, float %1, float %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %result
		}

		define double @f2(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f2:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: xorpd {{.*}}(%rip), %xmm0
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f2:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfnmadd213sd {{.#+}} xmm0 = -(xmm1 xmm0) + xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg double %0
		%result = call double @llvm.experimental.constrained.fmuladd.f64(double %3, double %1, double %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %result
		}

		define float @f3(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f3:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: xorps {{.*}}(%rip), %xmm2
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f3:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfmsub213ss {{.#+}} xmm0 = (xmm1 xmm0) - xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg float %2
		%result = call float @llvm.experimental.constrained.fmuladd.f32(float %0, float %1, float %3,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %result
		}

		define double @f4(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f4:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: xorpd {{.*}}(%rip), %xmm2
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f4:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfmsub213sd {{.#+}} xmm0 = (xmm1 xmm0) - xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg double %2
		%result = call double @llvm.experimental.constrained.fmuladd.f64(double %0, double %1, double %3,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %result
		}

		define float @f5(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f5:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movaps {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
		; NOFMA-NEXT: xorps %xmm3, %xmm0
		; NOFMA-NEXT: xorps %xmm3, %xmm2
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f5:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfnmsub213ss {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg float %0
		%4 = fneg float %2
		%result = call float @llvm.experimental.constrained.fmuladd.f32(float %3, float %1, float %4,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %result
		}

		define double @f6(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f6:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movapd {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0]
		; NOFMA-NEXT: xorpd %xmm3, %xmm0
		; NOFMA-NEXT: xorpd %xmm3, %xmm2
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f6:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfnmsub213sd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; FMA-NEXT: retq
		entry:
		%3 = fneg double %0
		%4 = fneg double %2
		%result = call double @llvm.experimental.constrained.fmuladd.f64(double %3, double %1, double %4,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %result
		}

		define float @f7(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f7:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: xorps {{.*}}(%rip), %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-AVX1-LABEL: f7:
		; FMA-AVX1: # %bb.0: # %entry
		; FMA-AVX1-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2
		; FMA-AVX1-NEXT: vxorps {{.*}}(%rip), %xmm0, %xmm0
		; FMA-AVX1-NEXT: retq
		;
		; FMA-AVX512-LABEL: f7:
		; FMA-AVX512: # %bb.0: # %entry
		; FMA-AVX512-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2
		; FMA-AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
		; FMA-AVX512-NEXT: vxorps %xmm1, %xmm0, %xmm0
		; FMA-AVX512-NEXT: retq
		entry:
		%3 = call float @llvm.experimental.constrained.fmuladd.f32(float %0, float %1, float %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%result = fneg float %3
		ret float %result
		}

		define double @f8(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f8:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: xorpd {{.*}}(%rip), %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f8:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm1 xmm0) + xmm2
		; FMA-NEXT: vxorpd {{.*}}(%rip), %xmm0, %xmm0
		; FMA-NEXT: retq
		entry:
		%3 = call double @llvm.experimental.constrained.fmuladd.f64(double %0, double %1, double %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%result = fneg double %3
		ret double %result
		}

		define float @f9(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f9:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movaps {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
		; NOFMA-NEXT: xorps %xmm3, %xmm0
		; NOFMA-NEXT: xorps %xmm3, %xmm2
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: xorps %xmm3, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-AVX1-LABEL: f9:
		; FMA-AVX1: # %bb.0: # %entry
		; FMA-AVX1-NEXT: vfnmsub213ss {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; FMA-AVX1-NEXT: vxorps {{.*}}(%rip), %xmm0, %xmm0
		; FMA-AVX1-NEXT: retq
		;
		; FMA-AVX512-LABEL: f9:
		; FMA-AVX512: # %bb.0: # %entry
		; FMA-AVX512-NEXT: vfnmsub213ss {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; FMA-AVX512-NEXT: vbroadcastss {{.*#+}} xmm1 = [-0.0E+0,-0.0E+0,-0.0E+0,-0.0E+0]
		; FMA-AVX512-NEXT: vxorps %xmm1, %xmm0, %xmm0
		; FMA-AVX512-NEXT: retq
		entry:
		%3 = fneg float %0
		%4 = fneg float %2
		%5 = call float @llvm.experimental.constrained.fmuladd.f32(float %3, float %1, float %4,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%result = fneg float %5
		ret float %result
		}

		define double @f10(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f10:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movapd {{.*#+}} xmm3 = [-0.0E+0,-0.0E+0]
		; NOFMA-NEXT: xorpd %xmm3, %xmm0
		; NOFMA-NEXT: xorpd %xmm3, %xmm2
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: xorpd %xmm3, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f10:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vfnmsub213sd {{.#+}} xmm0 = -(xmm1 xmm0) - xmm2
		; FMA-NEXT: vxorpd {{.*}}(%rip), %xmm0, %xmm0
		; FMA-NEXT: retq
		entry:
		%3 = fneg double %0
		%4 = fneg double %2
		%5 = call double @llvm.experimental.constrained.fmuladd.f64(double %3, double %1, double %4,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%result = fneg double %5
		ret double %result
		}

		; Verify constrained fmul and fadd aren't fused.
		define float @f11(float %0, float %1, float %2) #0 {
		; NOFMA-LABEL: f11:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f11:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmulss %xmm1, %xmm0, %xmm0
		; FMA-NEXT: vaddss %xmm2, %xmm0, %xmm0
		; FMA-NEXT: retq
		entry:
		%3 = call float @llvm.experimental.constrained.fmul.f32(float %0, float %1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%4 = call float @llvm.experimental.constrained.fadd.f32(float %3, float %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %4
		}

		; Verify constrained fmul and fadd aren't fused.
		define double @f12(double %0, double %1, double %2) #0 {
		; NOFMA-LABEL: f12:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm2, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f12:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmulsd %xmm1, %xmm0, %xmm0
		; FMA-NEXT: vaddsd %xmm2, %xmm0, %xmm0
		; FMA-NEXT: retq
		entry:
		%3 = call double @llvm.experimental.constrained.fmul.f64(double %0, double %1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		%4 = call double @llvm.experimental.constrained.fadd.f64(double %3, double %2,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %4
		}

		; Verify that fmuladd(3.5) isn't simplified when the rounding mode is
		; unknown.
		define float @f15() #0 {
		; NOFMA-LABEL: f15:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
		; NOFMA-NEXT: movaps %xmm1, %xmm0
		; NOFMA-NEXT: mulss %xmm1, %xmm0
		; NOFMA-NEXT: addss %xmm1, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f15:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
		; FMA-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm0 xmm0) + xmm0
		; FMA-NEXT: retq
		entry:
		%result = call float @llvm.experimental.constrained.fmuladd.f32(
		float 3.5,
		float 3.5,
		float 3.5,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret float %result
		}

		; Verify that fmuladd(42.1) isn't simplified when the rounding mode is
		; unknown.
		define double @f16() #0 {
		; NOFMA-LABEL: f16:
		; NOFMA: # %bb.0: # %entry
		; NOFMA-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
		; NOFMA-NEXT: movapd %xmm1, %xmm0
		; NOFMA-NEXT: mulsd %xmm1, %xmm0
		; NOFMA-NEXT: addsd %xmm1, %xmm0
		; NOFMA-NEXT: retq
		;
		; FMA-LABEL: f16:
		; FMA: # %bb.0: # %entry
		; FMA-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
		; FMA-NEXT: vfmadd213sd {{.#+}} xmm0 = (xmm0 xmm0) + xmm0
		; FMA-NEXT: retq
		entry:
		%result = call double @llvm.experimental.constrained.fmuladd.f64(
		double 42.1,
		double 42.1,
		double 42.1,
		metadata !"round.dynamic",
		metadata !"fpexcept.strict") #0
		ret double %result
		}

; Verify that fma(3.5) isn't simplified when the rounding mode is		; Verify that fma(3.5) isn't simplified when the rounding mode is
; unknown.		; unknown.
define float @f17() #0 {		define float @f17() #0 {
; NOFMA-LABEL: f17:		; NOFMA-LABEL: f17:
; NOFMA: # %bb.0: # %entry		; NOFMA: # %bb.0: # %entry
; NOFMA-NEXT: pushq %rax		; NOFMA-NEXT: pushq %rax
; NOFMA-NEXT: .cfi_def_cfa_offset 16		; NOFMA-NEXT: .cfi_def_cfa_offset 16
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	%result = call double @llvm.experimental.constrained.fma.f64(
double 42.1,		double 42.1,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict") #0		metadata !"fpexcept.strict") #0
ret double %result		ret double %result
}		}

attributes #0 = { strictfp }		attributes #0 = { strictfp }

		declare float @llvm.experimental.constrained.fmul.f32(float, float, metadata, metadata)
		declare float @llvm.experimental.constrained.fadd.f32(float, float, metadata, metadata)
		declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
		declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)		declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)		declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
		declare float @llvm.experimental.constrained.fmuladd.f32(float, float, float, metadata, metadata)
		declare double @llvm.experimental.constrained.fmuladd.f64(double, double, double, metadata, metadata)

This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv] Add pragma FP_CONTRACT support under strict FP.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 238408

clang/lib/CodeGen/CGExprScalar.cpp

clang/test/CodeGen/constrained-math-builtins.c

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/IR/ConstrainedOps.def

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86InstrAVX512.td

llvm/lib/Target/X86/X86InstrFMA.td

llvm/lib/Target/X86/X86InstrFragmentsSIMD.td

llvm/test/CodeGen/X86/fp-intrinsics-fma.ll

[FPEnv] Add pragma FP_CONTRACT support under strict FP.
ClosedPublic