Download Raw Diff

Details

Reviewers

pengfei
LuoYuanke
craig.topper
FreddyYe

Commits

rG76656ec8ec53: [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)

Summary

This patch is to support transform something like
_mm512_add_ph(acc, _mm512_fmadd_pch(a, b, _mm512_setzero_ph()))
to _mm512_fmadd_pch(a, b, acc).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LiuChen3 created this revision.Sep 17 2021, 12:40 AM

Herald added subscribers: pengfei, hiraditya. · View Herald TranscriptSep 17 2021, 12:40 AM

LiuChen3 requested review of this revision.Sep 17 2021, 12:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2021, 12:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

LiuChen3 added reviewers: pengfei, LuoYuanke, craig.topper.Sep 17 2021, 12:41 AM

LiuChen3 added a reviewer: FreddyYe.

xbolva00 added a subscriber: xbolva00.Sep 17 2021, 1:07 AM

xbolva00 added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
47481	Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)"?

pengfei added inline comments.Sep 17 2021, 1:10 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47522	I think we can use `bool IsConj`, `SDValue MulOp0, MulOp0` instead of `CFmul`. Then you don't need to create a temp mul node.
47529–47530	Better to add parentheses.

Harbormaster completed remote builds in B124344: Diff 373157.Sep 17 2021, 1:18 AM

LiuChen3 added inline comments.Sep 17 2021, 1:54 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47481	Good idea.
47522	It seems we create more temp node. Is it better?

pengfei added inline comments.Sep 17 2021, 2:28 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47522	They are temp variables rather than nodes. And compiler may likly optimize them.

Address comments

Harbormaster completed remote builds in B124369: Diff 373190.Sep 17 2021, 4:55 AM

pengfei added inline comments.Sep 17 2021, 5:17 AM

llvm/lib/Target/X86/X86ISelLowering.cpp

47517–47518

Can these be

MulOp0 = Op0->getOperand(1);
MulOp1 = Op0->getOperand(2);

47522

I think we can then use

if ((Opcode == X86ISD::VFMULC || Opcode == X86ISD::VFCMULC)) {
  ...
  return true;
}
if ((Opcode == X86ISD::VFMADDC || Opcode == X86ISD::VFCMADDC) ... {
  ...
  return true;
}
return false;

47533–47534

I think we can remove the assert now.

Adress comments.

pengfei added inline comments.Sep 17 2021, 7:00 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517	Why we still need this?

pengfei added inline comments.Sep 17 2021, 7:01 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47512	We don't need else after return. See the Lint comment.

Harbormaster completed remote builds in B124383: Diff 373206.Sep 17 2021, 7:18 AM

LiuChen3 added inline comments.Sep 17 2021, 7:19 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517	We need transfer FMA(A, B 0) to MUL(A, B) firstly.

LiuChen3 added inline comments.Sep 17 2021, 7:25 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47514–47517	My bad. I got what's your mean.

Adress comments.

Harbormaster completed remote builds in B124390: Diff 373216.Sep 17 2021, 8:13 AM

LGTM with a nit.

llvm/lib/Target/X86/X86ISelLowering.cpp
47514	indent

This revision is now accepted and ready to land.Sep 17 2021, 8:36 PM

In D109953#3007684, @pengfei wrote:

LGTM with a nit.

Can we make the title of this patch more obvious that we're talking about the f16 packed complex operations? FMA and FADD make it look generic.

Adress comments

LiuChen3 retitled this revision from [X86] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) to [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).Sep 17 2021, 11:01 PM

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

In D109953#3007745, @craig.topper wrote:

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

I agree, we may need to check -0.0 and 0.0 only when 'hasNoSignedZeros' is true?

We need to mention complex FMA in the title.

Harbormaster completed remote builds in B124521: Diff 373392.Sep 17 2021, 11:49 PM

In D109953#3007745, @craig.topper wrote:

I'm not sure this transform is valid without no signed zeros. fma(x, y, -0.0) is equivalent to fmul.

Sorry, I lack the knowledge of 0.0 and -0.0. Why fma(x, y, 0.0) is not equal fmul?

There are special rules for adding or subtracting signed zero:

x + (+0) = x (for x different than 0)
x + (-0) = x (for x different than 0)
(-0) + (-0) = (-0) - (+0) = -0
(+0) + (+0) = (+0) - (-0) = +0
x - x = x + (-x) = +0 for any finite x unless rounding towards negative infininity then the result is -0 instead

If A * B is -0, then FMA(A, B, +0) would be (-0 + +0) or (-0 - (-0)) which should produce +0 by the last rule above with x = -0. This is different than FMUL(A,B) which we said was -0.

In D109953#3008056, @craig.topper wrote:

There are special rules for adding or subtracting signed zero:

x + (+0) = x (for x different than 0)

x + (-0) = x (for x different than 0)

(-0) + (-0) = (-0) - (+0) = -0

(+0) + (+0) = (+0) - (-0) = +0

x - x = x + (-x) = +0 for any finite x unless rounding towards negative infininity then the result is -0 instead

If A * B is -0, then FMA(A, B, +0) would be (-0 + +0) or (-0 - (-0)) which should produce +0 by the last rule above with x = -0. This is different than FMUL(A,B) which we said was -0.

Got it. Thanks a lot. :)

Distinguish 0.0 and -0.0 in FMA.

LiuChen3 added inline comments.Sep 22 2021, 1:56 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47533	Maybe we can just check hasNoSignedZeros() and hasAllowContract() as pengfei said?
llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll
197	Should we do this combine standalone?

pengfei added inline comments.Sep 22 2021, 2:13 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47612–47621	This seems been changed unconsciously.
47629–47634	The indentation is wrong too. The same below.

Harbormaster completed remote builds in B125070: Diff 374158.Sep 22 2021, 2:17 AM

The format in this file was wrongly formatted.

llvm/lib/Target/X86/X86ISelLowering.cpp
47533	Yeah, I prefer to checking both in line 47582.
47582	Should this be AllowContract(Op0->getFlags()) && (ISD::isBuildVectorAllZeros(Op0->getOperand(0).getNode()) && Op0->getFlags().hasNoSignedZeros()) \|\| IsVectorAllNegativeZero(Op0->getOperand(0).getNode())) I.e, check `AllowContract` together with `IsVectorAllNegativeZero` as well.

This revision now requires changes to proceed.Sep 22 2021, 2:28 AM

LiuChen3 added inline comments.Sep 22 2021, 6:33 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47582	AllowContract will check hasNoSignedZeros(). It seems that we can only do this combination when the fast-math flag is set, No matter if the third operand is +0.0 or 0.0. +0.0 or -0.0 affects the conversion of FMA(a, b, ±0.0) to FMUL(a, b).
47612–47621	Sorry for this. Looks like I accidentally do some change here.

Remove redundant option checkRemove redundant option check.
Reformat.

Harbormaster completed remote builds in B125271: Diff 374432.Sep 22 2021, 8:50 PM

pengfei added inline comments.Sep 22 2021, 10:50 PM

llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll
3	How about `CHECK,NO-SZ`
4	How about `CHECK,HAS-SZ`

Address comments.

LGTM.

This revision is now accepted and ready to land.Sep 22 2021, 11:11 PM

Harbormaster completed remote builds in B125280: Diff 374445.Sep 22 2021, 11:24 PM

This revision was landed with ongoing or failed builds.Sep 23 2021, 12:37 AM

Closed by commit rG76656ec8ec53: [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) (authored by LiuChen3). · Explain Why

This revision was automatically updated to reflect the committed changes.

LiuChen3 added a commit: rG76656ec8ec53: [X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A).

Thanks for you review. The order of operand is changed in final commit: The addend of FMA builtin is moved from the first to the third.

LiuChen3 mentioned this in D110606: [X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)..Sep 27 2021, 6:42 PM

LiuChen3 mentioned this in rG57e8f840b6d3: [X86][FP16] Fix a bug when Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)..Sep 27 2021, 8:39 PM

Diff 373157

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,750 Lines • ▼ Show 20 Lines
	if (combineConjugation(Res))			if (combineConjugation(Res))
	return Res;			return Res;
	std::swap(LHS, RHS);			std::swap(LHS, RHS);
	if (combineConjugation(Res))			if (combineConjugation(Res))
	return Res;			return Res;
	return Res;			return Res;
	}			}

	// Try to combine the following nodes			// Try to combine the following nodes
				xbolva00Unsubmitted Not Done Reply Inline Actions Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)"? xbolva00: Do we really need this output here? Simplify it a bit? Something like you wrote "Combine the…
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions Good idea. LiuChen3: Good idea.
	// t21: v16f32 = X86ISD::VFMULC/VFCMULC t7, t8			// t21: v16f32 = X86ISD::VFMULC/VFCMULC t7, t8
	// t15: v32f16 = bitcast t21			// t15: v32f16 = bitcast t21
	// t16: v32f16 = fadd nnan ninf nsz arcp contract afn reassoc t15, t2			// t16: v32f16 = fadd nnan ninf nsz arcp contract afn reassoc t15, t2
	// into X86ISD::VFMADDC/VFCMADDC if possible:			// into X86ISD::VFMADDC/VFCMADDC if possible:
	// t22: v16f32 = bitcast t2			// t22: v16f32 = bitcast t2
	// t23: v16f32 = nnan ninf nsz arcp contract afn reassoc			// t23: v16f32 = nnan ninf nsz arcp contract afn reassoc
	// X86ISD::VFMADDC/VFCMADDC t7, t8, t22			// X86ISD::VFMADDC/VFCMADDC t7, t8, t22
	// t24: v32f16 = bitcast t23			// t24: v32f16 = bitcast t23
				// And
				// t11: v16f32 = ConstantFP vector<0.00>
				// t4: v32f16,ch = CopyFromReg t0, Register:v32f16 %1
				// t7: v16f32 = bitcast t4
				// t6: v32f16,ch = CopyFromReg t0, Register:v32f16 %2
				// t8: v16f32 = bitcast t6
				// t21: v16f32 = X86ISD::VFCMADDC/VFMADDC
				// nnan ninf nsz arcp contract afn reassoc t11, t7, t8
				// t15: v32f16 = bitcast t21
				// t2: v32f16,ch = CopyFromReg t0, Register:v32f16 %0
				// t16: v32f16 = fadd t15, t2
				// into
				// t6: v32f16,ch = CopyFromReg t0, Register:v32f16 %2
				// t8: v16f32 = bitcast t6
				// t24: v16f32 = X86ISD::VFCMADDC/VFMADDC
				// nnan ninf nsz arcp contract afn reassoc t23, t7, t8
				// t25: v32f16 = bitcast t24
	static SDValue combineFaddCFmul(SDNode *N, SelectionDAG &DAG,			static SDValue combineFaddCFmul(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	auto AllowContract = [&DAG](SDNode *N) {			auto AllowContract = [&DAG](SDNode *N) {
	return DAG.getTarget().Options.AllowFPOpFusion == FPOpFusion::Fast \|\|			return DAG.getTarget().Options.AllowFPOpFusion == FPOpFusion::Fast \|\|
	N->getFlags().hasAllowContract();			N->getFlags().hasAllowContract();
	};			};
				pengfeiUnsubmitted Not Done Reply Inline Actions We don't need else after return. See the Lint comment. pengfei: We don't need else after return. See the Lint comment.
	if (N->getOpcode() != ISD::FADD \|\| !Subtarget.hasFP16() \|\| !AllowContract(N))			if (N->getOpcode() != ISD::FADD \|\| !Subtarget.hasFP16() \|\| !AllowContract(N))
	return SDValue();			return SDValue();
				pengfeiUnsubmitted Not Done Reply Inline Actions indent pengfei: indent

	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);
	if (VT != MVT::v8f16 && VT != MVT::v16f16 && VT != MVT::v32f16)			if (VT != MVT::v8f16 && VT != MVT::v16f16 && VT != MVT::v32f16)
				pengfeiUnsubmitted Not Done Reply Inline Actions Why we still need this? pengfei: Why we still need this?
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions We need transfer FMA(A, B 0) to MUL(A, B) firstly. LiuChen3: We need transfer FMA(A, B 0) to MUL(A, B) firstly.
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions My bad. I got what's your mean. LiuChen3: My bad. I got what's your mean.
	return SDValue();			return SDValue();
				pengfeiUnsubmitted Not Done Reply Inline Actions Can these be MulOp0 = Op0->getOperand(1); MulOp1 = Op0->getOperand(2); pengfei: Can these be ``` MulOp0 = Op0->getOperand(1); MulOp1 = Op0->getOperand(2); ```

	SDValue LHS = N->getOperand(0);			SDValue LHS = N->getOperand(0);
	SDValue RHS = N->getOperand(1);			SDValue RHS = N->getOperand(1);
	SDValue CFmul, FAddOp1;			SDValue CFmul, FAddOp1;
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we can use `bool IsConj`, `SDValue MulOp0, MulOp0` instead of `CFmul`. Then you don't need to create a temp mul node. pengfei: I think we can use `bool IsConj`, `SDValue MulOp0, MulOp0` instead of `CFmul`. Then you don't…
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions It seems we create more temp node. Is it better? LiuChen3: It seems we create more temp node. Is it better?
				pengfeiUnsubmitted Not Done Reply Inline Actions They are temp variables rather than nodes. And compiler may likly optimize them. pengfei: They are temp variables rather than nodes. And compiler may likly optimize them.
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we can then use if ((Opcode == X86ISD::VFMULC \|\| Opcode == X86ISD::VFCMULC)) { ... return true; } if ((Opcode == X86ISD::VFMADDC \|\| Opcode == X86ISD::VFCMADDC) ... { ... return true; } return false; pengfei: I think we can then use ``` if ((Opcode == X86ISD::VFMULC \|\| Opcode == X86ISD::VFCMULC)) { ...
	auto GetCFmulFrom = [&CFmul, &AllowContract](SDValue N) -> bool {			auto GetCFmulFrom = [&CFmul, &AllowContract, &DAG](SDValue N) -> bool {
	if (!N.hasOneUse() \|\| N.getOpcode() != ISD::BITCAST)			if (!N.hasOneUse() \|\| N.getOpcode() != ISD::BITCAST)
	return false;			return false;
	SDValue Op0 = N.getOperand(0);			SDValue Op0 = N.getOperand(0);
	unsigned Opcode = Op0.getOpcode();			unsigned Opcode = Op0.getOpcode();
	if (Op0.hasOneUse() && AllowContract(Op0.getNode()) &&			if (Op0.hasOneUse() && AllowContract(Op0.getNode())) {
	(Opcode == X86ISD::VFMULC \|\| Opcode == X86ISD::VFCMULC))			if ((Opcode == X86ISD::VFMULC \|\| Opcode == X86ISD::VFCMULC))
	CFmul = Op0;			CFmul = Op0;
				pengfeiUnsubmitted Not Done Reply Inline Actions Better to add parentheses. pengfei: Better to add parentheses.
				else if ((Opcode == X86ISD::VFMADDC \|\| Opcode == X86ISD::VFCMADDC) &&
				ISD::isBuildVectorAllZeros(Op0->getOperand(0).getNode())) {
				CFmul = DAG.getNode(Opcode == X86ISD::VFMADDC ? X86ISD::VFMULC
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions Maybe we can just check hasNoSignedZeros() and hasAllowContract() as pengfei said? LiuChen3: Maybe we can just check hasNoSignedZeros() and hasAllowContract() as pengfei said?
				pengfeiUnsubmitted Not Done Reply Inline Actions Yeah, I prefer to checking both in line 47582. pengfei: Yeah, I prefer to checking both in line 47582.
				: X86ISD::VFCMULC,
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we can remove the assert now. pengfei: I think we can remove the assert now.
				SDLoc(Op0), Op0.getSimpleValueType(),
				Op0->getOperand(1), Op0->getOperand(2));
				DAG.ReplaceAllUsesOfValueWith(Op0, CFmul);
				}
				}
	return !!CFmul;			return !!CFmul;
	};			};

	if (GetCFmulFrom(LHS))			if (GetCFmulFrom(LHS))
	FAddOp1 = RHS;			FAddOp1 = RHS;
	else if (GetCFmulFrom(RHS))			else if (GetCFmulFrom(RHS))
	FAddOp1 = LHS;			FAddOp1 = LHS;
	else			else
	Show All 26 Lines
	/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify			/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify
	/// the codegen.			/// the codegen.
	/// e.g. TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) )			/// e.g. TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) )
	/// TODO: This overlaps with the generic combiner's visitTRUNCATE. Remove			/// TODO: This overlaps with the generic combiner's visitTRUNCATE. Remove
	/// anything that is guaranteed to be transformed by DAGCombiner.			/// anything that is guaranteed to be transformed by DAGCombiner.
	static SDValue combineTruncatedArithmetic(SDNode *N, SelectionDAG &DAG,			static SDValue combineTruncatedArithmetic(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget,			const X86Subtarget &Subtarget,
	const SDLoc &DL) {			const SDLoc &DL) {
	assert(N->getOpcode() == ISD::TRUNCATE && "Wrong opcode");			assert(N->getOpcode() == ISD::TRUNCATE && "Wrong opcode");
				pengfeiUnsubmitted Not Done Reply Inline Actions Should this be AllowContract(Op0->getFlags()) && (ISD::isBuildVectorAllZeros(Op0->getOperand(0).getNode()) && Op0->getFlags().hasNoSignedZeros()) \|\| IsVectorAllNegativeZero(Op0->getOperand(0).getNode())) I.e, check `AllowContract` together with `IsVectorAllNegativeZero` as well. pengfei: Should this be ``` AllowContract(Op0->getFlags()) && (ISD::isBuildVectorAllZeros(Op0…
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions AllowContract will check hasNoSignedZeros(). It seems that we can only do this combination when the fast-math flag is set, No matter if the third operand is +0.0 or 0.0. +0.0 or -0.0 affects the conversion of FMA(a, b, ±0.0) to FMUL(a, b). LiuChen3: AllowContract will check hasNoSignedZeros(). It seems that we can only do this combination when…
	SDValue Src = N->getOperand(0);			SDValue Src = N->getOperand(0);
	unsigned SrcOpcode = Src.getOpcode();			unsigned SrcOpcode = Src.getOpcode();
	const TargetLowering &TLI = DAG.getTargetLoweringInfo();			const TargetLowering &TLI = DAG.getTargetLoweringInfo();

	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);
	EVT SrcVT = Src.getValueType();			EVT SrcVT = Src.getValueType();

	auto IsFreeTruncation = [VT](SDValue Op) {			auto IsFreeTruncation = [VT](SDValue Op) {
	Show All 13 Lines
	// we'll just send up with a truncate on both operands which will			// we'll just send up with a truncate on both operands which will
	// get turned back into (truncate (binop)) causing an infinite loop.			// get turned back into (truncate (binop)) causing an infinite loop.
	return ISD::isBuildVectorOfConstantSDNodes(Op.getNode());			return ISD::isBuildVectorOfConstantSDNodes(Op.getNode());
	};			};

	auto TruncateArithmetic = [&](SDValue N0, SDValue N1) {			auto TruncateArithmetic = [&](SDValue N0, SDValue N1) {
	SDValue Trunc0 = DAG.getNode(ISD::TRUNCATE, DL, VT, N0);			SDValue Trunc0 = DAG.getNode(ISD::TRUNCATE, DL, VT, N0);
	SDValue Trunc1 = DAG.getNode(ISD::TRUNCATE, DL, VT, N1);			SDValue Trunc1 = DAG.getNode(ISD::TRUNCATE, DL, VT, N1);
	return DAG.getNode(SrcOpcode, DL, VT, Trunc0, Trunc1);			return DAG.getNode(SrcOpcode, DL, VT, Trunc0, Trunc1);
	};			};

	// Don't combine if the operation has other uses.			// Don't combine if the operation has other uses.
	if (!Src.hasOneUse())			if (!Src.hasOneUse())
	return SDValue();			return SDValue();

	// Only support vector truncation for now.			// Only support vector truncation for now.
	// TODO: i64 scalar math would benefit as well.			// TODO: i64 scalar math would benefit as well.
	if (!VT.isVector())			if (!VT.isVector())
				pengfeiUnsubmitted Not Done Reply Inline Actions This seems been changed unconsciously. pengfei: This seems been changed unconsciously.
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions Sorry for this. Looks like I accidentally do some change here. LiuChen3: Sorry for this. Looks like I accidentally do some change here.
	return SDValue();			return SDValue();

	// In most cases its only worth pre-truncating if we're only facing the cost			// In most cases its only worth pre-truncating if we're only facing the cost
	// of one truncation.			// of one truncation.
	// i.e. if one of the inputs will constant fold or the input is repeated.			// i.e. if one of the inputs will constant fold or the input is repeated.
	switch (SrcOpcode) {			switch (SrcOpcode) {
	case ISD::MUL:			case ISD::MUL:
	// X86 is rubbish at scalar and vector i64 multiplies (until AVX512DQ) - its			// X86 is rubbish at scalar and vector i64 multiplies (until AVX512DQ) - its
	// better to truncate if we have the chance.			// better to truncate if we have the chance.
	if (SrcVT.getScalarType() == MVT::i64 &&			if (SrcVT.getScalarType() == MVT::i64 &&
	TLI.isOperationLegal(SrcOpcode, VT) &&			TLI.isOperationLegal(SrcOpcode, VT) &&
	!TLI.isOperationLegal(SrcOpcode, SrcVT))			!TLI.isOperationLegal(SrcOpcode, SrcVT))
	return TruncateArithmetic(Src.getOperand(0), Src.getOperand(1));			return TruncateArithmetic(Src.getOperand(0), Src.getOperand(1));
				pengfeiUnsubmitted Not Done Reply Inline Actions The indentation is wrong too. The same below. pengfei: The indentation is wrong too. The same below.
	LLVM_FALLTHROUGH;			LLVM_FALLTHROUGH;
	case ISD::AND:			case ISD::AND:
	case ISD::XOR:			case ISD::XOR:
	case ISD::OR:			case ISD::OR:
	case ISD::ADD:			case ISD::ADD:
	case ISD::SUB: {			case ISD::SUB: {
	SDValue Op0 = Src.getOperand(0);			SDValue Op0 = Src.getOperand(0);
	SDValue Op1 = Src.getOperand(1);			SDValue Op1 = Src.getOperand(1);
	▲ Show 20 Lines • Show All 5,981 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx512fp16 --fp-contract=fast --enable-unsafe-fp-math \| FileCheck %s

				pengfeiUnsubmitted Not Done Reply Inline Actions How about `CHECK,NO-SZ` pengfei: How about `CHECK,NO-SZ`
				define dso_local <32 x half> @test1(<32 x half> %acc, <32 x half> %a, <32 x half> %b) {
				pengfeiUnsubmitted Not Done Reply Inline Actions How about `CHECK,HAS-SZ` pengfei: How about `CHECK,HAS-SZ`
				; CHECK-LABEL: test1:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfcmaddcph %zmm2, %zmm1, %zmm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <32 x half> %a to <16 x float>
				%1 = bitcast <32 x half> %b to <16 x float>
				%2 = tail call fast <16 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.512(<16 x float> zeroinitializer, <16 x float> %0, <16 x float> %1, i16 -1, i32 4)
				%3 = bitcast <16 x float> %2 to <32 x half>
				%add.i = fadd fast <32 x half> %3, %acc
				ret <32 x half> %add.i
				}

				define dso_local <32 x half> @test2(<32 x half> %acc, <32 x half> %a, <32 x half> %b) {
				; CHECK-LABEL: test2:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfmaddcph %zmm2, %zmm1, %zmm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <32 x half> %a to <16 x float>
				%1 = bitcast <32 x half> %b to <16 x float>
				%2 = tail call fast <16 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.512(<16 x float> zeroinitializer, <16 x float> %0, <16 x float> %1, i16 -1, i32 4)
				%3 = bitcast <16 x float> %2 to <32 x half>
				%add.i = fadd fast <32 x half> %3, %acc
				ret <32 x half> %add.i
				}

				define dso_local <16 x half> @test3(<16 x half> %acc, <16 x half> %a, <16 x half> %b) {
				; CHECK-LABEL: test3:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfcmaddcph %ymm2, %ymm1, %ymm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <16 x half> %a to <8 x float>
				%1 = bitcast <16 x half> %b to <8 x float>
				%2 = tail call fast <8 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.256(<8 x float> zeroinitializer, <8 x float> %0, <8 x float> %1, i8 -1)
				%3 = bitcast <8 x float> %2 to <16 x half>
				%add.i = fadd fast <16 x half> %3, %acc
				ret <16 x half> %add.i
				}

				define dso_local <16 x half> @test4(<16 x half> %acc, <16 x half> %a, <16 x half> %b) {
				; CHECK-LABEL: test4:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfmaddcph %ymm2, %ymm1, %ymm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <16 x half> %a to <8 x float>
				%1 = bitcast <16 x half> %b to <8 x float>
				%2 = tail call fast <8 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.256(<8 x float> zeroinitializer, <8 x float> %0, <8 x float> %1, i8 -1)
				%3 = bitcast <8 x float> %2 to <16 x half>
				%add.i = fadd fast <16 x half> %3, %acc
				ret <16 x half> %add.i
				}

				define dso_local <8 x half> @test5(<8 x half> %acc, <8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: test5:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfcmaddcph %xmm2, %xmm1, %xmm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <8 x half> %a to <4 x float>
				%1 = bitcast <8 x half> %b to <4 x float>
				%2 = tail call fast <4 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.128(<4 x float> zeroinitializer, <4 x float> %0, <4 x float> %1, i8 -1)
				%3 = bitcast <4 x float> %2 to <8 x half>
				%add.i = fadd fast <8 x half> %3, %acc
				ret <8 x half> %add.i
				}

				define dso_local <8 x half> @test6(<8 x half> %acc, <8 x half> %a, <8 x half> %b) {
				; CHECK-LABEL: test6:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: vfmaddcph %xmm2, %xmm1, %xmm0
				; CHECK-NEXT: retq
				entry:
				%0 = bitcast <8 x half> %a to <4 x float>
				%1 = bitcast <8 x half> %b to <4 x float>
				%2 = tail call fast <4 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.128(<4 x float> zeroinitializer, <4 x float> %0, <4 x float> %1, i8 -1)
				%3 = bitcast <4 x float> %2 to <8 x half>
				%add.i = fadd fast <8 x half> %3, %acc
				ret <8 x half> %add.i
				}

				declare <16 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.512(<16 x float>, <16 x float>, <16 x float>, i16, i32 immarg)
				declare <16 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.512(<16 x float>, <16 x float>, <16 x float>, i16, i32 immarg)
				declare <8 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.256(<8 x float>, <8 x float>, <8 x float>, i8)
				declare <8 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.256(<8 x float>, <8 x float>, <8 x float>, i8)
				declare <4 x float> @llvm.x86.avx512fp16.mask.vfcmadd.cph.128(<4 x float>, <4 x float>, <4 x float>, i8)
				declare <4 x float> @llvm.x86.avx512fp16.mask.vfmadd.cph.128(<4 x float>, <4 x float>, <4 x float>, i8)
				LiuChen3AuthorUnsubmitted Done Reply Inline Actions Should we do this combine standalone? LiuChen3: Should we do this combine standalone?

This is an archive of the discontinued LLVM Phabricator instance.

[X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 373157

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll

This is an archive of the discontinued LLVM Phabricator instance.

[X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 373157

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx512fp16-combine-vfmac-fadd.ll

[X86][FP16] Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A)
ClosedPublic