This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
3/12
fma.ll

Differential D67434

[InstCombine] Limit FMul constant folding for fma simplifications.
ClosedPublic

Authored by fhahn on Sep 11 2019, 3:03 AM.

Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
reames
scanon

Commits

rL372899: [InstCombine] Limit FMul constant folding for fma simplifications.
rGf3ab99dcf8af: [InstCombine] Limit FMul constant folding for fma simplifications.

Summary

As @reames pointed out post-commit, rL371518 adds additional rounding
in some cases, when doing constant folding of the multiplication.
This breaks a guarantee llvm.fma makes and must be avoided.

This patch reapplies rL371518, but splits off the simplifications not
requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 38000
Build 37999: arc lint + arc unit

Event Timeline

fhahn created this revision.Sep 11 2019, 3:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2019, 3:03 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B38000: Diff 219679.Sep 11 2019, 3:03 AM

fhahn added a reviewer: scanon.Sep 11 2019, 3:04 AM

lebedev.ri added inline comments.Sep 11 2019, 3:05 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	I don't think this restriction should apply to `@llvm.fmuladd`, as per https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

First, please revert the previous patch until this review concludes. We have an active miscompile in tree, and that should be addressed first.

Second, I'm not a fan of this approach. It feels very fragile. For example, what if someone adds a transform which exploits rounding but doesn't involve constants? (Say, we know some bits in one of the arguments...)

I would *strongly* suggest we restructure the code such that a common utility function is used, but with two cover functions which make the appropriate guarantees about rounding for the distinct cases.

llvm/test/Transforms/InstCombine/fma.ll
502–511	I don't see why not? The rounding is specified exactly as described.

This revision now requires changes to proceed.Sep 11 2019, 9:02 AM

lebedev.ri added inline comments.Sep 11 2019, 9:07 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	I think my comment got garbled. I'm saying the current patch is correct in the sense that if we are simplifying `fmuladd`, we don't have any rounding concerns: As per LangRef, `fmuladd` can be arbitrarily expanded to `fmul`+`fadd`, unlike `fma`.

reames added inline comments.Sep 11 2019, 9:12 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	From the langref: "is equivalent to the expression a * b + c, except that rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations. " Once fused, this implies the rounding is defined. We can consider changing the LangRef - I think we maybe should - but per the actual wording, we must do the rounding as per an fma, not the component operations.

reames added inline comments.Sep 11 2019, 9:15 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	p.s. Looking at the implementation, particularly SelectionDAGBuilder, I think you're right about the intent. We just need to clarify the semantics in LangRef.

fhahn mentioned this in rL371634: Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma..Sep 11 2019, 9:15 AM

fhahn mentioned this in rG51de22c8ee65: Revert [InstCombine] Use SimplifyFMulInst to simplify multiply in fma..

lebedev.ri added inline comments.Sep 11 2019, 9:18 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	I fail to read langref any other way already: The ‘llvm.fmuladd.’ intrinsic functions represent multiply-add expressions that can be fused if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions. "can be fused if" The expression: `%0 = call float @llvm.fmuladd.f32(%a, %b, %c)` is equivalent to the expression a b + c, except that rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations. Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead. "rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations" "Fusion is not guaranteed"

In D67434#1666337, @reames wrote:

First, please revert the previous patch until this review concludes. We have an active miscompile in tree, and that should be addressed first.

Sure, done.

Second, I'm not a fan of this approach. It feels very fragile. For example, what if someone adds a transform which exploits rounding but doesn't involve constants? (Say, we know some bits in one of the arguments...)

I would *strongly* suggest we restructure the code such that a common utility function is used, but with two cover functions which make the appropriate guarantees about rounding for the distinct cases.

Will do, thanks.

reames added inline comments.Sep 11 2019, 9:42 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	The key bit of wording here is that the rounding is described following the decision to fold. I'm not disputing that converting " a * b + c" to a fmuladd is valid per this description. I'm disputing that converting back to an "a * b + c" sequence is technically disallowed by the wording.

Split off simplifications not requiring rounding from SimplifFMulInst as SimplifyFMAFMul.

fhahn marked an inline comment as done.Sep 11 2019, 9:57 AM

fhahn added inline comments.

llvm/test/Transforms/InstCombine/fma.ll
502–511	The key bit of wording here is that the rounding is described following the decision to fold. My understanding is that as long as it is a fmuladd, the code generator did not make a decision about fusing yet. IIUC there are 2 ways it can decide to fuse: 1) replace it with an fma intrinsic or lower it to a FMA in the backend.

lebedev.ri added inline comments.Sep 11 2019, 10:04 AM

llvm/test/Transforms/InstCombine/fma.ll
502–511	I would agree with that if not for the Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead. to me that sounds: "use if fmulfadd if you want to request to do FMA; or fma if you want to require FMA (if unavailable compilation will fail)"

LGTM, subject to incorporate all comments below before landing.

llvm/include/llvm/Analysis/InstructionSimplify.h
146 ↗	(On Diff #219737)	Suggested tweak: In contrast to SimplifyFMulInst, this function will not perform simplifications whose unrounded results differ when rounded to the argument type.
llvm/lib/Analysis/InstructionSimplify.cpp
4536 ↗	(On Diff #219737)	Please don't repeat the doc comment. Just in the header is fine.
4545 ↗	(On Diff #219737)	These two appear to be new transforms. Please separate the basic optimization which is mostly refactoring plus hooks (this review) and a following optimization change.
llvm/test/Transforms/InstCombine/fma.ll
502–511	Ok, let me rephrase how I'm approaching this. As a reviewer, I have shared a concern about a potential ambiguity in the LangRef which effects the legality of the discussed transform. Whether there is an agreement that the ambiguity exists or not, I expect a change made to clarify the semantics to remove the potential ambiguity. Please consider this a hard requirement before any change to exploit the discussed semantics as proposed in the comment that started this sub-thread. (This entire sub-thread is off topic for the actual review.)

This revision is now accepted and ready to land.Sep 11 2019, 10:18 AM

fhahn marked an inline comment as done.Sep 12 2019, 12:35 PM

fhahn added inline comments.

llvm/test/Transforms/InstCombine/fma.ll
502–511	Sounds good, I'll think about how to make it more explicit.

fhahn mentioned this in D67552: [LangRef] Clarify absence of rounding guarantees for fmuladd..Sep 13 2019, 7:33 AM

fhahn marked 2 inline comments as done.Sep 13 2019, 7:37 AM

fhahn added inline comments.

llvm/lib/Analysis/InstructionSimplify.cpp
4545 ↗	(On Diff #219737)	Done, split off as D67553
llvm/test/Transforms/InstCombine/fma.ll
502–511	Split off as D67552

mcberg2017 added a subscriber: mcberg2017.Sep 17 2019, 1:52 PM

fhahn mentioned this in rL372892: [LangRef] Clarify absence of rounding guarantees for fmuladd..Sep 25 2019, 9:09 AM

fhahn mentioned this in rG6b3749f6968f: [LangRef] Clarify absence of rounding guarantees for fmuladd..

Closed by commit rGf3ab99dcf8af: [InstCombine] Limit FMul constant folding for fma simplifications. (authored by fhahn). · Explain WhySep 25 2019, 10:04 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

28 lines

test/

Transforms/

InstCombine/

fma.ll

64 lines

Diff 219679

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 2,252 Lines • ▼ Show 20 Lines	case Intrinsic::fma: {
// fma fabs(x), fabs(x), z -> fma x, x, z		// fma fabs(x), fabs(x), z -> fma x, x, z
if (match(Src0, m_FAbs(m_Value(X))) &&		if (match(Src0, m_FAbs(m_Value(X))) &&
match(Src1, m_FAbs(m_Specific(X)))) {		match(Src1, m_FAbs(m_Specific(X)))) {
II->setArgOperand(0, X);		II->setArgOperand(0, X);
II->setArgOperand(1, X);		II->setArgOperand(1, X);
return II;		return II;
}		}

		// Try to simplify the underlying FMul to a value that does not require
		// additional rounding. SimplifyFMulInst will either simplify to an existing
		// value or constant fold the multiply. If simplified to an existing value,
		// no additional rounding is required. Constant folding could introduce
		// additional rounding. We only try to simplify multiplications with 2
		// constants, if either is 1.0 or 0.0, which won't required rounding.
		// Fmuladd intrinsics do not make any guarantees about rounding, so we can
		// constant fold arbirary multiplies.
		if (match(Src0, m_FPOne()) \|\| match(Src1, m_FPOne()) \|\|
		match(Src0, m_Zero()) \|\| match(Src1, m_Zero()) \|\|
		!isa<Constant>(II->getArgOperand(0)) \|\|
		!isa<Constant>(II->getArgOperand(1)) \|\| IID == Intrinsic::fmuladd)
// Try to simplify the underlying FMul.		// Try to simplify the underlying FMul.
if (Value *V = SimplifyFMulInst(II->getArgOperand(0), II->getArgOperand(1),		if (Value *V = SimplifyFMulInst(
II->getFastMathFlags(),		II->getArgOperand(0), II->getArgOperand(1),
SQ.getWithInstruction(II))) {		II->getFastMathFlags(), SQ.getWithInstruction(II))) {
auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));		auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));
FAdd->copyFastMathFlags(II);		FAdd->copyFastMathFlags(II);
return FAdd;		return FAdd;
}		}

break;		break;
}		}
case Intrinsic::fabs: {		case Intrinsic::fabs: {
Value *Cond;		Value *Cond;
Constant LHS, RHS;		Constant LHS, RHS;
if (match(II->getArgOperand(0),		if (match(II->getArgOperand(0),
m_Select(m_Value(Cond), m_Constant(LHS), m_Constant(RHS)))) {		m_Select(m_Value(Cond), m_Constant(LHS), m_Constant(RHS)))) {
▲ Show 20 Lines • Show All 2,513 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fma.ll

	Show First 20 Lines • Show All 439 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <2 x double> [[RES]]			; CHECK-NEXT: ret <2 x double> [[RES]]
	;			;
	entry:			entry:
	%sqrt = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)			%sqrt = call fast <2 x double> @llvm.sqrt.v2f64(<2 x double> %a)
	%res = call fast <2 x double> @llvm.fma.v2f64(<2 x double> %sqrt, <2 x double> %sqrt, <2 x double> %b)			%res = call fast <2 x double> @llvm.fma.v2f64(<2 x double> %sqrt, <2 x double> %sqrt, <2 x double> %b)
	ret <2 x double> %res			ret <2 x double> %res
	}			}

				; We do not fold constant multiplies in FMAs, as they could require rounding, unless either constant is 0.0 or 1.0.
				define <2 x double> @fma_const_fmul(<2 x double> %b) {
				; CHECK-LABEL: @fma_const_fmul(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RES:%.]] = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 0x4131233302898702, double 0x40C387800000D6C0>, <2 x double> <double 1.291820e-08, double 9.123000e-06>, <2 x double> [[B:%.]])
				; CHECK-NEXT: ret <2 x double> [[RES]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> <double 0.0000000129182, double 0.000009123>, <2 x double> %b)
				ret <2 x double> %res
				}

				define <2 x double> @fma_const_fmul_zero(<2 x double> %b) {
				; CHECK-LABEL: @fma_const_fmul_zero(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret <2 x double> [[B:%.*]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 0.0, double 0.0>, <2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> %b)
				ret <2 x double> %res
				}

				define <2 x double> @fma_const_fmul_zero2(<2 x double> %b) {
				; CHECK-LABEL: @fma_const_fmul_zero2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret <2 x double> [[B:%.*]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> <double 0.0, double 0.0>, <2 x double> %b)
				ret <2 x double> %res
				}

				define <2 x double> @fma_const_fmul_one(<2 x double> %b) {
				; CHECK-LABEL: @fma_const_fmul_one(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RES:%.]] = fadd nnan nsz <2 x double> [[B:%.]], <double 0x4131233302898702, double 0x40C387800000D6C0>
				; CHECK-NEXT: ret <2 x double> [[RES]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 1.0, double 1.0>, <2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> %b)
				ret <2 x double> %res
				}

				define <2 x double> @fma_const_fmul_one2(<2 x double> %b) {
				; CHECK-LABEL: @fma_const_fmul_one2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RES:%.]] = fadd nnan nsz <2 x double> [[B:%.]], <double 0x4131233302898702, double 0x40C387800000D6C0>
				; CHECK-NEXT: ret <2 x double> [[RES]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fma.v2f64(<2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> <double 1.0, double 1.0>, <2 x double> %b)
				ret <2 x double> %res
				}

				define <2 x double> @fmuladd_const_fmul(<2 x double> %b) {
				; CHECK-LABEL: @fmuladd_const_fmul(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[RES:%.]] = fadd nnan nsz <2 x double> [[B:%.]], <double 0x3F8DB6C076AD949B, double 0x3FB75A405B6E6D69>
				; CHECK-NEXT: ret <2 x double> [[RES]]
				;
				entry:
				%res = call nnan nsz <2 x double> @llvm.fmuladd.v2f64(<2 x double> <double 1123123.0099110012314, double 9999.0000001>, <2 x double> <double 0.0000000129182, double 0.000009123>, <2 x double> %b)
				ret <2 x double> %res
				}
				lebedev.riUnsubmitted Not Done Reply Inline Actions I don't think this restriction should apply to `@llvm.fmuladd`, as per https://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic lebedev.ri: I don't think this restriction should apply to `@llvm.fmuladd`, as per https://llvm.
				reamesUnsubmitted Not Done Reply Inline Actions I don't see why not? The rounding is specified exactly as described. reames: I don't see why not? The rounding is specified exactly as described.
				lebedev.riUnsubmitted Not Done Reply Inline Actions I think my comment got garbled. I'm saying the current patch is correct in the sense that if we are simplifying `fmuladd`, we don't have any rounding concerns: As per LangRef, `fmuladd` can be arbitrarily expanded to `fmul`+`fadd`, unlike `fma`. lebedev.ri: I think my comment got garbled. I'm saying the current patch is correct in the sense that if we…
				reamesUnsubmitted Not Done Reply Inline Actions From the langref: "is equivalent to the expression a * b + c, except that rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations. " Once fused, this implies the rounding is defined. We can consider changing the LangRef - I think we maybe should - but per the actual wording, we must do the rounding as per an fma, not the component operations. reames: From the langref: "is equivalent to the expression a * b + c, *except that rounding will not be…
				reamesUnsubmitted Not Done Reply Inline Actions p.s. Looking at the implementation, particularly SelectionDAGBuilder, I think you're right about the intent. We just need to clarify the semantics in LangRef. reames: p.s. Looking at the implementation, particularly SelectionDAGBuilder, I think you're right…
				lebedev.riUnsubmitted Not Done Reply Inline Actions I fail to read langref any other way already: The ‘llvm.fmuladd.’ intrinsic functions represent multiply-add expressions that can be fused if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions. "can be fused if" The expression: `%0 = call float @llvm.fmuladd.f32(%a, %b, %c)` is equivalent to the expression a b + c, except that rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations. Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead. "rounding will not be performed between the multiplication and addition steps if the code generator fuses the operations" "Fusion is not guaranteed" lebedev.ri: I fail to read langref any other way already: > The ‘llvm.fmuladd.*’ intrinsic functions…
				reamesUnsubmitted Not Done Reply Inline Actions The key bit of wording here is that the rounding is described following the decision to fold. I'm not disputing that converting " a * b + c" to a fmuladd is valid per this description. I'm disputing that converting back to an "a * b + c" sequence is technically disallowed by the wording. reames: The key bit of wording here is that the rounding is described following the decision to fold.
				fhahnAuthorUnsubmitted Done Reply Inline Actions The key bit of wording here is that the rounding is described following the decision to fold. My understanding is that as long as it is a fmuladd, the code generator did not make a decision about fusing yet. IIUC there are 2 ways it can decide to fuse: 1) replace it with an fma intrinsic or lower it to a FMA in the backend. fhahn: > The key bit of wording here is that the rounding is described *following the decision to…
				lebedev.riUnsubmitted Not Done Reply Inline Actions I would agree with that if not for the Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead. to me that sounds: "use if fmulfadd if you want to request to do FMA; or fma if you want to require FMA (if unavailable compilation will fail)" lebedev.ri: I would agree with that if not for the > Fusion is not guaranteed, even if the target platform…
				reamesUnsubmitted Not Done Reply Inline Actions Ok, let me rephrase how I'm approaching this. As a reviewer, I have shared a concern about a potential ambiguity in the LangRef which effects the legality of the discussed transform. Whether there is an agreement that the ambiguity exists or not, I expect a change made to clarify the semantics to remove the potential ambiguity. Please consider this a hard requirement before any change to exploit the discussed semantics as proposed in the comment that started this sub-thread. (This entire sub-thread is off topic for the actual review.) reames: Ok, let me rephrase how I'm approaching this. As a reviewer, I have shared a concern about a…
				fhahnAuthorUnsubmitted Done Reply Inline Actions Sounds good, I'll think about how to make it more explicit. fhahn: Sounds good, I'll think about how to make it more explicit.
				fhahnAuthorUnsubmitted Done Reply Inline Actions Split off as D67552 fhahn: Split off as D67552

	declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)			declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)
	declare <2 x double> @llvm.sqrt.v2f64(<2 x double>)			declare <2 x double> @llvm.sqrt.v2f64(<2 x double>)


	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind readnone }			attributes #1 = { nounwind readnone }

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Limit FMul constant folding for fma simplifications.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 219679

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/fma.ll

[InstCombine] Limit FMul constant folding for fma simplifications.
ClosedPublic