This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] reassoc flag shouldn't enable contract
ClosedPublic

Authored by jsji on Jun 14 2021, 11:18 AM.

Details

Summary

According to IR LangRef, the FMF flag:

contract
Allow floating-point contraction (e.g. fusing a multiply followed by an
addition into a fused multiply-and-add).

reassoc
Allow reassociation transformations for floating-point instructions.
This may dramatically change results in floating-point.

My understanding is that these two flags shouldn't imply each other,
as we might have a SDNode that can be reassociated with others, but
not contractble.

eg: We may want following fmul/fad/fsub to freely reassoc, but don't
want fma being generated here.

%F = fmul reassoc double %A, %B         ; <double> [#uses=1]
%G = fmul reassoc double %C, %D         ; <double> [#uses=1]
%H = fadd reassoc double %F, %G         ; <double> [#uses=1]
%I = fsub reassoc double %H, %E         ; <double> [#uses=1]

Before https://reviews.llvm.org/D45710, reassoc flag actually
did not imply isContratable either.

static bool isContractable(SDNode *N) {

SDNodeFlags F = N->getFlags();
return F.hasAllowContract() || F.hasUnsafeAlgebra();

}

The current implementation also only check the flag in fadd node,
ignoring fmul node, this patch update that as well.

Diff Detail

Event Timeline

jsji created this revision.Jun 14 2021, 11:18 AM
jsji requested review of this revision.Jun 14 2021, 11:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2021, 11:18 AM
jsji added a reviewer: Restricted Project.Jun 14 2021, 11:24 AM
jsji edited the summary of this revision. (Show Details)Jun 14 2021, 11:41 AM
dmgreen added inline comments.Jun 14 2021, 11:43 AM
llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
3 ↗(On Diff #351939)

I can fix this.

jsji added inline comments.Jun 14 2021, 11:47 AM
llvm/test/CodeGen/Thumb2/mve-vldshuffle.ll
3 ↗(On Diff #351939)

Thanks!

jsji edited reviewers, added: dmgreen; removed: greened.Jun 14 2021, 11:49 AM

Changes in AMDGPU tests LGTM, thanks!

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
12874–12875

Nit: it seems this function is too trivial now and we can remove it?

13104–13105

Nit: move CanFuse definition two hundreds lines down to its uses?

RKSimon resigned from this revision.Jun 15 2021, 6:43 AM
jsji updated this revision to Diff 352132.Jun 15 2021, 7:22 AM

Address review comments.

qiucf added a comment.Jun 15 2021, 7:26 PM

Thanks for the patch. It looks reasonable since we've split effect of contract from reassoc (like D89527), and this one does the missing reverse.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13125

SDValue overloaded '->' operator.

llvm/test/CodeGen/PowerPC/fma-aggr-FMF.ll
26

extra space

jsji updated this revision to Diff 352329.Jun 15 2021, 8:33 PM

Address review comments.

jsji updated this revision to Diff 352838.Jun 17 2021, 2:02 PM

Rebase to pick up Dave's D104255.

qiucf accepted this revision as: qiucf.Jun 17 2021, 7:59 PM

LGTM if no further comments from others @spatel

This revision is now accepted and ready to land.Jun 17 2021, 7:59 PM

In code that I've looked at (mostly C compiled with -ffast-math), we always have contract when we have reassoc, so I don't see much practical difference.
Can you explain more how we could benefit from this change - in the example in the description, we would have 4 instructions rather than 3 if we use FMA - is that better?
We should duplicate (instead of edit) at least a few of the tests, so we can verify that the existing tests do not form FMA now. That could also be used to show an advantage from not forming FMA.

llvm/test/CodeGen/PowerPC/fmf-propagation.ll
107

This comment was not valid even before this patch.

jsji added a comment.Jun 18 2021, 8:15 AM

In code that I've looked at (mostly C compiled with -ffast-math), we always have contract when we have reassoc, so I don't see much practical difference.
Can you explain more how we could benefit from this change - in the example in the description, we would have 4 instructions rather than 3 if we use FMA - is that better?

Yes, you are right, the performance of having FMA should be better. However, we have quite some scenarios that users care about precision more than performance, they want to precise control of when FMA can be generated. So The major motivation of this is to ensure that we respect the IR semantics. For users that care about performance, we still can get them through default global option or emitting respect flag in IR.

We should duplicate (instead of edit) at least a few of the tests, so we can verify that the existing tests do not form FMA now. That could also be used to show an advantage from not forming FMA.

Sure, I can do that . Thanks!

In code that I've looked at (mostly C compiled with -ffast-math), we always have contract when we have reassoc, so I don't see much practical difference.
Can you explain more how we could benefit from this change - in the example in the description, we would have 4 instructions rather than 3 if we use FMA - is that better?

Yes, you are right, the performance of having FMA should be better. However, we have quite some scenarios that users care about precision more than performance, they want to precise control of when FMA can be generated. So The major motivation of this is to ensure that we respect the IR semantics. For users that care about performance, we still can get them through default global option or emitting respect flag in IR.

AFAIU, the use of FMA doesn't decline the precision (but may improve a bit, which is also unexpected in some circumstance). But reassoc flag does affect precision much. We had some discussion on D99675, contract is acceptable even we want to disable reassoc for the sake of precision. That said, contract has much less influence on precision than reassoc. So I don't see any necessity to use reassoc without contract.

jsji added a comment.EditedJun 18 2021, 8:24 PM

In code that I've looked at (mostly C compiled with -ffast-math), we always have contract when we have reassoc, so I don't see much practical difference.
Can you explain more how we could benefit from this change - in the example in the description, we would have 4 instructions rather than 3 if we use FMA - is that better?

Yes, you are right, the performance of having FMA should be better. However, we have quite some scenarios that users care about precision more than performance, they want to precise control of when FMA can be generated. So The major motivation of this is to ensure that we respect the IR semantics. For users that care about performance, we still can get them through default global option or emitting respect flag in IR.

AFAIU, the use of FMA doesn't decline the precision (but may improve a bit, which is also unexpected in some circumstance).

Yes, you are right. FMA on PowerPC improves precision, which is sometime unexpected to math library writers.

We had some discussion on D99675, contract is acceptable even we want to disable reassoc for the sake of precision.
That said, contract has much less influence on precision than reassoc. So I don't see any necessity to use reassoc without contract.

I agree in general.
However, there are some specific cases that IR producer might generate reassoc flag for general reassociation, not intending for contract,
and current implementation is causing confusions and failures.

As reassoc and contract are defined as two independent flags,
I think we should distinguish them in implementation, regardless of whether it is really meaningful to use reassoc without contract?

Or another choice is we can update the definition of these two flags, saying that reassoc always imply contract,
but this doesn't looks a clear definition to me.

jsji updated this revision to Diff 353165.Jun 18 2021, 9:05 PM

Address comments -- updated testcases.

spatel added inline comments.Jun 21 2021, 9:04 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13257

If we still allow the global/deprecated UnsafeFPMath option to enable FMA formation, we are keeping a side-channel way to override the FMF. That is, the global setting is effectively a replacement for "reassoc" in (many?) regression tests.

I think this is not changing the logic before this patch, so that's fine, but do you plan to fix that too?

llvm/test/CodeGen/PowerPC/fma-assoc.ll
323

Ideally, we would remove the RUN lines that have that setting and rely on IR-level FMF instead. This may require duplicating some tests and updating the FMF.

We are trying to rely on the IR and node-level alone if possible - see D99080 for discussion.

llvm/test/CodeGen/PowerPC/fma-precision.ll
2

Similar to the earlier inline comment (see D99080 for details). Can we avoid using the global flag by updating FMF instead?

jsji updated this revision to Diff 353393.Jun 21 2021, 9:17 AM

Address comments.

jsji added inline comments.Jun 21 2021, 9:26 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13257

Yes, I planned to fix this -- by allowing it for global flag as well , not only UnsafeFPMath.

13257

Oh, just to double confirm, we will still allow global options to override FMF (UnSafeFPMath, -ffast-math, etc), right? FMF is great, but sometimes global options will be more handy for front-end or IR producers. It may take time for IR producers to transit to FMF only?

llvm/test/CodeGen/PowerPC/fma-assoc.ll
323

Can we do this in another follow up patch? As you mentioned in D99080, I believe there are quite some tests needs updates.

spatel accepted this revision.Jun 21 2021, 11:14 AM

LGTM.
Since we are opting for a definition of reassoc that is not a superset of the other flags, it might be worth seeing if transforms related to arcp and afn are similarly affected.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13257

The goal of IR- and node-level FMF is to obsolete/deprecate the global options. But this has been the case for many years now, so I have no guess for how much longer it will take. :)

llvm/test/CodeGen/PowerPC/fma-assoc.ll
323

Sure - this is going to require manual inspection, adding tests, etc. I just wanted to point out that fixing the global option might not be worth the effort (if the code is behaving as expected using FMF).

jsji added a comment.Jun 21 2021, 1:52 PM

LGTM.
Since we are opting for a definition of reassoc that is not a superset of the other flags, it might be worth seeing if transforms related to arcp and afn are similarly affected.

Thanks, sure, will check.

This revision was automatically updated to reflect the committed changes.
mcberg2017 added inline comments.Jun 21 2021, 3:48 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258

Some of these patterns below have operands/operations which change order, they should be audited and checked reassoc FMF to enable. Perhaps a follow on change?

jsji added inline comments.Jun 21 2021, 4:34 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258

Good point, sure, will follow up.

spatel added inline comments.Jun 22 2021, 6:43 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258

Sorry, I missed this difference. The patch as-is is allowing miscompiles IIUC. For example, this test of the pattern:

; x*y + u*v - z
define float @f(float %x, float %y, float %z, float %u, float %v) {
  %xy = fmul contract float %x, %y
  %uv = fmul contract float %u, %v
  %xyuv = fadd contract float %xy, %uv
  %xyuvz = fsub contract float %xyuv, %z
  ret float %xyuvz
}

We are not allowed to fuse the trailing fsub into fma based on what we decided in D89527, but that's what happens now with:

$ llc -o - fma.ll -mtriple=powerpc64le -ppc-asm-full-reg-names
	xsmsubasp f3, f4, f5 ; u*v - z
	xsmaddasp f3, f1, f2 ; x*y + (u*v - z)
	fmr f1, f3
	blr
jsji added inline comments.Jun 22 2021, 6:47 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258

Yes, working on it.

jsji added inline comments.Jun 22 2021, 7:06 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258

The specific case ; x*y + u*v - z , the behavior is not changed by D104247. It has been like this for a long time. But yeah, I am looking into these patterns now.

jsji added inline comments.Jun 22 2021, 9:56 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13258