Page MenuHomePhabricator

Fast Math Flag mapping into SDNode
ClosedPublic

Authored by mcberg2017 on Apr 16 2018, 4:55 PM.

Details

Summary

Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Diff Detail

Event Timeline

mcberg2017 created this revision.Apr 16 2018, 4:55 PM
spatel added a subscriber: jbhateja.

Thanks for working on this. Adding some more potential reviewers...

If I'm seeing it correctly, there are several independent pieces to this patch, so please split it up as much as possible and have tests for each part.

As a possible first step, can you compare the FMF propagation from IR to SDAG to the proposal in D37686 (@jbhateja are you planning to proceed on that patch?)

This upload contains some additional tests as requested. I did some digging to find coverage for Aggressive fma. That is also in the test path now. As only the machine instructions code is separable and also small it seems better to leave it all in one place for context. I did look at the cross reference code in D37686. There is some similar coverage with the initial code and test changes and some that is outside those that coverage for different reasons.

arsenm added inline comments.Apr 18 2018, 1:47 AM
include/llvm/CodeGen/MachineInstr.h
76–98 ↗(On Diff #142879)

These should be a separate patch from the DAG changes, and include the necessary MIR printer/parser changes to handle them

106 ↗(On Diff #142879)

If this increases the size, can we decrease this to uint16?

110 ↗(On Diff #142879)

Does this change the overall size of the MachineInstr struct, or is there already enough padding to use?

lib/CodeGen/MachineInstr.cpp
536–539 ↗(On Diff #142879)

I thought we were moving away from an overly generic "fast" terminology. This can also go in the header

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
681

I think any of these DAG changes are a separate patch from the MI changes

spatel added inline comments.Apr 18 2018, 7:06 AM
include/llvm/CodeGen/SelectionDAGNodes.h
361–362

This is where *this patch* should start and end. Why do we still need an UnsafeAlgebra bit after adding AllowReassociation?

lib/CodeGen/MachineInstr.cpp
536–539 ↗(On Diff #142879)

That's correct. In IR, we're replacing uses of isFast() with the subset of FMF that are actually required for each transform.

There's no transform that actually requires every flag, so introducing this API for MI is inviting trouble. Having isFast() in the DAG as a bridge is fine, but the goal should be to refine the DAG transforms.

mcberg2017 added inline comments.Apr 18 2018, 10:41 AM
include/llvm/CodeGen/MachineInstr.h
110 ↗(On Diff #142879)

When I put up the new review with InstrEmitter.cpp and MachineInstr.h (as the cpp file will no longer have changes), I will answer the size question for Flags there.

mcberg2017 updated this revision to Diff 142968.EditedApr 18 2018, 11:13 AM
mcberg2017 marked 6 inline comments as done.

I have split off the MachineInstr based code for a separate review as requested, will provide a link once that is up.

There is a related review for the MachineInstr code in D45781.

mcberg2017 marked an inline comment as done.Apr 18 2018, 12:06 PM
mcberg2017 added inline comments.
include/llvm/CodeGen/MachineInstr.h
110 ↗(On Diff #142879)

The padding in theMIFlag enum is such that the new flags do not change the size of the class, however, the Flags field changes it by 1 byte in size.

mcberg2017 marked 2 inline comments as done.Apr 18 2018, 12:08 PM
spatel added inline comments.Apr 19 2018, 3:56 PM
include/llvm/CodeGen/SelectionDAGNodes.h
361–362

This is marked as 'Done', but UnsafeAlgebra is still here without comment?

mcberg2017 added inline comments.Apr 19 2018, 4:42 PM
include/llvm/CodeGen/SelectionDAGNodes.h
361–362

It still present as there has not yet been any cleanup for AMDGPU, looputils and vectorization with regard to hasUnsafeAlgebra which tests that use the value. Also Aarch64 still sets it but it is not used in get context. I would like to defer this issue to cleanup work after the fact if that is ok. I will add a comment on the field for now that it is still in use under the contexts above.

wristow added inline comments.Apr 20 2018, 2:54 AM
include/llvm/CodeGen/SelectionDAGNodes.h
361–362

From my POV, I think the adding of 'AllowReassociation' and 'ApproximateFuncs' goes hand-in-hand with the cleanup of the umbrella aspect of 'UnsafeAlgebra' (that is, with the removal of the 'UnsafeAlgebra' flag). I have a bias against the umbrella aspect of 'UnsafeAlgebra', so I tend to see that cleanup as a good first step (and I think that cleanup will be straightforward). But as long as there is a comment explaining the intention of cleaning it up, and as long as the deferring of the cleanup is intended to be for a very short time, that's OK with me.

spatel added inline comments.Apr 20 2018, 6:46 AM
include/llvm/CodeGen/SelectionDAGNodes.h
361–362

+1. Sorry if it wasn't clear when I said this is where the first patch should end. Warren's description is what I was trying to suggest. Unless I'm missing something, this is a simple first step that just involves some renaming (no functional changes), eg:

Flags.setUnsafeAlgebra(FMF.isFast());

I think you mention looputils and vectorization because a simple grep turns up 'unsafealgebra' there, but that's not based on this bit - those are IR synonyms that should be fixed independently to the updated vocabulary (and then fixed properly to the correct FMF requirements):

bool hasUnsafeAlgebra() {
  return InductionBinOp && !cast<FPMathOperator>(InductionBinOp)->isFast();

There should be a very small number of DAG changes to remove/rename/replace the DAG uses of getFlags().hasUnsafeAlgebra() -> getFlags.isFast().

So I'd prefer to do that as a preliminary cleanup because DAG FMF is already too messy.

mcberg2017 updated this revision to Diff 143402.EditedApr 20 2018, 3:43 PM

Please see the changes related to hasUnsafe... and setUnsafe... wrt the removal of UnsafeAlgebra from SDNodeFlags. I believe this was the requested change. All testing passes as before.

nhaehnle removed a subscriber: nhaehnle.Apr 23 2018, 4:06 AM

Stay tuned, more changes coming to both reviews. There will be new upload shortly...

When writing and testing the mir fmf test, I found I needed to augment interior fma flags as well and the InstrEmitter.cpp change is here now.

mcberg2017 updated this revision to Diff 144766.EditedMay 1 2018, 12:53 PM

Updated after testing with changes for AMDGPU and AArch64 with a test: All updated from D46322.
AMDGPU may need further testing on proprietary backends.

mcberg2017 updated this revision to Diff 144825.May 1 2018, 7:30 PM

I have greatly reduced the size and scope of this review, other reviews will follow with the initial information, after the fact. This patch is now independent as stand alone.

spatel added a comment.May 2 2018, 1:48 PM

I discussed this and D46322 with Michael offline, so I think we're not that far off now from a first step. I'll abandon the other patch.

But I'm still confused - why are the changes to SelectionDAGBuilder::visitTargetIntrinsic(), SelectionDAGBuilder::visitIntrinsicCall() and the DAGCombiner diffs to support those new propagations of FMF in this patch? I'd like to shrink this more, so we're just getting the FMF bits to line up.

include/llvm/CodeGen/SelectionDAGNodes.h
368

Typo: "ApproximateFuncs"

938–943

Flags.isFast() ?

mcberg2017 updated this revision to Diff 144973.May 2 2018, 7:32 PM

Reduced some more to only replacing unsafe in required context, and in those using the flags we have since we now have the ability while proffering SDNode flag context for all the fast sub flags and keeping isFast. We can make case by case adjustments to flag usage after the fact.

mcberg2017 added inline comments.May 2 2018, 7:38 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9549

FMA combining can happen without reasssocation, we should just keep this to contract.

10576

Since we have the flags lets use them.

10847

Since we have the flags, lets use them.

lib/Target/AMDGPU/SIISelLowering.cpp
5351

Question: unsafe implies arcp or it should, should reassocation imply that too or should we go without reassociation in this case?

mcberg2017 retitled this revision from More Fast Math Flag propagation to Fast Math Flag mapping into SDNode.May 2 2018, 9:27 PM

This is moving closer to being an NFC patch that purely cleans up the FMF bits, which makes sense to me. Ideally, it would be nice if only that clean-up was done in the initial patch, and then a later patch makes changes to DAGCombiner, etc. I think that's what Sanjay is saying. (But I think that because the semantics of the 'UnsafeAlgebra' flag (which is being removed and references being replaced by something else) aren't precisely defined, it may not be a guaranteed NFC patch -- but that's the goal.)

Along these same lines, the changes in the two SelectionDAG::getNode() methods in SelectionDAG.cpp seem like they should be deferred to a later patch.

mcberg2017 edited the summary of this revision. (Show Details)May 3 2018, 10:08 AM
spatel added inline comments.May 3 2018, 12:20 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9549

This is more restrictive than it should be. If something is reassociable, then doesn't that imply that it is also contractable?

10576

I don't think this is working as you intended. The FMA node probably doesn't have any FMF set on its node...because we're not propagating that from IR yet.

I think we should leave this as-is for the moment - just change setUnsafeAlgebra() to setAllowReassociation(). I've added tests that will hopefully demonstrate the functional difference of this change at:
rL331471
rL331476

lib/Target/AMDGPU/SIISelLowering.cpp
5351

Reciprocal transforms are a special case of reassociation, but not enabled by 'reassoc' alone. Ie, the reason 'arcp' is its own bit is because there are users that want to enable/disable this particular transform independently of other algebraic transforms.

So in this case, I think we want to just remove the hasUnsafeAlgebra() check because that (isFast) is implied by hasAllowReciprocal():
bool Unsafe = DAG.getTarget().Options.UnsafeFPMath || Flags.hasAllowReciprocal();

But @arsenm or someone with AMDGPU knowledge should confirm if that is the intent here. The sqrt transform might want to use 'afn' now that we're propagating it, but that might be an independent change too. It's a mess. :)

mcberg2017 updated this revision to Diff 145090.EditedMay 3 2018, 1:56 PM

With only binary ops reflecting flags, our hands are kind of tied, so this should suffice until we have flag propagation in place.

mcberg2017 marked 3 inline comments as done.May 3 2018, 1:58 PM
mcberg2017 updated this revision to Diff 145120.May 3 2018, 4:58 PM

Updated for a newly added test

spatel accepted this revision.May 4 2018, 7:18 AM

LGTM.

We're down to the minimal patch now, so the AMDGPU change is actually NFC - the 'unsafe' node flag on an fdiv could only be set when all IR FMF were set (aka isFast). That means 'AllowReciprocal' must also be set.
Let me cc a few more potential AMDGPU folks, so they're aware of this change: @nhaehnle @rampitec @FarhanaAleen @tstellar

This revision is now accepted and ready to land.May 4 2018, 7:18 AM

This review should have included the llvm-commits list as a subscriber.

rampitec added inline comments.May 4 2018, 10:05 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

Production of a fused fma opcode is not tied to reassociation. The question here is legality of intermediate result rounding.

mcberg2017 added inline comments.May 4 2018, 11:06 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

Stanislav, what change if any do you request then for this code?

rampitec added inline comments.May 4 2018, 11:10 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

I would say AllowContract can be added here, but reassociation is not a sufficient or required condition.

mcberg2017 added inline comments.May 4 2018, 11:13 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

Ok, will do.

spatel added inline comments.May 4 2018, 11:18 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

So this is an existing bug in the code? It shouldn't be checking hasUnsafeAlgebra (or Options.UnsafeFPMath). Can we add a test for that?

rampitec added inline comments.May 4 2018, 11:21 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

I do not think it is a bug, this is legal to do with unsafe algebra.
At the end of the day I think if you remove reassociation from the condition and add AllowContract check it should then boil down to the question if existing lit tests will pass.

mcberg2017 added inline comments.May 4 2018, 11:23 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

With the change to hasAllowContract, existing lit tests all pass.

rampitec added inline comments.May 4 2018, 11:41 AM
lib/Target/AMDGPU/SIISelLowering.cpp
6701–6702

Thanks. I think we are fine then.

Updated with change to AMD fma case.

This revision was automatically updated to reflect the committed changes.
nhaehnle removed a subscriber: nhaehnle.May 7 2018, 12:30 AM