Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.
Details
Diff Detail
Event Timeline
Thanks for working on this. Adding some more potential reviewers...
If I'm seeing it correctly, there are several independent pieces to this patch, so please split it up as much as possible and have tests for each part.
As a possible first step, can you compare the FMF propagation from IR to SDAG to the proposal in D37686 (@jbhateja are you planning to proceed on that patch?)
This upload contains some additional tests as requested. I did some digging to find coverage for Aggressive fma. That is also in the test path now. As only the machine instructions code is separable and also small it seems better to leave it all in one place for context. I did look at the cross reference code in D37686. There is some similar coverage with the initial code and test changes and some that is outside those that coverage for different reasons.
include/llvm/CodeGen/MachineInstr.h | ||
---|---|---|
76–98 ↗ | (On Diff #142879) | These should be a separate patch from the DAG changes, and include the necessary MIR printer/parser changes to handle them |
106 ↗ | (On Diff #142879) | If this increases the size, can we decrease this to uint16? |
110 ↗ | (On Diff #142879) | Does this change the overall size of the MachineInstr struct, or is there already enough padding to use? |
lib/CodeGen/MachineInstr.cpp | ||
536–539 ↗ | (On Diff #142879) | I thought we were moving away from an overly generic "fast" terminology. This can also go in the header |
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
680 | I think any of these DAG changes are a separate patch from the MI changes |
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
361–362 | This is where *this patch* should start and end. Why do we still need an UnsafeAlgebra bit after adding AllowReassociation? | |
lib/CodeGen/MachineInstr.cpp | ||
536–539 ↗ | (On Diff #142879) | That's correct. In IR, we're replacing uses of isFast() with the subset of FMF that are actually required for each transform. There's no transform that actually requires every flag, so introducing this API for MI is inviting trouble. Having isFast() in the DAG as a bridge is fine, but the goal should be to refine the DAG transforms. |
include/llvm/CodeGen/MachineInstr.h | ||
---|---|---|
110 ↗ | (On Diff #142879) | When I put up the new review with InstrEmitter.cpp and MachineInstr.h (as the cpp file will no longer have changes), I will answer the size question for Flags there. |
I have split off the MachineInstr based code for a separate review as requested, will provide a link once that is up.
include/llvm/CodeGen/MachineInstr.h | ||
---|---|---|
110 ↗ | (On Diff #142879) | The padding in theMIFlag enum is such that the new flags do not change the size of the class, however, the Flags field changes it by 1 byte in size. |
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
361–362 | This is marked as 'Done', but UnsafeAlgebra is still here without comment? |
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
361–362 | It still present as there has not yet been any cleanup for AMDGPU, looputils and vectorization with regard to hasUnsafeAlgebra which tests that use the value. Also Aarch64 still sets it but it is not used in get context. I would like to defer this issue to cleanup work after the fact if that is ok. I will add a comment on the field for now that it is still in use under the contexts above. |
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
361–362 | From my POV, I think the adding of 'AllowReassociation' and 'ApproximateFuncs' goes hand-in-hand with the cleanup of the umbrella aspect of 'UnsafeAlgebra' (that is, with the removal of the 'UnsafeAlgebra' flag). I have a bias against the umbrella aspect of 'UnsafeAlgebra', so I tend to see that cleanup as a good first step (and I think that cleanup will be straightforward). But as long as there is a comment explaining the intention of cleaning it up, and as long as the deferring of the cleanup is intended to be for a very short time, that's OK with me. |
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
361–362 | +1. Sorry if it wasn't clear when I said this is where the first patch should end. Warren's description is what I was trying to suggest. Unless I'm missing something, this is a simple first step that just involves some renaming (no functional changes), eg: Flags.setUnsafeAlgebra(FMF.isFast()); I think you mention looputils and vectorization because a simple grep turns up 'unsafealgebra' there, but that's not based on this bit - those are IR synonyms that should be fixed independently to the updated vocabulary (and then fixed properly to the correct FMF requirements): bool hasUnsafeAlgebra() { return InductionBinOp && !cast<FPMathOperator>(InductionBinOp)->isFast(); There should be a very small number of DAG changes to remove/rename/replace the DAG uses of getFlags().hasUnsafeAlgebra() -> getFlags.isFast(). So I'd prefer to do that as a preliminary cleanup because DAG FMF is already too messy. |
Please see the changes related to hasUnsafe... and setUnsafe... wrt the removal of UnsafeAlgebra from SDNodeFlags. I believe this was the requested change. All testing passes as before.
When writing and testing the mir fmf test, I found I needed to augment interior fma flags as well and the InstrEmitter.cpp change is here now.
Updated after testing with changes for AMDGPU and AArch64 with a test: All updated from D46322.
AMDGPU may need further testing on proprietary backends.
I have greatly reduced the size and scope of this review, other reviews will follow with the initial information, after the fact. This patch is now independent as stand alone.
I discussed this and D46322 with Michael offline, so I think we're not that far off now from a first step. I'll abandon the other patch.
But I'm still confused - why are the changes to SelectionDAGBuilder::visitTargetIntrinsic(), SelectionDAGBuilder::visitIntrinsicCall() and the DAGCombiner diffs to support those new propagations of FMF in this patch? I'd like to shrink this more, so we're just getting the FMF bits to line up.
include/llvm/CodeGen/SelectionDAGNodes.h | ||
---|---|---|
368 | Typo: "ApproximateFuncs" | |
938–943 | Flags.isFast() ? |
Reduced some more to only replacing unsafe in required context, and in those using the flags we have since we now have the ability while proffering SDNode flag context for all the fast sub flags and keeping isFast. We can make case by case adjustments to flag usage after the fact.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9476 | FMA combining can happen without reasssocation, we should just keep this to contract. | |
10547 | Since we have the flags lets use them. | |
10818 | Since we have the flags, lets use them. | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
5345 | Question: unsafe implies arcp or it should, should reassocation imply that too or should we go without reassociation in this case? |
This is moving closer to being an NFC patch that purely cleans up the FMF bits, which makes sense to me. Ideally, it would be nice if only that clean-up was done in the initial patch, and then a later patch makes changes to DAGCombiner, etc. I think that's what Sanjay is saying. (But I think that because the semantics of the 'UnsafeAlgebra' flag (which is being removed and references being replaced by something else) aren't precisely defined, it may not be a guaranteed NFC patch -- but that's the goal.)
Along these same lines, the changes in the two SelectionDAG::getNode() methods in SelectionDAG.cpp seem like they should be deferred to a later patch.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
9476 | This is more restrictive than it should be. If something is reassociable, then doesn't that imply that it is also contractable? | |
10547 | I don't think this is working as you intended. The FMA node probably doesn't have any FMF set on its node...because we're not propagating that from IR yet. I think we should leave this as-is for the moment - just change setUnsafeAlgebra() to setAllowReassociation(). I've added tests that will hopefully demonstrate the functional difference of this change at: | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
5345 | Reciprocal transforms are a special case of reassociation, but not enabled by 'reassoc' alone. Ie, the reason 'arcp' is its own bit is because there are users that want to enable/disable this particular transform independently of other algebraic transforms. So in this case, I think we want to just remove the hasUnsafeAlgebra() check because that (isFast) is implied by hasAllowReciprocal(): But @arsenm or someone with AMDGPU knowledge should confirm if that is the intent here. The sqrt transform might want to use 'afn' now that we're propagating it, but that might be an independent change too. It's a mess. :) |
With only binary ops reflecting flags, our hands are kind of tied, so this should suffice until we have flag propagation in place.
LGTM.
We're down to the minimal patch now, so the AMDGPU change is actually NFC - the 'unsafe' node flag on an fdiv could only be set when all IR FMF were set (aka isFast). That means 'AllowReciprocal' must also be set.
Let me cc a few more potential AMDGPU folks, so they're aware of this change: @nhaehnle @rampitec @FarhanaAleen @tstellar
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | Production of a fused fma opcode is not tied to reassociation. The question here is legality of intermediate result rounding. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | Stanislav, what change if any do you request then for this code? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | I would say AllowContract can be added here, but reassociation is not a sufficient or required condition. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | Ok, will do. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | So this is an existing bug in the code? It shouldn't be checking hasUnsafeAlgebra (or Options.UnsafeFPMath). Can we add a test for that? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | I do not think it is a bug, this is legal to do with unsafe algebra. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | With the change to hasAllowContract, existing lit tests all pass. |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
6670–6671 | Thanks. I think we are fine then. |
This is where *this patch* should start and end. Why do we still need an UnsafeAlgebra bit after adding AllowReassociation?