This is an archive of the discontinued LLVM Phabricator instance.

[FPEnv][SelectionDAG] Relax chain requirements
ClosedPublic

Authored by uweigand on Dec 2 2019, 11:01 AM.

Details

Summary

This patch fixes one remaining problem in matching strict FP operations in the SystemZ back-end: it is currently not possible to have the DAG matcher accept certain complex expressions involving strict FP operations; specifically this affects matching the MDEB(R) etc. family of multiply instructions that implicitly extend their input (32x32 -> 64 bit or 64x64 -> 128 bit).

The matcher currently fails because of two reasons:

  1. FP intrinsics are too rigidly sequenced due to the way their chains are constructed
  2. The chain output confuses a hasOneUse check in the matcher

As to 1), SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well.

This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG optimizations.

As to 2), even with the above change, I still was unable to match more that two linked DAG nodes in a single pattern (e.g. load -> extend -> multiply). This is because the OPC_CheckFoldableChainNode check requires that each of the intermediate nodes ("extend" in the above example) only has a single use. This check fails for strict operations as the chain output of the extend indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't.

This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value.

With those two changes (and a few SystemZ backend updates), I am able to match all remaining versions of the extended-multiply instruction against constrained intrinsics. This also fixes the last remaining set of test cases that were mixing strict and nonstrict operations in a single function.

The OPC_CheckFoldableChainNode change might theoretically affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening.

Diff Detail

Event Timeline

uweigand created this revision.Dec 2 2019, 11:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 2 2019, 11:01 AM
craig.topper added inline comments.Dec 2 2019, 11:10 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
3178

Can we break this loop when NonChainUses is greater than 1? No need to keep scanning.

uweigand updated this revision to Diff 231894.Dec 3 2019, 6:07 AM
uweigand marked 2 inline comments as done.
uweigand added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
3178

Sure, makes sense.

This revision is now accepted and ready to land.Dec 5 2019, 3:49 PM
This revision was automatically updated to reflect the committed changes.
uweigand marked an inline comment as done.

Should be fixed by b3009ed -- sorry!

I think this doesn't work if a constrained intrinsic is created that doesn't have have any uses, but needs to trigger an exception. The chain output ends up in PendingLoads, but PendingLoads isn't flushed if there's no stores or other operations that need to be ordered. So the basic block ends without connecting the chain from the constrained intrinsic to anything. This makes the node subject to deletion.

pengfei added a subscriber: pengfei.Jan 2 2020, 7:12 PM

I think this doesn't work if a constrained intrinsic is created that doesn't have have any uses, but needs to trigger an exception. The chain output ends up in PendingLoads, but PendingLoads isn't flushed if there's no stores or other operations that need to be ordered. So the basic block ends without connecting the chain from the constrained intrinsic to anything. This makes the node subject to deletion.

I guess you're right -- the current behavior seems correct only for fpexcept.maytrap but not fpexcept.strict nodes. The latter need to be part of both getRoot and getControlRoot. I'm not sure whether simply adding the nodes to both PendingLoads and PendingExports will work though, since then we might get duplicate chain links ... we may need a seperate list for the FP nodes. I'll have a look.

I guess you're right -- the current behavior seems correct only for fpexcept.maytrap but not fpexcept.strict nodes. The latter need to be part of both getRoot and getControlRoot. I'm not sure whether simply adding the nodes to both PendingLoads and PendingExports will work though, since then we might get duplicate chain links ... we may need a seperate list for the FP nodes. I'll have a look.

See D72341.