This patch fixes one remaining problem in matching strict FP operations in the SystemZ back-end: it is currently not possible to have the DAG matcher accept certain complex expressions involving strict FP operations; specifically this affects matching the MDEB(R) etc. family of multiply instructions that implicitly extend their input (32x32 -> 64 bit or 64x64 -> 128 bit).
The matcher currently fails because of two reasons:
- FP intrinsics are too rigidly sequenced due to the way their chains are constructed
- The chain output confuses a hasOneUse check in the matcher
As to 1), SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats each constrained intrinsic like a global barrier (e.g. a function call) and fully serializes all pending chains. This is actually not required; it is allowed for constrained intrinsics to be reordered w.r.t one another or (nonvolatile) memory accesses. The MI-level scheduler already allows for that flexibility, so it makes sense to allow it at the DAG level as well.
This patch therefore changes the way chains for constrained intrisincs are created, and handles them basically like load operations are handled. This has the effect that constrained intrinsics are no longer serialized against one another or (nonvolatile) loads. They are still serialized against stores, but that seems hard to change with the current DAG chain setup, and it also doesn't seem to be a big problem preventing DAG optimizations.
As to 2), even with the above change, I still was unable to match more that two linked DAG nodes in a single pattern (e.g. load -> extend -> multiply). This is because the OPC_CheckFoldableChainNode check requires that each of the intermediate nodes ("extend" in the above example) only has a single use. This check fails for strict operations as the chain output of the extend indeed has another use. However, we don't really need to consider chains here at all, since they will all be rewritten anyway by UpdateChains later. Other parts of the matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no more than one use of any non-chain output value.
With those two changes (and a few SystemZ backend updates), I am able to match all remaining versions of the extended-multiply instruction against constrained intrinsics. This also fixes the last remaining set of test cases that were mixing strict and nonstrict operations in a single function.
The OPC_CheckFoldableChainNode change might theoretically affect code unrelated to strict FP nodes, but at least on SystemZ I could not find any single instance of that happening.
Can we break this loop when NonChainUses is greater than 1? No need to keep scanning.