This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Signal CanonicalizerPass failure due to non-convergence
AbandonedPublic

Authored by springerm on Dec 21 2022, 7:17 AM.

Details

Summary

This change also makes the CanonicalizerPass signal a pass failure if the greedy pattern rewrite did not converge. Previously, the LogicalResult of applyPatternsAndFoldGreedily was ignored.

Diff Detail

Event Timeline

springerm created this revision.Dec 21 2022, 7:17 AM
springerm requested review of this revision.Dec 21 2022, 7:17 AM

Note: Multiple tests (e.g., GPU/canonicalize.mlir) are currently failing due to non-convergence. These are not yet fixed in this change.

This change also makes the CanonicalizerPass signal a pass failure if the greedy pattern rewrite did not converge

I don't think this is OK. This will interrupt the pipeline for something that is not a compilation failure.

mehdi_amini requested changes to this revision.Dec 21 2022, 8:53 AM
This revision now requires changes to proceed.Dec 21 2022, 8:53 AM

This change also makes the CanonicalizerPass signal a pass failure if the greedy pattern rewrite did not converge

I don't think this is OK. This will interrupt the pipeline for something that is not a compilation failure.

Can you elaborate a bit? Canonicalization is a fixpoint iteration of patterns and a failure of applyPatternsAndFoldGreedily indicates that no fixpoint was reached. Isn’t that always a compiler bug? (e.g., bug in a pattern) Would it be better to assert success?

If we don’t check the error state, we can run into hard-to-debug situations where a pattern is registered as a canonicalization but silently not being applied.

This change also makes the CanonicalizerPass signal a pass failure if the greedy pattern rewrite did not converge

I don't think this is OK. This will interrupt the pipeline for something that is not a compilation failure.

Can you elaborate a bit? Canonicalization is a fixpoint iteration of patterns and a failure of applyPatternsAndFoldGreedily indicates that no fixpoint was reached. Isn’t that always a compiler bug? (e.g., bug in a pattern) Would it be better to assert success?

Canonicalization is a best-effort based fix-point algorithm. There is no guarantee it will always be reached I believe.
For example, we could fail to reach a fixed point because the input IR would trigger an edge case where some more iterations would be needed to reach fixed point. We'd be shipping a compiler where all of our tests are passing but some input program could trigger a compiler abort.

If we don’t check the error state, we can run into hard-to-debug situations where a pattern is registered as a canonicalization but silently not being applied.

Canonicalization shouldn't be a requirement for correctness I guess?

For example, we could fail to reach a fixed point because the input IR would trigger an edge case where some more iterations would be needed to reach fixed point. We'd be shipping a compiler where all of our tests are passing but some input program could trigger a compiler abort.

Note: Canonicalizations must always converge. (https://mlir.llvm.org/docs/Canonicalization/)

Repeated applications of patterns should converge. Unstable or cyclic rewrites will cause infinite loops in the canonicalizer.

Canonicalization is a best-effort based fix-point algorithm. There is no guarantee it will always be reached I believe.

This would mean that the IR is not guaranteed to be in canonical form after running the canonicalizer. Which raises the question: What's the purpose of the canonicalizer pass?

Option<"maxIterations", "max-iterations", "int64_t",
       /*default=*/"10",
       "Max. iterations between applying patterns / simplifying regions">,

This default value also looks suspicious to me. I suspect that this default value was just copied GreedyRewriteConfig and was not a deliberate decision.

For example, we could fail to reach a fixed point because the input IR would trigger an edge case where some more iterations would be needed to reach fixed point. We'd be shipping a compiler where all of our tests are passing but some input program could trigger a compiler abort.

Note: Canonicalizations must always converge. (https://mlir.llvm.org/docs/Canonicalization/)

Repeated applications of patterns should converge. Unstable or cyclic rewrites will cause infinite loops in the canonicalizer.

This comment is about the bug you hit: that is the inner loop could be infinite with badly written patterns. We were not defending against this.
That said we added an outer loop because convergence of the inner loop does not imply that we reached a fix point.

Canonicalization is a best-effort based fix-point algorithm. There is no guarantee it will always be reached I believe.

This would mean that the IR is not guaranteed to be in canonical form after running the canonicalizer. Which raises the question: What's the purpose of the canonicalizer pass?

I don't think see this as a problem: again "best effort", and while we won't reach fixed-point in edge cases and a good pipeline should be resilient to this, the intent is that we should hit the fix point quickly in the vast majority of cases.

If we step back: the whole point of canonicalization is to make analysis/transform simpler, if some pieces of IR aren't perfectly canonical then we won't generate perfect code but that should be "fine" (in that a user won't see a compiler crashing, just a missed optimization).

Option<"maxIterations", "max-iterations", "int64_t",
       /*default=*/"10",
       "Max. iterations between applying patterns / simplifying regions">,

This default value also looks suspicious to me. I suspect that this default value was just copied GreedyRewriteConfig and was not a deliberate decision.

Why is this suspicious? Note that this was not "just copied", the greedy rewriter was developed alongside canonicalization, this whole thing was added to prevent infinite loops in the canonicalized.

If we step back: the whole point of canonicalization is to make analysis/transform simpler, if some pieces of IR aren't perfectly canonical then we won't generate perfect code but that should be "fine" (in that a user won't see a compiler crashing, just a missed optimization).

This sounds like canonicalization is optional (but can lead to more efficient lowering) and every pipeline should work correctly without canonicalization.

In practice, we often use the canonicalizer to enable other transforms/passes. Some transforms/passes only apply to the "canonical form". This would be incorrect, as the IR could stop lowering at that point.

If we think this further: Should we have a debug flag to replace the canonicalizer with a "no operation"? This could be used as a fuzzer to build resilient pass pipelines.

Note: (This maybe a somewhat construed example.) We cannot even reliably write test cases for the canonicalization patterns at the moment. There is no guarantee that -canonicalize will canonicalize the IR in Dialect/Vector/canonicalize.mlir. It could just return the unmodified input IR and that would be correct.

If we think this further: Should we have a debug flag to replace the canonicalizer with a "no operation"? This could be used as a fuzzer to build resilient pass pipelines.

I would be a useful robustness test for pipelines indeed.

Note: (This maybe a somewhat construed example.) We cannot even reliably write test cases for the canonicalization patterns at the moment. There is no guarantee that -canonicalize will canonicalize the IR in Dialect/Vector/canonicalize.mlir. It could just return the unmodified input IR and that would be correct.

I'm not sure why? We control the IR input here (its size and content), it isn't arbitrary so we can clearly consider it a bug if we can't apply a limited set of patterns on a controlled small IR.

split revision

springerm retitled this revision from [mlir] Limit number of pattern rewrites in CanonicalizerPass to [mlir] Signal CanonicalizerPass failure due to non-convergence.Dec 21 2022, 10:45 PM
springerm edited the summary of this revision. (Show Details)

I wrote up the findings of our discussion in an RFC (https://discourse.llvm.org/t/rfc-canonicalizerpass-convergence-error-handling/67333), where we can decide whether to update the CanonicalizerPass or the MLIR docs.

springerm abandoned this revision.Dec 23 2022, 4:10 AM