This is an archive of the discontinued LLVM Phabricator instance.

Avoid double-deleting successfully folded materialization ops
AcceptedPublic

Authored by j2kun on Aug 16 2023, 12:02 PM.

Details

Summary

Fixes https://github.com/llvm/llvm-project/issues/64665. The included
test case reproduces the stack trace when the if guard is removed.

I would be open to any advice about a better way to fix this issue,
since the if guard seems a little bit hacky (it would be better not to
generate duplicated removal instructions in the first place). From what
I can tell, the call to legalizeWithFold is where this originates, and
it calls ConversionPatternRewriter::replaceOpWithNewOp which inserts an
additional instruction to remove the materialization op after the loop
over unresolvedMaterializations already removes it.

Diff Detail

Event Timeline

j2kun created this revision.Aug 16 2023, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2023, 12:02 PM
j2kun requested review of this revision.Aug 16 2023, 12:02 PM

Not at all an export in the dialect conversion, so mostly looked at the test code and added some reviewers, but the approach looks sensible to me.

mlir/test/IR/test-dialect-conversion-folded-materialization.mlir
6–7

No need to write into a temporary-file, you can just pipe the result of mlir-opt into FileCheck

19–23

Can this be further minimized? Looking at the issue it seems to me the bug already occurs when having a single test.fall_asleep lead into a call.
Would something like:

mlir
func.func private @example(!test.dozing<i32>)
...
%0 = test.fall_asleep : () -> !test.dozing<i32>
func.call @example(%0) : (!test.dozing<i32>) -> ()
return

This would make the test.awaken and associated code redundant.

mlir/test/lib/Dialect/Test/TestPatterns.cpp
1715

Convention is to not use the mlir:: namespace anywhere in cpp files

1723–1726

Redundant optimization for a test case

1728–1729

This can just be rewriter.replaceOp(op, adaptor.getInput())

1766

Spell out the auto to Operation*

mlir/test/lib/Dialect/Test/TestTypeDefs.td
380–385

Don't need builders here.
You can also just use let assemblyFormat = "< struct(params) >"; for the assembly format.

j2kun updated this revision to Diff 550865.Aug 16 2023, 1:32 PM
j2kun marked 3 inline comments as done.

Make changes to zero9178's comments

j2kun marked an inline comment as not done.Aug 16 2023, 1:39 PM
j2kun added inline comments.
mlir/test/IR/test-dialect-conversion-folded-materialization.mlir
6–7

In the case that the test fails due to a stack trace in mlir-opt, the reported error is that the input given to FileCheck is an empty module. This way, lit reports the error produced by mlir-opt directly. Happy to change it if there are strong feelings to do so, though.

19–23

Having a single function argument does not reproduce the error, because it removes the need to add an unrealized_conversion_cast op for the second argument.

I described this in some detail in https://github.com/llvm/llvm-project/issues/64665#issue-1849041630 though the type/op names are different, specific to my out-of-tree project.

I was able to remove the awaken op and still reproduce the error.

mlir/test/lib/Dialect/Test/TestPatterns.cpp
1728–1729

Shockingly, switching these two lines to a single replaceOp causes the stack trace to no longer reproduce when I remove the if guard in DialectConversion.cpp. I.e., the following changes causes the test to pass

diff --git a/mlir/lib/Transforms/Utils/DialectConversion.cpp b/mlir/lib/Transforms/Utils/DialectConversion.cpp
index 905fdf326f61..67ba6c58ebb0 100644
--- a/mlir/lib/Transforms/Utils/DialectConversion.cpp
+++ b/mlir/lib/Transforms/Utils/DialectConversion.cpp
@@ -1059,8 +1059,8 @@ void ConversionPatternRewriterImpl::applyRewrites() {
     // In some cases, such as when a materialization is removed via folding,
     // the materialization op will already have a replacement, and we need
     // to avoid a double-drop.
-    if (replacements.count(mat.getOp()))
-      continue;
+    // if (replacements.count(mat.getOp()))
+    //   continue;
     mat.getOp()->dropAllUses();
     mat.getOp()->erase();
   }
diff --git a/mlir/test/lib/Dialect/Test/TestPatterns.cpp b/mlir/test/lib/Dialect/Test/TestPatterns.cpp
index 0596265add3d..607911484d9c 100644
--- a/mlir/test/lib/Dialect/Test/TestPatterns.cpp
+++ b/mlir/test/lib/Dialect/Test/TestPatterns.cpp
@@ -1720,8 +1720,10 @@ struct ConvertFallAsleep : public OpConversionPattern<FallAsleepOp> {
   LogicalResult
   matchAndRewrite(FallAsleepOp op, OpAdaptor adaptor,
                   ConversionPatternRewriter &rewriter) const override {
-    rewriter.replaceAllUsesWith(op.getResult(), adaptor.getInput());
-    rewriter.eraseOp(op);
+    // rewriter.replaceAllUsesWith(op.getResult(), adaptor.getInput());
+    // rewriter.eraseOp(op);
+
+    rewriter.replaceOp(op, adaptor.getInput());
     return success();
   }
 };

Now I'm even less sure my fix makes sense, but shouldn't these two options (replaceOp vs replaceAllUsesWith + eraseOp) be interchangeable?

rkayaith added inline comments.
mlir/test/lib/Dialect/Test/TestPatterns.cpp
1728–1729

only replaceOp ends up calling ConversionPatternRewriterImpl::notifyOpReplaced:

j2kun added inline comments.Aug 17 2023, 10:04 AM
mlir/test/lib/Dialect/Test/TestPatterns.cpp
1728–1729

I see, but I'm still unclear as to why that would result in this failure (double deleting a materialization op) instead of, say, failing to delete the dialect-specific converted op, nor what the right approach would be to fix it (override replaceAllUsesWith? not sure how it would modify the conversion's internal state...)

Did we get to the end of this issue?

j2kun added a comment.Sep 7 2023, 11:36 PM

Did we get to the end of this issue?

It depends on whether it's intended behavior for ConversionPatternRewriter::replaceAllUsesWith and ConversionPatternRewriter::eraseOp to be unusable in conversion passes. I imagine those methods should be overridden somehow, but I'm not sure what the right behavior is.

It depends on whether it's intended behavior for ConversionPatternRewriter::replaceAllUsesWith and ConversionPatternRewriter::eraseOp to be unusable in conversion passes

I'm confused, aren't these widely used? At least eraseOp is pretty standard here and I see quite a large number of hit in the conversion passes in-tree (replaceAllUsesWith maybe less common because replaceOpWithNewOp).

j2kun added a comment.Sep 9 2023, 10:01 AM

It depends on whether it's intended behavior for ConversionPatternRewriter::replaceAllUsesWith and ConversionPatternRewriter::eraseOp to be unusable in conversion passes

I'm confused, aren't these widely used? At least eraseOp is pretty standard here and I see quite a large number of hit in the conversion passes in-tree (replaceAllUsesWith maybe less common because replaceOpWithNewOp).

All I can say for sure is that when I use them together in this (very simple) conversion pass, I get a hard crash. It seems like maybe replaceAllUsesWith should be using a notification hook (as per rkayaith's comment, it doesn't call notifyOpReplaced), or else maybe the check to avoid a double-delete is OK, design wise? I'm not sure what the right fix is, but happy to implement it with some guidance.

springerm added a comment.EditedTue, Nov 14, 7:27 PM

Not an expert on dialect conversion. This is part of the internal state of the dialect conversion:

/// Ordered map of requested operation replacements.
llvm::MapVector<Operation *, OpReplacement> replacements;

replacements is populated by replaceOp (via notification) but not by replaceAllUsesWith + eraseOp. I don't why exactly replacements is needed. But the fact that two API constructs that are supposed to be equivalent (replaceOp and replaceAllUsesWith+eraseOp) are handled differently looks like a problem to me.

Maybe we should be tracking something like value-value replacements instead of op replacements? Presumably we have replacements so that the dialect conversion can roll back the changes (?)

Another thing that stood out to me: This is documentation of OpReplacement:

/// This class represents one requested operation replacement via 'replaceOp' or
/// 'eraseOp`.
struct OpReplacement {
...
};

I think we are not appending to replacements in case of eraseOp. Maybe that's the missing part?

edit: nvm, replaceOp is called from eraseOp...

Mogball accepted this revision.Tue, Nov 28, 11:09 AM
Mogball added a subscriber: Mogball.

The fix seems fine to me

mlir/test/lib/Dialect/Test/TestPatterns.cpp
1738–1739
1762

drop trivial braces

This revision is now accepted and ready to land.Tue, Nov 28, 11:09 AM