This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Provide a way to print ops in custom form on pass failure
ClosedPublic

Authored by bondhugula on Apr 15 2022, 9:13 PM.

Details

Summary

The generic form of the op is too verbose and in some cases not
readable. On pass failure, ops have been so far printed in generic form
to provide a (stronger) guarantee that the IR print succeeds. However,
in a large number of pass failure cases, the IR is still valid and
the custom printers for the ops will succeed. In fact, readability is
highly desirable post pass failure. This revision provides an option to
print ops in their custom/pretty-printed form on IR failure -- this
option is unsafe and there is no guarantee it will succeed. It's
disabled by default and can be turned on only if needed.

Diff Detail

Event Timeline

bondhugula created this revision.Apr 15 2022, 9:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2022, 9:13 PM
bondhugula requested review of this revision.Apr 15 2022, 9:13 PM
bondhugula retitled this revision from Provide a way to print ops in custom form on pass failure to [MLIR] Provide a way to print ops in custom form on pass failure.Apr 15 2022, 9:17 PM

The cost of getting custom format output from generic output dump is rerunning through opt tool, the cost of crash is rerunning program (which can take multiple hours and take a lot of compute resources). I'd much rather it be uniform and safe and avoid needing flag plumbed through.

bondhugula added a comment.EditedApr 16 2022, 9:37 AM

The cost of getting custom format output from generic output dump is rerunning through opt tool,

You may not be able to rerun through opt tool because you may hit a parse error. However this option will do the trick.

the cost of crash is rerunning program (which can take multiple hours and take a lot of compute resources). I'd much rather it be uniform and safe and avoid needing flag plumbed through.

You'd normally use this flag only after not having used this initially and typcially and having looked at the generic form. So this is strictly better AFAICS. (Use it or lose the readability win -- there are a lot of crash cases that don't take multiple hours and compute resources. A user can make the tradeoff.)

The cost of getting custom format output from generic output dump is rerunning through opt tool, the cost of crash is rerunning program (which can take multiple hours and take a lot of compute resources). I'd much rather it be uniform and safe and avoid needing flag plumbed through.

I have another take on this: this is printed after a pass failed but we don't know that the IR is invalid.
We were printing generically to be conservative, however our printer is now "safe" in that it'll run the verifier and print generically anyway (even with this new option added here...) if the verifier is failing.

So: why do we even need an option here? Why not just remove the OpPrintingFlags().printGenericOpForm() entirely from mlir/lib/Pass/IRPrinting.cpp and be done?

The cost of getting custom format output from generic output dump is rerunning through opt tool,

You may not be able to rerun through opt tool because you may hit a parse error. However this option will do the trick.

You shouldn't be able to get a parse error from the generic syntax: what am I missing here?

The cost of getting custom format output from generic output dump is rerunning through opt tool, the cost of crash is rerunning program (which can take multiple hours and take a lot of compute resources). I'd much rather it be uniform and safe and avoid needing flag plumbed through.

I have another take on this: this is printed after a pass failed but we don't know that the IR is invalid.
We were printing generically to be conservative, however our printer is now "safe" in that it'll run the verifier and print generically anyway (even with this new option added here...) if the verifier is failing.

But this isn't what is desired, i.e., a user may not want a generic print even if the verifier is failing in certain cases. Consider a 3-d loop nest one of whose load accesses is taking an i32 subscript instead of index: it's an invalid IR -- the IR print would be close to unreadable in generic form, but the custom form print will succeed and give you a compact/readable form; the generic form round-trip will fail with a verifier error and again give you a generic form (unless you disable the verifier).

The cost of getting custom format output from generic output dump is rerunning through opt tool,

You may not be able to rerun through opt tool because you may hit a parse error. However this option will do the trick.

You shouldn't be able to get a parse error from the generic syntax: what am I missing here?

Sorry, I meant a "verifier" error.

What is the concern here? I don't think that Uday is arguing this should be on-by-default. He's arguing that some client would find this to be useful, and they may not take hours of compile runs, and may not be working with passes that change operations in ways that prevent them from printing. Core operations like affine for loops are almost illegible when printed in their generic form.

Is someone arguing that adding this does harm to MLIR, or are you just saying that you personally wouldn't use it?

stellaraccident accepted this revision.Apr 16 2022, 10:03 PM

I'm with Chris/Uday on this: when debugging, I've made local changes to similar ends when trying to figure things out. I don't feel this does harm, and generally I'm supportive of no cost debugging flexibility.

This revision is now accepted and ready to land.Apr 16 2022, 10:03 PM
lattner accepted this revision.Apr 16 2022, 10:10 PM

I agree, and I'd go further: I think it is extremely reasonable for one to make the printing hook be resilient to malformed IR, particularly because you might want to ->dump() something in the middle of a transform etc. I would certainly fix my printers if they crashed in practice in the middle of a dump.

Can the ODS generated printers ever crash on malformed input?

mlir/lib/Pass/PassManagerOptions.cpp
57

Is custom-form the right description for this? Maybe mlir-print-custom-assembly-after-failure

I share some of the concerns about the dubious value of this flag, but if enough others think it's useful I won't block (given how limited in exposure this is). I will say, though, that we should really prefix the names of these options with pass- or something, they sound like the are applied
much more generally than they actually are.

I agree, and I'd go further: I think it is extremely reasonable for one to make the printing hook be resilient to malformed IR, particularly because you might want to ->dump() something in the middle of a transform etc. I would certainly fix my printers if they crashed in practice in the middle of a dump.

Can the ODS generated printers ever crash on malformed input?

Hardening custom printers requires a significant amount of duplicated verification, generally to the point of reimplementing the verifier, and is difficult to reason about the validity of (formats generally rely on various assumptions being held, especially for exotic formats). I have strongly preferred in the past
that custom formats should be able to assume a verified op for this specific reason, and this was why support was added for verifying an operation before printing custom form (with fallback to generic form on failure). dump should already use the latter behavior, meaning that it dynamically checks if that operation
is valid before choosing whether to use the custom or generic form (what Mehdi was mentioning above).

Do we have a form of "verify" that doesn't abort? That would really be ideal.

Do we have a form of "verify" that doesn't abort? That would really be ideal.

What do you mean by "abort" here? Can you describe what you are envisioning a bit more?

It would be nice for an op to be able to have a hook that indicates the custom printer will not abort, but does not indicate that all invariants are satisfied. As an example: a loop op could fail verification but pass the abort hook if it was missing a terminator or has mismatched result types.

I expect that complex ops could benefit from this extra level of verification.

mehdi_amini requested changes to this revision.Apr 17 2022, 12:10 AM

But this isn't what is desired, i.e., a user may not want a generic print even if the verifier is failing in certain cases.

Maybe: but is this what's implemented in this patch? As far as I can see you're not touching anything about the logic that is triggered on verifier failures?

Marking "requested changes" because I don't understand the patch right now.

What is the concern here? I don't think that Uday is arguing this should be on-by-default.

Uday doesn't: I am.

This revision now requires changes to proceed.Apr 17 2022, 12:10 AM

What is the concern here? I don't think that Uday is arguing this should be on-by-default.

Uday doesn't: I am.

To expand, my proposal is:

  1. Change the default for a pass failure (which what this patch is about) to print the custom form, that is remove the printGenericOpForm() from mlir/lib/Pass/IRPrinting.cpp) . This has nothing to do with invalid IR: printing the custom form will (by default) run the verifier and on failure fallback anyway to generic IR already right now.
  1. Consider adding a command line flag to enable the OpPrintingFlags().shouldAssumeVerified() when printing the IR to force printing custom even when the verifier fails. The tricky part will be to plumb it appropriately but it should be doable.
bondhugula marked an inline comment as done.

Update cmd line flag name and added test case.

What is the concern here? I don't think that Uday is arguing this should be on-by-default.

Uday doesn't: I am.

To expand, my proposal is:

  1. Change the default for a pass failure (which what this patch is about) to print the custom form, that is remove the printGenericOpForm() from mlir/lib/Pass/IRPrinting.cpp) . This has nothing to do with invalid IR: printing the custom form will (by default) run the verifier and on failure fallback anyway to generic IR already right now.

This doesn't address the goals of this revision. Did you see my previous response? (reproduced below)

"But this isn't what is desired, i.e., a user may not want a generic print even if the verifier is failing in certain cases. Consider a 3-d loop nest one of whose load accesses is taking an i32 subscript instead of index: it's an invalid IR -- the IR print would be close to unreadable in generic form, but the custom form print will succeed and give you a compact/readable form; the generic form round-trip will fail with a verifier error and again give you a generic form (unless you disable the verifier)."

  1. Consider adding a command line flag to enable the OpPrintingFlags().shouldAssumeVerified() when printing the IR to force printing custom even when the verifier fails. The tricky part will be to plumb it appropriately but it should be doable.

If you'd like something else in addition to what this revision does or subsume what it does, please feel free to create a revision -- I'll be happy to review as well.

At this point, I don't see a single valid concern. The only point Jacques had was to round trip it back through mlir-opt with the verifier disabled (in case of invalid IR). However, this is really unwieldy because both pass failure and print-ir-after-failure will emit to stderr and you'll have to find to filter out any number of such failure messages ... -print-ir-after-failure | .. filter pass failure error messages ... | mlir-opt -verify-each=0. What this revision gives you is the custom form when you need it (whether IR is valid or invalid) after a pass failure:

-mlir-print-ir-after-failure mlir-print-custom-assembly-after-failure

I share some of the concerns about the dubious value of this flag,

Could you say which ones? The ones I saw and responded to seemed off-topic and the alternatives suggested weren't really wieldy or addressing the readability issue. In particular, what would be your substitute for:

`... -mlir-print-ir-after-failure mlir-print-custom-assembly-after-failure`

to read the IR in the pretty-printed form after a pass failure where the IR could be invalid?
mlir/lib/Pass/PassManagerOptions.cpp
57

Thanks, this would be better. Updated.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 17 2022, 7:44 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Hey folks, I'm a bit dismayed by how this patch/discussion/revert happened. It seems to me that we are reaching for perfection on a debugging option that could have added utility/mileage towards an ultimate better state. For a lot of such things, the on the ground experience of using something for debugging in varied circumstances is how we find the best outcome. I personally find merit in points that both Chris and Mehdi made for the future but I don't see a need to resolve them or get there in one step.

In any case, merits/opinions aside, this was not a shining example of either the development process or the meta goal of having an open, pragmatic community. Not sure what to do about that but I feel it needs stating. I don't have much of a stake in this issue and if Mehdi/Uday would like to discuss it, I'd be open to facilitating.

mehdi_amini added a comment.EditedApr 17 2022, 12:27 PM

This doesn't address the goals of this revision. Did you see my previous response? (reproduced below)

"But this isn't what is desired, i.e., a user may not want a generic print even if the verifier is failing in certain cases. Consider a 3-d loop nest one of whose load accesses is taking an i32 subscript instead of index: it's an invalid IR -- the IR print would be close to unreadable in generic form, but the custom form print will succeed and give you a compact/readable form; the generic form round-trip will fail with a verifier error and again give you a generic form (unless you disable the verifier)."

I read this, I believe I understand what this is, but I still don't understand how this patch is achieving it, did you see my previous response? (reproduced below):

But this isn't what is desired, i.e., a user may not want a generic print even if the verifier is failing in certain cases.

Maybe: but is this what's implemented in this patch? As far as I can see you're not touching anything about the logic that is triggered on verifier failures?

Can you please as a test case where there is invalid IR produced by a pass?
The test you added with -test-pass-failure does not achieve this. It is likely that -test-pass-invalid-parent would be suitable for this.

Here I implemented part 1) of my proposal: https://reviews.llvm.org/D123915
I believe this subsumes already entirely this patch (I included the test you had in the current revision here): if I'm missing something please provide a test case that is handled by your patch that wouldn't be handled then.

  1. Consider adding a command line flag to enable the OpPrintingFlags().shouldAssumeVerified() when printing the IR to force printing custom even when the verifier fails. The tricky part will be to plumb it appropriately but it should be doable.

If you'd like something else in addition to what this revision does or subsume what it does, please feel free to create a revision -- I'll be happy to review as well.

It seems to me that this is the only way to do what you claim you're trying to achieve though.

At this point, I don't see a single valid concern.

I reverted your patch: please drive the discussion to consensus, I don't see the urgency to step over the current discussion without addressing the points above.

mehdi_amini added a comment.EditedApr 17 2022, 1:00 PM

Added a test case in https://reviews.llvm.org/D123916 to double-check my understanding and illustrate my point. I added a pass that creates an invalid op and has an option to trigger an optional call to "signalPassFailure()".
Below is the trace for all the possible configurations (the only constant is that the IR is invalid) with the current patch applied:

  • The pass signals pass failure, the option -mlir-print-custom-assembly-after-failure is not set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=true})' -mlir-print-ir-after-failure 
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass does not signal pass failure (so the verifier will fail instead), the option -mlir-print-custom-assembly-after-failure is not set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=false})' -mlir-print-ir-after-failure 
<unknown>:0: error: 'test.any_attr_of_i32_str' op requires attribute 'attr'
<unknown>:0: note: see current operation: "test.any_attr_of_i32_str"() : () -> ()
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass signals pass failure, the option -mlir-print-custom-assembly-after-failure is set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=true})' -mlir-print-ir-after-failure  -mlir-print-custom-assembly-after-failure
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass does not signal pass failure (so the verifier will fail instead), the option -mlir-print-custom-assembly-after-failure is set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=false})' -mlir-print-ir-after-failure -mlir-print-custom-assembly-after-failure
<unknown>:0: error: 'test.any_attr_of_i32_str' op requires attribute 'attr'
<unknown>:0: note: see current operation: "test.any_attr_of_i32_str"() : () -> ()
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()

We print generically in all cases: the current patch has no effect when there is invalid IR involved.

Do we have a form of "verify" that doesn't abort? That would really be ideal.

What do you mean by "abort" here? Can you describe what you are envisioning a bit more?

Oh, I forgot how this worked, we apparently already have this:

LogicalResult verify(Operation *op, bool verifyRecursively = true);

We should just make this automatic without a flag. Maybe that is what D123915 is all about, I'll take a look at it, nice!

Added a test case in https://reviews.llvm.org/D123916 to double-check my understanding and illustrate my point. I added a pass that creates an invalid op and has an option to trigger an optional call to "signalPassFailure()".
Below is the trace for all the possible configurations (the only constant is that the IR is invalid) with the current patch applied:

  • The pass signals pass failure, the option -mlir-print-custom-assembly-after-failure is not set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=true})' -mlir-print-ir-after-failure 
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass does not signal pass failure (so the verifier will fail instead), the option -mlir-print-custom-assembly-after-failure is not set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=false})' -mlir-print-ir-after-failure 
<unknown>:0: error: 'test.any_attr_of_i32_str' op requires attribute 'attr'
<unknown>:0: note: see current operation: "test.any_attr_of_i32_str"() : () -> ()
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass signals pass failure, the option -mlir-print-custom-assembly-after-failure is set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=true})' -mlir-print-ir-after-failure  -mlir-print-custom-assembly-after-failure
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()
  • The pass does not signal pass failure (so the verifier will fail instead), the option -mlir-print-custom-assembly-after-failure is set:
$ bin/mlir-opt ../mlir/test/Pass/invalid-ir.mlir -pass-pipeline='func.func(test-pass-create-invalid-ir{signal-pass-failure=false})' -mlir-print-ir-after-failure -mlir-print-custom-assembly-after-failure
<unknown>:0: error: 'test.any_attr_of_i32_str' op requires attribute 'attr'
<unknown>:0: note: see current operation: "test.any_attr_of_i32_str"() : () -> ()
// -----// IR Dump After (anonymous namespace)::TestInvalidIRPass Failed //----- //
"func.func"() ({
  "test.any_attr_of_i32_str"() : () -> ()
  "func.return"() : () -> ()
}) {function_type = () -> (), sym_name = "TestCreateInvalidCallInPass"} : () -> ()

We print generically in all cases: the current patch has no effect when there is invalid IR involved.

I hadn't considered the last two possibilities when there is invalid IR and the printer would anyway generate the generic form. I added this option while preserving the default, which was to print in generic form when the pass failed. Since this has been reverted, let's continue the discussion on https://reviews.llvm.org/D123915 on the best possible end state in terms of custom printing on failures. Thank you again for making a special effort to post in such detail.

Hey folks, I'm a bit dismayed by how this patch/discussion/revert happened. It seems to me that we are reaching for perfection on a debugging option that could have added utility/mileage towards an ultimate better state. For a lot of such things, the on the ground experience of using something for debugging in varied circumstances is how we find the best outcome. I personally find merit in points that both Chris and Mehdi made for the future but I don't see a need to resolve them or get there in one step.

In any case, merits/opinions aside, this was not a shining example of either the development process or the meta goal of having an open, pragmatic community. Not sure what to do about that but I feel it needs stating. I don't have much of a stake in this issue and if Mehdi/Uday would like to discuss it, I'd be open to facilitating.

I agree with Stella here in principle, but I don't see this instance as a major issue to warrant further discussion. A bit subjective but I find it inappropriate to revert a patch right away under such circumstances. The right course of action I feel is to have the revert stacked right below the other revision that is taking us to a strictly better state and commit both of those in one shot. The rationale is that this patch provided strictly better functionality while not changing the default, i.e., you could get a custom assembly form after pass failure when the IR was valid. Of course, a subsequent patch could do it in a better/shorter way.

The revert seemed totally warranted here: I believe the real break of policy here is that it was committed over a clearly stated objection without an attempt to resolve the discussion, this is just not OK.

The revert seemed totally warranted here: I believe the real break of policy here is that it was committed over a clearly stated objection without an attempt to resolve the discussion, this is just not OK.

I'm broadly interpreting https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy (esp. the lines: "As a community, we strongly value having the tip of tree in a good state while allowing rapid iterative development. As such, we tend to make much heavier use of reverts to keep the tree healthy than some other open source projects, and our norms are a bit different."). As I stated, this could be a bit subjective, but I find it inappropriate to revert here without landing the other revision that addresses this -- simply because we are worse off without this additional functionality and this is a small incremental positive step that doesn't change the default. Again, revert is eventually perhaps the right course, but an immediate one without an alternative is crossing the line in the wrong direction and is not ideal from a development process standpoint. Reviewers should also consider the fact that what they consider "clearly stated objection" could be viewed as grounds for further iterative improvement.

You probably should also review https://llvm.org/docs/CodeReview.html (which talks also about revert by the way).

You’re expected to “acknowledge All Reviewer Feedback”, and we favor “ erring on the side of the developer asking for more review prevents any lingering disagreement over code in the tree”.
A revert is warranted when someone raises post-commit a concern and that “ concern would have been significant enough to warrant a conversation during pre-commit review (including around the need for more design discussions)”, this isn’t your notion of “ we are worse off without this additional functionality”, quite the opposite.

But stepping back from this: the core value of the community is, I believe, in spending enough time and energy in building consensus. We can’t always reach an agreement but we still try hard, and hopefully get everyone to agree on some trade off that allows us to move forward.

We usually ask the author to revert (and they are expected to revert promptly, even just for more discussion, as you can see in the link above), but the policy is also written assuming the request comes post-commit…
I actually never saw a patch committed so fast with a clearly stated objection, and the discussion here somehow comforts me in reverting immediately: it seems like asking for a revert would have lead to an argument about whether “we are worse off without this additional functionality” or not.

Reviewers should also consider the fact that what they consider "clearly stated objection" could be viewed as grounds for further iterative improvement.

This is defendable, but this the patch author responsibility to convince the reviewer of such, again trying to build consensus through discussion.
There was nothing incremental in this patch: I mentioned in my pre-commit objection that a one line change to make it the default would subsume this patch.