This is an archive of the discontinued LLVM Phabricator instance.

Improve diagnostic when emitting operations with regions
ClosedPublic

Authored by bondhugula on Aug 25 2022, 4:33 AM.

Details

Summary

This has a broad impact on diagnostics that attach an operation. Ops
with one or more regions will now be printed on a new line. It was
confusing and hard to read with a trailing first line for ops
with regions.

Before:

<unknown>:0: note: see current operation: affine.for %arg3 = 0 to 8192 {
  affine.for %arg4 = 0 to 8192 step 512 {
    affine.for %arg5 = 0 to 8192 step 128 {

    ...

After:

<unknown>:0: note: see current operation:
affine.for %arg3 = 0 to 8192 {
  affine.for %arg4 = 0 to 8192 step 512 {
    affine.for %arg5 = 0 to 8192 step 128 {
      ...

Diff Detail

Event Timeline

bondhugula created this revision.Aug 25 2022, 4:33 AM
bondhugula requested review of this revision.Aug 25 2022, 4:33 AM
bondhugula added a comment.EditedAug 25 2022, 4:34 AM

I'm not 100% sure about this -- whether we should be consistent in the format and do this for all ops or for none (current approach). But this chnage is a readability improvement for ops with regions.

That seems fine with me, but I'm curious what others have to say about it?

The goal of this makes sense to me. That said, would it be possible to key this off of "whether the printed form of an op has \n's in it"?

I think that is the actual predicate that matters here. It is possible to have ops with regions that have custom printing hooks (e.g. if the region has an op to reduce and prints the op inline when it is simple and singular), and it is possible to have ops with embedded \n's in their printed form. Keying off "contains a \n" seems more stable/correct to me.

The goal of this makes sense to me. That said, would it be possible to key this off of "whether the printed form of an op has \n's in it"?

I think that is the actual predicate that matters here. It is possible to have ops with regions that have custom printing hooks (e.g. if the region has an op to reduce and prints the op inline when it is simple and singular), and it is possible to have ops with embedded \n's in their printed form. Keying off "contains a \n" seems more stable/correct to me.

+1 here. I would key off of the generated output, instead of assuming based on the number of regions.

bondhugula added a comment.EditedAug 25 2022, 5:52 PM

The goal of this makes sense to me. That said, would it be possible to key this off of "whether the printed form of an op has \n's in it"?

I think that is the actual predicate that matters here. It is possible to have ops with regions that have custom printing hooks (e.g. if the region has an op to reduce and prints the op inline when it is simple and singular), and it is possible to have ops with embedded \n's in their printed form. Keying off "contains a \n" seems more stable/correct to me.

+1 here. I would key off of the generated output, instead of assuming based on the number of regions.

Wouldn't checking the printed form for a new line (for a region holding op in particular) be undesirable (too much unnecessary work for too little)? Is it possible to do this cleanly?

I used regions as a proxy here -- that every op with a region will almost always have a new line and that every op without a region will almost always be printed on a single line. I haven't myself seen a single case deviating from this in the tree or in an open-source downstream project. In fact, printing on a single line for region-holding ops hurts readability (you are likely to mix the external attribute list with the region since braces are at play in both), and printing on multiple lines for a non-region op is similarly confusing.

The change is really addressing the overwhelmingly common case with near-zero overhead.

The goal of this makes sense to me. That said, would it be possible to key this off of "whether the printed form of an op has \n's in it"?

I think that is the actual predicate that matters here. It is possible to have ops with regions that have custom printing hooks (e.g. if the region has an op to reduce and prints the op inline when it is simple and singular), and it is possible to have ops with embedded \n's in their printed form. Keying off "contains a \n" seems more stable/correct to me.

+1 here. I would key off of the generated output, instead of assuming based on the number of regions.

Wouldn't checking the printed form (for a region holding op in particular) be undesirable (too much unnecessary work for too little)? Is it possible to do this cleanly?

I used regions as a proxy here -- that every op with a region will almost always have a new line and that every op without a region will almost always be printed on a single line. I haven't myself seen a single case deviating from this in the tree or in an open-source downstream project. In fact, printing on a single line for region-holding ops hurts readability (you are likely to mix the external attribute list with the region since braces are at play in both), and printing on multiple lines for a non-region op is similarly confusing.

The change is really addressing the overwhelmingly common case with near-zero overhead.

Not sure what you mean, just checking the number of regions has no indication. You'd have to also check that they aren't empty for that to be a reasonable heuristic (e.g. every "external" function has one region, and is printed on a single line).

Not sure what you mean, just checking the number of regions has no indication. You'd have to also check that they aren't empty for that to be a reasonable heuristic (e.g. every "external" function has one region, and is printed on a single line).

Thanks for spotting this. I should at least check for empty regions.

Check for non-empty regions; add test case for the single line case as well.

As mentioned above, this isn't enough and is just wrong. Some ops may print empty regions as {} on different lines.

Checking the printing looks trivial, just change Diagnostic::appendOp (which is already printing to a string before emitting the op) to add a \n when os.str() contains a \n. Likely two line patch.

It is slightly more than two lines because we don't have a std::string Operation::toString(OpPrintingFlags &flags); method:

std::string str;
llvm::raw_string_ostream os(str);
op.print(os, adjustPrintingFlags(flags, severity));

What about adding such toString method though?

We actually just added a free function helper in TensorFlow for all MLIR object that can be streamed to llvm::raw_ostream: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/mlir/utils/to_string.h#L28-L39
(it works with Type, Attr, Operation, ... but does not take OpPrintingFlags, maybe an overload would do?)

All that code already exists in Diagnostic::appendOp, we need incremental ~+2 more lines on what is in main.

Check for new line in the op's printed form.

It is slightly more than two lines because we don't have a std::string Operation::toString(OpPrintingFlags &flags); method:

std::string str;
llvm::raw_string_ostream os(str);
op.print(os, adjustPrintingFlags(flags, severity));

What about adding such toString method though?

We actually just added a free function helper in TensorFlow for all MLIR object that can be streamed to llvm::raw_ostream: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/mlir/utils/to_string.h#L28-L39
(it works with Type, Attr, Operation, ... but does not take OpPrintingFlags, maybe an overload would do?)

Thank you for the comments. It looks useful to have a toString anyway; I've gone ahead and added that while on this and made appendOp use it.

Checking the printing looks trivial, just change Diagnostic::appendOp (which is already printing to a string before emitting the op) to add a \n when os.str() contains a \n. Likely two line patch.

By putting the logic in appendOp, one won't know whether to print a trailing white space at the end of "see current operation: ". But I've gone ahead with this.

rriddle added inline comments.Aug 28 2022, 8:06 PM
mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

I'm not really a fan of this method, can you remove it from this diff? There is already something similar in https://github.com/llvm/llvm-project/blob/1a1c59f9958609438c13af73444061d85caa749c/mlir/include/mlir/Support/DebugStringHelper.h#L28 which could just be changed to invoke "print()" and have an optional set of additional arguments to that.

mlir/lib/IR/Diagnostics.cpp
150

Why the const? Can you remove that?

bondhugula marked an inline comment as done.

toString -> str

It is slightly more than two lines because we don't have a std::string Operation::toString(OpPrintingFlags &flags); method:

std::string str;
llvm::raw_string_ostream os(str);
op.print(os, adjustPrintingFlags(flags, severity));

What about adding such toString method though?

Should this be named str() to be consistent with Diagnostic::str() and other similar things?

mlir/lib/IR/Diagnostics.cpp
150

It's to indicate that its contents will not be changed - trivial here, though.

bondhugula updated this revision to Diff 457778.EditedSep 3 2022, 2:54 AM
bondhugula marked an inline comment as done.

Update check in DebugString.

mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

Updated this check.

rriddle added inline comments.Sep 3 2022, 3:14 AM
mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

I should be more clear, I'm not a fan of having a str method on operation.

rriddle added inline comments.Sep 3 2022, 3:15 AM
mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

For clarity, print methods are extremely common and I'd rather have something generic that wraps the boilerplate of calling print than to duplicate it for every potential class.

bondhugula added inline comments.Sep 7 2022, 2:21 AM
mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

I see, thanks. I went ahead and added that since @mehdi_amini felt we should have one. I can take this new method out of this revision. Having something generic would be useful (for Operation, Type, Attribute, etc.).

mehdi_amini added inline comments.Sep 7 2022, 4:43 AM
mlir/include/mlir/IR/Operation.h
672 ↗(On Diff #456231)

Right, it's a matter of taste, but we have a dump method that makes it "convenient" to call print with stderr, so having a toString() seemed like on the same scale of convenience :)

(that said I thought of this having forgotten that we already have the free function debugString available in mlir/include/mlir/Support/DebugStringHelper.h)

fwiw, the reason that LLVM conventionally has the ->dump() methods is for use in debuggers. Modern debuggers still aren't able to resolve call thing->print(llvm::errs()) super reliably.

I agree that it would be useful to have a "get as string" convenience next to these things generally. Please name it "getAsString()" though to follow the general naming conventions which prefer "get". LLVM is wildly inconsistent across subprojects, but MLIR is relatively good about following this convention.

Thanks for the reminder about debuggers, I had forgotten this motivation )

I agree that it would be useful to have a "get as string" convenience next to these things generally. Please name it "getAsString()" though to follow the general naming conventions which prefer "get". LLVM is wildly inconsistent across subprojects, but MLIR is relatively good about following this convention.

I'm not really in favor of adding a getAsString to every class with a print, especially given that in some contexts it's going to be confusing. I'm just imagining the confusion of Attr::getAsString, given we have a StringAttr and getAsString isn't going to give you the value of the StringAttr, it's going to print to a string. I'd rather name this more explicitly as printToString/printAsString/dumpAsString or something, but even then I'd be really sad if we have to carbon copy this same 4 lines of boilerplate all over the code base.

Yes, that's a fair point, I didn't think about StringAttr. This discussion is pretty far afield from the title on this PR - can we move it to its own issue?

bondhugula updated this revision to Diff 458639.Sep 7 2022, 9:38 PM

Drop the str() accessor.

Yes, that's a fair point, I didn't think about StringAttr. This discussion is pretty far afield from the title on this PR - can we move it to its own issue?

Done.

Right, it's a matter of taste, but we have a dump method that makes it "convenient" to call print with stderr, so having a toString() seemed like on the same scale of convenience :)

That's precisely how I had thought about it too.

rriddle accepted this revision.Sep 7 2022, 9:41 PM
This revision is now accepted and ready to land.Sep 7 2022, 9:41 PM
bondhugula updated this revision to Diff 458640.Sep 7 2022, 9:43 PM

Add missed change.

bondhugula updated this revision to Diff 458641.Sep 7 2022, 9:45 PM

NFC tweak.

Thank you for improving this Uday! (and also tolerating the long and winding side discussion) :-) :-)

lattner accepted this revision.Sep 7 2022, 10:42 PM
bondhugula updated this revision to Diff 458677.Sep 8 2022, 1:12 AM

Fix oversights; rebase. check-mlir passes.

This revision was landed with ongoing or failed builds.Sep 8 2022, 1:18 AM
This revision was automatically updated to reflect the committed changes.