This is an archive of the discontinued LLVM Phabricator instance.

[IR] Add a Location to BlockArgument.
ClosedPublic

Authored by lattner on May 15 2021, 10:18 PM.

Details

Summary

This adds the ability to specify a location when creating BlockArguments.
Notably Value::getLoc() will return this correctly, which makes diagnostics
more precise (e.g. the example in test-legalize-type-conversion.mlir).

This is currently optional to avoid breaking any existing code - if
absent, the BlockArgument defaults to using the location of its enclosing
operation (preserving existing behavior).

The bulk of this change is plumbing location tracking through the parser
and printer to make sure it can round trip (in -mlir-print-debuginfo
mode). This is complete for generic operations, but requires manual
adoption for custom ops.

I added support for function-like ops to round trip their argument
locations - they print correctly, but when parsing the locations are
dropped on the floor. I intend to fix this, but it will require more
invasive plumbing through "function_like_impl" stuff so I think it
best to split it out to its own patch.

Diff Detail

Event Timeline

lattner created this revision.May 15 2021, 10:18 PM
lattner requested review of this revision.May 15 2021, 10:18 PM
lattner updated this revision to Diff 345670.May 15 2021, 10:19 PM

[IR] Expose Block arguments through Builders.h

rriddle added inline comments.May 17 2021, 6:25 PM
mlir/lib/IR/Block.cpp
171–173

nit: Use llvm::zip here.

181

This comment seems a little weird. The only situation where there isn't a parent operation is if this block is unlinked.

182

nit: Spell out auto here.

mlir/lib/Parser/Parser.cpp
255

nit: Drop the llvm:: here.

rriddle added a comment.EditedMay 17 2021, 6:29 PM

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

Do you plan to resolve the TODOs in this patch or a followup?

(Code in this patch looks fine)

mlir/lib/IR/FunctionImplementation.cpp
63–67

Can we just resolve this now? Or do you plan on doing this in a followup?

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

I have the same feeling here: an array of locations (lazily initialized) on the block itself would be less costly when unused.

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

I have the same feeling here: an array of locations (lazily initialized) on the block itself would be less costly when unused.

Since the block argument is sort of at the same level as a defining op, having the Location on it would have been the "normal" thing. So I feel it's fine to first add this and then optimize it with indirection as suggested above later.

Side question: @rriddle did you mean "hundreds or thousands" or "hundreds of thousands" above?

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

No I don't, however, I expect this to be insignificant - there are many fewer arguments than ops, attributes, and types in the primary use cases that I'm aware of. It doesn't make sense to have an argument without a use of it after all.

More generally, this patch is completely in line with MLIRs emphasis on good location tracking, and is a gaping hole in the story. We need support for this on non-entry block arguments for parity with LLVM's tracking - this is the equivalent of being able to put a location on llvm::PHINode. We also know that this causes known "error messages getting emitted to line zero" issues that Uday points out.

This is something we always planned to do, just never got around to.

Also, keep in mind that all block arguments coming from a .mlir file have a location. While other frontends may not be specifying them much yet, I expect that to change progressively over time.

lattner marked 3 inline comments as done.May 18 2021, 8:57 AM
lattner added inline comments.
mlir/lib/IR/Block.cpp
181

Good point, I just pulled it forward without rethinking it. Fixed.

mlir/lib/IR/FunctionImplementation.cpp
63–67

I plan to do it as a follow up as mentioned in the commit message above. This will require some more invasive work on function_like_impl stuff and I'd rather hill climb instead of make a single huge patch.

lattner updated this revision to Diff 346187.May 18 2021, 9:02 AM
lattner marked an inline comment as done.

Incorporate several improvements River pointed out, thanks!

mehdi_amini accepted this revision.May 18 2021, 9:33 AM

Since the block argument is sort of at the same level as a defining op, having the Location on it would have been the "normal" thing. So I feel it's fine to first add this and then optimize it with indirection as suggested above later.

Another way to see the analogy is that an operation defines multiple SSA values, but the operation holds a location: we don't have a location for each of the SSA values individually, and the location is owned by the entity (Operation) that encapsulate the results.

More generally, this patch is completely in line with MLIRs emphasis on good location tracking, and is a gaping hole in the story.

Yes we agree that this is a needed feature, but that seems orthogonal to "how do we store it".
Also the "emphasis on good location tracking" yielded APIs that forces the location to be passed in, while this makes it optional. I imagine this is just too hard to require a location for the block arguments at this point?

Also, keep in mind that all block arguments coming from a .mlir file have a location.

This does not seem like a given to me: custom parser/printer don't have to provide one for example. Would the block argument produced by the scf.for parse automatically get a location? Func op? Right now it isn't clear to me that most .mlir file would get locations on their block argument if they don't have a CFG / aren't in generic form.

No I don't, however, I expect this to be insignificant - there are many fewer arguments than ops, attributes, and types in the primary use cases that I'm aware of. It doesn't make sense to have an argument without a use of it after all.

That's convincing to me, so LGTM on this patch.

This revision is now accepted and ready to land.May 18 2021, 9:33 AM

This does not seem like a given to me: custom parser/printer don't have to provide one for example. Would the block argument produced by the scf.for parse automatically get a location? Func op? Right now it isn't clear to me that most .mlir file would get locations on their block argument if they don't have a CFG / aren't in generic form.

Yes, fair point. I mean block arguments inside the function. My next patch in this series will fix std.func and other FunctionLike ops so we'll get good coverage for entry arguments as well.

Thank you both for the review!

This revision was landed with ongoing or failed builds.May 18 2021, 10:18 AM
This revision was automatically updated to reflect the committed changes.

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

I have the same feeling here: an array of locations (lazily initialized) on the block itself would be less costly when unused.

Since the block argument is sort of at the same level as a defining op, having the Location on it would have been the "normal" thing. So I feel it's fine to first add this and then optimize it with indirection as suggested above later.

Side question: @rriddle did you mean "hundreds or thousands" or "hundreds of thousands" above?

Do you have an estimation on memory usage growth here? Seems like a lot of existing situations would end up with unknown locations on every argument, which is a 8-byte memory size hit (per argument, in which there have been some cases in TensorFlow with hundreds of thousands of arguments) for an unused feature. For example, how often would this be used for non-entry blocks? Should we instead have a more sparse representation? e.g. a map in the parent block or something.

I have the same feeling here: an array of locations (lazily initialized) on the block itself would be less costly when unused.

Since the block argument is sort of at the same level as a defining op, having the Location on it would have been the "normal" thing. So I feel it's fine to first add this and then optimize it with indirection as suggested above later.

Side question: @rriddle did you mean "hundreds or thousands" or "hundreds of thousands" above?

No, I really meant hundreds of thousands. Some are close to a million.

mlir/lib/IR/AsmPrinter.cpp
489

You need to visit the block argument locations, otherwise we don't initialize an alias for them. Take a look at void print(Operation *op) for an example. I assume we need to do that here, and when printing the block if printEntryBlockArgs is true.

Noting here that I am now seeing multiple timeouts for internal TensorFlow benchmarks after this change. Compile time seems to have more than quadrupled. Looking into it now, but may request a revert.

Feel free to revert if this is a culprit!

I can't imagine that, but if so that's clearly the right way to go, thanks!

Feel free to revert if this is a culprit!

I can't imagine that, but if so that's clearly the right way to go, thanks!

Thanks. One of the benchmarks has about 750k arguments, so it may just be hitting a degenerate case during parsing/printing.

Thanks. One of the benchmarks has about 750k arguments, so it may just be hitting a degenerate case during parsing/printing.

Whoa. Yes, I could see that exacerbating an N^2 algorithm somewhere. If you email me an mlir file that shows the issue I'd be happy to fix. Reverting is also totally fine of course.

Thanks. One of the benchmarks has about 750k arguments, so it may just be hitting a degenerate case during parsing/printing.

Whoa. Yes, I could see that exacerbating an N^2 algorithm somewhere. If you email me an mlir file that shows the issue I'd be happy to fix. Reverting is also totally fine of course.

Can confirm after this patch just parsing the benchmark file took ~4700 seconds (1 hour and 18 minutes), with pretty much all of that being spent in Lexer::getEncodedSourceLocation. I sent out a minimal fix in D102734, which brings that time down to 8 seconds.

Thank you so much for working on this River!

Thank you so much for working on this River!

I think we'll still need to revert though. Seems like there is a bug/mismatch between parsing/printing. A repro is:

func @foo(%arg0: tensor<2xi32> loc(#loc0)) {
  return
}
#loc0 = loc(unknown)

This is getting printed, but isn't able to parse back in:

/tmp/test.mlir:2:44: error: expected location instance
  func @foo(%arg0: tensor<2xi32> loc(#loc0)) {

Thank you so much for working on this River!

I think we'll still need to revert though. Seems like there is a bug/mismatch between parsing/printing. A repro is:

func @foo(%arg0: tensor<2xi32> loc(#loc0)) {
  return
}
#loc0 = loc(unknown)

This is getting printed, but isn't able to parse back in:

/tmp/test.mlir:2:44: error: expected location instance
  func @foo(%arg0: tensor<2xi32> loc(#loc0)) {

I want to say that there is some interaction going on between location aliases that are deferred (i.e. for the parent operation) and those that can't be deferred (currently for this block argument). This seems to happen when the argument inherits its parent location.

rsmith added a subscriber: rsmith.May 18 2021, 7:26 PM

I think we'll still need to revert though.

Reverted for now in rG80d981eda69f1ada6d944ed89571456cad13b850.

Thank you for reverting this @rsmith . This is going to be very tricky to fix and may need some larger scale refactoring.

This PR isn't really required by any of my current projects, it was a "let's make MLIR better in a way that has long annoyed me" sort of thing. I'll investigate when I have time this weekend or next week. Thanks!

Thank you for reverting this @rsmith . This is going to be very tricky to fix and may need some larger scale refactoring.

This PR isn't really required by any of my current projects, it was a "let's make MLIR better in a way that has long annoyed me" sort of thing. I'll investigate when I have time this weekend or next week. Thanks!

I can also look into the alias issue if you'd like.

This patch + the minimal fix for the issue identified above was reapplied in https://reviews.llvm.org/D102991. I'll continue working on improved support for this, but I'd love some help with the existing bug: https://bugs.llvm.org/show_bug.cgi?id=50451