This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add partial lowering of shape.cstr_broadcastable.
ClosedPublic

Authored by tpopp on Oct 13 2020, 8:57 AM.

Details

Summary

Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.

Diff Detail

Event Timeline

tpopp created this revision.Oct 13 2020, 8:57 AM
tpopp requested review of this revision.Oct 13 2020, 8:57 AM
tpopp added a comment.Oct 13 2020, 8:59 AM

Please let me know what you think of this. I think lowering away the "broadcast" logic and keeping the "cstr" logic in ShapeToStd fits with the purpose of the pass, but that is debatable.

This is a very interesting direction! I was thinking of something similar. Let's discuss this at our shape meeting on Thursday.

If we land this, we should remove the similar code from LowerShapeConstraints.

We discussed this and I think agreed to support the lowering via RequireOp. We have to find one place for these patterns, though.

Do we want to refactor the patterns in LowerShapeConstraints?

silvas added a comment.EditedOct 19 2020, 8:09 PM

The result of our discussion was to do

(a) assume(cstr_broadcastable(x, y)) =>
(b) assume(cstr_requires(is_broadcastable(x,y), “cannot broadcast”) =>
(c) assume(cstr_requires(scf.for( … ), “cannot broadcast”))

I think that we can reformulate LowerShapeConstraints to be (a)->(b).

Then we have ShapeToStd implement is_broadcastable->scf.for ((b)->(c)). Not sure if there is too much value in doing (b)->(c) in the absence of other shape-to-std patterns (Thoughts?).

We won't necessarily need to have a pattern that lowers cstr_requires upstream, though I think that having a standalone pass that lowers it to std.assert would be ok to have but not required.

tpopp updated this revision to Diff 301659.Oct 29 2020, 9:40 AM

Rework to lower to IsBroadcastable which is further lowered.

tpopp updated this revision to Diff 301662.Oct 29 2020, 9:45 AM

Reformat each block of code in own namespace.

Harbormaster completed remote builds in B76936: Diff 301662.
silvas added inline comments.Oct 29 2020, 3:09 PM
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
22 ↗(On Diff #301662)

I think that this decomposition is useful independently. Can we have "decomposeConstraints" pass / populateDecomposeConstraintsPatterns or something?

silvas requested changes to this revision.Oct 29 2020, 3:12 PM
silvas added inline comments.
mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
401

This test is really unfortunate, since it is basically a test of the pattern you are introducing in this patch together with the pattern in https://reviews.llvm.org/D90407
In a separate pass, we would avoid this, since we would just observe the cstr_broadcastable -> cstr_require(is_broadcastable, "msg") expansion.

We don't want to test these same 20+ lines of broadcast-to-std more times than we absolutely have to :)

This revision now requires changes to proceed.Oct 29 2020, 3:12 PM
tpopp added inline comments.Oct 30 2020, 3:07 AM
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
22 ↗(On Diff #301662)

What use are you thinking of? My original logic for lowering cstr_broadcastable in this pass was that it fits the intention of this pass in removing special Shape computations. Creating a separate pass would then mean that ShapeToStandard is not actually replacing all Shape computations and that users would have to know to first do a separate lowering. This also gets annoying because we are making more code bloat by having to define/call more passes (with a required order) and more populate functions.

DecomposeConstraints()
ShapeToStandard()
ConstraintsToAsserts()

Also, having a separate pass for decomposing constraints runs the risk of users decomposing early and losing some of the reasoning benefits in these cstr_ operations. If this doesn't belong here, to be safe, I would rather only have the lowering in ConvertShapeConstraints (although note we'd have to rerun ShapeToStandard after that).

If you have some concrete example, I could be convinced. I'm just worried that a composability mindset might, in this case, cause both ShapeToStandard and Cstr*Ops to be left in a state different from their intended purposes.

mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
401

See my comment above. I think we need a good reason to split these in separate passes. A separate pass would require more than 20 lines split across ~4 files which would then be more lines than this test case.

I do agree with the repeated test being unfortunate. I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized. This would be useful for tests where it's nice to have multiple patterns compose for full lowerings, but we also would like to test individual patterns.

silvas added inline comments.Oct 30 2020, 12:05 PM
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
22 ↗(On Diff #301662)

The assumption of having is_broadcastable in the first place is that it is useful to have an intermediate state that only has cstr_require and not other cstr_* ops. So it seems useful to provide that pass in isolation.

As a simple example, CSE should be able to easily deduplicate is_broadcastable, but it cannot easily deduplicate the result of expanding is_broadcastable. Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it. So a user with lots of broadcasting in their program will likely want to run CSE after DecomposeConstraints() but before ShapeToStandard().

Also, we described use cases like cstr_require (is_broadcastable(a, b) || is_broadcastable(b, c), "msg"). We want to dedupe is_broadcastable calls between this predicate and cstr_broadcastable(a, b), which will happen naturally after decomposing constraints.

Because this is just a bunch of patterns, it's easy for a user to call two separate populate* functions in their own pass that combines the two (if they are concerned about compile time performance). However, it's much harder for them to come upstream and separate it into two populate* functions (or better, passes so they don't have to roll their own downstream and setup legality and such) in order to get the CSE benefit I just described.

herhut accepted this revision.Nov 2 2020, 8:34 AM
herhut added inline comments.
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
22 ↗(On Diff #301662)

Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it.

cstr_broadcastable can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one will never abort. cstr_ operations have interesting semantics in that sense (and how we express those semantics is a different story).

The hope was to start with simple constraints and stay with them as far as possible to simplify analysis for now. Having a constraint of the form cstr_require (is_broadcastable(a, b) || is_broadcastable(b, c), "msg") makes analysis hard and I have a hard time imagining an example where such constraints would arise. If you have one, that would be really helpful.

I get the example of CSE where a program has both is_broadcastable and cstr_broadcastable and partial lowering is helpful. Again, I do not see a good example, though. What is very common is to have cstr_broadcastable and broadcast and we might want to look into optimizations there, if it becomes a performance issue down the road.

My instinct would be to keep it in one pass until we see a use case. That gives us one less knob to control. The decision is easy to change later and less knobs might be a good thing for now.

mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
401

I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized.

That would indeed be fairly helpful when testing lowerings that have multiple steps. The tests for scf to standard could benefit from this, as well.

silvas added inline comments.Nov 2 2020, 12:35 PM
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
22 ↗(On Diff #301662)

cstr_broadcastable can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one > will never abort. cstr_ operations have interesting semantics in that sense (and how we express those semantics is a different story).

Yes, but that reasoning is more difficult. I don't think anyone has immediate plans to define the properties needed and extend CSE to do this. We would need something like "idempotent side effect".

And "somebody wants to CSE broadcasts today without proposing a new IR property" seems like a use case that we don't need to wait for IMO.

Given that we can trivially lower it to is_broadcastable and get CSE there (which we want anyways and seems to basically "come for free"), I don't see the advantage of wanting to do any reasoning on the cstr_broadcastable ops. As common as they might be, canonicalizing them immediately to a different form seems valuable (do you see any downside?). E.g. "<= const_integer" comparisons are very common, but LLVM turns all x <= const into x < const+1.

Anyway, feel free to submit this. We should circle back to this aspect of our intended design in our next shape meeting.

silvas accepted this revision.Nov 2 2020, 12:35 PM
This revision is now accepted and ready to land.Nov 2 2020, 12:35 PM
This revision was landed with ongoing or failed builds.Nov 3 2020, 12:57 AM
This revision was automatically updated to reflect the committed changes.