This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Factor type reconciliation out of Standard-to-LLVM conversion
ClosedPublic

Authored by ftynse on Sep 9 2021, 7:06 AM.

Details

Summary

Conversion to the LLVM dialect is being refactored to be more progressive and
is now performed as a series of independent passes converting different
dialects. These passes may produce unrealized_conversion_cast operations that
represent pending conversions between built-in and LLVM dialect types.
Historically, a more monolithic Standard-to-LLVM conversion pass did not need
these casts as all operations were converted in one shot. Previous refactorings
have led to the requirement of running the Standard-to-LLVM conversion pass to
clean up unrealized_conversion_casts even though the IR had no standard
operations in it. The pass must have been also run the last among all to-LLVM
passes, in contradiction with the partial conversion logic. Additionally, the
way it was set up could produce invalid operations by removing casts between
LLVM and built-in types even when the consumer did not accept the uncasted
type, or could lead to cryptic conversion errors (recursive application of the
rewrite pattern on unrealized_conversion_cast as a means to indicate failure
to eliminate casts).

In fact, the need to eliminate A->B->A unrealized_conversion_casts is not
specific to to-LLVM conversions and can be factored out into a separate type
reconciliation pass, which is achieved in this commit. While the cast operation
itself has a folder pattern, it is insufficient in most conversion passes as
the folder only applies to the second cast. Without complex legality setup in
the conversion target, the conversion infra will either consider the cast
operations valid and not fold them (a separate canonicalization would be
necessary to trigger the folding), or consider the first cast invalid upon
generation and stop with error. The pattern provided by the reconciliation pass
applies to the first cast operation instead. Furthermore, having a separate
pass makes it clear when unrealized_conversion_casts could not have been
eliminated since it is the only reason why this pass can fail.

Diff Detail

Event Timeline

ftynse created this revision.Sep 9 2021, 7:06 AM
ftynse requested review of this revision.Sep 9 2021, 7:06 AM
Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache accepted this revision.Sep 9 2021, 7:28 AM

Thanks!

IMO this is a welcome improvement to "finalize" getting rid of these pesky conversion casts across pass granularities; seems everyone is hitting those issues these days.

This revision is now accepted and ready to land.Sep 9 2021, 7:28 AM
bondhugula added inline comments.
mlir/include/mlir/Conversion/Passes.td
374

If I am understanding this explanation right, why doesn't a folding hook on unrealized_conversion_cast get rid of these when the second pass runs?

ftynse updated this revision to Diff 371596.Sep 9 2021, 7:48 AM

Fix a runaway python test

This revision was landed with ongoing or failed builds.Sep 9 2021, 7:51 AM
This revision was automatically updated to reflect the committed changes.
ftynse marked an inline comment as done.Sep 9 2021, 9:02 AM
ftynse added inline comments.
mlir/include/mlir/Conversion/Passes.td
374

I envision more complex rules here, e.g., A->B->C->A cast chains, just splitting the refactoring part in a separate diff. Also, there is a bunch of passes that do populateXToYPatterns and won't work with the folder alone, they need a pattern that is exercised by this pass.

rriddle added inline comments.Sep 9 2021, 9:49 AM
mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
55

The name here needs to be updated.

bondhugula added inline comments.Sep 9 2021, 6:56 PM
mlir/include/mlir/Conversion/Passes.td
374

But a method to populate this pattern can just be exposed and added to -convert-std-to-llvm instead of adding a -reconcile-unrealized-casts to virtually every codegen pipeline? Honestly, this would appear like an awkwardly named pass, sort of internal detail, being exposed outside.

bondhugula added inline comments.Sep 9 2021, 7:30 PM
mlir/include/mlir/Conversion/Passes.td
374

But you could always fold A -> B -> C to a A -> C. Wouldn't that suffice? Either this or the explanation here needs an update. You are only looking for cyclic chains? The folder runs every iteration on all ops every iteration of the rewrite driver and on every op on the worklist every iteration until the worklist is empty that iteration, as you already know.

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
1

Spell fix: Reconcile

37–40

Trivial braces.

43

Both "live" and "unrealized conversion" can be ambiguous to read here; "leftover unrealized conversion cast"?

bondhugula added inline comments.Sep 9 2021, 7:56 PM
mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
23

I think the traversal order isn't important or needed here for soundness: having a pattern like this in any greedy rewrite driver-driven pass will do.

35–49

This looks like it could be handled by a local folding hook. You can just pull the operands of your operands' unique defining op which would be another unrealized-conversion-cast? You wouldn't need to walk any use lists. If this makes sense, you ultimately don't need this rewrite pattern, nor this whole pass itself and the need to add such a pass to any other pass or pass pipeline. Folding + dead code elimination in the greedy rewrite driver will naturally take care of this. In general, it's also more efficient having things in the folding hook since the folding hook is called earlier -- before any canonicalization patterns on the op itself are called.

36

Better to have a code comment here.

mehdi_amini added inline comments.Sep 9 2021, 8:01 PM
mlir/include/mlir/Conversion/Passes.td
374

I'm also wondering what kind of complex cases make it so that we couldn't fold it? It seems best if the "reconciliation" of cast happens eagerly as much as possible in general.

ftynse marked 11 inline comments as done.Sep 10 2021, 5:10 AM

Thanks for additional reviews, changes applied in 61bc6aa5a723438198ba8526b0469ff11b47c489.

mlir/include/mlir/Conversion/Passes.td
374

But a method to populate this pattern can just be exposed and added to -convert-std-to-llvm.

"convert-std-to-llvm" will be removed/refactored along with "std", we shouldn't add new functionality to either of those. Also, you may notice that the new library does _not_ depend on std, so it doesn't incur a spurious dependency on somebody who wants to run it beyond std-to-llvm.

virtually every codegen pipeline

I had to change 3 pass pipelines in this change, 2 of them in testing tools. If you are referring to RUN lines in integration tests, I would be interested to figure out a better way for those than repeating the pass list every time. This is orthogonal to the current change though.

Honestly, this would appear like an awkwardly named pass, sort of internal detail, being exposed outside.

I find having to run "-convert-std-to-llvm" to remove _builtin_ cast operations injected by the dialect conversion (implementation detail of specific passes) converting _builtin_ types to LLVM dialect types in absence of any standard operation, and having to do so after all passes, more awkward than a pass that clearly indicates what it does.

374

But you could always fold A -> B -> C to a A -> C. Wouldn't that suffice? Either this or the explanation here needs an update. You are only looking for cyclic chains?

We could, but we don't. This is also not in scope of this patch, which does the same as folder (A->B->A) but in different order.

The folder runs every iteration on all ops every iteration of the rewrite driver and on every op on the worklist every iteration until the worklist is empty that iteration, as you already know.

As I do actually know since I wrote the first implementation of the dialect conversion infrastructure and reviewed most of the further changes (around the lines we will be looking at 1, 2), the process is slightly more involved and is not entirely worklist-based unlike the greedy driver that is _not_ used by these conversions. The worklist is only computed for the original state of the IR. When a pattern is applied, the infrastructure keeps track of operations created by the pattern and attempts to legalize them _immediately_ after the pattern and _fails_ if they are not legal - https://github.com/llvm/llvm-project/blob/788e7b3b8c285d63a4ef354608240e1376179c79/mlir/lib/Transforms/Utils/DialectConversion.cpp#L1930.

I'm also wondering what kind of complex cases make it so that we couldn't fold it? It seems best if the "reconciliation" of cast happens eagerly as much as possible in general.

The one I have in mind is !pseudo_c_ir.intptr_t -> index -> i64 on one hand and i64 -> !pseudo_c_ir.intptr_t on another hand. Part of the input IR is lowered to a dialect that operates on index and another parent is lowered to the LLVM dialect directly. I haven't had time to work on this case in the pet project of mine, so I cannot say for certain if this will be necessary. My preference would be to reconcile eagerly though!

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
23

It is important for passes that are not based on the greedy driver. The majority of conversion passes use a different driver.

35–49

As a matter of fact, there is and always has been a folder for the unrealized_conversion_cast. Unfortunately, it is not sufficient as the commit message explains. Conversion passes use the dialect conversion driver rather than the greedy driver. It also calls the folder, but on the _wrong op_. The folder can _only_ be expressed on the second op of the pair because it is not allowed to change other ops than the one being folded. The dialect conversion driver must finalize the first op before proceeding to the second. I haven't benchmarked this but I expect it to be only marginally less efficient because, in most cases, the pattern will either bail out early or find exactly one user that is a backwards cast.

I would allow myself a remark which might come out as snarky or otherwise mean, apologies for that: I would have found the discussion more constructive if we all assumed that folks doing large refactorings, especially in the area where they are de-facto code owners, understand what they are doing sufficiently well and have spent enough effort trying to find the least complex way of changing things.

43

"leftover" doesn't sound right here, this message should explain why the match failed, "leftover" sounds more like final result. Changed to "live unrealized conversion cast". "Live" means used by another operation and already appears in dialect conversion terminology.

55

Good catch, my regexp-fu needs improvement...

jpienaar added inline comments.
mlir/include/mlir/Conversion/Passes.td
374

I would be interested to figure out a better way for those than repeating the pass list every time

To avoid repeating pass pipeline, https://mlir.llvm.org/docs/PassManagement/#pass-pipeline-registration and that would have been useful here (that's at textual level/tool level) with a potential matching C++ "addStdToLLVMPasses" or some such (C API side don't know if we have support for this so this would still have caused the breakage it did).

ftynse marked 6 inline comments as done.Sep 10 2021, 5:39 AM
ftynse added inline comments.Sep 10 2021, 5:43 AM
mlir/include/mlir/Conversion/Passes.td
374

To avoid repeating pass pipeline, https://mlir.llvm.org/docs/PassManagement/#pass-pipeline-registration and that would have been useful here (that's at textual level/tool level) with a potential matching C++ "addStdToLLVMPasses" or some such (C API side don't know if we have support for this so this would still have caused the breakage it did).

I thought about this too and there are two questions:

  • Should we put these registrations in mlir-opt or in a separate testing tool?
  • What passes to include? E.g., there are multiple ways to go from Linalg to LLVM and they are exercised by tests, sometimes in the same file.

I also thought about some lit substitutions that live next to the set of files they apply to, so we just have // RUN: mlir-opt %current-pass-pipeline %s in all tests and only need to change current-pass-pipeline once per directory.

bondhugula added inline comments.Sep 10 2021, 8:46 PM
mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
35–49

Folding is always considered a good thing to do as early as possible by design. This is why the greedy rewrite driver has always applied the folding hook for every op in its iteration irrespective of which patterns are in the list or for which ops. If the dialect conversion infra isn't running the folding hook on all ops, that would be inconsistent with the general thinking and one should consider improving that infra and have the folding hook run so that this naturally happens as part of the rewrite driver. I feel this should be explored first instead of making this change to adding an extra cleanup pass for what should have been handled inside of -convert-std-to-llvm.

Post-commit review is completely fine and authors/maintainers thinking so is obviously fine per the LLVM policy. However, I don't think a change like this one without exploring the right direction first is out of line. I would request a revert here.

bondhugula added inline comments.Sep 10 2021, 9:54 PM
mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
35–49

I would have found the discussion more constructive if we all assumed that folks
doing large refactorings, especially in the area where they are de-facto code owners,
understand what they are doing sufficiently well and have spent enough effort trying
to find the least complex way of changing things

This comment and understanding is pretty irrelevant here because it does not and should not prevent other folks from questioning the approach post-review. This is all constructive and purely technical feedback/questions (here and on the discourse from everyone the way I see it) and de-facto code owners should learn to understand and recognize that the questions do not mean "they are not doing sufficiently well or have not spent enough effort trying to find the least complex way of changing things". LLVM's policy already leans in several cases towards post-commit review and revert if needed based on that same understanding as well as in the interest of allowing more rapid development.

ftynse added inline comments.Sep 11 2021, 3:54 AM
mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
35–49

Folding is always considered a good thing to do as early as possible by design. This is why the greedy rewrite driver has always applied the folding hook for every op in its iteration irrespective of which patterns are in the list or for which ops. If the dialect conversion infra isn't running the folding hook on all ops, that would be inconsistent with the general thinking and one should consider improving that infra and have the folding hook run so that this naturally happens as part of the rewrite driver. I feel this should be explored first instead of making this change to adding an extra cleanup pass for what should have been handled inside of -convert-std-to-llvm.

Well, changes to how the dialect conversion operates is not in scope of this patch, so I fail to see how reverting it would help with anything.

There is another way to reason about this change in context of dialect layering if we drop the assumption that unrealized_conversion_cast is special. It is just an op in the "builtin" dialect. All -convert-X-to-llvm passes are, in fact, -convert-X-to-llvm-and-builtin because they produce operations from llvm and builtin dialects. We need to further lower those builtin operations to the LLVM dialect. This is can be done by an appropriate pass, similarly to any other conversion. This pass is just, perhaps misleadingly, called -reconcile-unrealized-casts whereas it should have been called -convert-builtin-to-llvm. Ironically, this was my initial take on this as River noticed.

Therefore, I propose the following:

  • create a -convert-builtin-to-llvm that removes the cast chains specifically between built-in and LLVM dialect types, and leaves the other casts alone if they are present;
  • turn -reconcile-conversion-casts into a test pass for the pattern + folding and use that as testbed to further improve the conversion infrastructure if necessary.

Would that satisfy everyone?