This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Conversion/
-
mlir/
-
Conversion/
-
Passes.td
-
lib/Conversion/ReconcileUnrealizedCasts/
-
Conversion/
-
ReconcileUnrealizedCasts/
6/25
ReconcileUnrealizedCasts.cpp
-
test/Conversion/ReconcileUnrealizedCasts/
-
Conversion/
-
ReconcileUnrealizedCasts/
-
reconcile-unrealized-casts-failure.mlir
-
reconcile-unrealized-casts.mlir

Differential D130711

[MLIR] Reconciliation of chains of unrealized casts
ClosedPublic

Authored by mscuttari on Jul 28 2022, 7:50 AM.

Download Raw Diff

Details

Reviewers

ftynse

Commits

rGe90deaf1217d: [MLIR] Reconciliation of chains of unrealized casts

Summary

The reconciliation pass has been improved to introduce the support for chains of casts, thus not limiting anymore the reconciliation to just consider pairs of unrealized casts.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mscuttari created this revision.Jul 28 2022, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 7:50 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 18 others. · View Herald Transcript

mscuttari requested review of this revision.Jul 28 2022, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 7:50 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B178079: Diff 448342.Jul 28 2022, 8:03 AM

This looks broken in multiple ways and unnecessarily complex.

I can see several approaches to this:

modify this to find a DAG of casts starting from the "top" such that all "bottom" casts have the same result types as the "top" has operand types, and intermediate casts have no other users than casts that belong to the DAG, then drop the entire DAG; note that some of the bifurcation tests will have several such DAGs where the user of the "bottom" is another cast.
add a separate pattern that iteratively propagates operands through casts: {A->B, B->C, C->A} becomes {A->B, A->C, C->A} that can be removed by DCE + the current pattern, or even {A->B, A->C, A->A} that can be removed by DCE + folding away the cast to itself.

The second approach looks significantly less complex.

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
38–39	Nit: `llvm::append_range` is slightly less verbose and slightly more efficient (it will reserve space).
40–41	Is this even necessary? Can't we just rely on the infra to delete dead ops?
44	No need to prefix with `mlir::` inside mlir codebase.
49–50	There is no guarantee that the cast has only one input.
53–56	So this will just indiscriminately erase intermediate casts even when we don't know yet if the entire chain can be erased. And if the chain is not, we will have just dropped the casts on the floor without necessarily changing their users, living the IR in the very broken state with dangling pointers. While the passes are allowed to leave the IR in invalid state on failure, I would argue that invalid means "doesn't pass the verifier" not "has corrupted memory". Furthermore, the pattern is used outside this specific pass in the wild.
57	Nit: this is usually called a cycle rather than a loop.
61–63
67	Do not evaluate `.size()` on every iteration: https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop
67	Please add braces to non-trivial loops https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements
69	This will also ignore unrealized casts that are not part of the chain. So if you have a cast in the middle of a chain that is used by two other casts, you will still consider the entire chain dead. I suppose the correct check would be that the cast has only one user, and this user is contained in `ops`. Otherwise, if you really want to check if the entire DAG of casts is dead, you likely need to do so by following use-def chains and checking that each op.
77	Prefer SmallVector https://llvm.org/docs/CodingStandards.html#c-standard-library
87	Please expand auto unless the deduced type is obvious from context, e.g., there is a cast on the RHS.
92	Prefer iteration to recursion when traversing IR, specifically use-def chains - https://mlir.llvm.org/getting_started/DeveloperGuide/.
102–103	I can't fully follow this live/dead reasoning and am not convinced it is at all necessary. What would happen if we always added the root here regardless of it being live? Is it a problem if we clean up a chain even if we know it's dead?

This revision now requires changes to proceed.Jul 29 2022, 5:37 AM

In D130711#3687192, @ftynse wrote:

modify this to find a DAG of casts starting from the "top" such that all "bottom" casts have the same result types as the "top" has operand types, and intermediate casts have no other users than casts that belong to the DAG, then drop the entire DAG; note that some of the bifurcation tests will have several such DAGs where the user of the "bottom" is another cast.

This is the current approach.

add a separate pattern that iteratively propagates operands through casts: {A->B, B->C, C->A} becomes {A->B, A->C, C->A} that can be removed by DCE + the current pattern, or even {A->B, A->C, A->A} that can be removed by DCE + folding away the cast to itself.

Please tell if I'm wrong, but being this a conversion pass the following would happen, the operations are not really modified until the end of the conversion, and thus the values at the beginning of the chain would not be propagated down to the end, but only to the next cast. For example, {A -> B, B -> C, C -> D, D -> A} would become {A -> B, A -> C, B -> D, C -> A}.

mscuttari added inline comments.Jul 29 2022, 8:40 AM

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
40–41	Sure, we can assume that the CSE pass is applied before the casts reconciliation
49–50	The same approach is followed in the UnrealizedCastOp folder though
53–56	Maybe I've misunderstood how to pattern rewriter works, but erasing the intra-chain nodes should not "drop anything on the floor". They would be actually erased only if the conversion succeeds, and this means that the entire chain would have been processed.
69	I don't get this. If a cast in the middle of a chain is used by other casts, then the chain is still possibly valid. Example: A -> B -> C -> A \|-> D -> E leads to two chains: A -> B -> C -> A and A -> B -> D -> E. The first one is valid, while the second one would make the conversion fail. Can you please make an example of what you mean? Maybe "live" is just not the appropriate wording in this case.
102–103	This was for the corner case {A -> A}, but can be removed if we rely on folding

mscuttari added inline comments.Jul 29 2022, 8:42 AM

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
69	Sorry, bad formatting. The D path was meant to start from B. A -> B -> C -> A \|-> D -> E

In D130711#3687637, @mscuttari wrote:

In D130711#3687192, @ftynse wrote:

modify this to find a DAG of casts starting from the "top" such that all "bottom" casts have the same result types as the "top" has operand types, and intermediate casts have no other users than casts that belong to the DAG, then drop the entire DAG; note that some of the bifurcation tests will have several such DAGs where the user of the "bottom" is another cast.

This is the current approach.

I can't follow the code to recover this. Could you please refactor it in a way that makes the concept visible in the code and erases all the ops simultaneously?

add a separate pattern that iteratively propagates operands through casts: {A->B, B->C, C->A} becomes {A->B, A->C, C->A} that can be removed by DCE + the current pattern, or even {A->B, A->C, A->A} that can be removed by DCE + folding away the cast to itself.

Please tell if I'm wrong, but being this a conversion pass the following would happen, the operations are not really modified until the end of the conversion, and thus the values at the beginning of the chain would not be propagated down to the end, but only to the next cast. For example, {A -> B, B -> C, C -> D, D -> A} would become {A -> B, A -> C, B -> D, C -> A}.

I incorrectly assumed that the pass was using a regular rewrite driver because the pattern is _not_ a conversion pattern. In the dialect conversion infrastructure, your interpretation is indeed correct. The pass doesn't have to use it though, but it may be preferable to keep the pattern compatible for users that may be using it.

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
49–50	Then the folder has a bug.
53–56	That assumes the dialect conversion driver (the pass does use it), but this is a plain rewrite pattern which may be used with other drivers that do not necessarily postpone deletion.
69	Maybe the entire chain concept is not the right word/abstraction here.

In D130711#3687785, @ftynse wrote:

I can't follow the code to recover this. Could you please refactor it in a way that makes the concept visible in the code and erases all the ops simultaneously?

They are erased, but not simultaneously. I've been thinking a few days about this but I'm having a hard time figuring how to perform it. In case of "bifurcations", the casts that are shared between two or more chains would be erased multiple times. Using the conversion driver allows to delay the erase execution, and thus allows to perform it exactly once on all the casts that are not the roots (see code).

In D130711#3693018, @mscuttari wrote:

In D130711#3687785, @ftynse wrote:

I can't follow the code to recover this. Could you please refactor it in a way that makes the concept visible in the code and erases all the ops simultaneously?

They are erased, but not simultaneously. I've been thinking a few days about this but I'm having a hard time figuring how to perform it. In case of "bifurcations", the casts that are shared between two or more chains would be erased multiple times. Using the conversion driver allows to delay the erase execution, and thus allows to perform it exactly once on all the casts that are not the roots (see code).

If your pattern is set up to first collect (aka match) all cast ops to erase in, e.g., a DenseSet, and then just go over it and erase them, there shouldn't be a double erase. Note that I am expecting DenseSet to contain a DAG of casts, and the code currently works on chains, where multiple _overlapping_ chains form the DAG, hence the issue with double erasing. Chains aren't the right model IMO.

I am really worried about the current approach relying on the knowledge of how the specific rewriter works internally since there is absolutely no guarantee it will keep working that way. Such a change is unlikely to happen immediately, but when it does, it will be extremely hard to debug the breakage here.

In D130711#3693044, @ftynse wrote:

If your pattern is set up to first collect (aka match) all cast ops to erase in, e.g., a DenseSet, and then just go over it and erase them, there shouldn't be a double erase. Note that I am expecting DenseSet to contain a DAG of casts, and the code currently works on chains, where multiple _overlapping_ chains form the DAG, hence the issue with double erasing. Chains aren't the right model IMO.

I am really worried about the current approach relying on the knowledge of how the specific rewriter works internally since there is absolutely no guarantee it will keep working that way. Such a change is unlikely to happen immediately, but when it does, it will be extremely hard to debug the breakage here.

Wouldn't the rewrite pattern just become an erase of the matched casts?
And wouldn't this require the DenseSet to be populated (and passed to the pattern) before the driver is executed? If someone would like to reuse the rewrite pattern to eliminate the casts, then it would also have to copy the logic to first discover the DAGs.

I understand your worries and I agree with you about them, but as I was saying I am missing how to implement it in a different self-contained way.

In D130711#3693106, @mscuttari wrote:

Wouldn't the rewrite pattern just become an erase of the matched casts?
And wouldn't this require the DenseSet to be populated (and passed to the pattern) before the driver is executed? If someone would like to reuse the rewrite pattern to eliminate the casts, then it would also have to copy the logic to first discover the DAGs.

The pattern would also discover the DAG and populate the DenseSet, that would be its "match" part, and the "rewrite" part is to erase everything and replace the sink nodes of the DAG with operands of its root. Each cast belongs to at most one such DAG under the single-operand cast assumption (which we should check and bail out when it doesn't hold), so we should not be able to accidentally find it by use-def chains after it has been erased.

Switched from chains to DAGs. All the belonging casts are now erased together.

Harbormaster completed remote builds in B178760: Diff 449276.Aug 2 2022, 6:23 AM

Switched from chains to DAGs. All the belonging casts are now erased together.

I have implemented your suggestions. It turned out to be simpler than what I expected: I didn’t know that in case of dialect conversion the driver skips the operation that have been marked as erased (and thus my previous doubts). Thanks for the ideas and the small dive into the different driver opportunities (I knew the conversion driver was not the only one, but I never had the chance to reason about the others and the sharing of patterns).
One note though: I have opted for keeping the dead casts management, for two reasons: the first (a bit meaningless) is its simplicity (just a users.empty() check); the second, more important, is that a CSE pass would lead to other IR changes that the user may not want to perform (for whatever reason). In you feel that we should completely avoid this please let me know.

P.S: sorry for the double patch, I forgot to specify the base commit wrt diff

Harbormaster completed remote builds in B178761: Diff 449277.Aug 2 2022, 7:10 AM

In D130711#3693486, @mscuttari wrote:

I have implemented your suggestions. It turned out to be simpler than what I expected: I didn’t know that in case of dialect conversion the driver skips the operation that have been marked as erased (and thus my previous doubts). Thanks for the ideas and the small dive into the different driver opportunities (I knew the conversion driver was not the only one, but I never had the chance to reason about the others and the sharing of patterns).

I hoped it would be simpler. Indeed, the driver doesn't consider operations marked for erasure. Otherwise, patterns would need to know in which order the driver walks over the IR and we don't want patterns to know about the driver internals.

One note though: I have opted for keeping the dead casts management, for two reasons: the first (a bit meaningless) is its simplicity (just a users.empty() check); the second, more important, is that a CSE pass would lead to other IR changes that the user may not want to perform (for whatever reason). In you feel that we should completely avoid this please let me know.

This is fine.

Please address the two remaining comments and this should be good to go.

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp
49	You can use `SmallVector` instead, it has convenient `push_back`/`pop_back_val`. `std::stack` is a wrapper around `std::deque`, which is less efficient than `std::vector`, which itself is less efficient than `SmallVector`.
68	This is checking the cycle condition inside the lambda for all users, but the condition itself doesn't need the user so it may end up being checked repeatedly. Could you factor it out of the lambda? It may be possible to do a single sweep over users checking that (a) they are all `UnrealizedConversionCastOp` and (b) they cast back to the previous type; then the surrounding conditional will become `if (b \|\| current.getResultTypes() == op.getInputs().getTypes())` and the `isSink` below will be simplified to `users.empty() \|\| a`.

This revision is now accepted and ready to land.Aug 2 2022, 7:31 AM

Performance improvements

Patch updated.
To make things more clear, I’ve renamed the “sink” nodes to “exit” nodes.
Also the iteration on the users for the DAG traversal has been merged so that only one sweep is needed.
For the sake of completeness, I’ve also made the match to fail when there is a mismatch of input and output arguments among the casts (before, it was detected as a live cast but it could be accepted anyway if the types formed a cycle, a behavior that can lead to wrong results). To be honest I can’t imagine a situation where a cast takes multiple operands, but still the corner case is covered for future-proofness. In my opinion the unrealized cast doesn’t even have a reason to take more than one operand, but this is for another patch :)

Format fix

Harbormaster completed remote builds in B178780: Diff 449307.Aug 2 2022, 9:37 AM

Closed by commit rGe90deaf1217d: [MLIR] Reconciliation of chains of unrealized casts (authored by mscuttari, committed by ftynse). · Explain WhyAug 3 2022, 4:57 AM

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rGe90deaf1217d: [MLIR] Reconciliation of chains of unrealized casts.

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.td

5 lines

lib/

Conversion/

ReconcileUnrealizedCasts/

ReconcileUnrealizedCasts.cpp

93 lines

test/

Conversion/

ReconcileUnrealizedCasts/

reconcile-unrealized-casts-failure.mlir

45 lines

reconcile-unrealized-casts.mlir

105 lines

Diff 449633

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 613 Lines • ▼ Show 20 Lines	def ReconcileUnrealizedCasts : Pass<"reconcile-unrealized-casts"> {
let description = [{		let description = [{
Eliminate `unrealized_conversion_cast` operations, commonly introduced by		Eliminate `unrealized_conversion_cast` operations, commonly introduced by
partial dialect conversions, that transitively convert a value to another		partial dialect conversions, that transitively convert a value to another
value of the same type, that is:		value of the same type, that is:

```		```
%0 = "producer.op"() : () -> !type.A		%0 = "producer.op"() : () -> !type.A
%1 = unrealized_conversion_cast %0 : !type.A to !type.B		%1 = unrealized_conversion_cast %0 : !type.A to !type.B
%2 = unrealized_conversion_cast %1 : !type.B to !type.A		%2 = unrealized_conversion_cast %1 : !type.B to !type.C
"consumer.op"(%2) : (!type.A) -> ()		%3 = unrealized_conversion_cast %2 : !type.C to !type.A
		"consumer.op"(%3) : (!type.A) -> ()
```		```

Such situations appear when the consumer operation is converted by one pass		Such situations appear when the consumer operation is converted by one pass
and the producer operation is converted by another pass, each of which		and the producer operation is converted by another pass, each of which
produces an unrealized cast. This pass can be used to clean up the IR.		produces an unrealized cast. This pass can be used to clean up the IR.
}];		}];
let constructor = "mlir::createReconcileUnrealizedCastsPass()";		let constructor = "mlir::createReconcileUnrealizedCastsPass()";
}		}
▲ Show 20 Lines • Show All 328 Lines • Show Last 20 Lines

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp

Show All 11 Lines

#include "mlir/IR/PatternMatch.h" #include "mlir/IR/PatternMatch.h"

#include "mlir/Pass/Pass.h" #include "mlir/Pass/Pass.h"

#include "mlir/Transforms/DialectConversion.h" #include "mlir/Transforms/DialectConversion.h"

using namespace mlir; using namespace mlir;

namespace { namespace {

/// Removes `unrealized_conversion_cast`s whose results are only used by other /// Folds the DAGs of `unrealized_conversion_cast`s that have as exit types

/// `unrealized_conversion_cast`s converting back to the original type. This /// the same as the input ones.

/// pattern is complementary to the folder and can be used to process operations /// For example, the DAGs `A -> B -> C -> B -> A` and `A -> B -> C -> A`

/// starting from the first, i.e. the usual traversal order in dialect /// represent a noop within the IR, and thus the initial input values can be

/// conversion. The folder, on the other hand, can only apply to the last /// propagated.

/// operation in a chain of conversions because it is not expected to walk /// The same does not hold for 'open' chains chains of casts, such as

/// use-def chains. One would need to declare cast ops as dynamically illegal /// `A -> B -> C`. In this last case there is no cycle among the types and thus

/// with a complex condition in order to eliminate them using the folder alone /// the conversion is incomplete. The same hold for 'closed' chains like

/// in the dialect conversion infra. /// `A -> B -> A`, but with the result of type `B` being used by some non-cast

/// operations.

/// Bifurcations (that is when a chain starts in between of another one) are

/// also taken into considerations, and all the above considerations remain

/// valid.

/// Special corner cases such as dead casts or single casts with same input and

/// output types are also covered.

struct UnrealizedConversionCastPassthrough struct UnrealizedConversionCastPassthrough

: public OpRewritePattern<UnrealizedConversionCastOp> { : public OpRewritePattern<UnrealizedConversionCastOp> {

using OpRewritePattern<UnrealizedConversionCastOp>::OpRewritePattern; using OpRewritePattern<UnrealizedConversionCastOp>::OpRewritePattern;

LogicalResult matchAndRewrite(UnrealizedConversionCastOp op, LogicalResult matchAndRewrite(UnrealizedConversionCastOp op,

ftynseUnsubmitted

Not Done

Nit: llvm::append_range is slightly less verbose and slightly more efficient (it will reserve space).

ftynse: Nit: `llvm::append_range` is slightly less verbose and slightly more efficient (it will reserve…

PatternRewriter &rewriter) const override { PatternRewriter &rewriter) const override {

// Match the casts that are _only_ used by other casts, with the overall // The nodes that either are not used by any operation or have at least

ftynseUnsubmitted

Not Done

Is this even necessary? Can't we just rely on the infra to delete dead ops?

ftynse: Is this even necessary? Can't we just rely on the infra to delete dead ops?

mscuttariAuthorUnsubmitted

Done

Sure, we can assume that the CSE pass is applied before the casts reconciliation

mscuttari: Sure, we can assume that the CSE pass is applied before the casts reconciliation

// cast being a trivial noop: A->B->A. // one user that is not an unrealized cast.

auto users = op->getUsers(); DenseSet<UnrealizedConversionCastOp> exitNodes;

if (!llvm::all_of(users, [&](Operation *user) {

ftynseUnsubmitted

Not Done

No need to prefix with mlir:: inside mlir codebase.

ftynse: No need to prefix with `mlir::` inside mlir codebase.

// The nodes whose users are all unrealized casts

DenseSet<UnrealizedConversionCastOp> intermediateNodes;

// Stack used for the depth-first traversal of the use-def DAG.

SmallVector<UnrealizedConversionCastOp, 2> visitStack;

ftynseUnsubmitted

Not Done

You can use SmallVector instead, it has convenient push_back/pop_back_val. std::stack is a wrapper around std::deque, which is less efficient than std::vector, which itself is less efficient than SmallVector.

ftynse: You can use `SmallVector` instead, it has convenient `push_back`/`pop_back_val`. `std::stack`…

visitStack.push_back(op);

ftynseUnsubmitted

Not Done

There is no guarantee that the cast has only one input.

ftynse: There is no guarantee that the cast has only one input.

mscuttariAuthorUnsubmitted

Done

The same approach is followed in the UnrealizedCastOp folder though

mscuttari: The same approach is followed in the UnrealizedCastOp folder though

ftynseUnsubmitted

Not Done

Then the folder has a bug.

ftynse: Then the folder has a bug.

while (!visitStack.empty()) {

UnrealizedConversionCastOp current = visitStack.pop_back_val();

auto users = current->getUsers();

bool isLive = false;

ftynseUnsubmitted

Not Done

So this will just indiscriminately erase intermediate casts even when we don't know yet if the entire chain can be erased. And if the chain is not, we will have just dropped the casts on the floor without necessarily changing their users, living the IR in the very broken state with dangling pointers. While the passes are allowed to leave the IR in invalid state on failure, I would argue that invalid means "doesn't pass the verifier" not "has corrupted memory". Furthermore, the pattern is used outside this specific pass in the wild.

ftynse: So this will just indiscriminately erase intermediate casts even when we don't know yet if the…

mscuttariAuthorUnsubmitted

Done

Maybe I've misunderstood how to pattern rewriter works, but erasing the intra-chain nodes should not "drop anything on the floor". They would be actually erased only if the conversion succeeds, and this means that the entire chain would have been processed.

mscuttari: Maybe I've misunderstood how to pattern rewriter works, but erasing the intra-chain nodes…

ftynseUnsubmitted

Not Done

That assumes the dialect conversion driver (the pass does use it), but this is a plain rewrite pattern which may be used with other drivers that do not necessarily postpone deletion.

ftynse: That assumes the dialect conversion driver (the pass does use it), but this is a plain rewrite…

for (Operation *user : users) {

ftynseUnsubmitted

Not Done

Nit: this is usually called a cycle rather than a loop.

ftynse: Nit: this is usually called a cycle rather than a loop.

if (auto other = dyn_cast<UnrealizedConversionCastOp>(user)) {

if (other.getInputs() != current.getOutputs())

return rewriter.notifyMatchFailure(

op, "mismatching values propagation");

} else {

isLive = true;

ftynseUnsubmitted

Not Done

return front().getInputs().getTypes() == back().getResultTypes();

}

- /// The method returns 'true' if any of the casts composing to the chain is

- /// live within the IR, that is if there is exists an use from an operations

+ /// The method returns 'true' if any of the casts composing the chain are

+ /// live within the IR, that is if there exists a use from an operation

/// that is not another cast (with the exception of the last cast, which is

/// the expected exit point of the chain). Looking from the opposite

ftynse:

}

// Continue traversing the DAG of unrealized casts

if (auto other = dyn_cast<UnrealizedConversionCastOp>(user)) if (auto other = dyn_cast<UnrealizedConversionCastOp>(user))

ftynseUnsubmitted

Not Done

Do not evaluate .size() on every iteration: https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

ftynse: Do not evaluate `.size()` on every iteration: https://llvm.org/docs/CodingStandards.html#don-t…

ftynseUnsubmitted

Not Done

Please add braces to non-trivial loops https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

ftynse: Please add braces to non-trivial loops https://llvm.org/docs/CodingStandards.html#don-t-use…

return other.getResultTypes() == op.getInputs().getTypes() && visitStack.push_back(other);

ftynseUnsubmitted

Not Done

This is checking the cycle condition inside the lambda for all users, but the condition itself doesn't need the user so it may end up being checked repeatedly. Could you factor it out of the lambda? It may be possible to do a single sweep over users checking that (a) they are all UnrealizedConversionCastOp and (b) they cast back to the previous type; then the surrounding conditional will become if (b || current.getResultTypes() == op.getInputs().getTypes()) and the isSink below will be simplified to users.empty() || a.

ftynse: This is checking the cycle condition inside the lambda for all users, but the condition itself…

other.getInputs() == op.getOutputs(); }

ftynseUnsubmitted

Not Done

This will also ignore unrealized casts that are not part of the chain. So if you have a cast in the middle of a chain that is used by two other casts, you will still consider the entire chain dead. I suppose the correct check would be that the cast has only one user, and this user is contained in ops. Otherwise, if you really want to check if the entire DAG of casts is dead, you likely need to do so by following use-def chains and checking that each op.

ftynse: This will also ignore unrealized casts that are not part of the chain. So if you have a cast in…

mscuttariAuthorUnsubmitted

Done

I don't get this. If a cast in the middle of a chain is used by other casts, then the chain is still possibly valid.
Example:
A -> B -> C -> A

|-> D -> E

leads to two chains: A -> B -> C -> A and A -> B -> D -> E. The first one is valid, while the second one would make the conversion fail.
Can you please make an example of what you mean? Maybe "live" is just not the appropriate wording in this case.

mscuttari: I don't get this. If a cast in the middle of a chain is used by other casts, then the chain is…

mscuttariAuthorUnsubmitted

Done

Sorry, bad formatting. The D path was meant to start from B.

A -> B -> C -> A
        |-> D -> E

mscuttari: Sorry, bad formatting. The D path was meant to start from B. ``` A -> B -> C -> A |->…

ftynseUnsubmitted

Not Done

Maybe the entire chain concept is not the right word/abstraction here.

ftynse: Maybe the entire chain concept is not the right word/abstraction here.

return false;

})) { // If the cast is live, then we need to check if the results of the last

return rewriter.notifyMatchFailure(op, "live unrealized conversion cast"); // cast have the same type of the root inputs. It this is the case (e.g.

// `{A -> B, B -> A}`, but also `{A -> A}`), then the cycle is just a

// no-op and the inputs can be forwarded. If it's not (e.g.

// `{A -> B, B -> C}`, `{A -> B}`), then the cast chain is incomplete.

bool isCycle = current.getResultTypes() == op.getInputs().getTypes();

ftynseUnsubmitted

Not Done

Prefer SmallVector https://llvm.org/docs/CodingStandards.html#c-standard-library

ftynse: Prefer SmallVector https://llvm.org/docs/CodingStandards.html#c-standard-library

if (isLive && !isCycle)

return rewriter.notifyMatchFailure(op,

"live unrealized conversion cast");

bool isExitNode = users.empty() || isLive;

if (isExitNode) {

exitNodes.insert(current);

} else {

ftynseUnsubmitted

Not Done

Please expand auto unless the deduced type is obvious from context, e.g., there is a cast on the RHS.

ftynse: Please expand auto unless the deduced type is obvious from context, e.g., there is a cast on…

intermediateNodes.insert(current);

}

} }

for (Operation *user : users) // Replace the sink nodes with the root input values

ftynseUnsubmitted

Not Done

Prefer iteration to recursion when traversing IR, specifically use-def chains - https://mlir.llvm.org/getting_started/DeveloperGuide/.

ftynse: Prefer iteration to recursion when traversing IR, specifically use-def chains - https://mlir.

rewriter.replaceOp(user, op.getInputs()); for (UnrealizedConversionCastOp exitNode : exitNodes)

rewriter.replaceOp(exitNode, op.getInputs());

// Erase all the other casts belonging to the DAG

for (UnrealizedConversionCastOp castOp : intermediateNodes)

rewriter.eraseOp(castOp);

rewriter.eraseOp(op);

return success(); return success();

} }

}; };

ftynseUnsubmitted

Not Done

I can't fully follow this live/dead reasoning and am not convinced it is at all necessary. What would happen if we always added the root here regardless of it being live? Is it a problem if we clean up a chain even if we know it's dead?

ftynse: I can't fully follow this live/dead reasoning and am not convinced it is at all necessary. What…

mscuttariAuthorUnsubmitted

Done

This was for the corner case {A -> A}, but can be removed if we rely on folding

mscuttari: This was for the corner case {A -> A}, but can be removed if we rely on folding

/// Pass to simplify and eliminate unrealized conversion casts. /// Pass to simplify and eliminate unrealized conversion casts.

struct ReconcileUnrealizedCasts struct ReconcileUnrealizedCasts

: public ReconcileUnrealizedCastsBase<ReconcileUnrealizedCasts> { : public ReconcileUnrealizedCastsBase<ReconcileUnrealizedCasts> {

ReconcileUnrealizedCasts() = default; ReconcileUnrealizedCasts() = default;

void runOnOperation() override { void runOnOperation() override {

RewritePatternSet patterns(&getContext()); RewritePatternSet patterns(&getContext());

populateReconcileUnrealizedCastsPatterns(patterns); populateReconcileUnrealizedCastsPatterns(patterns);

Show All 18 Lines

mlir/test/Conversion/ReconcileUnrealizedCasts/reconcile-unrealized-casts-failure.mlir

This file was added.

				// RUN: not mlir-opt %s -split-input-file -mlir-print-ir-after-failure -reconcile-unrealized-casts 2>&1 \| FileCheck %s

				// CHECK-LABEL: @liveSingleCast
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i32
				// CHECK: %[[liveCast:.*]] = builtin.unrealized_conversion_cast %[[arg0]] : i64 to i32
				// CHECK: return %[[liveCast]] : i32

				func.func @liveSingleCast(%arg0: i64) -> i32 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				return %0 : i32
				}

				// -----

				// CHECK-LABEL: @liveChain
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i32
				// CHECK: %[[cast0:.*]] = builtin.unrealized_conversion_cast %[[arg0]] : i64 to i1
				// CHECK: %[[cast1:.*]] = builtin.unrealized_conversion_cast %[[cast0]] : i1 to i32
				// CHECK: return %[[cast1]] : i32

				func.func @liveChain(%arg0: i64) -> i32 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i1
				%1 = builtin.unrealized_conversion_cast %0 : i1 to i32
				return %1 : i32
				}

				// -----

				// CHECK-LABEL: @liveBifurcation
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: %[[cast0:.*]] = builtin.unrealized_conversion_cast %[[arg0]] : i64 to i32
				// CHECK: %[[cast1:.*]] = builtin.unrealized_conversion_cast %[[cast0]] : i32 to i64
				// CHECK: %[[cast2:.*]] = builtin.unrealized_conversion_cast %[[cast0]] : i32 to i1
				// CHECK: %[[extsi:.*]] = arith.extsi %[[cast2]] : i1 to i64
				// CHECK: %[[result:.*]] = arith.addi %[[cast1]], %[[extsi]] : i64
				// CHECK: return %[[result]] : i64

				func.func @liveBifurcation(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i64
				%2 = builtin.unrealized_conversion_cast %0 : i32 to i1
				%3 = arith.extsi %2 : i1 to i64
				%4 = arith.addi %1, %3 : i64
				return %4 : i64
				}

mlir/test/Conversion/ReconcileUnrealizedCasts/reconcile-unrealized-casts.mlir

This file was added.

				// RUN: mlir-opt %s -split-input-file -reconcile-unrealized-casts \| FileCheck %s

				// CHECK-LABEL: @unusedCast
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @unusedCast(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				return %arg0 : i64
				}

				// -----

				// CHECK-LABEL: @sameTypes
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @sameTypes(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i64
				return %0 : i64
				}

				// -----

				// CHECK-LABEL: @pair
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @pair(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i64
				return %1 : i64
				}

				// -----

				// CHECK-LABEL: @symmetricChain
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @symmetricChain(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i1
				%2 = builtin.unrealized_conversion_cast %1 : i1 to i32
				%3 = builtin.unrealized_conversion_cast %2 : i32 to i64
				return %3 : i64
				}

				// -----

				// CHECK-LABEL: @asymmetricChain
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @asymmetricChain(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i1
				%2 = builtin.unrealized_conversion_cast %1 : i1 to i64
				return %2 : i64
				}

				// -----

				// CHECK-LABEL: @unusedChain
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: return %[[arg0]] : i64

				func.func @unusedChain(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i1
				return %arg0 : i64
				}

				// -----

				// CHECK-LABEL: @bifurcation
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: %[[result:.*]] = arith.addi %[[arg0]], %[[arg0]] : i64
				// CHECK: return %[[result]] : i64

				func.func @bifurcation(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i1
				%2 = builtin.unrealized_conversion_cast %1 : i1 to i64
				%3 = builtin.unrealized_conversion_cast %1 : i1 to i32
				%4 = builtin.unrealized_conversion_cast %3 : i32 to i64
				%5 = arith.addi %2, %4 : i64
				return %5 : i64
				}

				// -----

				// CHECK-LABEL: @unusedBifurcation
				// CHECK-SAME: (%[[arg0:.*]]: i64) -> i64
				// CHECK: %[[result:.*]] = arith.addi %[[arg0]], %[[arg0]] : i64
				// CHECK: return %[[result]] : i64

				func.func @unusedBifurcation(%arg0: i64) -> i64 {
				%0 = builtin.unrealized_conversion_cast %arg0 : i64 to i32
				%1 = builtin.unrealized_conversion_cast %0 : i32 to i1
				%2 = builtin.unrealized_conversion_cast %1 : i1 to i64
				%3 = builtin.unrealized_conversion_cast %0 : i32 to i64
				%4 = arith.addi %arg0, %3 : i64
				return %4 : i64
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Reconciliation of chains of unrealized castsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 449633

mlir/include/mlir/Conversion/Passes.td

mlir/lib/Conversion/ReconcileUnrealizedCasts/ReconcileUnrealizedCasts.cpp

mlir/test/Conversion/ReconcileUnrealizedCasts/reconcile-unrealized-casts-failure.mlir

mlir/test/Conversion/ReconcileUnrealizedCasts/reconcile-unrealized-casts.mlir

[MLIR] Reconciliation of chains of unrealized casts
ClosedPublic