# [MLIR] Add a utility to sort the operands of commutative opsNeeds ReviewPublicActions

Authored by srishti-pm on May 1 2022, 10:31 PM.

# Details

Reviewers
 nicolasvasilache mehdi_amini rriddle aartbik Chia-hungDuan bondhugula vinayaka-polymage stephenneuendorffer shauheen antiagainst arpith-jacob mgester liufengdb Mogball
Summary

Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative.

The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function.

The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order.

The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand.

Now, the "AncestorKey" associated with:

1. A block argument is {type: BLOCK_ARGUMENT, opName: ""}.
2. A non-constant-like op, for example, arith.addi, is {type: NON_CONSTANT_OP, opName: "arith.addi"}.
3. A constant-like op, for example, arith.constant, is {type: CONSTANT_OP, opName: ""}.

So, if an operand, say A, was produced as follows:

````<block argument>`  `<block argument>`
\          /
\        /
`arith.subi`           `arith.constant`
\            /
|
returns `A````

Then, the block arguments and operations present in the backward slice of A, in the breadth-first order are:
arith.addi, arith.subi, arith.constant, <block argument>, and <block argument>.

Thus, the "key" associated with operand A is:
{{type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: CONSTANT_OP, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}}.

Now, if "keyA" is the key associated with operand A and "keyB" is the key associated with operand B, then:

1. "keyA" == "keyB" iff: Both these keys, each of which is a list, have the same size and both the elements in each pair of corresponding elements among them is the same.
2. "keyA" < "keyB" iff: 2.1. In the first unequal pair of corresponding elements among them, "keyA"'s element is smaller, or 2.2. Both the elements in every pair of corresponding elements are the same in both keys and the size of "keyA" is smaller.

Now, "AncestorKey1" is considered smaller than "AncestorKey2" iff:

1. The type of "AncestorKey1" is BLOCK_ARGUMENT and that of "AncestorKey2" isn't,
2. The type of "AncestorKey1" is NON_CONSTANT_OP and that of "AncestorKey2" is CONSTANT_OP, or
3. Both have the same type and the opName of "AncestorKey1" is alphabetically smaller than that of "AncestorKey2".

Some examples of such a sorting:

Assume that the sorting is being applied to foo.commutative, which is a commutative op.

Example 1:

%1 = foo.const 0
%2 = foo.mul <block argument>, <block argument>
%3 = foo.commutative %1, %2

Here,

1. The key associated with %1 is {{CONSTANT_OP, ""}}, and,
2. The key associated with %2 is {{NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""}}.

Thus, the key of %2 < key of %1, and so, the sorted foo.commutative will look like:

%3 = foo.commutative %2, %1

Note that in this example, it wasn't necessary to get the entire backward slice of operand %2 to decide that it has the smaller key. Just the comparision of {{CONSTANT_OP, ""}} and {{NON_CONSTANT_OP, "foo.mul"}} is enough to determine this. Thus, this sorting utility does not store the entire backward slice of an operand at once. It seeks the slice in a breadth-first fashion, one at a time, to create an operand "key" but stops when the relative ordering of that operand can be determined.

So, here,

1. The key computed for %1 is {{CONSTANT_OP, ""}}, and,
2. The key computed for %2 is {{NON_CONSTANT_OP, "foo.mul"}} (instead of {{NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""}}).

Example 2:

%1 = foo.const 0
%2 = foo.mul <block argument>, <block argument>
%3 = foo.mul %2, %1
%5 = foo.commutative %1, %2, %3, %4

Here,

1. The key associated with %1 is {{CONSTANT_OP, ""}},
2. The key associated with %2 is {{NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""}},
3. The key associated with %3 is {{NON_CONSTANT_OP, "foo.mul"}, {NON_CONSTANT_OP, "foo.mul"}, {CONSTANT_OP, ""}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""}}, and,
4. The key associated with %4 is {{NON_CONSTANT_OP, "foo.add"}, {NON_CONSTANT_OP, "foo.mul"}, {CONSTANT_OP, ""}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""}}.

And,

1. The key computed for %1 is {{CONSTANT_OP, ""}},
2. The key computed for %2 is {{NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""}},
3. The key computed for %3 is {{NON_CONSTANT_OP, "foo.mul"}, {NON_CONSTANT_OP, "foo.mul"}}, and,
4. The key computed for %4 is {{NON_CONSTANT_OP, "foo.add"}}.

Note that the BFS's for operands %1 and %4 stop after one iteration while those for operands %2 and %3 stop after the second iteration.

And, the sorted foo.commutative will look like:

%5 = foo.commutative %4, %3, %2, %1

Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com>

# Diff Detail

### Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

My "why?" question is about canonicalization: could this be a canonicalization and if so why / why not? This is an important thing to answer before looking into the discussion below actually:

I think, yes, it can be a canonicalization. I don't see why not.

So far you explained "what is canonicalization and why are we canonicalizing", the same rationale applies to "push constant to the right" that we do already in canonicalization, and this is exactly why I asked before whether we could do what you're doing as a canonicalization.

You are right. And the answer is yes, we could do this as a canonicalization. In fact, it is a canonicalization because we are converting various forms of an IR to a specific canonicalized form. It just hasn't been added to any op's canonicalizer here.

So: I understand that the producers should be sorted for a pattern to apply, but our disconnect earlier is that my usual approach to see canonicalization is to process an entire block/region, and as such we don't work on slices but on operation in order until fix-point. I'm a bit concerned about efficiency of your approach, because when integrated in a framework that actually visit the entire block you would recompute subset of the same slice over and over and re-attempt to sort everything multiple times:

I completely agree with your concern. I had the same concern myself (refer to discussion starting from here: https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/12?u=srishtisrivastava). I think we can find a way to call this utility in the canonicalizers of all the commutative ops and remove the recursion (which won't be needed anymore, given the canonicalization happens from top to bottom). What are your views on this?

That was the sense of my question about canonicalization indeed :)

Every time the algorithm would consider a commutative op, that is all the op, it would recurse and try to re-sort the current slice, processing the same ops over and over.

Yes, exactly.

Why? Again your description of the sort is as follow:

I think I need to update my commit summary and revision summary. I had thought that this was an intuitive way of explaining the utility. But, I understand that it is quite misleading.

I need to look at the algorithm in more detail, but I'm not a fan of using a string key. Concatenating strings to make compound keys is not very efficient and potentially brittle. Can you assign unique IDs and use an array of IDs instead?

mlir/include/mlir/Transforms/CommutativityUtils.h
25

Add documentation? Similar to what you have in the patch description.

Mogball added a comment.EditedMay 10 2022, 7:59 AM

On the matter of whether this should be a canonicalization, my concern with this is that if an operation has its own preferred ordering of operands that conflicts with the sort, then this will cause canonicalization to loop infinitely.

It's not actually the canonicalizer pass that moves constants to the right hand size. It's the folder. And it probably shouldn't be the folder that does this. So I'm open to making this part of canonicalization IF the sorted operand order produced by this utility is the canonical order of operands for commutative operations, so that conflicts are not possible.

(1) the operands defined by non-constant-like ops come first, followed by (2) block arguments, and these are followed by (3) the operands defined by constant-like ops.

I would have thought block-arguments would come first as we don't know their values, while non-constant-like ops could be folded at some point and then become constant-like. Meaning, they seem closer to constant than block arguments.

+1 to Mehdi's question about just stable sorting based on based on 4 criteria (3 buckets + ordering within (1)) and then we should be able to avoid all the string mangling too as Jeff asked about.

+1 on all of the other comments, especially related to the use of strings.

On the matter of whether this should be a canonicalization, my concern with this is that if an operation has its own preferred ordering of operands that conflicts with the sort, then this will cause canonicalization to loop infinitely.

It's not actually the canonicalizer pass that moves constants to the right hand size. It's the folder. And it probably shouldn't be the folder that does this. So I'm open to making this part of canonicalization IF the sorted operand order produced by this utility is the canonical order of operands for commutative operations, so that conflicts are not possible.

We can decide whatever we want the canonical ordering of operands to be for the Commutative trait. We don't have to leave things up to operations if it doesn't make sense.

Improving the documentation of the functionality of this utility to make it more clear.

srishti-pm edited the summary of this revision. (Show Details)May 10 2022, 2:20 PM
srishti-pm edited the summary of this revision. (Show Details)
srishti-pm edited the summary of this revision. (Show Details)May 10 2022, 2:24 PM
srishti-pm edited the summary of this revision. (Show Details)
srishti-pm edited the summary of this revision. (Show Details)

Fixing a minor typo.

srishti-pm marked an inline comment as done.May 10 2022, 2:34 PM

Thanks for improving the doc! Are you moving this to be used in canonicalization next?
I think a first good step would be to make it a pattern and test it with a pass that applies it in the greedy rewriter. I would land this first and then try to enable this in the canonicalized.

Also, have you thought already about how to get rid of string manipulation?

Thanks for improving the doc! Are you moving this to be used in canonicalization next?
I think a first good step would be to make it a pattern and test it with a pass that applies it in the greedy rewriter. I would land this first and then try to enable this in the canonicalized.

Thanks! I hope it is unambiguous and clear now! Yes, I will do the "moving this to be used in canonicalization" now. Thanks for the suggestion. The plan sounds good.

Also, have you thought already about how to get rid of string manipulation?

srishti-pm added a comment.EditedMay 10 2022, 4:19 PM

1. Regarding string manipulation:

I need to look at the algorithm in more detail, but I'm not a fan of using a string key. Concatenating strings to make compound keys is not very efficient and potentially brittle. Can you assign unique IDs and use an array of IDs instead?

Sure, I'm currently brainstorming to come up with a way to do this. But, @Mogball, do you remember our discussion of including the attibutes, etc. also in the unique ID/key (https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/13)? We had decided that this was something that can be added if and when required but probably won't be included in the first revision of this utility. As in, the first revision will contain a TODO comment describing this enhancement. Will we still want to do this when we are using an array of unique IDs? I wish to understand this so that I know if I have to think of an ID which can be easily extended to include the attributes, etc. of an op.

+1 to Mehdi's question about just stable sorting based on based on 4 criteria (3 buckets + ordering within (1)) and then we should be able to avoid all the string mangling too as Jeff asked about.

Actually, that question of Mehdi is now obsolete (I think). It was based on my incorrect documentation of the "sorting rule". I have now corrected the documentation of the rule (in the revision summary, the commit summary, and the code comments). The rule was finalized in a recent RFC (refer to comments starting from here: https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/19). Does this sound okay?

2. Regarding reversing non-constant-like op and block argument order:

I would have thought block-arguments would come first as we don't know their values, while non-constant-like ops could be folded at some point and then become constant-like. Meaning, they seem closer to constant than block arguments.

Sure. This sounds much better actually. I think at some point during the implementation, I had also thought of the same :)
I'll do this. Thanks for the suggestion.

I'm open to iterating in tree. Landing this utility first and then try adding it as a canonicalization SGTM.

The string could be replaced with an array of tuples (yuck) for now. An enum (constant, block arg, everything else) plus the OperationName.

Attributes need to be handled somehow possibly by adding the op's dictionary attribute and using a deep compare.

@Mogball @mehdi_amini @jpienaar, sorry there haven't been any updates from my side here for the past 10 or so days. I had been busy in some other tasks. I have started working on this again.

srishti-pm edited the summary of this revision. (Show Details)Sun, Jun 12, 10:29 PM

I haven't thought too hard about the algorithm itself yet. I'm in the camp of "let's move forward if it works". I have mostly trivial comments.

clang/docs/tools/clang-formatted-files.txt
8451 ↗(On Diff #436274)

I don't think this file needs to be modified.

mlir/include/mlir/Transforms/CommutativityUtils.h
25

Why do all of these need to be exposed publicly? I think this file should only contain SortCommutativeOperands.

35

using BlockArgumentOrOpKey = std::pair<BlockArgumentOrOpType, StringRef>

The default operator< for std::pair should work for you.

141

ArrayRef<OperandBFS *>

162

Pass these both by ArrayRef

175

ArrayRef<OperandBFS *>

187

ArrayRef<OperandBFS *>

195

SmallVectorImpl

202

ArrayRef<OperandBFS *>

212

Please move the body matchAndRewrite into a source file. It only needs Operation *. And then all the structs/enum/utility functions in the header can be moved there as well.

227

memory leak?

243

sortedOperands(numOperands, nullptr);

276
297
mlir/lib/Transforms/Utils/CommutativityUtils.cpp
18

unused?

52

The doc of a public function shouldn't be repeated above the implementation.

mlir/include/mlir/Transforms/CommutativityUtils.h
212

This could stand to be a static_assert

258

Could you not change getIndicesOfUnassigned... to populate two lists of operands and pass these to assignSortedPositionTo instead of using a set to track the indices. You could put the operand index inside OperandBFS to keep track.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
25

This function doesn't seem like it pays for itself -- llvm::any_of?

93

These could all be ArrayRefs since you aren't modifying them

168

drop mlir::

202

This function doesn't seem like it pays for itself.

srishti-pm edited the summary of this revision. (Show Details)Sun, Jun 12, 11:13 PM
srishti-pm edited the summary of this revision. (Show Details)
srishti-pm edited the summary of this revision. (Show Details)Sun, Jun 12, 11:15 PM
srishti-pm edited the summary of this revision. (Show Details)
srishti-pm edited the summary of this revision. (Show Details)Sun, Jun 12, 11:17 PM
srishti-pm edited the summary of this revision. (Show Details)Sun, Jun 12, 11:24 PM

Minor changes.

mlir/include/mlir/Transforms/CommutativityUtils.h
212

Can we make this not a template? This will be a code bloat otherwise.

333

There seems to me to be far too much code in the public head: I can't isolate the public API here, if this is just about a pattern then the usual way is to have a "populate" method.

srishti-pm marked 20 inline comments as done.

srishti-pm edited the summary of this revision. (Show Details)Wed, Jun 15, 11:04 AM

Increasing pattern benefit + minor typo correction.

Right now I'm not yet understanding all of the algorithm (haven't spent enough time on it), but I'm mostly concerned by the runtime cost of this normalization.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
372
378–380

Right now I'm not yet understanding all of the algorithm (haven't spent enough time on it), but I'm mostly concerned by the runtime cost of this normalization.

I understand your concern. I'll go through my implementation once and optimize things that need it.

In principle, I think the algorithm is fine. I'm pretty sure you can rewrite bits of it to get rid of the map/sets. I'm only concerned about handling attributes. (e.g. cmpi slt vs cmpi sgt)

srishti-pm marked 6 inline comments as done.

@mehdi_amini, I have made sure that the algorithm is good in terms of both time and space complexity.

@Mogball, "handling attributes (e.g. cmpi slt vs cmpi sgt)" doesn't seem hard to me. I think this algorithm can be extended with ease to take attributes into account. But, I don't think that it should be a part of this revision (I believe you agree) because it seems like an incremental change which should be added only when the need arises. A new user should first get accustomed to this utility (and how its sorts operands with different backward slices) and then the utility can be extended to differentiate between backward slices containing ops with the same name but different attribute values.

mlir/include/mlir/Transforms/CommutativityUtils.h
35

I have added a constructor to BlockArgumentOrOpKey (now renamed to AncestorKey) and thus I think this comment is obsolete now. Hope that this fine. Adding a constructor made the code look cleaner.

227

Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.

258

I think that doing this might not be a good idea. It will increase the space complexity unnecessarily (OperandBFS is a big structure) and not help much with the time complexity because the sets of the indices are expected to be small. At least the number of indices will be <= the total number of operands and each element in these sets will occupy very less space (size of unsigned).

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
272

Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.

srishti-pm edited the summary of this revision. (Show Details)Tue, Jun 28, 5:28 AM
srishti-pm edited the summary of this revision. (Show Details)
srishti-pm marked 2 inline comments as done.

Fixed memory leak.