This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/docs/tools/
-
docs/
-
tools/
1/1
clang-formatted-files.txt
-
mlir/
-
include/mlir/Transforms/
-
mlir/
-
Transforms/
21/21
CommutativityUtils.h
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
CMakeLists.txt
54/54
CommutativityUtils.cpp
-
test/
-
Transforms/
-
test-commutativity-utils.mlir
-
lib/
-
Dialect/Test/
-
Test/
-
TestOps.td
-
Transforms/
-
CMakeLists.txt
-
TestCommutativityUtils.cpp
-
tools/mlir-opt/
-
mlir-opt/
-
mlir-opt.cpp

Differential D124750

[MLIR] Add a utility to sort the operands of commutative ops
ClosedPublic

Authored by srishti-pm on May 1 2022, 10:31 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
mehdi_amini
rriddle
aartbik
Chia-hungDuan
bondhugula
vinayaka-polymage
stephenneuendorffer
shauheen
antiagainst
arpith-jacob
mgester
liufengdb
Mogball

Commits

rGb508c5649f5e: [MLIR] Add a utility to sort the operands of commutative ops

Summary

Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative. This sorting is stable.

The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function.

The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order.

The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand.

Now, the "AncestorKey" associated with:

A block argument is {type: BLOCK_ARGUMENT, opName: ""}.
A non-constant-like op, for example, arith.addi, is {type: NON_CONSTANT_OP, opName: "arith.addi"}.
A constant-like op, for example, arith.constant, is {type: CONSTANT_OP, opName: "arith.constant"}.

So, if an operand, say A, was produced as follows:

`<block argument>`  `<block argument>`
             \          /
              \        /
              `arith.subi`           `arith.constant`
                         \            /
                         `arith.addi`
                                |
                           returns `A`

Then, the block arguments and operations present in the backward slice of A, in the breadth-first order are:
arith.addi, arith.subi, arith.constant, <block argument>, and <block argument>.

Thus, the "key" associated with operand A is:

{
 {type: NON_CONSTANT_OP, opName: "arith.addi"},
 {type: NON_CONSTANT_OP, opName: "arith.subi"},
 {type: CONSTANT_OP, opName: "arith.constant"},
 {type: BLOCK_ARGUMENT, opName: ""},
 {type: BLOCK_ARGUMENT, opName: ""}
}

Now, if "keyA" is the key associated with operand A and "keyB" is the key associated with operand B, then:
"keyA" < "keyB" iff:

In the first unequal pair of corresponding AncestorKeys, the AncestorKey in operand A is smaller, or,
Both the AncestorKeys in every pair are the same and the size of operand A's "key" is smaller.

AncestorKeys of type BLOCK_ARGUMENT are considered the smallest, those of type CONSTANT_OP, the largest, and NON_CONSTANT_OP types come in between. Within the types NON_CONSTANT_OP and CONSTANT_OP, the smaller ones are the ones with smaller op names (lexicographically).

Some examples of such a sorting:

Assume that the sorting is being applied to foo.commutative, which is a commutative op.

Example 1:

%1 = foo.const 0
%2 = foo.mul <block argument>, <block argument>
%3 = foo.commutative %1, %2

Here,

The key associated with %1 is:

{
 {CONSTANT_OP, "foo.const"}
}

The key associated with %2 is:

{
 {NON_CONSTANT_OP, "foo.mul"},
 {BLOCK_ARGUMENT, ""},
 {BLOCK_ARGUMENT, ""}
}

The key of %2 < the key of %1
Thus, the sorted foo.commutative is:

%3 = foo.commutative %2, %1

Example 2:

%1 = foo.const 0
%2 = foo.mul <block argument>, <block argument>
%3 = foo.mul %2, %1
%4 = foo.add %2, %1
%5 = foo.commutative %1, %2, %3, %4

Here,

The key associated with %1 is:

{
 {CONSTANT_OP, "foo.const"}
}

The key associated with %2 is:

{
 {NON_CONSTANT_OP, "foo.mul"},
 {BLOCK_ARGUMENT, ""}
}

The key associated with %3 is:

{
 {NON_CONSTANT_OP, "foo.mul"},
 {NON_CONSTANT_OP, "foo.mul"},
 {CONSTANT_OP, "foo.const"},
 {BLOCK_ARGUMENT, ""},
 {BLOCK_ARGUMENT, ""}
}

The key associated with %4 is:

{
 {NON_CONSTANT_OP, "foo.add"},
 {NON_CONSTANT_OP, "foo.mul"},
 {CONSTANT_OP, "foo.const"},
 {BLOCK_ARGUMENT, ""},
 {BLOCK_ARGUMENT, ""}
}

Thus, the sorted foo.commutative is:

%5 = foo.commutative %4, %3, %2, %1

Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fixing a comment typo and enhancing the commit summary even further.

srishti-pm added reviewers: stephenneuendorffer, shauheen, antiagainst, arpith-jacob, mgester, liufengdb.May 8 2022, 6:35 PM

Harbormaster completed remote builds in B163407: Diff 427964.May 8 2022, 7:41 PM

Seems like this should be added to canonicalization? The "push constants to the right hand side" is there already.

I also don't understand the complexity of the implementation, I may need an example to understand why you're recursively operating on the producer ops for the operands.
From the revision description: , (1) the operands defined by non-constant-like ops come first, followed by (2) block arguments, and these are followed by (3) the operands defined by constant-like ops which all seems like a fairly local check: a stable-sort on the operands would be deterministic and local to a single operation.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
236	The string conversion seems unnecessary to me?
272	Is this a leak?

In D124750#3500748, @mehdi_amini wrote:

Seems like this should be added to canonicalization? The "push constants to the right hand side" is there already.

I think this was not added to canonicalization because we wanted it to be an independent utility that can be used if needed and not be used if not needed. Yes, the "push constants to the right hand side" is there already and that's actually the reason why this utility also pushes the constants to the right. I didn't want this utility to do something else and clash with the "push constants to the right hand side" canonicalization if and when both were being used together. So, when we decided what our sorting order will be, we made sure that this order kept constants to the right. For more context, you could refer to this RFC's discussion, starting from this comment: https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/12?u=srishtisrivastava.

But, note this:
Right now, the code shows that I am sorting the constants alphabetically (as in, theoretically, we can say that arith.constant comes before tf.Const). I will remove this "alphabetical sorting" among the constants. It isn't consistent with the existing "pushing of constants" and moreover adds unnecessary computation too. I'll ensure that the sorting is stable and pushes all the constants to the right.

I also don't understand the complexity of the implementation, I may need an example to understand why you're recursively operating on the producer ops for the operands.
From the revision description: (1) the operands defined by non-constant-like ops come first, followed by (2) block arguments, and these are followed by (3) the operands defined by constant-like ops` which all seems like a fairly local check: a stable-sort on the operands would be deterministic and local to a single operation.

I do this because, firstly, in the description, if you look below this paragraph, you will see the following:
"And, if two operands come from the same op, the function backtracks and
looks even further to sort them. This backtracking is done over the
backward slice of the operand, in a breadth-first traversal."

So, in essence, we are looking at the entire origin of each of the operands, in a breadth-first fashion, to decide the ordering of the operands. Secondly, we need to sort the producers of the operands so that we have a deterministic sorting (because if the producers are not sorted, we don't know if we are doing the right sorting or not, even if we are looking at the entire backward slice). This is because since the producers are not sorted, the breadth-first traversal becomes non-deterministic to a user who is writing a pattern matching function (say, matchAndRewrite()).

Within constant-like ops, removed the requirement for them being sorted alphabetically. Basically, all constants will be treated as equals by the sorting algorithm and it will not distinguish between, say, arith.constant and tf.Const. This is because multiple canonicalizations exist in various dialects that push the constants to the right but do not make any distinction among constants. So, since we want this utility to not clash with those canonicalizations, this is being done.

In D124750#3502228, @srishti-pm wrote:

In D124750#3500748, @mehdi_amini wrote:

Seems like this should be added to canonicalization? The "push constants to the right hand side" is there already.

I think this was not added to canonicalization because we wanted it to be an independent utility that can be used if needed and not be used if not needed.

You're telling me "what" while I'm actually more interested in the "why" here?

I also don't understand the complexity of the implementation, I may need an example to understand why you're recursively operating on the producer ops for the operands.
From the revision description: (1) the operands defined by non-constant-like ops come first, followed by (2) block arguments, and these are followed by (3) the operands defined by constant-like ops` which all seems like a fairly local check: a stable-sort on the operands would be deterministic and local to a single operation.

I do this because, firstly, in the description, if you look below this paragraph, you will see the following:
"And, if two operands come from the same op, the function backtracks and
looks even further to sort them. This backtracking is done over the
backward slice of the operand, in a breadth-first traversal."

Same as before: this does not tell me why, can you provide an example where this matters?

In D124750#3502295, @mehdi_amini wrote:

You're telling me "what" while I'm actually more interested in the "why" here?

I'm not sure what your question is, with a "why". Let me think about this a bit. I'll get back to you.

Same as before: this does not tell me why, can you provide an example where this matters?

Sure. This is a bit lengthy. I'm really sorry for that !

So, lets start with some basic understanding here. Let's say I am writing a matchAndRewrite() function, where I take the following INPUT and convert it to the following OUTPUT:

INPUT:
a = div b, c
d = sub e, f
g = add d, a
h = const 0
i = mul h, g

OUTPUT:
i = some_op b, c, e, f

Now, when I'm writing a C++ code to match and rewrite:

If I only sort the i = mul h, g op, I get my canonicalized input as follows:

CANONICALIZED INPUT #1:

a = div b, c
d = sub e, f
g = add d, a
h = const 0
i = mul g, h

So, I'm basically sorting i = mul h, g to i = mul g, h using the utility and then writing the if-else statements to match the CANONICALIZED INPUT #1.
So, a psuedo code will be:

if mul.operand[0].defOp != add OR mul.operand[1].defOp != const 0
  return failure

if add.operand[0].defOp != sub
  if add.operand[0].defOp != div OR add.operand[1].defOp != sub
    return failure
  else
    get the values of b, c, e, and f
else if add.operand[1].defOp == div
  get the values of b, c, e, and f
else
  return failure

rewrite<some_op>(mul, b, c, e, f)

But, if I had sorted the producers as well, my canonicalized input would be:
CANONICALIZED INPUT #2:

a = div b, c
d = sub e, f
g = add a, d
h = const 0
i = mul g, h

and thus my code will reduce to:

if mul.operand[0].defOp !=  add OR mul.operand[1].defOp != const 0
  return failure

if add.operand[0].defOp != div OR add.operand[1].defOp != sub
  return failure

get the values of b, c, e, and f

rewrite<some_op>(mul, b, c, e, f)

So, in essence, we can see that the effort of an end user writing a C++ pattern has reduced if I sort the producers as well. But, one may argue that I could have sorted the add op after seeing it and then my if-else statements would reduce. So, the above illustration doesn't explain why we sort the producers.

The real reason for sorting the producers is that, if such a thing is not done, the sorting and this entire utility will be virtually useless. A deterministic sorting of an op requires its producers to be sorted. Our sorting algorithm is based on the breadth-first traversal of backard slices. On the same level of DAG, the traversal looks at operands from the first to the last. That is how the breadth-first traversal is defined here. Now, if this traversal is non-deterministic, then, the whole use of sorting something will collapse. Maybe, this can be BEST explained with the below example.

If I have this IR:
d = div b, c
s = sub e, f
x = xor k, l
g = add s, d
h = add d, x
i = add g, h

Then, i = add g, h will be sorted to i = add g, h (no change).

But, when I have the below IR (which is functionally the same as the above IR):
d = div b, c
s = sub e, f
x = xor k, l
g = add d, s
h = add d, x
i = add g, h

Then, i = add g, h will be sorted to i = add h, g.

So, we have two functionally same IRs being sorted differently. This is clearly not useful. The sorting depends on what the input IR is. So, even after sorting, we still have functionally same IRs that look different. So, the pattern matcher (a human) still has to write an excessive number of if-else statements to match the input. This is exactly what this sorting was supposed to avoid. This is as good as not having done any sorting at all!

Is the motivation clear now?

Harbormaster completed remote builds in B163601: Diff 428240.May 9 2022, 8:01 PM

Made the sorting strictly stable.

srishti-pm edited the summary of this revision. (Show Details)May 9 2022, 11:54 PM

Harbormaster completed remote builds in B163640: Diff 428300.May 10 2022, 1:01 AM

In D124750#3502401, @srishti-pm wrote:

In D124750#3502295, @mehdi_amini wrote:

You're telling me "what" while I'm actually more interested in the "why" here?

I'm not sure what your question is, with a "why". Let me think about this a bit. I'll get back to you.

My "why?" question is about canonicalization: could this be a canonicalization and if so why / why not? This is an important thing to answer before looking into the discussion below actually:

Same as before: this does not tell me why, can you provide an example where this matters?

Sure. This is a bit lengthy. I'm really sorry for that !

No worry, thanks for being thorough.

...
So, in essence, we can see that the effort of an end user writing a C++ pattern has reduced if I sort the producers as well.

So far you explained "what is canonicalization and why are we canonicalizing", the same rationale applies to "push constant to the right" that we do already in canonicalization, and this is exactly why I asked before whether we could do what you're doing as a canonicalization.

The real reason for sorting the producers is that, if such a thing is not done, the sorting and this entire utility will be virtually useless.

So: I understand that the producers should be sorted for a pattern to apply, but our disconnect earlier is that my usual approach to see canonicalization is to process an entire block/region, and as such we don't work on slices but on operation in order until fix-point. I'm a bit concerned about efficiency of your approach, because when integrated in a framework that actually visit the entire block you would recompute subset of the same slice over and over and re-attempt to sort everything multiple times:

d = add b, c
s = add d, f
g = add s, d
h = add d, x
i = add g, h

Every time the algorithm would consider a commutative op, that is all the op, it would recurse and try to re-sort the current slice, processing the same ops over and over.

A deterministic sorting of an op requires its producers to be sorted.

This isn't clear to me why based on the sorting criteria you provided, but in any case a local sort with fixed-point iteration on an isolated region should converge (hopefully).

Our sorting algorithm is based on the breadth-first traversal of backard slices. On the same level of DAG, the traversal looks at operands from the first to the last. That is how the breadth-first traversal is defined here. Now, if this traversal is non-deterministic, then, the whole use of sorting something will collapse. Maybe, this can be BEST explained with the below example.

If I have this IR:
d = div b, c
s = sub e, f
x = xor k, l
g = add s, d
h = add d, x
i = add g, h

Then, i = add g, h will be sorted to i = add g, h (no change).

But, when I have the below IR (which is functionally the same as the above IR):
d = div b, c
s = sub e, f
x = xor k, l
g = add d, s
h = add d, x
i = add g, h

Then, i = add g, h will be sorted to i = add h, g.

Why? Again your description of the sort is as follow:

(1) the operands defined by non-constant-like ops come first, followed by
(2) block arguments, and these are followed by
(3) the operands defined by constant-like ops.
In addition to this, within the category (1), the order of operands is alphabetical w.r.t. the dialect name and op name.

In your example:

g = add d, s
h = add d, x
i = add g, h

The operands fits category 1) I believe, and so they should be sorted "alphabetical w.r.t. the dialect name and op name", so a stable sort would never re-order them in any way.

My "why?" question is about canonicalization: could this be a canonicalization and if so why / why not? This is an important thing to answer before looking into the discussion below actually:

I think, yes, it can be a canonicalization. I don't see why not.

So far you explained "what is canonicalization and why are we canonicalizing", the same rationale applies to "push constant to the right" that we do already in canonicalization, and this is exactly why I asked before whether we could do what you're doing as a canonicalization.

You are right. And the answer is yes, we could do this as a canonicalization. In fact, it is a canonicalization because we are converting various forms of an IR to a specific canonicalized form. It just hasn't been added to any op's canonicalizer here.

So: I understand that the producers should be sorted for a pattern to apply, but our disconnect earlier is that my usual approach to see canonicalization is to process an entire block/region, and as such we don't work on slices but on operation in order until fix-point. I'm a bit concerned about efficiency of your approach, because when integrated in a framework that actually visit the entire block you would recompute subset of the same slice over and over and re-attempt to sort everything multiple times:

I completely agree with your concern. I had the same concern myself (refer to discussion starting from here: https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/12?u=srishtisrivastava). I think we can find a way to call this utility in the canonicalizers of all the commutative ops and remove the recursion (which won't be needed anymore, given the canonicalization happens from top to bottom). What are your views on this?

That was the sense of my question about canonicalization indeed :)

Every time the algorithm would consider a commutative op, that is all the op, it would recurse and try to re-sort the current slice, processing the same ops over and over.

Yes, exactly.

Why? Again your description of the sort is as follow:

I think I need to update my commit summary and revision summary. I had thought that this was an intuitive way of explaining the utility. But, I understand that it is quite misleading.

mehdi_amini added a reviewer: Mogball.May 10 2022, 6:29 AM

I need to look at the algorithm in more detail, but I'm not a fan of using a string key. Concatenating strings to make compound keys is not very efficient and potentially brittle. Can you assign unique IDs and use an array of IDs instead?

mlir/include/mlir/Transforms/CommutativityUtils.h
25	Add documentation? Similar to what you have in the patch description.

On the matter of whether this should be a canonicalization, my concern with this is that if an operation has its own preferred ordering of operands that conflicts with the sort, then this will cause canonicalization to loop infinitely.

It's not actually the canonicalizer pass that moves constants to the right hand size. It's the folder. And it probably shouldn't be the folder that does this. So I'm open to making this part of canonicalization IF the sorted operand order produced by this utility is the canonical order of operands for commutative operations, so that conflicts are not possible.

(1) the operands defined by non-constant-like ops come first, followed by (2) block arguments, and these are followed by (3) the operands defined by constant-like ops.

I would have thought block-arguments would come first as we don't know their values, while non-constant-like ops could be folded at some point and then become constant-like. Meaning, they seem closer to constant than block arguments.

+1 to Mehdi's question about just stable sorting based on based on 4 criteria (3 buckets + ordering within (1)) and then we should be able to avoid all the string mangling too as Jeff asked about.

+1 on all of the other comments, especially related to the use of strings.

In D124750#3503607, @Mogball wrote:

On the matter of whether this should be a canonicalization, my concern with this is that if an operation has its own preferred ordering of operands that conflicts with the sort, then this will cause canonicalization to loop infinitely.

It's not actually the canonicalizer pass that moves constants to the right hand size. It's the folder. And it probably shouldn't be the folder that does this. So I'm open to making this part of canonicalization IF the sorted operand order produced by this utility is the canonical order of operands for commutative operations, so that conflicts are not possible.

We can decide whatever we want the canonical ordering of operands to be for the Commutative trait. We don't have to leave things up to operations if it doesn't make sense.

Improving the documentation of the functionality of this utility to make it more clear.

srishti-pm edited the summary of this revision. (Show Details)May 10 2022, 2:20 PM

srishti-pm edited the summary of this revision. (Show Details)

srishti-pm edited the summary of this revision. (Show Details)May 10 2022, 2:24 PM

srishti-pm edited the summary of this revision. (Show Details)

Fixing a minor typo.

srishti-pm marked an inline comment as done.May 10 2022, 2:34 PM

Thanks for improving the doc! Are you moving this to be used in canonicalization next?
I think a first good step would be to make it a pattern and test it with a pass that applies it in the greedy rewriter. I would land this first and then try to enable this in the canonicalized.

Also, have you thought already about how to get rid of string manipulation?

Harbormaster completed remote builds in B163788: Diff 428502.May 10 2022, 3:47 PM

Thanks for improving the doc! Are you moving this to be used in canonicalization next?
I think a first good step would be to make it a pattern and test it with a pass that applies it in the greedy rewriter. I would land this first and then try to enable this in the canonicalized.

Thanks! I hope it is unambiguous and clear now! Yes, I will do the "moving this to be used in canonicalization" now. Thanks for the suggestion. The plan sounds good.

Also, have you thought already about how to get rid of string manipulation?

No, I haven't. I'm still thinking about this. Meanwhile, I'll also address the other comments given here.

Replying to the various comments given in this revision:

1. Regarding string manipulation:

I need to look at the algorithm in more detail, but I'm not a fan of using a string key. Concatenating strings to make compound keys is not very efficient and potentially brittle. Can you assign unique IDs and use an array of IDs instead?

Sure, I'm currently brainstorming to come up with a way to do this. But, @Mogball, do you remember our discussion of including the attibutes, etc. also in the unique ID/key (https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/13)? We had decided that this was something that can be added if and when required but probably won't be included in the first revision of this utility. As in, the first revision will contain a TODO comment describing this enhancement. Will we still want to do this when we are using an array of unique IDs? I wish to understand this so that I know if I have to think of an ID which can be easily extended to include the attributes, etc. of an op.

+1 to Mehdi's question about just stable sorting based on based on 4 criteria (3 buckets + ordering within (1)) and then we should be able to avoid all the string mangling too as Jeff asked about.

Actually, that question of Mehdi is now obsolete (I think). It was based on my incorrect documentation of the "sorting rule". I have now corrected the documentation of the rule (in the revision summary, the commit summary, and the code comments). The rule was finalized in a recent RFC (refer to comments starting from here: https://discourse.llvm.org/t/mlir-pdl-extending-pdl-pdlinterp-bytecode-to-enable-commutative-matching/60798/19). Does this sound okay?

2. Regarding reversing non-constant-like op and block argument order:

I would have thought block-arguments would come first as we don't know their values, while non-constant-like ops could be folded at some point and then become constant-like. Meaning, they seem closer to constant than block arguments.

Sure. This sounds much better actually. I think at some point during the implementation, I had also thought of the same :)
I'll do this. Thanks for the suggestion.

I'm open to iterating in tree. Landing this utility first and then try adding it as a canonicalization SGTM.

The string could be replaced with an array of tuples (yuck) for now. An enum (constant, block arg, everything else) plus the OperationName.

Attributes need to be handled somehow possibly by adding the op's dictionary attribute and using a deep compare.

srishti-pm mentioned this in D118684: [MLIR][PDLInterp] Define new ops in PDLInterp to support commutativity.May 16 2022, 8:18 PM

srishti-pm mentioned this in D118689: [MLIR][PDL] Execute nondeterministic bytecode & lower some PDLInterp ops.

@Mogball @mehdi_amini @jpienaar, sorry there haven't been any updates from my side here for the past 10 or so days. I had been busy in some other tasks. I have started working on this again.

Herald added a subscriber: bzcheeseman. · View Herald TranscriptMay 23 2022, 1:00 AM

Addressed all the comments.

srishti-pm edited the summary of this revision. (Show Details)Jun 12 2022, 10:29 PM

I haven't thought too hard about the algorithm itself yet. I'm in the camp of "let's move forward if it works". I have mostly trivial comments.

clang/docs/tools/clang-formatted-files.txt
8452	I don't think this file needs to be modified.
mlir/include/mlir/Transforms/CommutativityUtils.h
25	Why do all of these need to be exposed publicly? I think this file should only contain `SortCommutativeOperands`.
35	`using BlockArgumentOrOpKey = std::pair<BlockArgumentOrOpType, StringRef>` The default `operator<` for `std::pair` should work for you.
141	`ArrayRef<OperandBFS *>`
162	Pass these both by `ArrayRef`
175	`ArrayRef<OperandBFS *>`
187	`ArrayRef<OperandBFS *>`
195	`SmallVectorImpl`
202	`ArrayRef<OperandBFS *>`
212	Please move the body `matchAndRewrite` into a source file. It only needs `Operation *`. And then all the structs/enum/utility functions in the header can be moved there as well.
227	memory leak?
243	`sortedOperands(numOperands, nullptr);`
276
297
mlir/lib/Transforms/Utils/CommutativityUtils.cpp
18	unused?
52	The doc of a public function shouldn't be repeated above the implementation.

Mogball added inline comments.Jun 12 2022, 10:57 PM

mlir/include/mlir/Transforms/CommutativityUtils.h
212	This could stand to be a `static_assert`
258	Could you not change `getIndicesOfUnassigned...` to populate two lists of operands and pass these to `assignSortedPositionTo` instead of using a set to track the indices. You could put the operand index inside `OperandBFS` to keep track.
mlir/lib/Transforms/Utils/CommutativityUtils.cpp
25	This function doesn't seem like it pays for itself -- `llvm::any_of`?
93	These could all be `ArrayRef`s since you aren't modifying them
168	drop mlir::
202	This function doesn't seem like it pays for itself.

Harbormaster completed remote builds in B169356: Diff 436274.Jun 12 2022, 11:01 PM

srishti-pm edited the summary of this revision. (Show Details)Jun 12 2022, 11:13 PM

srishti-pm edited the summary of this revision. (Show Details)

srishti-pm edited the summary of this revision. (Show Details)Jun 12 2022, 11:15 PM

srishti-pm edited the summary of this revision. (Show Details)

srishti-pm edited the summary of this revision. (Show Details)Jun 12 2022, 11:17 PM

srishti-pm edited the summary of this revision. (Show Details)Jun 12 2022, 11:24 PM

Minor changes.

Harbormaster completed remote builds in B169363: Diff 436281.Jun 13 2022, 12:11 AM

mehdi_amini added inline comments.Jun 14 2022, 1:18 PM

mlir/include/mlir/Transforms/CommutativityUtils.h
212	Can we make this not a template? This will be a code bloat otherwise.
333	There seems to me to be far too much code in the public head: I can't isolate the public API here, if this is just about a pattern then the usual way is to have a "populate" method.

Addressing comments.

Addressed most of the comments. A few remaining.

srishti-pm edited the summary of this revision. (Show Details)Jun 15 2022, 11:04 AM

Harbormaster completed remote builds in B170048: Diff 437249.Jun 15 2022, 11:48 AM

Increasing pattern benefit + minor typo correction.

Harbormaster completed remote builds in B170072: Diff 437276.Jun 15 2022, 1:00 PM

Right now I'm not yet understanding all of the algorithm (haven't spent enough time on it), but I'm mostly concerned by the runtime cost of this normalization.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
372
378–380

In D124750#3587161, @mehdi_amini wrote:

Right now I'm not yet understanding all of the algorithm (haven't spent enough time on it), but I'm mostly concerned by the runtime cost of this normalization.

I understand your concern. I'll go through my implementation once and optimize things that need it.

In principle, I think the algorithm is fine. I'm pretty sure you can rewrite bits of it to get rid of the map/sets. I'm only concerned about handling attributes. (e.g. cmpi slt vs cmpi sgt)

Addressed the final comments.

@mehdi_amini, I have made sure that the algorithm is good in terms of both time and space complexity.

@Mogball, "handling attributes (e.g. cmpi slt vs cmpi sgt)" doesn't seem hard to me. I think this algorithm can be extended with ease to take attributes into account. But, I don't think that it should be a part of this revision (I believe you agree) because it seems like an incremental change which should be added only when the need arises. A new user should first get accustomed to this utility (and how its sorts operands with different backward slices) and then the utility can be extended to differentiate between backward slices containing ops with the same name but different attribute values.

mlir/include/mlir/Transforms/CommutativityUtils.h
35	I have added a constructor to `BlockArgumentOrOpKey` (now renamed to `AncestorKey`) and thus I think this comment is obsolete now. Hope that this fine. Adding a constructor made the code look cleaner.
227	Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.
258	I think that doing this might not be a good idea. It will increase the space complexity unnecessarily (OperandBFS is a big structure) and not help much with the time complexity because the sets of the indices are expected to be small. At least the number of indices will be <= the total number of operands and each element in these sets will occupy very less space (size of `unsigned`).
mlir/lib/Transforms/Utils/CommutativityUtils.cpp
272	Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.

srishti-pm edited the summary of this revision. (Show Details)Jun 28 2022, 5:28 AM

srishti-pm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B172445: Diff 440570.Jun 28 2022, 5:43 AM

Fixed memory leak.

Mogball added inline comments.Jun 28 2022, 9:41 PM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
179	Please remove these. They don't improve readability.

Harbormaster completed remote builds in B172640: Diff 440848.Jun 28 2022, 10:04 PM

Addressed of all Jeff's comments.

srishti-pm marked an inline comment as done.Jun 29 2022, 9:24 PM

Harbormaster completed remote builds in B172931: Diff 441253.Jun 29 2022, 9:40 PM

I'm glad the DenseSets are gone, but my three-ish biggest gripes are:

The algorithm is conceptually simple, but there is way more code than is necessary to achieve it.
More comments (excluding "doc" comments) than code is generally not a good sign
The implementation is still inefficient in a lot of obvious ways.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
11–12
22–39	This class isn't necessary. `Ancestor` can just be `Operation *`. If it's null, then we know it's a block argument (the bool flag is redundant).
41
66–67	My biggest complaint with respect to readability is that there are more comments than code. This is fine if the comment has a big explanation about the algorithm and how keys are represented, especially with a nice ASCII diagram as you have below. But if this constructor had 0 comments except maybe "Only non-constant ops are sorted by name", it would be perfectly fine.
77	Constant ops could be sorted by name as well.
89–97	This should behave the same as the manually written comparator. `operator<` of `std::tuple` compares the first element and then the next if the first is equal.
111	Since this is a set of pointers expected to be small, you can use `SmallPtrSet` for a big efficiency boost (linear scan instead of hashing when small).
149–150	Since you are moving sorted operands into their sorted position and tracking the unsorted range, you shouldn't even need this flag because you will always know the sorted and unsorted subranges. There are multiple loops in which you iterate over the entire operand list but skip those where this flag is set/unset. In those cases, you can always just iterate the subrange of interest.
156–160
163–178	I would drop these helpers. `std::queue` will already assert if `pop` or `front` are called on an empty queue.
201–218
268
270	This flag is not necessary because you can just check `bfsOfOperandsWithKey.size() == 1`
278
279–280	You don't need to make a copy. In fact, I think you should just track the indices.
299–305	This shouldn't be necessary if you're tracking the unsorted subrange. Just quit when it gets to 1 element.
308–309
313	Argument names in comments are not necessary when the passed variable has a descriptive name.
330–354	There is no way you need this much code. A `std::swap` between the current operand and the first unsorted position should be enough.
361	This is possibly the longest function name I've ever seen. Please make it more concise.
401	This check could be moved into `pushAncestor`
536–539	You could just be returning a flag to indicate whether any swapping occurred so that you don't have to track before and after.

Mogball added inline comments.Jun 30 2022, 12:13 AM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
524	And then you'll never need to check `isSorted` again.

srishti-pm marked 4 inline comments as done.Jun 30 2022, 12:42 AM

srishti-pm added inline comments.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
77	The only reason we separated constant ops from the non-constant ops was because the former are canonicalized to the right (as a stable sort) by existing canonicalizations. And, we didn't want our algorithm to conflict with these existing canonicalizations. That is the reason I am not sorting them by name and just keeping them to the right (as a stable sort).
270	`.size()` is an O(N) operation and that is why I usually try to avoid it. Do you still agree we should use it here? I understand that N is an expectedly small value.
279–280	I agree. But, we had discussed to store operands instead of indices and that's why I did this. I will change this to use indices again (keeping other things unchanged).
330–354	If I do a swap, the sorting will no longer be stable and I believe that there was a discussion that concluded with the fact that "we want stable sorting".
361	Could you please give a suggestion for the name? After a long thought, I came up with this name. It was better than all my other ideas.

Mogball added inline comments.Jun 30 2022, 7:36 AM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
77	I know. You can sort them separately from regular ops and also by name and the overall behaviour would be the same.
270	Size is constant time
279–280	I mean you can have list (not a set) of indices to shift
330–354	That's true, but shifting like this is very slow as well. At this point, you might want to give `std::stable_sort` with a custom comparator that does extra BFS iterations on demand a try.

srishti-pm marked 4 inline comments as done.Jun 30 2022, 8:12 AM

srishti-pm added inline comments.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
330–354	So, this is what I think: The number of commutative operands is not expected to be huge. So, we can afford to do shifting. In most cases, we wouldn't have to shift more than 1 or 2 positions. But, the custom comparator might cost us a lot, considering that each BFS could potentially be very large, especially for deep learning models. So, doing the BFS traversals again and again for the same operand, even though caching will be involved, doesn't sound like a good idea to me. What are your views?

Mogball added inline comments.Jun 30 2022, 8:28 AM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
330–354	Very rough estimate: on its own, this function is N. Finding the smallest key is N, and then finding all matching elements is N. This function is called for each operand that needs to be moved, but the number of such operands decreases. So the sort itself averages out to be 3N^2 iterations over the operand list. Now for traversals, doing BFS on demand inside the comparator doesn't mean it has to restart every time. It would do extra iterations on top of existing iteration results only when needed to break ties. In your case, you do an extra iteration of BFS for all operands if the current smallest key is identical, not just for the ones needed. It's hard to estimate the number of iterations of BFS, but certainly it's more in your case. Using `std::stable_sort` would also bring the complexity down to N logN

For this to be a usable canonicalization, it is really the case where the operands are already sorted (no-op) that needs to be heavily optimized (that is no complex data structure to populate, etc.)

I'm not sure how to check the no-op case without constructing at least a queue. Even assuming only 2 commutative operands, if they look the same depth=1, then the comparison has to iterate.

Used the stable_sort function.

srishti-pm marked 21 inline comments as done.Jul 20 2022, 4:28 PM

srishti-pm edited the summary of this revision. (Show Details)

srishti-pm edited the summary of this revision. (Show Details)Jul 20 2022, 4:38 PM

srishti-pm edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B176618: Diff 446295.Jul 20 2022, 4:48 PM

Mogball added inline comments.Jul 20 2022, 9:10 PM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
239	Why can't you sort the OperandBFS directly to avoid the hash map?
287	Can you use unique_ptr so that the memory doesn't leak?

Mogball requested changes to this revision.Jul 20 2022, 9:11 PM

This revision now requires changes to proceed.Jul 20 2022, 9:11 PM

srishti-pm marked an inline comment as done.Jul 20 2022, 9:29 PM

srishti-pm added inline comments.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
239	Because comparators are not allowed to modify their input arguments.

Mogball added inline comments.Jul 20 2022, 9:35 PM

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
239	The arguments have to be const references? If so, just const_cast

srishti-pm marked an inline comment as done.Jul 21 2022, 3:50 AM

srishti-pm added inline comments.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
111	This set isn't expected to be small, right? There can be many ancestors. The only thing that is small in this context is the number of operands, right?

Addressed all the comments.
Refactored the code.

srishti-pm edited the summary of this revision. (Show Details)Jul 27 2022, 10:36 AM

Harbormaster completed remote builds in B177901: Diff 448094.Jul 27 2022, 11:13 AM

Updated an outdated comment.

Harbormaster completed remote builds in B177915: Diff 448117.Jul 27 2022, 12:32 PM

Got a few small nits but otherwise LGTM. Thanks for all the hard work! This looks really solid now. I haven't thought too hard about the performance of that while loop but it seems good enough to land for now.

mlir/lib/Transforms/Utils/CommutativityUtils.cpp
247
253
256

This revision is now accepted and ready to land.Jul 29 2022, 10:25 AM

. I haven't thought too hard about the performance of that while loop but it seems good enough to land for now.

What's the finality of it? That is: outside of canonicalization what is its purpose?

I'm referring to the nitty gritty details of the while loop inside the comparator. It looks pretty tight to me right now. If the operands are already sorted, worst-case each operand is compared against only its neighbours. Unfortunately, without extra bookkeeping, BFS will still need to iterate when necessary

Addressed the final NITs.

Harbormaster completed remote builds in B178321: Diff 448682.Jul 29 2022, 11:14 AM

Closed by commit rGb508c5649f5e: [MLIR] Add a utility to sort the operands of commutative ops (authored by srishti-cb <srishti.srivastava@polymagelabs.com>, committed by Mogball). · Explain WhyJul 30 2022, 4:25 PM

This revision was automatically updated to reflect the committed changes.

Mogball added a commit: rGb508c5649f5e: [MLIR] Add a utility to sort the operands of commutative ops.

Revision Contents

Path

Size

clang/

docs/

tools/

clang-formatted-files.txt

2 lines

mlir/

include/

mlir/

Transforms/

CommutativityUtils.h

35 lines

lib/

Transforms/

Utils/

CMakeLists.txt

1 line

CommutativityUtils.cpp

410 lines

test/

Transforms/

test-commutativity-utils.mlir

116 lines

lib/

Dialect/

Test/

TestOps.td

10 lines

Transforms/

CMakeLists.txt

1 line

TestCommutativityUtils.cpp

67 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

Diff 428502

clang/docs/tools/clang-formatted-files.txt

	Show First 20 Lines • Show All 7,882 Lines • ▼ Show 20 Lines
	mlir/include/mlir/Tools/PDLL/AST/Diagnostic.h			mlir/include/mlir/Tools/PDLL/AST/Diagnostic.h
	mlir/include/mlir/Tools/PDLL/CodeGen/CPPGen.h			mlir/include/mlir/Tools/PDLL/CodeGen/CPPGen.h
	mlir/include/mlir/Tools/PDLL/CodeGen/MLIRGen.h			mlir/include/mlir/Tools/PDLL/CodeGen/MLIRGen.h
	mlir/include/mlir/Tools/PDLL/ODS/Constraint.h			mlir/include/mlir/Tools/PDLL/ODS/Constraint.h
	mlir/include/mlir/Tools/PDLL/ODS/Context.h			mlir/include/mlir/Tools/PDLL/ODS/Context.h
	mlir/include/mlir/Tools/PDLL/ODS/Dialect.h			mlir/include/mlir/Tools/PDLL/ODS/Dialect.h
	mlir/include/mlir/Tools/PDLL/ODS/Operation.h			mlir/include/mlir/Tools/PDLL/ODS/Operation.h
	mlir/include/mlir/Tools/PDLL/Parser/Parser.h			mlir/include/mlir/Tools/PDLL/Parser/Parser.h
				mlir/include/mlir/Transforms/CommutativityUtils.h
	mlir/include/mlir/Transforms/ControlFlowSinkUtils.h			mlir/include/mlir/Transforms/ControlFlowSinkUtils.h
	mlir/include/mlir/Transforms/DialectConversion.h			mlir/include/mlir/Transforms/DialectConversion.h
	mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h			mlir/include/mlir/Transforms/GreedyPatternRewriteDriver.h
	mlir/include/mlir/Transforms/InliningUtils.h			mlir/include/mlir/Transforms/InliningUtils.h
	mlir/include/mlir/Transforms/LocationSnapshot.h			mlir/include/mlir/Transforms/LocationSnapshot.h
	mlir/include/mlir/Transforms/Passes.h			mlir/include/mlir/Transforms/Passes.h
	mlir/include/mlir/Transforms/RegionUtils.h			mlir/include/mlir/Transforms/RegionUtils.h
	mlir/include/mlir-c/AffineExpr.h			mlir/include/mlir-c/AffineExpr.h
	▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines
	mlir/lib/Transforms/Inliner.cpp			mlir/lib/Transforms/Inliner.cpp
	mlir/lib/Transforms/LocationSnapshot.cpp			mlir/lib/Transforms/LocationSnapshot.cpp
	mlir/lib/Transforms/LoopInvariantCodeMotion.cpp			mlir/lib/Transforms/LoopInvariantCodeMotion.cpp
	mlir/lib/Transforms/PassDetail.h			mlir/lib/Transforms/PassDetail.h
	mlir/lib/Transforms/SCCP.cpp			mlir/lib/Transforms/SCCP.cpp
	mlir/lib/Transforms/StripDebugInfo.cpp			mlir/lib/Transforms/StripDebugInfo.cpp
	mlir/lib/Transforms/SymbolDCE.cpp			mlir/lib/Transforms/SymbolDCE.cpp
	mlir/lib/Transforms/SymbolPrivatize.cpp			mlir/lib/Transforms/SymbolPrivatize.cpp
				mlir/lib/Transforms/Utils/CommutativityUtils.cpp
				MogballUnsubmitted Done Reply Inline Actions I don't think this file needs to be modified. Mogball: I don't think this file needs to be modified.
	mlir/lib/Transforms/Utils/ControlFlowSinkUtils.cpp			mlir/lib/Transforms/Utils/ControlFlowSinkUtils.cpp
	mlir/lib/Transforms/Utils/DialectConversion.cpp			mlir/lib/Transforms/Utils/DialectConversion.cpp
	mlir/lib/Transforms/Utils/FoldUtils.cpp			mlir/lib/Transforms/Utils/FoldUtils.cpp
	mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp			mlir/lib/Transforms/Utils/GreedyPatternRewriteDriver.cpp
	mlir/lib/Transforms/Utils/InliningUtils.cpp			mlir/lib/Transforms/Utils/InliningUtils.cpp
	mlir/lib/Transforms/Utils/RegionUtils.cpp			mlir/lib/Transforms/Utils/RegionUtils.cpp
	mlir/lib/Translation/Translation.cpp			mlir/lib/Translation/Translation.cpp
	mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp			mlir/tools/mlir-cpu-runner/mlir-cpu-runner.cpp
	▲ Show 20 Lines • Show All 398 Lines • Show Last 20 Lines

mlir/include/mlir/Transforms/CommutativityUtils.h

This file was added.

//===- CommutativityUtils.h - Commutativity utilities -----------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This header file declares a utility that is intended to be used inside a pass

// or an individual pattern to simplify the matching of commutative operations.

// Note that this utility can also be used inside PDL patterns in conjunction

// with the `pdl.apply_native_rewrite` op.

//===----------------------------------------------------------------------===//

#ifndef MLIR_TRANSFORMS_COMMUTATIVITYUTILS_H

#define MLIR_TRANSFORMS_COMMUTATIVITYUTILS_H

namespace mlir {

class Operation;

class PatternRewriter;

/// Sorts `op`.

/// "Sorting" `op` means to "sort" the ops defining each of its operands

MogballUnsubmitted

Done

Add documentation? Similar to what you have in the patch description.

Mogball: Add documentation? Similar to what you have in the patch description.

MogballUnsubmitted

Done

Why do all of these need to be exposed publicly? I think this file should only contain SortCommutativeOperands.

Mogball: Why do all of these need to be exposed publicly? I think this file should only contain…

/// followed by rearranging its operands in the "sorted" order. Before the

/// rearrangement, it is important to sort the ops defining its operands so that

/// the rearrangement is deterministic. In other words, if these ops were not

/// sorted, the rearrangement would be non-deterministic and would thus make

/// this utility useless.

void sortCommutativeOperands(Operation *op, PatternRewriter &rewriter);

} // namespace mlir

#endif // MLIR_TRANSFORMS_COMMUTATIVITYUTILS_H

MogballUnsubmitted

Done

using BlockArgumentOrOpKey = std::pair<BlockArgumentOrOpType, StringRef>

The default operator< for std::pair should work for you.

Mogball: `using BlockArgumentOrOpKey = std::pair<BlockArgumentOrOpType, StringRef>` The default…

srishti-pmAuthorUnsubmitted

Done

I have added a constructor to BlockArgumentOrOpKey (now renamed to AncestorKey) and thus I think this comment is obsolete now. Hope that this fine. Adding a constructor made the code look cleaner.

srishti-pm: I have added a constructor to `BlockArgumentOrOpKey` (now renamed to `AncestorKey`) and thus I…

MogballUnsubmitted

Done

ArrayRef<OperandBFS *>

Mogball: `ArrayRef<OperandBFS *>`

MogballUnsubmitted

Done

Pass these both by ArrayRef

Mogball: Pass these both by `ArrayRef`

MogballUnsubmitted

Done

ArrayRef<OperandBFS *>

Mogball: `ArrayRef<OperandBFS *>`

MogballUnsubmitted

Done

SmallVectorImpl

Mogball: `SmallVectorImpl`

MogballUnsubmitted

Done

ArrayRef<OperandBFS *>

Mogball: `ArrayRef<OperandBFS *>`

MogballUnsubmitted

Done

ArrayRef<OperandBFS *>

Mogball: `ArrayRef<OperandBFS *>`

MogballUnsubmitted

Done

memory leak?

Mogball: memory leak?

srishti-pmAuthorUnsubmitted

Done

Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.

srishti-pm: Fixed this by adding a struct called "Ancestor" which refers to either an op or a block…

MogballUnsubmitted

Done

sortedOperands(numOperands, nullptr);

Mogball: `sortedOperands(numOperands, nullptr);`

MogballUnsubmitted

Done

Please move the body matchAndRewrite into a source file. It only needs Operation *. And then all the structs/enum/utility functions in the header can be moved there as well.

Mogball: Please move the body `matchAndRewrite` into a source file. It only needs `Operation *`. And…

MogballUnsubmitted

Done

// sorting).

- for (auto indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {

+ for (auto &indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {

OperandBFS *bfsOfOperand = indexedBfsOfOperand.value();

Mogball:

MogballUnsubmitted

Done

// to assign it a sorted position if possible (ensuring stable sorting).

- for (auto indexedBfsOfOperand :

+ for (auto &indexedBfsOfOperand :

llvm::enumerate(llvm::reverse(bfsOfOperands))) {

Mogball:

MogballUnsubmitted

Done

This could stand to be a static_assert

Mogball: This could stand to be a `static_assert`

MogballUnsubmitted

Done

Could you not change getIndicesOfUnassigned... to populate two lists of operands and pass these to assignSortedPositionTo instead of using a set to track the indices. You could put the operand index inside OperandBFS to keep track.

Mogball: Could you not change `getIndicesOfUnassigned...` to populate two lists of operands and pass…

srishti-pmAuthorUnsubmitted

Done

I think that doing this might not be a good idea. It will increase the space complexity unnecessarily (OperandBFS is a big structure) and not help much with the time complexity because the sets of the indices are expected to be small. At least the number of indices will be <= the total number of operands and each element in these sets will occupy very less space (size of unsigned).

srishti-pm: I think that doing this might not be a good idea. It will increase the space complexity…

mehdi_aminiUnsubmitted

Done

There seems to me to be far too much code in the public head: I can't isolate the public API here, if this is just about a pattern then the usual way is to have a "populate" method.

mehdi_amini: There seems to me to be far too much code in the public head: I can't isolate the public API…

mehdi_aminiUnsubmitted

Done

Can we make this not a template? This will be a code bloat otherwise.

mehdi_amini: Can we make this not a template? This will be a code bloat otherwise.

mlir/lib/Transforms/Utils/CMakeLists.txt

	add_mlir_library(MLIRTransformUtils			add_mlir_library(MLIRTransformUtils
				CommutativityUtils.cpp
	ControlFlowSinkUtils.cpp			ControlFlowSinkUtils.cpp
	DialectConversion.cpp			DialectConversion.cpp
	FoldUtils.cpp			FoldUtils.cpp
	GreedyPatternRewriteDriver.cpp			GreedyPatternRewriteDriver.cpp
	InliningUtils.cpp			InliningUtils.cpp
	LoopInvariantCodeMotionUtils.cpp			LoopInvariantCodeMotionUtils.cpp
	RegionUtils.cpp			RegionUtils.cpp
	SideEffectUtils.cpp			SideEffectUtils.cpp
	Show All 9 Lines

mlir/lib/Transforms/Utils/CommutativityUtils.cpp

This file was added.

//===- CommutativityUtils.cpp - Commutativity utilities ---------*- C++ -*-===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===//

// This file implements a utility that is intended to be used inside a pass or

// an individual pattern to simplify the matching of commutative operations.

// Note that this utility can also be used inside PDL patterns in conjunction

// with the `pdl.apply_native_rewrite` op.

MogballUnsubmitted

Done

// This file implements a commutativity utility pattern and a function to

// populate this pattern. The function is intended to be used inside passes to

- // simplify the matching of commutative operations.

+ // simplify the matching of commutative operations by canonicalizing the order

+ // of their operands.

//===----------------------------------------------------------------------===//

Mogball:

//===----------------------------------------------------------------------===//

#include "mlir/Transforms/CommutativityUtils.h"

#include "mlir/IR/PatternMatch.h"

MogballUnsubmitted

Done

unused?

Mogball: unused?

#include <queue>

#define DEBUG_TYPE "commutativity-utils"

using namespace mlir;

/// Stores the BFS traversal information of an operand.

MogballUnsubmitted

Done

This function doesn't seem like it pays for itself -- llvm::any_of?

Mogball: This function doesn't seem like it pays for itself -- `llvm::any_of`?

struct OperandBFS {

/// Stores the queue of ancestors of the BFS traversal of an operand at a

/// particular point in time.

std::queue<Operation *> ancestorQueue;

/// Stores the list of visited ancestors of the BFS traversal of an operand at

/// a particular point in time.

DenseSet<Operation *> visitedAncestors;

/// Stores the "key" associated with an operand. This "key" is defined as the

/// concatenation of the "keys" associated with the ops and block arguments

/// present in the "backward slice" of this operand, in a breadth-first order.

///

/// So, if an operand, say, `A` was produced as follows:

MogballUnsubmitted

Done

using namespace mlir;

- /// Stores the "ancestor" of an operand of some op. The operand of any op is

- /// produced by a set of ops and block arguments. Each of these ops and block

- /// arguments is called an "ancestor" of this operand.

- struct Ancestor {

- /// Stores true when the "ancestor" is an op and false when the "ancestor" is

- /// a block argument.

- bool isOp;

- /// Stores the op when the "ancestor" is an op and nullptr when the "ancestor"

- /// is a block argument.

- Operation *op;

- /// Defines the constructor for `Ancestor`.

- Ancestor(Operation *opForAncestor) {

- isOp = opForAncestor ? true : false;

- op = opForAncestor;

- }

- };

/// Declares various "types" of ancestors.

This class isn't necessary. Ancestor can just be Operation *. If it's null, then we know it's a block argument (the bool flag is redundant).

Mogball: This class isn't necessary. `Ancestor` can just be `Operation *`. If it's null, then we know…

///

/// `<block argument>` `<block argument>`

MogballUnsubmitted

Done

}

};

- /// Declares various "types" of ancestors.

+ /// The possible "types" of ancestors.

enum AncestorType {

Mogball:

/// \ /

/// `arith.subi` `arith.constant`

/// \ /

/// `arith.addi`

/// |

/// returns `A`

///

/// Then, the ops and block arguments present in the backward slice of `A`, in

/// the breadth-first order are:

/// `arith.addi`, `arith.subi`, `arith.constant`, `<block argument>`, and

MogballUnsubmitted

Done

The doc of a public function shouldn't be repeated above the implementation.

Mogball: The doc of a public function shouldn't be repeated above the implementation.

/// `<block argument>`.

/// Now, the "key" associated with:

/// 1. A non-constant-like op, for example, `arith.addi`, is "1arith.addi",

/// i.e., the full op name, preceded by "1".

/// 2. A block argument is "2".

/// 3. A constant-like op, for example, `arith.constant`, is "3".

///

/// Thus, the "key" associated with operand `A` is:

/// "1arith.addi" + "1arith.subi" + "3" + "2" + "2" =

/// "1arith.addi1arith.subi322".

std::string key = "";

/// Stores true iff the operand has been assigned a sorted position yet.

bool isAssignedSortedPosition = false;

MogballUnsubmitted

Done

My biggest complaint with respect to readability is that there are more comments than code. This is fine if the comment has a big explanation about the algorithm and how keys are represented, especially with a nice ASCII diagram as you have below. But if this constructor had 0 comments except maybe "Only non-constant ops are sorted by name", it would be perfectly fine.

Mogball: My biggest complaint with respect to readability is that there are more comments than code.

/// Push an ancestor into the operand's BFS information structure. This

/// entails it being pushed into the queue (always) and inserted into the

/// "visited ancestors" list (iff it is not null, i.e., corresponds to an op

/// rather than a block argument).

void pushAncestor(Operation *ancestor) {

ancestorQueue.push(ancestor);

if (ancestor)

visitedAncestors.insert(ancestor);

return;

}

MogballUnsubmitted

Done

Constant ops could be sorted by name as well.

Mogball: Constant ops could be sorted by name as well.

srishti-pmAuthorUnsubmitted

Done

The only reason we separated constant ops from the non-constant ops was because the former are canonicalized to the right (as a stable sort) by existing canonicalizations. And, we didn't want our algorithm to conflict with these existing canonicalizations. That is the reason I am not sorting them by name and just keeping them to the right (as a stable sort).

srishti-pm: The only reason we separated constant ops from the non-constant ops was because the former are…

MogballUnsubmitted

Done

I know. You can sort them separately from regular ops and also by name and the overall behaviour would be the same.

Mogball: I know. You can sort them separately from regular ops and also by name and the overall…

/// Pop the ancestor from the front of the queue.

void popAncestor() {

assert(!ancestorQueue.empty() &&

"to pop the ancestor from the front of the queue, the ancestor "

"queue should be non-empty");

ancestorQueue.pop();

return;

}

/// Return the ancestor at the front of the queue.

Operation *frontAncestor() {

assert(!ancestorQueue.empty() &&

"to access the ancestor at the front of the queue, the ancestor "

"queue should be non-empty");

return ancestorQueue.front();

MogballUnsubmitted

Done

These could all be ArrayRefs since you aren't modifying them

Mogball: These could all be `ArrayRef`s since you aren't modifying them

}

};

/// Returns true iff at least one unassigned operand exists. An unassigned

MogballUnsubmitted

Done

/// alphabetically smaller than that of `AncestorKey2`.

bool operator<(const AncestorKey &key) const {

- if ((type == BLOCK_ARGUMENT && key.type != BLOCK_ARGUMENT) ||

- (type == NON_CONSTANT_OP && key.type == CONSTANT_OP))

- return true;

- if ((key.type == BLOCK_ARGUMENT && type != BLOCK_ARGUMENT) ||

- (key.type == NON_CONSTANT_OP && type == CONSTANT_OP))

- return false;

- return opName < key.opName;

+ return std::tie(type, opName) < std::tie(key.type, key.opName);

}

};

/// Stores the BFS traversal information of an operand.

This should behave the same as the manually written comparator. operator< of std::tuple compares the first element and then the next if the first is equal.

Mogball: This should behave the same as the manually written comparator. `operator<` of `std::tuple`…

/// operand refers to one which has not been assigned a sorted position yet.

static bool

hasAtLeastOneUnassignedOperand(SmallVector<OperandBFS *, 2> bfsOfOperands) {

for (OperandBFS *bfsOfOperand : bfsOfOperands) {

if (!bfsOfOperand->isAssignedSortedPosition)

return true;

}

return false;

}

/// Goes through all the unassigned operands of `bfsOfOperands` and:

/// 1. Stores the indices of the ones with the smallest key in

/// `smallestKeyIndices`,

/// 2. Stores the indices of the ones with the largest key in

MogballUnsubmitted

Done

/// operand at a particular point in time.

- DenseSet<Operation *> visitedAncestors;

+ SmallPtrSet<Operation *> visitedAncestors;

/// Stores the "key" associated with an operand. This "key" is defined as the

Since this is a set of pointers expected to be small, you can use SmallPtrSet for a big efficiency boost (linear scan instead of hashing when small).

Mogball: Since this is a set of pointers expected to be small, you can use `SmallPtrSet` for a big…

srishti-pmAuthorUnsubmitted

Done

This set isn't expected to be small, right? There can be many ancestors. The only thing that is small in this context is the number of operands, right?

srishti-pm: This set isn't expected to be small, right? There can be many ancestors. The only thing that is…

/// `largestKeyIndices`,

/// 3. Sets `hasASingleOperandWithSmallestKey` as true if exactly one of them

/// has the smallest key (and as false otherwise), AND,

/// 4. Sets `hasASingleOperandWithLargestKey` as true if exactly one of them has

/// the largest key (and as false otherwise).

static void getIndicesOfUnassignedOperandsWithSmallestAndLargestKeys(

SmallVector<OperandBFS *, 2> bfsOfOperands,

DenseSet<unsigned> &smallestKeyIndices,

DenseSet<unsigned> &largestKeyIndices,

bool &hasASingleOperandWithSmallestKey,

bool &hasASingleOperandWithLargestKey) {

bool foundAnUnassignedOperand = false;

// Compute the smallest and largest keys present among the unassigned operands

// of `bfsOfOperands`.

std::string smallestKey, largestKey;

for (OperandBFS *bfsOfOperand : bfsOfOperands) {

if (bfsOfOperand->isAssignedSortedPosition)

continue;

std::string currentKey = bfsOfOperand->key;

if (!foundAnUnassignedOperand) {

foundAnUnassignedOperand = true;

smallestKey = currentKey;

largestKey = currentKey;

continue;

}

if (smallestKey > currentKey)

smallestKey = currentKey;

if (largestKey < currentKey)

largestKey = currentKey;

}

// If there is no unassigned operand, assign the necessary values to the input

// arguments and return.

if (!foundAnUnassignedOperand) {

hasASingleOperandWithSmallestKey = false;

hasASingleOperandWithLargestKey = false;

return;

MogballUnsubmitted

Done

SmallVector<AncestorKey, 4> key;

- /// Stores true iff the operand has been assigned a sorted position yet.

- bool isSorted = false;

/// Push an ancestor into the operand's BFS information structure. This

Since you are moving sorted operands into their sorted position and tracking the unsorted range, you shouldn't even need this flag because you will always know the sorted and unsorted subranges.

There are multiple loops in which you iterate over the entire operand list but skip those where this flag is set/unset. In those cases, you can always just iterate the subrange of interest.

Mogball: Since you are moving sorted operands into their sorted position and tracking the unsorted range…

}

// Populate `smallestKeyIndices` and `largestKeyIndices` and set

// `hasASingleOperandWithSmallestKey` and `hasASingleOperandWithLargestKey`

// accordingly.

bool smallestKeyFound = false;

bool largestKeyFound = false;

hasASingleOperandWithSmallestKey = true;

hasASingleOperandWithLargestKey = true;

for (auto indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {

MogballUnsubmitted

Done

void pushAncestor(Operation *op) {

- Ancestor ancestor(op);

- ancestorQueue.push(ancestor);

+ ancestorQueue.push({op});

if (ancestor.isOp)

visitedAncestors.insert(ancestor.op);

- return;

}

/// Pop the ancestor from the front of the queue.

Mogball:

OperandBFS *bfsOfOperand = indexedBfsOfOperand.value();

if (bfsOfOperand->isAssignedSortedPosition)

continue;

unsigned index = indexedBfsOfOperand.index();

std::string currentKey = bfsOfOperand->key;

if (smallestKey == currentKey) {

MogballUnsubmitted

Done

drop mlir::

Mogball: drop mlir::

smallestKeyIndices.insert(index);

if (smallestKeyFound)

hasASingleOperandWithSmallestKey = false;

smallestKeyFound = true;

}

if (largestKey == currentKey) {

largestKeyIndices.insert(index);

if (largestKeyFound)

hasASingleOperandWithLargestKey = false;

MogballUnsubmitted

Done

return;

}

- /// Pop the ancestor from the front of the queue.

- void popAncestor() {

- assert(!ancestorQueue.empty() &&

- "to pop the ancestor from the front of the queue, the ancestor "

- "queue should be non-empty");

- ancestorQueue.pop();

- return;

- }

- /// Return the ancestor at the front of the queue.

- Ancestor frontAncestor() {

- assert(!ancestorQueue.empty() &&

- "to access the ancestor at the front of the queue, the ancestor "

- "queue should be non-empty");

- return ancestorQueue.front();

- }

};

/// Returns:

I would drop these helpers. std::queue will already assert if pop or front are called on an empty queue.

Mogball: I would drop these helpers. `std::queue` will already assert if `pop` or `front` are called on…

largestKeyFound = true;

MogballUnsubmitted

Done

Please remove these. They don't improve readability.

Mogball: Please remove these. They don't improve readability.

}

return;

}

/// Update the key associated with each unassigned operand in `bfsOfOperands`.

/// Updating a key entails making it up-to-date with its associated operand's

/// BFS traversal that has happened till that point in time, i.e, concatenating

/// the exisiting key with the current front ancestor's "key".

/// Note that a key directly reflects the BFS and thus needs to be updated

/// during the progression of the traversal.

static void updateKeys(SmallVector<OperandBFS *, 2> bfsOfOperands) {

for (OperandBFS *bfsOfOperand : bfsOfOperands) {

if (bfsOfOperand->isAssignedSortedPosition ||

bfsOfOperand->ancestorQueue.empty())

continue;

Operation *frontAncestor = bfsOfOperand->frontAncestor();

if (!frontAncestor) {

// When the front ancestor is a block argument, we concatenate the old key

// with "2", which is the key associated with a block argument.

bfsOfOperand->key = (Twine(bfsOfOperand->key) + Twine("2")).str();

} else if (frontAncestor->hasTrait<OpTrait::ConstantLike>()) {

MogballUnsubmitted

Done

This function doesn't seem like it pays for itself.

Mogball: This function doesn't seem like it pays for itself.

// When the front ancestor is a constant-like operation, we concatenate

// the old key with "3", which is the key associated with a constant-like

// operation.

bfsOfOperand->key = (Twine(bfsOfOperand->key) + Twine("3")).str();

} else {

// When the front ancestor is a non-constant-like operation, we

// concatenate the old key with its full op name preceded by "1", which is

// the key associated with a non-constant-like operation.

bfsOfOperand->key = (Twine(bfsOfOperand->key) + Twine("1") +

std::string(frontAncestor->getName().getStringRef()))

.str();

}

return;

}

MogballUnsubmitted

Done

static int compareKeys(ArrayRef<AncestorKey> keyA, ArrayRef<AncestorKey> keyB) {

- unsigned keyASize = keyA.size();

- unsigned keyBSize = keyB.size();

- unsigned smallestSize = keyASize;

- if (keyBSize < smallestSize)

- smallestSize = keyBSize;

- for (unsigned i = 0; i < smallestSize; i++) {

- if (keyA[i] < keyB[i])

- return -1;

- if (keyB[i] < keyA[i])

- return 1;

- }

- if (keyASize == keyBSize)

- return 0;

- if (keyASize < keyBSize)

- return -1;

- return 1;

+ return std::lexicographical_compare(keyA.begin(), keyA.end(),

+ keyB.begin(), keyB.end());

}

/// Refresh the key associated with each unsorted operand in `bfsOfOperands`.

Mogball:

/// If `keyIndices` contains `indexOfOperand` and either `isTheOnlyKey` is true

/// or the ancestor queue of `bfsOfOperand` is empty, assign the sorted position

/// `positionToAssign` to the operand of `op` at index `indexOfOperand`, and

/// return true. Else, return false.

static bool assignSortedPositionTo(OperandBFS *bfsOfOperand,

unsigned indexOfOperand,

DenseSet<unsigned> keyIndices,

bool isTheOnlyKey,

SmallVector<Value, 2> &sortedOperands,

unsigned positionToAssign, Operation *op) {

if (keyIndices.contains(indexOfOperand) &&

(isTheOnlyKey || bfsOfOperand->ancestorQueue.empty())) {

bfsOfOperand->isAssignedSortedPosition = true;

sortedOperands[positionToAssign] = op->getOperand(indexOfOperand);

return true;

}

return false;

}

mehdi_aminiUnsubmitted

Done

The string conversion seems unnecessary to me?

mehdi_amini: The string conversion seems unnecessary to me?

/// In each of the operands of `bfsOfOperands`, pop the front ancestor from the

/// queue, if any, and then push its adjacent unvisited ancestors, if any, to

MogballUnsubmitted

Done

Why can't you sort the OperandBFS directly to avoid the hash map?

Mogball: Why can't you sort the OperandBFS directly to avoid the hash map?

srishti-pmAuthorUnsubmitted

Done

Because comparators are not allowed to modify their input arguments.

srishti-pm: Because comparators are not allowed to modify their input arguments.

MogballUnsubmitted

Done

The arguments have to be const references? If so, just const_cast

Mogball: The arguments have to be const references? If so, just const_cast

/// the queue (this is the main body of the BFS algorithm).

static void popFrontAndPushAdjacentUnvisitedAncestors(

SmallVector<OperandBFS *, 2> bfsOfOperands) {

for (OperandBFS *bfsOfOperand : bfsOfOperands) {

if (bfsOfOperand->isAssignedSortedPosition ||

bfsOfOperand->ancestorQueue.empty())

continue;

Operation *frontAncestor = bfsOfOperand->frontAncestor();

MogballUnsubmitted

Done

// `constCommOperandA`'s "key" is smaller.

- auto CommutativeOperandComparator =

+ auto commutativeOperandComparator =

[](const std::unique_ptr<CommutativeOperand> &constCommOperandA,

Mogball:

bfsOfOperand->popAncestor();

if (!frontAncestor)

continue;

for (Value operand : frontAncestor->getOperands()) {

Operation *thisOperandDefOp = operand.getDefiningOp();

if (!thisOperandDefOp ||

MogballUnsubmitted

Done

return false;

- std::unique_ptr<CommutativeOperand> &commOperandA =

+ auto &commOperandA =

const_cast<std::unique_ptr<CommutativeOperand> &>(

Mogball:

!bfsOfOperand->visitedAncestors.contains(thisOperandDefOp))

bfsOfOperand->pushAncestor(thisOperandDefOp);

}

MogballUnsubmitted

Done

constCommOperandA);

- std::unique_ptr<CommutativeOperand> &commOperandB =

+ auto &commOperandB =

const_cast<std::unique_ptr<CommutativeOperand> &>(

Mogball:

}

return;

}

/// Rewrite `op`, i.e., rearrange its operands in a "sorted" order.

/// The operands of an op are considered to be "sorted" iff:

/// 1. The op is not commutative, OR,

/// 2. It is commutative and its operands are in ascending order of the "keys"

/// associated with them.

///

/// Note that `operandDefOps` stores the list of ops defining its operands (in

/// the order in which they appear in `op`). If an operand is a block argument,

MogballUnsubmitted

Done

ArrayRef<AncestorKey> key,

- SmallVectorImpl<std::unique_ptr<OperandBFS>> &bfsOfOperandsWithKey,

+ SmallVectorImpl<OperandBFS*> &bfsOfOperandsWithKey,

ArrayRef<std::unique_ptr<OperandBFS>> bfsOfOperands,

Mogball:

/// the op defining it stores null.

static void

MogballUnsubmitted

Done

This flag is not necessary because you can just check bfsOfOperandsWithKey.size() == 1

Mogball: This flag is not necessary because you can just check `bfsOfOperandsWithKey.size() == 1`

srishti-pmAuthorUnsubmitted

Done

.size() is an O(N) operation and that is why I usually try to avoid it. Do you still agree we should use it here? I understand that N is an expectedly small value.

srishti-pm: `.size()` is an O(N) operation and that is why I usually try to avoid it. Do you still agree we…

MogballUnsubmitted

Done

Size is constant time

Mogball: Size is constant time

rewriteCommutativeOperands(Operation *op,

SmallVector<Operation *, 2> operandDefOps,

mehdi_aminiUnsubmitted

Done

Is this a leak?

mehdi_amini: Is this a leak?

srishti-pmAuthorUnsubmitted

Done

Fixed this by adding a struct called "Ancestor" which refers to either an op or a block argument.

srishti-pm: Fixed this by adding a struct called "Ancestor" which refers to either an op or a block…

PatternRewriter &rewriter) {

// If `op` is not commutative, do nothing.

if (!op->hasTrait<OpTrait::IsCommutative>())

return;

// `bfsOfOperands` stores the BFS traversal information of each operand of

MogballUnsubmitted

Done

ArrayRef<AncestorKey> currentKey = bfsOfOperand->key;

- if (compareKeys(key, currentKey) == 0) {

+ if (key == currentKey) {

bfsOfOperandsWithKey.push_back(

Mogball:

// `op`. For each operand, this information comprises a queue of ancestors

// being visited during the BFS (at a particular point in time), a list of

MogballUnsubmitted

Done

if (compareKeys(key, currentKey) == 0) {

- bfsOfOperandsWithKey.push_back(

- std::make_unique<OperandBFS>(*bfsOfOperand));

+ bfsOfOperandsWithKey.push_back(bfsOfOperand.get());

if (keyFound)

You don't need to make a copy. In fact, I think you should just track the indices.

Mogball: You don't need to make a copy. In fact, I think you should just track the indices.

srishti-pmAuthorUnsubmitted

Done

I agree. But, we had discussed to store operands instead of indices and that's why I did this. I will change this to use indices again (keeping other things unchanged).

srishti-pm: I agree. But, we had discussed to store operands instead of indices and that's why I did this.

MogballUnsubmitted

Done

I mean you can have list (not a set) of indices to shift

Mogball: I mean you can have list (not a set) of indices to shift

// visited ancestors (at a particular point in time), its associated key (at a

// particular point in time), and whether or not the operand has been assigned

// a sorted position yet.

SmallVector<OperandBFS *, 2> bfsOfOperands;

// Initially, each operand's ancestor queue contains the op defining it (which

// is considered its first ancestor). Thus, it acts as the starting point for

MogballUnsubmitted

Done

Can you use unique_ptr so that the memory doesn't leak?

Mogball: Can you use unique_ptr so that the memory doesn't leak?

// that operand's BFS traversal.

for (Operation *operandDefOp : operandDefOps) {

OperandBFS *bfsOfOperand = new OperandBFS();

bfsOfOperand->pushAncestor(operandDefOp);

bfsOfOperands.push_back(bfsOfOperand);

}

// Since none of the operands have been assigned a sorted position yet, the

// smallest unassigned position is set as zero and the largest one is set as

// the number of operands in `op` minus one (N - 1). This is because each

// operand will be assigned a sorted position between 0 and (N - 1), both

// inclusive.

unsigned numOperands = op->getNumOperands();

unsigned smallestUnassignedPosition = 0;

unsigned largestUnassignedPosition = numOperands - 1;

// `sortedOperands` will store the list of `op`'s operands in sorted order.

// At first, all elements in it are initialized as null.

MogballUnsubmitted

Done

bool &hasOneSmallestOperand) {

- // If there exists no unsorted operand, return false.

- if (llvm::all_of(bfsOfOperands,

- [](const std::unique_ptr<OperandBFS> &bfsOfOperand) {

- return bfsOfOperand->isSorted;

- }))

- return false;

// Get the smallest key present among the unsorted operands.

This shouldn't be necessary if you're tracking the unsorted subrange. Just quit when it gets to 1 element.

Mogball: This shouldn't be necessary if you're tracking the unsorted subrange. Just quit when it gets to…

SmallVector<Value, 2> sortedOperands;

for (unsigned i = 0; i < numOperands; i++)

sortedOperands.push_back(nullptr);

MogballUnsubmitted

Done

// Get the smallest key present among the unsorted operands.

- ArrayRef<AncestorKey> smallestKey =

- computeTheSmallestUnsortedKey(bfsOfOperands);

+ ArrayRef<AncestorKey> smallestKey = *std::min_element(bfsOfOperands.begin(), bfsOfOperands.end(), [](auto lhs, auto rhs) { return compareKeys(lhs, rhs); });

// Set `bfsOfSmallestUnsortedOperands` and `hasOneSmallestOperand`.

Mogball:

// We perform the BFS traversals of all operands parallelly until each of them

// is assigned a sorted position. During the traversals, we try to assign a

// sorted position to an operand as soon as it is possible (based on a

// comparision of its traversal with the other traversals at that particular

MogballUnsubmitted

Done

getBFSOfOperandsWithKey(

- /*key=*/smallestKey,

+ smallestKey,

/*bfsOfOperandsWithKey=*/bfsOfSmallestUnsortedOperands, bfsOfOperands,

Argument names in comments are not necessary when the passed variable has a descriptive name.

Mogball: Argument names in comments are not necessary when the passed variable has a descriptive name.

// point in time).

while (hasAtLeastOneUnassignedOperand(bfsOfOperands)) {

// Update the keys corresponding to all unassigned operands.

updateKeys(bfsOfOperands);

// Stores the indices of the unassigned operands whose key is the smallest.

DenseSet<unsigned> smallestKeyIndices;

// Stores the indices of the unassigned operands whose key is the largest.

DenseSet<unsigned> largestKeyIndices;

// Stores true iff there is a single unassigned operand that has the

// smallest key.

bool hasASingleOperandWithSmallestKey;

// Stores true iff there is a single unassigned operand that has the largest

// key.

bool hasASingleOperandWithLargestKey;

getIndicesOfUnassignedOperandsWithSmallestAndLargestKeys(

bfsOfOperands, smallestKeyIndices, largestKeyIndices,

hasASingleOperandWithSmallestKey, hasASingleOperandWithLargestKey);

// Go through each of the unassigned operands with the smallest key and try

// to assign it a sorted position if possible (ensuring stable sorting).

for (auto indexedBfsOfOperand : llvm::enumerate(bfsOfOperands)) {

OperandBFS *bfsOfOperand = indexedBfsOfOperand.value();

if (bfsOfOperand->isAssignedSortedPosition)

continue;

// If an unassigned operand has the smallest key and:

// 1. It is the only operand with the smallest key, OR,

// 2. Its BFS is complete,

// then,

// this operand is assigned the `smallestUnassignedPosition` (which will

// be its new position in the rearranged `op`).

if (assignSortedPositionTo(

bfsOfOperand, /*indexOfOperand=*/indexedBfsOfOperand.index(),

/*keyIndices=*/smallestKeyIndices,

/*isTheOnlyKey=*/hasASingleOperandWithSmallestKey,

/*sortedOperands=*/sortedOperands,

/*positionToAssign=*/smallestUnassignedPosition, /*op=*/op))

smallestUnassignedPosition++;

MogballUnsubmitted

Done

There is no way you need this much code. A std::swap between the current operand and the first unsorted position should be enough.

Mogball: There is no way you need this much code. A `std::swap` between the current operand and the…

srishti-pmAuthorUnsubmitted

Done

If I do a swap, the sorting will no longer be stable and I believe that there was a discussion that concluded with the fact that "we want stable sorting".

srishti-pm: If I do a swap, the sorting will no longer be stable and I believe that there was a discussion…

MogballUnsubmitted

Done

That's true, but shifting like this is very slow as well. At this point, you might want to give std::stable_sort with a custom comparator that does extra BFS iterations on demand a try.

Mogball: That's true, but shifting like this is very slow as well. At this point, you might want to give…

srishti-pmAuthorUnsubmitted

Done

So, this is what I think:

The number of commutative operands is not expected to be huge. So, we can afford to do shifting. In most cases, we wouldn't have to shift more than 1 or 2 positions. But, the custom comparator might cost us a lot, considering that each BFS could potentially be very large, especially for deep learning models. So, doing the BFS traversals again and again for the same operand, even though caching will be involved, doesn't sound like a good idea to me.

What are your views?

srishti-pm: So, this is what I think: The number of commutative operands is not expected to be huge. So…

MogballUnsubmitted

Done

Very rough estimate: on its own, this function is N. Finding the smallest key is N, and then finding all matching elements is N. This function is called for each operand that needs to be moved, but the number of such operands decreases. So the sort itself averages out to be 3N^2 iterations over the operand list.

Now for traversals, doing BFS on demand inside the comparator doesn't mean it has to restart every time. It would do extra iterations on top of existing iteration results only when needed to break ties. In your case, you do an extra iteration of BFS for all operands if the current smallest key is identical, not just for the ones needed. It's hard to estimate the number of iterations of BFS, but certainly it's more in your case. Using std::stable_sort would also bring the complexity down to N logN

Mogball: Very rough estimate: on its own, this function is N. Finding the smallest key is N, and then…

}

// Go through each of the unassigned operands with the largest key and try

// to assign it a sorted position if possible (ensuring stable sorting).

for (auto indexedBfsOfOperand :

llvm::enumerate(llvm::reverse(bfsOfOperands))) {

OperandBFS *bfsOfOperand = indexedBfsOfOperand.value();

if (bfsOfOperand->isAssignedSortedPosition)

MogballUnsubmitted

Done

This is possibly the longest function name I've ever seen. Please make it more concise.

Mogball: This is possibly the longest function name I've ever seen. Please make it more concise.

srishti-pmAuthorUnsubmitted

Done

Could you please give a suggestion for the name? After a long thought, I came up with this name. It was better than all my other ideas.

srishti-pm: Could you please give a suggestion for the name? After a long thought, I came up with this name.

continue;

// If an unassigned operand has the largest key and:

// 1. It is the only operand with the largest key, OR,

// 2. Its BFS is complete,

// then,

// this operand is assigned the `largestUnassignedPosition` (which will

// be its new position in the rearranged `op`).

if (assignSortedPositionTo(

bfsOfOperand,

/*indexOfOperand=*/numOperands - indexedBfsOfOperand.index() - 1,

mehdi_aminiUnsubmitted

Done

// assigned a sorted position yet.

- SmallVector<OperandBFS *, 2> bfsOfOperands;

+ SmallVector<std::unique_ptr<OperandBFS>, 2> bfsOfOperands;

// Initially, each operand's ancestor queue contains the op defining it

mehdi_amini:

/*keyIndices=*/largestKeyIndices,

/*isTheOnlyKey=*/hasASingleOperandWithLargestKey,

/*sortedOperands=*/sortedOperands,

/*positionToAssign=*/largestUnassignedPosition, /*op=*/op))

largestUnassignedPosition--;

}

// For each operand in `bfsOfOperands`, pop the front ancestor from the

mehdi_aminiUnsubmitted

Done

for (Value operand : op->getOperands()) {

- OperandBFS *bfsOfOperand = new OperandBFS();

- bfsOfOperand->pushAncestor(operand.getDefiningOp());

- bfsOfOperands.push_back(bfsOfOperand);

+ bfsOfOperands.push_back(std::make_unique<OperandBFS>());

+ bfsOfOperands.back()->pushAncestor(operand.getDefiningOp());

}

// Since none of the operands have been assigned a sorted position yet, the

mehdi_amini:

// queue and push its adjacent unvisited ancestors into the queue.

popFrontAndPushAdjacentUnvisitedAncestors(bfsOfOperands);

}

rewriter.updateRootInPlace(op, [&] { op->setOperands(sortedOperands); });

}

/// Sorts `op`.

/// "Sorting" `op` means to "sort" the ops defining each of its operands

/// followed by rearranging its operands in the "sorted" order. Before the

/// rearrangement, it is important to sort the ops defining its operands so that

/// the rearrangement is deterministic. In other words, if these ops were not

/// sorted, the rearrangement would be non-deterministic and would thus make

/// this utility useless.

void mlir::sortCommutativeOperands(Operation *op, PatternRewriter &rewriter) {

assert(op && "the input argument `op` must not be null");

// Before the operands of `op` are rearranged, the operations defining the

// operands of `op` are sorted.

SmallVector<Operation *, 2> operandDefOps;

for (Value operand : op->getOperands()) {

Operation *operandDefOp = operand.getDefiningOp();

MogballUnsubmitted

Done

This check could be moved into pushAncestor

Mogball: This check could be moved into `pushAncestor`

operandDefOps.push_back(operandDefOp);

if (operandDefOp)

sortCommutativeOperands(operandDefOp, rewriter);

}

// Now, rewrite `op`, i.e, rearrange its operands in a "sorted" order.

rewriteCommutativeOperands(op, operandDefOps, rewriter);

return;

}

MogballUnsubmitted

Done

You could just be returning a flag to indicate whether any swapping occurred so that you don't have to track before and after.

Mogball: You could just be returning a flag to indicate whether any swapping occurred so that you don't…

MogballUnsubmitted

Done

shiftTheSmallestUnsortedOperandsToTheSmallestUnsortedPositions(

- bfsOfOperands, smallestUnsortedPosition);

+ bfsOfOperands.slice(smallestUnsortedPosition), smallestUnsortedPosition);

// For each unsorted operand, pop the front ancestor from the BFS queue

And then you'll never need to check isSorted again.

Mogball: And then you'll never need to check `isSorted` again.

mlir/test/Transforms/test-commutativity-utils.mlir

This file was added.

				// RUN: mlir-opt %s -test-commutativity-utils \| FileCheck %s

				// CHECK-LABEL: @test_small_pattern_1
				func @test_small_pattern_1(%arg0 : i32) -> i32 {
				// CHECK-NEXT: %[[ARITH_CONST:.*]] = arith.constant
				%0 = arith.constant 45 : i32

				// CHECK-NEXT: %[[TEST_ADD:.*]] = "test.addi"
				%1 = "test.addi"(%arg0, %arg0): (i32, i32) -> i32

				// CHECK-NEXT: %[[ARITH_ADD:.*]] = arith.addi
				%2 = arith.addi %arg0, %arg0 : i32

				// CHECK-NEXT: %[[ARITH_MUL:.*]] = arith.muli
				%3 = arith.muli %arg0, %arg0 : i32

				// CHECK-NEXT: %[[RESULT:.*]] = "test.op_commutative"(%[[ARITH_ADD]], %[[ARITH_MUL]], %[[TEST_ADD]], %[[ARITH_CONST]])
				%result = "test.op_commutative"(%0, %1, %2, %3): (i32, i32, i32, i32) -> i32

				// CHECK-NEXT: return %[[RESULT]]
				return %result : i32
				}

				// CHECK-LABEL: @test_small_pattern_2
				// CHECK-SAME: (%[[ARG0:.*]]: i32
				func @test_small_pattern_2(%arg0 : i32) -> i32 {
				// CHECK-NEXT: %[[TEST_CONST:.*]] = "test.constant"
				%0 = "test.constant"() {value = 0 : i32} : () -> i32

				// CHECK-NEXT: %[[ARITH_CONST:.*]] = arith.constant
				%1 = arith.constant 0 : i32

				// CHECK-NEXT: %[[ARITH_ADD:.*]] = arith.addi
				%2 = arith.addi %arg0, %arg0 : i32

				// CHECK-NEXT: %[[RESULT:.*]] = "test.op_commutative"(%[[ARITH_ADD]], %[[ARG0]], %[[TEST_CONST]], %[[ARITH_CONST]])
				%result = "test.op_commutative"(%0, %1, %2, %arg0): (i32, i32, i32, i32) -> i32

				// CHECK-NEXT: return %[[RESULT]]
				return %result : i32
				}

				// CHECK-LABEL: @test_large_pattern
				func @test_large_pattern(%arg0 : i32, %arg1 : i32) -> i32 {
				// CHECK-NEXT: arith.divsi
				%0 = arith.divsi %arg0, %arg1 : i32

				// CHECK-NEXT: arith.divsi
				%1 = arith.divsi %0, %arg0 : i32

				// CHECK-NEXT: arith.divsi
				%2 = arith.divsi %1, %arg1 : i32

				// CHECK-NEXT: arith.addi
				%3 = arith.addi %1, %arg1 : i32

				// CHECK-NEXT: arith.subi
				%4 = arith.subi %2, %3 : i32

				// CHECK-NEXT: "test.addi"
				%5 = "test.addi"(%arg0, %arg0): (i32, i32) -> i32

				// CHECK-NEXT: %[[VAL6:.*]] = arith.divsi
				%6 = arith.divsi %4, %5 : i32

				// CHECK-NEXT: arith.divsi
				%7 = arith.divsi %1, %arg1 : i32

				// CHECK-NEXT: %[[VAL8:.*]] = arith.muli
				%8 = arith.muli %1, %arg1 : i32

				// CHECK-NEXT: %[[VAL9:.*]] = arith.subi
				%9 = arith.subi %7, %8 : i32

				// CHECK-NEXT: "test.addi"
				%10 = "test.addi"(%arg0, %arg0): (i32, i32) -> i32

				// CHECK-NEXT: %[[VAL11:.*]] = arith.divsi
				%11 = arith.divsi %9, %10 : i32

				// CHECK-NEXT: %[[VAL12:.*]] = arith.divsi
				%12 = arith.divsi %6, %arg1 : i32

				// CHECK-NEXT: arith.subi
				%13 = arith.subi %arg1, %arg0 : i32

				// CHECK-NEXT: "test.op_commutative"(%[[VAL12]], %[[VAL12]], %[[VAL8]], %[[VAL9]])
				%14 = "test.op_commutative"(%12, %9, %12, %8): (i32, i32, i32, i32) -> i32

				// CHECK-NEXT: %[[VAL15:.*]] = arith.divsi
				%15 = arith.divsi %13, %14 : i32

				// CHECK-NEXT: %[[VAL16:.*]] = arith.addi
				%16 = arith.addi %2, %15 : i32

				// CHECK-NEXT: arith.subi
				%17 = arith.subi %16, %arg1 : i32

				// CHECK-NEXT: "test.addi"
				%18 = "test.addi"(%arg0, %arg0): (i32, i32) -> i32

				// CHECK-NEXT: %[[VAL19:.*]] = arith.divsi
				%19 = arith.divsi %17, %18 : i32

				// CHECK-NEXT: "test.addi"
				%20 = "test.addi"(%arg0, %16): (i32, i32) -> i32

				// CHECK-NEXT: %[[VAL21:.*]] = arith.divsi
				%21 = arith.divsi %17, %20 : i32

				// CHECK-NEXT: %[[RESULT:.*]] = "test.op_large_commutative"(%[[VAL16]], %[[VAL21]], %[[VAL19]], %[[VAL19]], %[[VAL6]], %[[VAL11]], %[[VAL15]])
				%result = "test.op_large_commutative"(%16, %6, %11, %15, %19, %19, %21): (i32, i32, i32, i32, i32, i32, i32) -> i32

				// CHECK-NEXT: return %[[RESULT]]
				return %result : i32
				}

mlir/test/lib/Dialect/Test/TestOps.td

	Show First 20 Lines • Show All 1,095 Lines • ▼ Show 20 Lines
	}			}

	def TestOpWithVariadicResultsAndFolder: TEST_Op<"op_with_variadic_results_and_folder"> {			def TestOpWithVariadicResultsAndFolder: TEST_Op<"op_with_variadic_results_and_folder"> {
	let arguments = (ins Variadic<I32>);			let arguments = (ins Variadic<I32>);
	let results = (outs Variadic<I32>);			let results = (outs Variadic<I32>);
	let hasFolder = 1;			let hasFolder = 1;
	}			}

				def TestAddIOp : TEST_Op<"addi"> {
				let arguments = (ins I32:$op1, I32:$op2);
				let results = (outs I32);
				}

	def TestCommutativeOp : TEST_Op<"op_commutative", [Commutative]> {			def TestCommutativeOp : TEST_Op<"op_commutative", [Commutative]> {
	let arguments = (ins I32:$op1, I32:$op2, I32:$op3, I32:$op4);			let arguments = (ins I32:$op1, I32:$op2, I32:$op3, I32:$op4);
	let results = (outs I32);			let results = (outs I32);
	}			}

				def TestLargeCommutativeOp : TEST_Op<"op_large_commutative", [Commutative]> {
				let arguments = (ins I32:$op1, I32:$op2, I32:$op3, I32:$op4, I32:$op5, I32:$op6, I32:$op7);
				let results = (outs I32);
				}

	def TestCommutative2Op : TEST_Op<"op_commutative2", [Commutative]> {			def TestCommutative2Op : TEST_Op<"op_commutative2", [Commutative]> {
	let arguments = (ins I32:$op1, I32:$op2);			let arguments = (ins I32:$op1, I32:$op2);
	let results = (outs I32);			let results = (outs I32);
	}			}

	def TestIdempotentTraitOp			def TestIdempotentTraitOp
	: TEST_Op<"op_idempotent_trait",			: TEST_Op<"op_idempotent_trait",
	[SameOperandsAndResultType, NoSideEffect, Idempotent]> {			[SameOperandsAndResultType, NoSideEffect, Idempotent]> {
	▲ Show 20 Lines • Show All 1,702 Lines • Show Last 20 Lines

mlir/test/lib/Transforms/CMakeLists.txt

	# Exclude tests from libMLIR.so			# Exclude tests from libMLIR.so
	add_mlir_library(MLIRTestTransforms			add_mlir_library(MLIRTestTransforms
				TestCommutativityUtils.cpp
	TestConstantFold.cpp			TestConstantFold.cpp
	TestControlFlowSink.cpp			TestControlFlowSink.cpp
	TestInlining.cpp			TestInlining.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Transforms
	Show All 11 Lines

mlir/test/lib/Transforms/TestCommutativityUtils.cpp

This file was added.

				//===- TestCommutativityUtils.cpp - Pass to test the commutativity utility-===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass tests the functionality of the commutativity utility.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Transforms/CommutativityUtils.h"

				#include "TestDialect.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

				using namespace mlir;
				using namespace test;

				namespace {

				struct SmallPattern : public OpRewritePattern<TestCommutativeOp> {
				using OpRewritePattern<TestCommutativeOp>::OpRewritePattern;
				LogicalResult matchAndRewrite(TestCommutativeOp testCommOp,
				PatternRewriter &rewriter) const override {
				sortCommutativeOperands(testCommOp.getOperation(), rewriter);
				return success();
				}
				};

				struct LargePattern : public OpRewritePattern<TestLargeCommutativeOp> {
				using OpRewritePattern<TestLargeCommutativeOp>::OpRewritePattern;
				LogicalResult matchAndRewrite(TestLargeCommutativeOp testLargeCommOp,
				PatternRewriter &rewriter) const override {
				sortCommutativeOperands(testLargeCommOp.getOperation(), rewriter);
				return success();
				}
				};

				struct CommutativityUtils
				: public PassWrapper<CommutativityUtils, OperationPass<FuncOp>> {
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(CommutativityUtils)

				StringRef getArgument() const final { return "test-commutativity-utils"; }
				StringRef getDescription() const final {
				return "Test the functionality of the commutativity utility";
				}

				void runOnOperation() override {
				auto func = getOperation();
				auto *context = &getContext();

				RewritePatternSet patterns(context);
				patterns.add<LargePattern, SmallPattern>(context);

				(void)applyPatternsAndFoldGreedily(func, std::move(patterns));
				}
				};
				} // namespace

				namespace mlir {
				namespace test {
				void registerCommutativityUtils() { PassRegistration<CommutativityUtils>(); }
				} // namespace test
				} // namespace mlir

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
void registerTestReducer();		void registerTestReducer();
void registerTestSpirvEntryPointABIPass();		void registerTestSpirvEntryPointABIPass();
void registerTestSpirvModuleCombinerPass();		void registerTestSpirvModuleCombinerPass();
void registerTestTraitsPass();		void registerTestTraitsPass();
void registerTosaTestQuantUtilAPIPass();		void registerTosaTestQuantUtilAPIPass();
void registerVectorizerTestPass();		void registerVectorizerTestPass();

namespace test {		namespace test {
		void registerCommutativityUtils();
void registerConvertCallOpPass();		void registerConvertCallOpPass();
void registerInliner();		void registerInliner();
void registerMemRefBoundCheck();		void registerMemRefBoundCheck();
void registerPatternsTestPass();		void registerPatternsTestPass();
void registerSimpleParametricTilingPass();		void registerSimpleParametricTilingPass();
void registerTestAffineLoopParametricTilingPass();		void registerTestAffineLoopParametricTilingPass();
void registerTestAliasAnalysisPass();		void registerTestAliasAnalysisPass();
void registerTestBuiltinAttributeInterfaces();		void registerTestBuiltinAttributeInterfaces();
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	void registerTestPasses() {
registerTestPrintNestingPass();		registerTestPrintNestingPass();
registerTestReducer();		registerTestReducer();
registerTestSpirvEntryPointABIPass();		registerTestSpirvEntryPointABIPass();
registerTestSpirvModuleCombinerPass();		registerTestSpirvModuleCombinerPass();
registerTestTraitsPass();		registerTestTraitsPass();
registerVectorizerTestPass();		registerVectorizerTestPass();
registerTosaTestQuantUtilAPIPass();		registerTosaTestQuantUtilAPIPass();

		mlir::test::registerCommutativityUtils();
mlir::test::registerConvertCallOpPass();		mlir::test::registerConvertCallOpPass();
mlir::test::registerInliner();		mlir::test::registerInliner();
mlir::test::registerMemRefBoundCheck();		mlir::test::registerMemRefBoundCheck();
mlir::test::registerPatternsTestPass();		mlir::test::registerPatternsTestPass();
mlir::test::registerSimpleParametricTilingPass();		mlir::test::registerSimpleParametricTilingPass();
mlir::test::registerTestAffineLoopParametricTilingPass();		mlir::test::registerTestAffineLoopParametricTilingPass();
mlir::test::registerTestAliasAnalysisPass();		mlir::test::registerTestAliasAnalysisPass();
mlir::test::registerTestBuiltinAttributeInterfaces();		mlir::test::registerTestBuiltinAttributeInterfaces();
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add a utility to sort the operands of commutative opsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428502

clang/docs/tools/clang-formatted-files.txt

mlir/include/mlir/Transforms/CommutativityUtils.h

mlir/lib/Transforms/Utils/CMakeLists.txt

mlir/lib/Transforms/Utils/CommutativityUtils.cpp

mlir/test/Transforms/test-commutativity-utils.mlir

mlir/test/lib/Dialect/Test/TestOps.td

mlir/test/lib/Transforms/CMakeLists.txt

mlir/test/lib/Transforms/TestCommutativityUtils.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

[MLIR] Add a utility to sort the operands of commutative ops
ClosedPublic