This is an archive of the discontinued LLVM Phabricator instance.

@nicolasvasilache this is the draft I have for CSE-ing ops with a single block. Its probably more complex than it needs to be since I think a lot of the things related to hash computation can either be dropped, or simplified a lot.

mravishankar added a child revision: D134307: [mlir][TilingInterface] Add callback to yield a produced value..Sep 20 2022, 1:25 PM

Harbormaster completed remote builds in B187820: Diff 461676.Sep 20 2022, 1:48 PM

Rebase.

Herald added a subscriber: zero9178. · View Herald TranscriptSep 23 2022, 5:49 PM

mravishankar published this revision for review.Sep 23 2022, 5:51 PM

mravishankar retitled this revision from [WIP] CSE of ops with a single block. to [mlir][Transforms] CSE of ops with a single block..

mravishankar edited the summary of this revision. (Show Details)

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2022, 5:51 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

@rriddle and @nicolasvasilache this is first attempt at adding CSE support for blocks with a single region and single block. There are some failures with this, but first want to get an idea if this is along the right direction.

mravishankar added a reviewer: pzread.Sep 23 2022, 5:52 PM

Harbormaster completed remote builds in B188509: Diff 462634.Sep 23 2022, 6:22 PM

mravishankar added a reviewer: Mogball.Sep 26 2022, 12:12 PM

Mogball added inline comments.Sep 26 2022, 12:43 PM

mlir/include/mlir/IR/OperationSupport.h
862 ↗	(On Diff #462634)	I'm not sure operation equivalence needs to be modified to support this change, mainly because this doesn't support multiblock regions. You can hash the region separately from the operation and compare.
mlir/lib/Transforms/CSE.cpp
119	I'm surprised this works. Isn't this hashing the block pointer?

I'll take a more in depth look later.

Thanks @Mogball . I did need this, so happy to get input and adapt accordingly.

mlir/include/mlir/IR/OperationSupport.h
862 ↗	(On Diff #462634)	I think you are right.... The change to `computeHash` itself can be dropped. Needs the changes to `OperationEquivalence` though....
mlir/lib/Transforms/CSE.cpp
119	Yes. AFAIK, its just a hash. Worst case has some collisions.

Mogball added inline comments.Sep 26 2022, 2:25 PM

mlir/lib/Transforms/CSE.cpp
119	What surprises me is that the test cases below work, which shouldn't happen if the hashes are different, but in this case they are because the block pointers are different for each region.

mravishankar added inline comments.Sep 26 2022, 6:10 PM

mlir/lib/Transforms/CSE.cpp
119	Thats was my starting assumption too. My starting understanding was It is the `isEqual` method that really comparese two operations. If that returns true for two operations that can be CSEed then thats enough. The hash is just a way to do fast compares. If two ops have the same hash, they "might" be same. So first you compare hashes, and if hash is the same you compare the op explicitly. So in theory we can explicitly ignore the region hashes, and it should still work .

rriddle added inline comments.Sep 27 2022, 9:34 AM

mlir/include/mlir/IR/OperationSupport.h
863 ↗	(On Diff #462634)	I would either uncomment, or just remove the variable name (it adds nothing here).
mlir/lib/Transforms/CSE.cpp
52–53	This path looks broken. Why is this no longer calling into OperationEquivalence? Please separate out the region code into a different function so that this one is a bit easier to follow.
119	The point of hashing is to simplify the number of collisions. The point @Mogball is making is that this hash will never match with any other really, given that no other region will have `block` inside of it. How are you matching region operations if the hash never matches?

mravishankar added inline comments.Sep 28 2022, 10:42 AM

mlir/lib/Transforms/CSE.cpp
119	I think I understand what you guys are saying. Would it work if I just ignore the region during the hash computation.

Since we are looking at this and there is discussion of splitting, could we factor out some of the CSE logic into independent utility functions? I'd like to be able to call that for a single region without running a pass across everything, like we can do with various pattern rewriters.

Rebase and address some comments.

mravishankar added inline comments.Sep 29 2022, 10:16 AM

mlir/lib/Transforms/CSE.cpp
52–53	It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it! Now all tests pass.
119	So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause more collisions, but IIUC the approach here is (a) If hash is the same use `isEqual` to actually disambiguate. (b) if hashes are different ops are different.

What's going on with the diff?

mlir/test/lib/Dialect/Test/TestOps.td
2992	super nit: can you capitalize CSE here?

In D134306#3824546, @Mogball wrote:

What's going on with the diff?

What specifically. The diff itself is fine. I made some changes, and phabricator doesnt know how to realign

Harbormaster completed remote builds in B189459: Diff 463953.Sep 29 2022, 11:44 AM

I don't see any changes to CSE.cpp anymore

In D134306#3825274, @Mogball wrote:

I don't see any changes to CSE.cpp anymore

Thats surprising. I see changes to CSE.cpp here.

Handle commutative ops correctly.

Harbormaster completed remote builds in B189771: Diff 464390.Sep 30 2022, 2:43 PM

@Mogball do you still not see the diff for CSE.cpp?

I can see them now. I'm not sure what happened

Rebase and address nit.

@rriddle and @Mogball wondering if you guys have any more comments on this (or whether this is landable or not). The functionality is a blocker for other work, so I'd like to get an idea of how to proceed here (if this approach has issues). Happy to address/modify the approach based on your recommendations.

Harbormaster completed remote builds in B190330: Diff 465180.Oct 4 2022, 4:17 PM

Apologies for the delay, we have a company event this week.

mlir/lib/IR/OperationSupport.cpp
724–746	Why do you need two SmallVector? Should be easy enough to order OpResults before block arguments inside of the sort function. Either way, what's the benefit of ordering block arguments differently from results?
mlir/lib/Transforms/CSE.cpp
17	I would've expected that this would already be included.
51–52	The comment of `if they have a single region with a single block` seems like something that should be pushed down to where the context is relevant. Just saying "No regions take the easy path"(or something similar) seems fine here.
62
72
78
96–99	Why check getParent here instead of just checking areEquivalentValues?
105	When does this happen (aside from results of the parent op)?
mlir/test/Transforms/cse.mlir
328–334	Can you group the CHECK lines in these tests? It's a little hard for me to read given the size of the regions.

Rebase and address comments.

mravishankar marked 7 inline comments as done.Oct 5 2022, 11:28 AM

mravishankar added inline comments.

mlir/lib/IR/OperationSupport.cpp
724–746	I combined the two to have a single sort function. I kept it separate for clarity, but there is really no need to do that . As to why block arguments ordering matters, that helps in the equivalence comparison in the call back. For example, if you have the following two regions ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 linalg.yield %0 : f32 and ^bb0(%b0 : f32, %b1: f32) : %1 = arith.addf %b0, %b1 : f32 linalg.yield %1 : f32 These two regions are equivalent. But if you use the pointer values to sort it, when comparing `arith.addf` operands you can end up with the operands of the `arith.addf` in the first region being `[%a0, %a1]` and for the `arith.addf` in the second region being `[%b1, %b0]` . Then the equivalence check fails. If you account for the argument position as well. Writing this though it still doesnt cover all the cases. You could have ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 %1 = arith.mulf %a0, %a1 : f32 %2 = arith.addf %0, %1 : f32 linalg.yield %2 : f32 and ^bb0(%b0 : f32, %b1: f32) : %3 = arith.addf %b0, %b1 : f32 %4 = arith.mulf %b0, %b1 : f32 %5 = arith.addf %3, %4 : f32 linalg.yield %5 : f32 which would still fail cause `%3` and `%4` would be ordered using the `Value`s pointer value. Thats an issue (and really hard to write a test cases for it). Is there a way to completely avoid using pointer value of `Value`?
mlir/lib/Transforms/CSE.cpp
96–99	I think you are right. Dropped the `getParent` check...
105	You need to treat the values created within a region by equivalent operations as equivalent. For example, ^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) : %0 = arith.addf %arg0, %arg1 : f32 %1 = arith.mulf %0, %arg2 : f32 linalg.yield %1 : f32 and ^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) : %2 = arith.addf %arg0, %arg1 : f32 %3 = arith.mulf %2, %arg2 : f32 linalg.yield %3 : f32 `%0` and `%2` are equivalent and based on that `%3` and `%1` are equivalent. Then the regions are the same and the ops can be CSEed (if the other conditions like operands are the same attributes are the same, etc. are met). So `getParent` here checks that this is an op within the region of the two operations being compared, and they are then added to the equivalence set.
mlir/test/Transforms/cse.mlir
328–334	I was just being consistent with the file here. I prefer having the `CHECK*`s together too. Changed.

Drop unnecessary header.

mravishankar marked an inline comment as done.Oct 5 2022, 11:31 AM

mravishankar added inline comments.

mlir/lib/Transforms/CSE.cpp
17	Dont know right now why I needed it (I think it was to get the `mlir::hash_value(Value v)` definition. Dont need it now. Dropped

Harbormaster completed remote builds in B190549: Diff 465482.Oct 5 2022, 12:51 PM

Mogball added inline comments.Oct 10 2022, 9:10 AM

mlir/lib/Transforms/CSE.cpp
79	`auto [lhsArg, rhsArg] =` ?
264	This comment needs an update

LG overall

mlir/lib/Transforms/CSE.cpp
85	This should really be added to `Value`. I've written this same code many times. Thoughts @rriddle ?

This revision is now accepted and ready to land.Oct 10 2022, 9:11 AM

Add an option to enable CSE of single block ops. This seems to have
pretty significant downstream effects. Making this enhancement opt-in
as a ramp to make it default available.

Fix comment.

I would prefer the option be true by default. If the pass is creating erroneous code, it's either a bug in the pass (which I assume isn't the case here), or something wrong with downstream users. It should always be legal to CSE an side-effect-free op.

@Mogball I made the CSE for single block ops guarded by a flag. I used it downstream, and it has some non-trivial effects. It is needed for some things, and this is probably an indication that we need a mechanism that allows conditional CSE (on a set of ops, or class of ops, etc.). For now the flag keeps the "new feature" guarded to use as needed. PTAL since this is a change to the patch from when you approved it.

In D134306#3869775, @mravishankar wrote:

@Mogball I made the CSE for single block ops guarded by a flag. I used it downstream, and it has some non-trivial effects. It is needed for some things, and this is probably an indication that we need a mechanism that allows conditional CSE (on a set of ops, or class of ops, etc.). For now the flag keeps the "new feature" guarded to use as needed. PTAL since this is a change to the patch from when you approved it.

What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.

This revision now requires changes to proceed.Oct 19 2022, 4:52 PM

Harbormaster completed remote builds in B193125: Diff 469083.Oct 19 2022, 5:15 PM

What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.

Basically CSE increases lifetimes of operations. So actually in the long term we probably need CSE method to be a more controllable pass than it is today. The fact is that this is an issue today already. You could have an mhlo.dot which is a GEMM operation that will get CSE-ed, but when lowered to linalg.matmul it doesnt get CSEed, while the impact of CSE-ing these operations are the same....
It also has impact on computation that is in destination passing style. This is the issue I hit.

%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%3 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%4 = linalg.fill ins(%cst) outs(%3) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%4) : tensor<?x?xf32>

Earlier the linalg.fill would not CSE (because actually its an op with a single block). Now it will

%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>

While the semantics is correct, this is a more opinionated change than "just CSE". This is effectively adding false sharing of the output when one didnt exist earlier. So bufferization (or some pre-processing before bufferization) will have to break this false dependency. All these could have happened earlier as well with CSE. It is just exposed now because the linalg operations have a single block, and those just happened to not be covered by CSE. So what this change did is unearth an issue that already existed, which needs to be fixed by allowing CSE to be more controllable....
If having a flag is an issue, then please suggest a way forward. This is blocking quite a lot of things for me.....

FYI @ftynse @nicolasvasilache

In D134306#3871667, @mravishankar wrote:
What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.
Basically CSE increases lifetimes of operations. So actually in the long term we probably need CSE method to be a more controllable pass than it is today. The fact is that this is an issue today already. You could have an mhlo.dot which is a GEMM operation that will get CSE-ed, but when lowered to linalg.matmul it doesnt get CSEed, while the impact of CSE-ing these operations are the same....
It also has impact on computation that is in destination passing style. This is the issue I hit.
%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%3 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%4 = linalg.fill ins(%cst) outs(%3) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%4) : tensor<?x?xf32>
Earlier the linalg.fill would not CSE (because actually its an op with a single block). Now it will
%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
While the semantics is correct, this is a more opinionated change than "just CSE". This is effectively adding false sharing of the output when one didnt exist earlier. So bufferization (or some pre-processing before bufferization) will have to break this false dependency. All these could have happened earlier as well with CSE. It is just exposed now because the linalg operations have a single block, and those just happened to not be covered by CSE. So what this change did is unearth an issue that already existed, which needs to be fixed by allowing CSE to be more controllable....

I really don't see the problem here, this is precisely what I would expect.
The fact that downstream assumptions are not valid anymore should not be a blocker for this to progress.

If having a flag is an issue, then please suggest a way forward. This is blocking quite a lot of things for me.....

FYI @ftynse @nicolasvasilache

It seems like you want downstream-specific behavior to be preserved, this suggests downstream-specific changes (either a custom CSE but ideally don't rely on such assumptions).

I would drop the flag from this PR.

Regardless of the discussion on Linalg vs. MHLO CSE behavior, which is arguably up to the dialects to decide, I think we should just (a) just enable this change without a flag as it looks reasonable to have and (b) consider finer-grain control over general simplification (canonicalization, CSE, constant folding and propagation) in the longer term.

Drop the optionality flag...

@rriddle dropped the flag. PTAL.

Harbormaster completed remote builds in B194698: Diff 471212.Oct 27 2022, 11:46 AM

Thanks Mahesh!

I am fine with this as it stands now.

rriddle accepted this revision.Nov 1 2022, 10:44 PM

rriddle added inline comments.

mlir/lib/Transforms/CSE.cpp
264	Not done yet?

This revision is now accepted and ready to land.Nov 1 2022, 10:44 PM

Herald added a subscriber: Moerafaat. · View Herald TranscriptNov 1 2022, 10:44 PM

Sorry for the delay, swamped lately. The flang build breakages look real though.

Rebase and update.

Harbormaster completed remote builds in B196987: Diff 474389.Nov 9 2022, 5:00 PM

Rebase and add dependency on D137857

mravishankar added a parent revision: D137857: [mlir] Remove `Transforms/SideEffectUtils.h` and move the methods into `Interface/SideEffectInterfaces.h`..Nov 11 2022, 2:24 PM

Harbormaster completed remote builds in B197307: Diff 474860.Nov 11 2022, 2:38 PM

Rebase

Herald added a subscriber: hanchung. · View Herald TranscriptNov 15 2022, 3:29 PM

Harbormaster completed remote builds in B197856: Diff 475607.Nov 15 2022, 4:02 PM

Fix failing lit test.

Herald added a reviewer: aartbik. · View Herald TranscriptNov 15 2022, 6:26 PM

Herald added a subscriber: anlunx. · View Herald Transcript

Harbormaster completed remote builds in B197890: Diff 475654.Nov 15 2022, 6:46 PM

This revision was landed with ongoing or failed builds.Nov 15 2022, 6:56 PM

Closed by commit rGfcaf6dd597ea: [mlir][Transforms] CSE of ops with a single block. (authored by mravishankar). · Explain Why

This revision was automatically updated to reflect the committed changes.

mravishankar added a commit: rGfcaf6dd597ea: [mlir][Transforms] CSE of ops with a single block..

aartbik added inline comments.Nov 15 2022, 9:06 PM

mlir/test/Dialect/SparseTensor/codegen_buffer_initialization.mlir
20	LGTM, but we need a FIXME for the windows determinism

Revision Contents

Path

Size

mlir/

lib/

IR/

OperationSupport.cpp

36 lines

Transforms/

CSE.cpp

68 lines

test/

Dialect/

SparseTensor/

codegen_buffer_initialization.mlir

1 line

Transforms/

cse.mlir

124 lines

lib/

Dialect/

Test/

TestOps.td

19 lines

Diff 475659

mlir/lib/IR/OperationSupport.cpp

Show First 20 Lines • Show All 715 Lines • ▼ Show 20 Lines	if (lhs->getName() != rhs->getName() \|\|
lhs->getNumResults() != rhs->getNumResults())		lhs->getNumResults() != rhs->getNumResults())
return false;		return false;
if (!(flags & IgnoreLocations) && lhs->getLoc() != rhs->getLoc())		if (!(flags & IgnoreLocations) && lhs->getLoc() != rhs->getLoc())
return false;		return false;

ValueRange lhsOperands = lhs->getOperands(), rhsOperands = rhs->getOperands();		ValueRange lhsOperands = lhs->getOperands(), rhsOperands = rhs->getOperands();
SmallVector<Value> lhsOperandStorage, rhsOperandStorage;		SmallVector<Value> lhsOperandStorage, rhsOperandStorage;
if (lhs->hasTrait<mlir::OpTrait::IsCommutative>()) {		if (lhs->hasTrait<mlir::OpTrait::IsCommutative>()) {
lhsOperandStorage.append(lhsOperands.begin(), lhsOperands.end());		auto sortValues = [](ValueRange values) {
llvm::sort(lhsOperandStorage, [](Value a, Value b) -> bool {		SmallVector<Value> sortedValues = llvm::to_vector(values);
return a.getAsOpaquePointer() < b.getAsOpaquePointer();		llvm::sort(sortedValues, [](Value a, Value b) {
});		auto aArg = a.dyn_cast<BlockArgument>();
lhsOperands = lhsOperandStorage;		auto bArg = b.dyn_cast<BlockArgument>();

		// Case 1. Both `a` and `b` are `BlockArgument`s.
		if (aArg && bArg) {
		if (aArg.getParentBlock() == bArg.getParentBlock())
		return aArg.getArgNumber() < bArg.getArgNumber();
		return aArg.getParentBlock() < bArg.getParentBlock();
		}

		// Case 2. One of then is a `BlockArgument` and other is not. Treat
		// `BlockArgument` as lesser.
		if (aArg && !bArg)
		return true;
		if (bArg && !aArg)
		return false;

rhsOperandStorage.append(rhsOperands.begin(), rhsOperands.end());		// Case 3. Both are values.
llvm::sort(rhsOperandStorage, [](Value a, Value b) -> bool {
return a.getAsOpaquePointer() < b.getAsOpaquePointer();		return a.getAsOpaquePointer() < b.getAsOpaquePointer();
});		});
		rriddleUnsubmitted Done Reply Inline Actions Why do you need two SmallVector? Should be easy enough to order OpResults before block arguments inside of the sort function. Either way, what's the benefit of ordering block arguments differently from results? rriddle: Why do you need two SmallVector? Should be easy enough to order OpResults before block…
		mravishankarAuthorUnsubmitted Done Reply Inline Actions I combined the two to have a single sort function. I kept it separate for clarity, but there is really no need to do that . As to why block arguments ordering matters, that helps in the equivalence comparison in the call back. For example, if you have the following two regions ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 linalg.yield %0 : f32 and ^bb0(%b0 : f32, %b1: f32) : %1 = arith.addf %b0, %b1 : f32 linalg.yield %1 : f32 These two regions are equivalent. But if you use the pointer values to sort it, when comparing `arith.addf` operands you can end up with the operands of the `arith.addf` in the first region being `[%a0, %a1]` and for the `arith.addf` in the second region being `[%b1, %b0]` . Then the equivalence check fails. If you account for the argument position as well. Writing this though it still doesnt cover all the cases. You could have ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 %1 = arith.mulf %a0, %a1 : f32 %2 = arith.addf %0, %1 : f32 linalg.yield %2 : f32 and ^bb0(%b0 : f32, %b1: f32) : %3 = arith.addf %b0, %b1 : f32 %4 = arith.mulf %b0, %b1 : f32 %5 = arith.addf %3, %4 : f32 linalg.yield %5 : f32 which would still fail cause `%3` and `%4` would be ordered using the `Value`s pointer value. Thats an issue (and really hard to write a test cases for it). Is there a way to completely avoid using pointer value of `Value`? mravishankar: I combined the two to have a single sort function. I kept it separate for clarity, but there is…
		return sortedValues;
		};
		lhsOperandStorage = sortValues(lhsOperands);
		lhsOperands = lhsOperandStorage;
		rhsOperandStorage = sortValues(rhsOperands);
rhsOperands = rhsOperandStorage;		rhsOperands = rhsOperandStorage;
}		}
auto checkValueRangeMapping =		auto checkValueRangeMapping =
[](ValueRange lhs, ValueRange rhs,		[](ValueRange lhs, ValueRange rhs,
function_ref<LogicalResult(Value, Value)> mapValues) {		function_ref<LogicalResult(Value, Value)> mapValues) {
for (auto operandPair : llvm::zip(lhs, rhs)) {		for (auto operandPair : llvm::zip(lhs, rhs)) {
Value curArg = std::get<0>(operandPair);		Value curArg = std::get<0>(operandPair);
Value otherArg = std::get<1>(operandPair);		Value otherArg = std::get<1>(operandPair);
▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

mlir/lib/Transforms/CSE.cpp

//===- CSE.cpp - Common Sub-expression Elimination ------------------------===// //===- CSE.cpp - Common Sub-expression Elimination ------------------------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This transformation pass performs a simple common sub-expression elimination // This transformation pass performs a simple common sub-expression elimination

// algorithm on operations within a region. // algorithm on operations within a region.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "mlir/Transforms/Passes.h" #include "mlir/Transforms/Passes.h"

#include "mlir/IR/Dominance.h" #include "mlir/IR/Dominance.h"

#include "mlir/Interfaces/SideEffectInterfaces.h" #include "mlir/Interfaces/SideEffectInterfaces.h"

rriddleUnsubmitted

Done

I would've expected that this would already be included.

rriddle: I would've expected that this would already be included.

mravishankarAuthorUnsubmitted

Done

Dont know right now why I needed it (I think it was to get the mlir::hash_value(Value v) definition. Dont need it now. Dropped

mravishankar: Dont know right now why I needed it (I think it was to get the `mlir::hash_value(Value v)`…

#include "mlir/Pass/Pass.h" #include "mlir/Pass/Pass.h"

#include "llvm/ADT/DenseMapInfo.h" #include "llvm/ADT/DenseMapInfo.h"

#include "llvm/ADT/Hashing.h" #include "llvm/ADT/Hashing.h"

#include "llvm/ADT/ScopedHashTable.h" #include "llvm/ADT/ScopedHashTable.h"

#include "llvm/Support/Allocator.h" #include "llvm/Support/Allocator.h"

#include "llvm/Support/RecyclingAllocator.h" #include "llvm/Support/RecyclingAllocator.h"

#include <deque> #include <deque>

Show All 16 Lines struct SimpleOperationInfo : public llvm::DenseMapInfo<Operation *> {

static bool isEqual(const Operation *lhsC, const Operation *rhsC) { static bool isEqual(const Operation *lhsC, const Operation *rhsC) {

auto *lhs = const_cast<Operation *>(lhsC); auto *lhs = const_cast<Operation *>(lhsC);

auto *rhs = const_cast<Operation *>(rhsC); auto *rhs = const_cast<Operation *>(rhsC);

if (lhs == rhs) if (lhs == rhs)

return true; return true;

if (lhs == getTombstoneKey() || lhs == getEmptyKey() || if (lhs == getTombstoneKey() || lhs == getEmptyKey() ||

rhs == getTombstoneKey() || rhs == getEmptyKey()) rhs == getTombstoneKey() || rhs == getEmptyKey())

return false; return false;

// If op has no regions, operation equivalence w.r.t operands alone is

// enough.

rriddleUnsubmitted

Done

The comment of if they have a single region with a single block seems like something that should be pushed down to where the context is relevant. Just saying "No regions take the easy path"(or something similar) seems fine here.

rriddle: The comment of `if they have a single region with a single block` seems like something that…

if (lhs->getNumRegions() == 0 && rhs->getNumRegions() == 0) {

rriddleUnsubmitted

Done

This path looks broken. Why is this no longer calling into OperationEquivalence? Please separate out the region code into a different function so that this one is a bit easier to follow.

rriddle: This path looks broken. Why is this no longer calling into OperationEquivalence? Please…

mravishankarAuthorUnsubmitted

Done

It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it! Now all tests pass.

mravishankar: It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it!

return OperationEquivalence::isEquivalentTo( return OperationEquivalence::isEquivalentTo(

const_cast<Operation *>(lhsC), const_cast<Operation *>(rhsC), const_cast<Operation *>(lhsC), const_cast<Operation *>(rhsC),

/*mapOperands=*/OperationEquivalence::exactValueMatch, OperationEquivalence::exactValueMatch,

/*mapResults=*/OperationEquivalence::ignoreValueEquivalence, OperationEquivalence::ignoreValueEquivalence,

OperationEquivalence::IgnoreLocations); OperationEquivalence::IgnoreLocations);

} }

// If lhs or rhs does not have a single region with a single block, they

// aren't CSEed for now.

rriddleUnsubmitted

Done

// If lhs or rhs does not have a single region with a single block, they

- // arent CSEed for now.

+ // aren't CSEed for now.

if (lhs->getNumRegions() != 1 || rhs->getNumRegions() != 1 ||

rriddle:

if (lhs->getNumRegions() != 1 || rhs->getNumRegions() != 1 ||

!llvm::hasSingleElement(lhs->getRegion(0)) ||

!llvm::hasSingleElement(rhs->getRegion(0)))

return false;

// Compare the two blocks.

Block &lhsBlock = lhs->getRegion(0).front();

Block &rhsBlock = rhs->getRegion(0).front();

// Don't CSE if number of arguments differ.

rriddleUnsubmitted

Done

Block &rhsBlock = rhs->getRegion(0).front();

- // If number of arguments differ, not CSEed

+ // Don't CSE if the number of arguments differ.

if (lhsBlock.getNumArguments() != rhsBlock.getNumArguments())

rriddle:

if (lhsBlock.getNumArguments() != rhsBlock.getNumArguments())

return false;

// Map to store `Value`s from `lhsBlock` that are equivalent to `Value`s in

// `rhsBlock`. `Value`s from `lhsBlock` are the key.

DenseMap<Value, Value> areEquivalentValues;

rriddleUnsubmitted

Done

// `rhsBlock`. `Value`s from `lhsBlock` are the key.

- llvm::DenseMap<Value, Value> areEquivalentValues;

+ DenseMap<Value, Value> areEquivalentValues;

for (auto bbArgs : llvm::zip(lhs->getRegion(0).getArguments(),

rriddle:

for (auto bbArgs : llvm::zip(lhs->getRegion(0).getArguments(),

MogballUnsubmitted

Not Done

auto [lhsArg, rhsArg] = ?

Mogball: `auto [lhsArg, rhsArg] = `?

rhs->getRegion(0).getArguments())) {

areEquivalentValues[std::get<0>(bbArgs)] = std::get<1>(bbArgs);

}

// Helper function to get the parent operation.

auto getParent = [](Value v) -> Operation * {

MogballUnsubmitted

Not Done

This should really be added to Value. I've written this same code many times. Thoughts @rriddle ?

Mogball: This should really be added to `Value`. I've written this same code many times. Thoughts…

if (auto blockArg = v.dyn_cast<BlockArgument>())

return blockArg.getParentBlock()->getParentOp();

return v.getDefiningOp()->getParentOp();

};

// Callback to compare if operands of ops in the region of `lhs` and `rhs`

// are equivalent.

auto mapOperands = [&](Value lhsValue, Value rhsValue) -> LogicalResult {

if (lhsValue == rhsValue)

return success();

if (areEquivalentValues.lookup(lhsValue) == rhsValue)

return success();

return failure();

};

rriddleUnsubmitted

Done

Why check getParent here instead of just checking areEquivalentValues?

rriddle: Why check getParent here instead of just checking areEquivalentValues?

mravishankarAuthorUnsubmitted

Done

I think you are right. Dropped the getParent check...

mravishankar: I think you are right. Dropped the `getParent` check...

// Callback to compare if results of ops in the region of `lhs` and `rhs`

// are equivalent.

auto mapResults = [&](Value lhsResult, Value rhsResult) -> LogicalResult {

if (getParent(lhsResult) == lhs && getParent(rhsResult) == rhs) {

auto insertion = areEquivalentValues.insert({lhsResult, rhsResult});

rriddleUnsubmitted

Done

When does this happen (aside from results of the parent op)?

rriddle: When does this happen (aside from results of the parent op)?

mravishankarAuthorUnsubmitted

Done

You need to treat the values created within a region by equivalent operations as equivalent. For example,

^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) :
  %0 = arith.addf %arg0, %arg1 : f32
  %1 = arith.mulf %0, %arg2 : f32
  linalg.yield %1 : f32

and

^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) :
  %2 = arith.addf %arg0, %arg1 : f32
  %3 = arith.mulf %2, %arg2 : f32
  linalg.yield %3 : f32

%0 and %2 are equivalent and based on that %3 and %1 are equivalent. Then the regions are the same and the ops can be CSEed (if the other conditions like operands are the same attributes are the same, etc. are met).

So getParent here checks that this is an op within the region of the two operations being compared, and they are then added to the equivalence set.

mravishankar: You need to treat the values created within a region by equivalent operations as equivalent.

return success(insertion.first->second == rhsResult);

}

return success();

};

return OperationEquivalence::isEquivalentTo(

const_cast<Operation *>(lhsC), const_cast<Operation *>(rhsC),

mapOperands, mapResults, OperationEquivalence::IgnoreLocations);

}

}; };

} // namespace } // namespace

namespace { namespace {

/// Simple common sub-expression elimination. /// Simple common sub-expression elimination.

MogballUnsubmitted

Done

I'm surprised this works. Isn't this hashing the block pointer?

Mogball: I'm surprised this works. Isn't this hashing the block pointer?

mravishankarAuthorUnsubmitted

Done

Yes. AFAIK, its just a hash. Worst case has some collisions.

mravishankar: Yes. AFAIK, its just a hash. Worst case has some collisions.

MogballUnsubmitted

Done

What surprises me is that the test cases below work, which shouldn't happen if the hashes are different, but in this case they are because the block pointers are different for each region.

Mogball: What surprises me is that the test cases below work, which shouldn't happen if the hashes are…

mravishankarAuthorUnsubmitted

Done

Thats was my starting assumption too. My starting understanding was

It is the isEqual method that really comparese two operations. If that returns true for two operations that can be CSEed then thats enough.
The hash is just a way to do fast compares. If two ops have the same hash, they "might" be same. So first you compare hashes, and if hash is the same you compare the op explicitly.

So in theory we can explicitly ignore the region hashes, and it should still work .

mravishankar: Thats was my starting assumption too. My starting understanding was 1) It is the `isEqual`…

rriddleUnsubmitted

Done

The point of hashing is to simplify the number of collisions. The point @Mogball is making is that this hash will never match with any other really, given that no other region will have block inside of it. How are you matching region operations if the hash never matches?

rriddle: The point of hashing is to simplify the number of collisions. The point @Mogball is making is…

mravishankarAuthorUnsubmitted

Done

I think I understand what you guys are saying. Would it work if I just ignore the region during the hash computation.

mravishankar: I think I understand what you guys are saying. Would it work if I just ignore the region during…

mravishankarAuthorUnsubmitted

Done

So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause more collisions, but IIUC the approach here is
(a) If hash is the same use isEqual to actually disambiguate.
(b) if hashes are different ops are different.

mravishankar: So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause…

struct CSE : public impl::CSEBase<CSE> { struct CSE : public impl::CSEBase<CSE> {

/// Shared implementation of operation elimination and scoped map definitions. /// Shared implementation of operation elimination and scoped map definitions.

using AllocatorTy = llvm::RecyclingAllocator< using AllocatorTy = llvm::RecyclingAllocator<

llvm::BumpPtrAllocator, llvm::BumpPtrAllocator,

llvm::ScopedHashTableVal<Operation *, Operation *>>; llvm::ScopedHashTableVal<Operation *, Operation *>>;

using ScopedMapTy = llvm::ScopedHashTable<Operation *, Operation *, using ScopedMapTy = llvm::ScopedHashTable<Operation *, Operation *,

SimpleOperationInfo, AllocatorTy>; SimpleOperationInfo, AllocatorTy>;

▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines LogicalResult CSE::simplifyOperation(ScopedMapTy &knownValues, Operation *op,

// If the operation is already trivially dead just add it to the erase list. // If the operation is already trivially dead just add it to the erase list.

if (isOpTriviallyDead(op)) { if (isOpTriviallyDead(op)) {

opsToErase.push_back(op); opsToErase.push_back(op);

++numDCE; ++numDCE;

return success(); return success();

} }

// Don't simplify operations with nested blocks. We don't currently model // Don't simplify operations with nested blocks. We don't currently model

// equality comparisons correctly among other things. It is also unclear // equality comparisons correctly among other things. It is also unclear

MogballUnsubmitted

Done

This comment needs an update

Mogball: This comment needs an update

rriddleUnsubmitted

Not Done

Not done yet?

rriddle: Not done yet?

// whether we would want to CSE such operations. // whether we would want to CSE such operations.

if (op->getNumRegions() != 0) if (!(op->getNumRegions() == 0 ||

(op->getNumRegions() == 1 && llvm::hasSingleElement(op->getRegion(0)))))

return failure(); return failure();

// Some simple use case of operation with memory side-effect are dealt with // Some simple use case of operation with memory side-effect are dealt with

// here. Operations with no side-effect are done after. // here. Operations with no side-effect are done after.

if (!isMemoryEffectFree(op)) { if (!isMemoryEffectFree(op)) {

auto memEffects = dyn_cast<MemoryEffectOpInterface>(op); auto memEffects = dyn_cast<MemoryEffectOpInterface>(op);

// TODO: Only basic use case for operations with MemoryEffects::Read can be // TODO: Only basic use case for operations with MemoryEffects::Read can be

// eleminated now. More work needs to be done for more complicated patterns // eleminated now. More work needs to be done for more complicated patterns

▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

mlir/test/Dialect/SparseTensor/codegen_buffer_initialization.mlir

	Show All 11 Lines
	// CHECK: %[[T2:.*]] = memref.alloc() : memref<16xindex>			// CHECK: %[[T2:.*]] = memref.alloc() : memref<16xindex>
	// CHECK: %[[T3:.*]] = memref.cast %[[T2]] : memref<16xindex> to memref<?xindex>			// CHECK: %[[T3:.*]] = memref.cast %[[T2]] : memref<16xindex> to memref<?xindex>
	// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T2]] : memref<16xindex>)			// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T2]] : memref<16xindex>)
	// CHECK: %[[T4:.*]] = memref.alloc() : memref<16xindex>			// CHECK: %[[T4:.*]] = memref.alloc() : memref<16xindex>
	// CHECK: %[[T5:.*]] = memref.cast %[[T4]] : memref<16xindex> to memref<?xindex>			// CHECK: %[[T5:.*]] = memref.cast %[[T4]] : memref<16xindex> to memref<?xindex>
	// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T4]] : memref<16xindex>)			// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T4]] : memref<16xindex>)
	// CHECK: %[[T6:.*]] = memref.alloc() : memref<16xf64>			// CHECK: %[[T6:.*]] = memref.alloc() : memref<16xf64>
	// CHECK: %[[T7:.*]] = memref.cast %[[T6]] : memref<16xf64> to memref<?xf64>			// CHECK: %[[T7:.*]] = memref.cast %[[T6]] : memref<16xf64> to memref<?xf64>
	// CHECK: linalg.fill ins(%{{.*}} : f64) outs(%[[T6]] : memref<16xf64>)
	aartbikUnsubmitted Not Done Reply Inline Actions LGTM, but we need a FIXME for the windows determinism aartbik: LGTM, but we need a FIXME for the windows determinism
	// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T1]] : memref<3xindex>)			// CHECK: linalg.fill ins(%[[C0]] : index) outs(%[[T1]] : memref<3xindex>)
	// CHECK: memref.store %[[A]], %[[T0]][%[[C0]]] : memref<1xindex>			// CHECK: memref.store %[[A]], %[[T0]][%[[C0]]] : memref<1xindex>
	// CHECK: %[[P0:.*]] = sparse_tensor.push_back %[[T1]], %[[T3]]			// CHECK: %[[P0:.*]] = sparse_tensor.push_back %[[T1]], %[[T3]]
	// CHECK: %[[P1:.*]] = sparse_tensor.push_back %[[T1]], %[[P0]]			// CHECK: %[[P1:.*]] = sparse_tensor.push_back %[[T1]], %[[P0]]
	// CHECK: return %[[T0]], %[[T1]], %[[P1]], %[[T5]], %[[T7]] :			// CHECK: return %[[T0]], %[[T1]], %[[P1]], %[[T5]], %[[T7]] :
	func.func @sparse_alloc_sparse_vector(%arg0: index) -> tensor<?xf64, #SV> {			func.func @sparse_alloc_sparse_vector(%arg0: index) -> tensor<?xf64, #SV> {
	%0 = bufferization.alloc_tensor(%arg0) : tensor<?xf64, #SV>			%0 = bufferization.alloc_tensor(%arg0) : tensor<?xf64, #SV>
	%1 = sparse_tensor.load %0 : tensor<?xf64, #SV>			%1 = sparse_tensor.load %0 : tensor<?xf64, #SV>
	return %1 : tensor<?xf64, #SV>			return %1 : tensor<?xf64, #SV>
	}			}

mlir/test/Transforms/cse.mlir

	Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines
	func.func @check_cummutative_cse(%a : i32, %b : i32) -> i32 {			func.func @check_cummutative_cse(%a : i32, %b : i32) -> i32 {
	// CHECK: %[[ADD1:.]] = arith.addi %{{.}}, %{{.*}} : i32			// CHECK: %[[ADD1:.]] = arith.addi %{{.}}, %{{.*}} : i32
	%1 = arith.addi %a, %b : i32			%1 = arith.addi %a, %b : i32
	%2 = arith.addi %b, %a : i32			%2 = arith.addi %b, %a : i32
	// CHECK-NEXT: arith.muli %[[ADD1]], %[[ADD1]] : i32			// CHECK-NEXT: arith.muli %[[ADD1]], %[[ADD1]] : i32
	%3 = arith.muli %1, %2 : i32			%3 = arith.muli %1, %2 : i32
	return %3 : i32			return %3 : i32
	}			}

				// Check that an operation with a single region can CSE.
				func.func @cse_single_block_ops(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32):
				test.region_yield %arg0 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32):
				rriddleUnsubmitted Done Reply Inline Actions Can you group the CHECK lines in these tests? It's a little hard for me to read given the size of the regions. rriddle: Can you group the CHECK lines in these tests? It's a little hard for me to read given the size…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions I was just being consistent with the file here. I prefer having the `CHECK`s together too. Changed. mravishankar:* I was just being consistent with the file here. I prefer having the `CHECK*`s together too.
				test.region_yield %arg0 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @cse_single_block_ops
				// CHECK: %[[OP:.+]] = test.cse_of_single_block_op
				// CHECK-NOT: test.cse_of_single_block_op
				// CHECK: return %[[OP]], %[[OP]]

				// Operations with different number of bbArgs dont CSE.
				func.func @no_cse_varied_bbargs(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				test.region_yield %arg0 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32):
				test.region_yield %arg0 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @no_cse_varied_bbargs
				// CHECK: %[[OP0:.+]] = test.cse_of_single_block_op
				// CHECK: %[[OP1:.+]] = test.cse_of_single_block_op
				// CHECK: return %[[OP0]], %[[OP1]]

				// Operations with different regions dont CSE
				func.func @no_cse_region_difference_simple(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				test.region_yield %arg0 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				test.region_yield %arg1 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @no_cse_region_difference_simple
				// CHECK: %[[OP0:.+]] = test.cse_of_single_block_op
				// CHECK: %[[OP1:.+]] = test.cse_of_single_block_op
				// CHECK: return %[[OP0]], %[[OP1]]

				// Operation with identical region with multiple statements CSE.
				func.func @cse_single_block_ops_identical_bodies(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>, %c : f32, %d : i1)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.divf %arg0, %arg1 : f32
				%2 = arith.remf %arg0, %c : f32
				%3 = arith.select %d, %1, %2 : f32
				test.region_yield %3 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.divf %arg0, %arg1 : f32
				%2 = arith.remf %arg0, %c : f32
				%3 = arith.select %d, %1, %2 : f32
				test.region_yield %3 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @cse_single_block_ops_identical_bodies
				// CHECK: %[[OP:.+]] = test.cse_of_single_block_op
				// CHECK-NOT: test.cse_of_single_block_op
				// CHECK: return %[[OP]], %[[OP]]

				// Operation with non-identical regions dont CSE.
				func.func @no_cse_single_block_ops_different_bodies(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>, %c : f32, %d : i1)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.divf %arg0, %arg1 : f32
				%2 = arith.remf %arg0, %c : f32
				%3 = arith.select %d, %1, %2 : f32
				test.region_yield %3 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.divf %arg0, %arg1 : f32
				%2 = arith.remf %arg0, %c : f32
				%3 = arith.select %d, %2, %1 : f32
				test.region_yield %3 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @no_cse_single_block_ops_different_bodies
				// CHECK: %[[OP0:.+]] = test.cse_of_single_block_op
				// CHECK: %[[OP1:.+]] = test.cse_of_single_block_op
				// CHECK: return %[[OP0]], %[[OP1]]

				// Account for commutative ops within regions during CSE.
				func.func @cse_single_block_with_commutative_ops(%a : tensor<?x?xf32>, %b : tensor<?x?xf32>, %c : f32)
				-> (tensor<?x?xf32>, tensor<?x?xf32>) {
				%0 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.addf %arg0, %arg1 : f32
				%2 = arith.mulf %1, %c : f32
				test.region_yield %2 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				%1 = test.cse_of_single_block_op inputs(%a, %b) {
				^bb0(%arg0 : f32, %arg1 : f32):
				%1 = arith.addf %arg1, %arg0 : f32
				%2 = arith.mulf %c, %1 : f32
				test.region_yield %2 : f32
				} : tensor<?x?xf32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
				}
				// CHECK-LABEL: func @cse_single_block_with_commutative_ops
				// CHECK: %[[OP:.+]] = test.cse_of_single_block_op
				// CHECK-NOT: test.cse_of_single_block_op
				// CHECK: return %[[OP]], %[[OP]]

mlir/test/lib/Dialect/Test/TestOps.td

Show First 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	def TestProducingBranchOp : TEST_Op<"producing_br",
let arguments = (ins Variadic<AnyType>:$firstOperands,		let arguments = (ins Variadic<AnyType>:$firstOperands,
Variadic<AnyType>:$secondOperands);		Variadic<AnyType>:$secondOperands);
let results = (outs I32:$dummy);		let results = (outs I32:$dummy);
let successors = (successor AnySuccessor:$first,AnySuccessor:$second);		let successors = (successor AnySuccessor:$first,AnySuccessor:$second);
}		}

// Produces an error value on the error path		// Produces an error value on the error path
def TestInternalBranchOp : TEST_Op<"internal_br",		def TestInternalBranchOp : TEST_Op<"internal_br",
[DeclareOpInterfaceMethods<BranchOpInterface>, Terminator,		[DeclareOpInterfaceMethods<BranchOpInterface>, Terminator,
AttrSizedOperandSegments]> {		AttrSizedOperandSegments]> {

let arguments = (ins Variadic<AnyType>:$successOperands,		let arguments = (ins Variadic<AnyType>:$successOperands,
Variadic<AnyType>:$errorOperands);		Variadic<AnyType>:$errorOperands);

let successors = (successor AnySuccessor:$successPath, AnySuccessor:$errorPath);		let successors = (successor AnySuccessor:$successPath, AnySuccessor:$errorPath);
}		}

def AttrSizedOperandOp : TEST_Op<"attr_sized_operands",		def AttrSizedOperandOp : TEST_Op<"attr_sized_operands",
▲ Show 20 Lines • Show All 2,301 Lines • ▼ Show 20 Lines	def TestReflectBoundsOp : TEST_Op<"reflect_bounds",

let assemblyFormat = "attr-dict $value";		let assemblyFormat = "attr-dict $value";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Test ConditionallySpeculatable		// Test ConditionallySpeculatable
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def ConditionallySpeculatableOp : TEST_Op<"conditionally_speculatable_op",		def ConditionallySpeculatableOp : TEST_Op<"conditionally_speculatable_op",
		MogballUnsubmitted Done Reply Inline Actions super nit: can you capitalize CSE here? Mogball: super nit: can you capitalize CSE here?
[ConditionallySpeculatable, NoMemoryEffect]> {		[ConditionallySpeculatable, NoMemoryEffect]> {
let description = [{		let description = [{
Op used to test conditional speculation. This op can be speculatively		Op used to test conditional speculation. This op can be speculatively
executed if the input to it is an `arith.constant`.		executed if the input to it is an `arith.constant`.
}];		}];

let arguments = (ins I32:$input);		let arguments = (ins I32:$input);
let results = (outs I32:$result);		let results = (outs I32:$result);
Show All 39 Lines	def RecursivelySpeculatableOp : TEST_Op<"recursively_speculatable_op", [
let description = [{		let description = [{
Op used to test conditional speculation. This op can be speculatively		Op used to test conditional speculation. This op can be speculatively
executed only if all the ops in the attached region can be.		executed only if all the ops in the attached region can be.
}];		}];
let results = (outs I32:$result);		let results = (outs I32:$result);
let regions = (region SizedRegion<1>:$body);		let regions = (region SizedRegion<1>:$body);
}		}

		//===---------------------------------------------------------------------===//
		// Test CSE
		//===---------------------------------------------------------------------===//

		def TestCSEOfSingleBlockOp : TEST_Op<"cse_of_single_block_op",
		[SingleBlockImplicitTerminator<"RegionYieldOp">, Pure]> {
		let arguments = (ins Variadic<AnyType>:$inputs);
		let results = (outs Variadic<AnyType>:$outputs);
		let regions = (region SizedRegion<1>:$region);
		let assemblyFormat = [{
		attr-dict `inputs` `(` $inputs `)`
		$region `:` type($inputs) `->` type($outputs)
		}];
		}

#endif // TEST_OPS		#endif // TEST_OPS

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Transforms] CSE of ops with a single block.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 475659

mlir/lib/IR/OperationSupport.cpp

mlir/lib/Transforms/CSE.cpp

mlir/test/Dialect/SparseTensor/codegen_buffer_initialization.mlir

mlir/test/Transforms/cse.mlir

mlir/test/lib/Dialect/Test/TestOps.td

[mlir][Transforms] CSE of ops with a single block.
ClosedPublic