This is an archive of the discontinued LLVM Phabricator instance.

@nicolasvasilache this is the draft I have for CSE-ing ops with a single block. Its probably more complex than it needs to be since I think a lot of the things related to hash computation can either be dropped, or simplified a lot.

mravishankar added a child revision: D134307: [mlir][TilingInterface] Add callback to yield a produced value..Sep 20 2022, 1:25 PM

Harbormaster completed remote builds in B187820: Diff 461676.Sep 20 2022, 1:48 PM

Rebase.

Herald added a subscriber: zero9178. · View Herald TranscriptSep 23 2022, 5:49 PM

mravishankar published this revision for review.Sep 23 2022, 5:51 PM

mravishankar retitled this revision from [WIP] CSE of ops with a single block. to [mlir][Transforms] CSE of ops with a single block..

mravishankar edited the summary of this revision. (Show Details)

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2022, 5:51 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

@rriddle and @nicolasvasilache this is first attempt at adding CSE support for blocks with a single region and single block. There are some failures with this, but first want to get an idea if this is along the right direction.

mravishankar added a reviewer: pzread.Sep 23 2022, 5:52 PM

Harbormaster completed remote builds in B188509: Diff 462634.Sep 23 2022, 6:22 PM

mravishankar added a reviewer: Mogball.Sep 26 2022, 12:12 PM

Mogball added inline comments.Sep 26 2022, 12:43 PM

mlir/include/mlir/IR/OperationSupport.h
862	I'm not sure operation equivalence needs to be modified to support this change, mainly because this doesn't support multiblock regions. You can hash the region separately from the operation and compare.
mlir/lib/Transforms/CSE.cpp
97	I'm surprised this works. Isn't this hashing the block pointer?

I'll take a more in depth look later.

Thanks @Mogball . I did need this, so happy to get input and adapt accordingly.

mlir/include/mlir/IR/OperationSupport.h
862	I think you are right.... The change to `computeHash` itself can be dropped. Needs the changes to `OperationEquivalence` though....
mlir/lib/Transforms/CSE.cpp
97	Yes. AFAIK, its just a hash. Worst case has some collisions.

Mogball added inline comments.Sep 26 2022, 2:25 PM

mlir/lib/Transforms/CSE.cpp
97	What surprises me is that the test cases below work, which shouldn't happen if the hashes are different, but in this case they are because the block pointers are different for each region.

mravishankar added inline comments.Sep 26 2022, 6:10 PM

mlir/lib/Transforms/CSE.cpp
97	Thats was my starting assumption too. My starting understanding was It is the `isEqual` method that really comparese two operations. If that returns true for two operations that can be CSEed then thats enough. The hash is just a way to do fast compares. If two ops have the same hash, they "might" be same. So first you compare hashes, and if hash is the same you compare the op explicitly. So in theory we can explicitly ignore the region hashes, and it should still work .

rriddle added inline comments.Sep 27 2022, 9:34 AM

mlir/include/mlir/IR/OperationSupport.h
863	I would either uncomment, or just remove the variable name (it adds nothing here).
mlir/lib/Transforms/CSE.cpp
54–55	This path looks broken. Why is this no longer calling into OperationEquivalence? Please separate out the region code into a different function so that this one is a bit easier to follow.
97	The point of hashing is to simplify the number of collisions. The point @Mogball is making is that this hash will never match with any other really, given that no other region will have `block` inside of it. How are you matching region operations if the hash never matches?

mravishankar added inline comments.Sep 28 2022, 10:42 AM

mlir/lib/Transforms/CSE.cpp
97	I think I understand what you guys are saying. Would it work if I just ignore the region during the hash computation.

Since we are looking at this and there is discussion of splitting, could we factor out some of the CSE logic into independent utility functions? I'd like to be able to call that for a single region without running a pass across everything, like we can do with various pattern rewriters.

Rebase and address some comments.

mravishankar added inline comments.Sep 29 2022, 10:16 AM

mlir/lib/Transforms/CSE.cpp
54–55	It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it! Now all tests pass.
97	So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause more collisions, but IIUC the approach here is (a) If hash is the same use `isEqual` to actually disambiguate. (b) if hashes are different ops are different.

What's going on with the diff?

mlir/test/lib/Dialect/Test/TestOps.td
3016 ↗	(On Diff #463953)	super nit: can you capitalize CSE here?

In D134306#3824546, @Mogball wrote:

What's going on with the diff?

What specifically. The diff itself is fine. I made some changes, and phabricator doesnt know how to realign

Harbormaster completed remote builds in B189459: Diff 463953.Sep 29 2022, 11:44 AM

I don't see any changes to CSE.cpp anymore

In D134306#3825274, @Mogball wrote:

I don't see any changes to CSE.cpp anymore

Thats surprising. I see changes to CSE.cpp here.

Handle commutative ops correctly.

Harbormaster completed remote builds in B189771: Diff 464390.Sep 30 2022, 2:43 PM

@Mogball do you still not see the diff for CSE.cpp?

I can see them now. I'm not sure what happened

Rebase and address nit.

@rriddle and @Mogball wondering if you guys have any more comments on this (or whether this is landable or not). The functionality is a blocker for other work, so I'd like to get an idea of how to proceed here (if this approach has issues). Happy to address/modify the approach based on your recommendations.

Harbormaster completed remote builds in B190330: Diff 465180.Oct 4 2022, 4:17 PM

Apologies for the delay, we have a company event this week.

mlir/lib/IR/OperationSupport.cpp
733–755	Why do you need two SmallVector? Should be easy enough to order OpResults before block arguments inside of the sort function. Either way, what's the benefit of ordering block arguments differently from results?
mlir/lib/Transforms/CSE.cpp
17	I would've expected that this would already be included.
53–54	The comment of `if they have a single region with a single block` seems like something that should be pushed down to where the context is relevant. Just saying "No regions take the easy path"(or something similar) seems fine here.
64
74
80
98–101	Why check getParent here instead of just checking areEquivalentValues?
107	When does this happen (aside from results of the parent op)?
mlir/test/Transforms/cse.mlir
328–334 ↗	(On Diff #465180)	Can you group the CHECK lines in these tests? It's a little hard for me to read given the size of the regions.

Rebase and address comments.

mravishankar marked 7 inline comments as done.Oct 5 2022, 11:28 AM

mravishankar added inline comments.

mlir/lib/IR/OperationSupport.cpp
733–755	I combined the two to have a single sort function. I kept it separate for clarity, but there is really no need to do that . As to why block arguments ordering matters, that helps in the equivalence comparison in the call back. For example, if you have the following two regions ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 linalg.yield %0 : f32 and ^bb0(%b0 : f32, %b1: f32) : %1 = arith.addf %b0, %b1 : f32 linalg.yield %1 : f32 These two regions are equivalent. But if you use the pointer values to sort it, when comparing `arith.addf` operands you can end up with the operands of the `arith.addf` in the first region being `[%a0, %a1]` and for the `arith.addf` in the second region being `[%b1, %b0]` . Then the equivalence check fails. If you account for the argument position as well. Writing this though it still doesnt cover all the cases. You could have ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 %1 = arith.mulf %a0, %a1 : f32 %2 = arith.addf %0, %1 : f32 linalg.yield %2 : f32 and ^bb0(%b0 : f32, %b1: f32) : %3 = arith.addf %b0, %b1 : f32 %4 = arith.mulf %b0, %b1 : f32 %5 = arith.addf %3, %4 : f32 linalg.yield %5 : f32 which would still fail cause `%3` and `%4` would be ordered using the `Value`s pointer value. Thats an issue (and really hard to write a test cases for it). Is there a way to completely avoid using pointer value of `Value`?
mlir/lib/Transforms/CSE.cpp
98–101	I think you are right. Dropped the `getParent` check...
107	You need to treat the values created within a region by equivalent operations as equivalent. For example, ^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) : %0 = arith.addf %arg0, %arg1 : f32 %1 = arith.mulf %0, %arg2 : f32 linalg.yield %1 : f32 and ^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) : %2 = arith.addf %arg0, %arg1 : f32 %3 = arith.mulf %2, %arg2 : f32 linalg.yield %3 : f32 `%0` and `%2` are equivalent and based on that `%3` and `%1` are equivalent. Then the regions are the same and the ops can be CSEed (if the other conditions like operands are the same attributes are the same, etc. are met). So `getParent` here checks that this is an op within the region of the two operations being compared, and they are then added to the equivalence set.
mlir/test/Transforms/cse.mlir
328–334 ↗	(On Diff #465180)	I was just being consistent with the file here. I prefer having the `CHECK*`s together too. Changed.

Drop unnecessary header.

mravishankar marked an inline comment as done.Oct 5 2022, 11:31 AM

mravishankar added inline comments.

mlir/lib/Transforms/CSE.cpp
17	Dont know right now why I needed it (I think it was to get the `mlir::hash_value(Value v)` definition. Dont need it now. Dropped

Harbormaster completed remote builds in B190549: Diff 465482.Oct 5 2022, 12:51 PM

Mogball added inline comments.Oct 10 2022, 9:10 AM

mlir/lib/Transforms/CSE.cpp
81	`auto [lhsArg, rhsArg] =` ?
295	This comment needs an update

LG overall

mlir/lib/Transforms/CSE.cpp
87	This should really be added to `Value`. I've written this same code many times. Thoughts @rriddle ?

This revision is now accepted and ready to land.Oct 10 2022, 9:11 AM

Add an option to enable CSE of single block ops. This seems to have
pretty significant downstream effects. Making this enhancement opt-in
as a ramp to make it default available.

Fix comment.

I would prefer the option be true by default. If the pass is creating erroneous code, it's either a bug in the pass (which I assume isn't the case here), or something wrong with downstream users. It should always be legal to CSE an side-effect-free op.

@Mogball I made the CSE for single block ops guarded by a flag. I used it downstream, and it has some non-trivial effects. It is needed for some things, and this is probably an indication that we need a mechanism that allows conditional CSE (on a set of ops, or class of ops, etc.). For now the flag keeps the "new feature" guarded to use as needed. PTAL since this is a change to the patch from when you approved it.

In D134306#3869775, @mravishankar wrote:

@Mogball I made the CSE for single block ops guarded by a flag. I used it downstream, and it has some non-trivial effects. It is needed for some things, and this is probably an indication that we need a mechanism that allows conditional CSE (on a set of ops, or class of ops, etc.). For now the flag keeps the "new feature" guarded to use as needed. PTAL since this is a change to the patch from when you approved it.

What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.

This revision now requires changes to proceed.Oct 19 2022, 4:52 PM

Harbormaster completed remote builds in B193125: Diff 469083.Oct 19 2022, 5:15 PM

What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.

Basically CSE increases lifetimes of operations. So actually in the long term we probably need CSE method to be a more controllable pass than it is today. The fact is that this is an issue today already. You could have an mhlo.dot which is a GEMM operation that will get CSE-ed, but when lowered to linalg.matmul it doesnt get CSEed, while the impact of CSE-ing these operations are the same....
It also has impact on computation that is in destination passing style. This is the issue I hit.

%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%3 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%4 = linalg.fill ins(%cst) outs(%3) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%4) : tensor<?x?xf32>

Earlier the linalg.fill would not CSE (because actually its an op with a single block). Now it will

%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>

While the semantics is correct, this is a more opinionated change than "just CSE". This is effectively adding false sharing of the output when one didnt exist earlier. So bufferization (or some pre-processing before bufferization) will have to break this false dependency. All these could have happened earlier as well with CSE. It is just exposed now because the linalg operations have a single block, and those just happened to not be covered by CSE. So what this change did is unearth an issue that already existed, which needs to be fixed by allowing CSE to be more controllable....
If having a flag is an issue, then please suggest a way forward. This is blocking quite a lot of things for me.....

FYI @ftynse @nicolasvasilache

In D134306#3871667, @mravishankar wrote:
What are the non-trivial effects? I'm quite concerned about guarding things behind a flag.
Basically CSE increases lifetimes of operations. So actually in the long term we probably need CSE method to be a more controllable pass than it is today. The fact is that this is an issue today already. You could have an mhlo.dot which is a GEMM operation that will get CSE-ed, but when lowered to linalg.matmul it doesnt get CSEed, while the impact of CSE-ing these operations are the same....
It also has impact on computation that is in destination passing style. This is the issue I hit.
%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%3 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%4 = linalg.fill ins(%cst) outs(%3) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%4) : tensor<?x?xf32>
Earlier the linalg.fill would not CSE (because actually its an op with a single block). Now it will
%0 = tensor.empty [%d0, %d1] : tensor<?x?xf32>
%1 = linalg.fill ins(%cst) outs(%0) : tensor<?x?xf32>
%2 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
%5 = linalg.matmul ins(...) outs(%1) : tensor<?x?xf32>
While the semantics is correct, this is a more opinionated change than "just CSE". This is effectively adding false sharing of the output when one didnt exist earlier. So bufferization (or some pre-processing before bufferization) will have to break this false dependency. All these could have happened earlier as well with CSE. It is just exposed now because the linalg operations have a single block, and those just happened to not be covered by CSE. So what this change did is unearth an issue that already existed, which needs to be fixed by allowing CSE to be more controllable....

I really don't see the problem here, this is precisely what I would expect.
The fact that downstream assumptions are not valid anymore should not be a blocker for this to progress.

If having a flag is an issue, then please suggest a way forward. This is blocking quite a lot of things for me.....

FYI @ftynse @nicolasvasilache

It seems like you want downstream-specific behavior to be preserved, this suggests downstream-specific changes (either a custom CSE but ideally don't rely on such assumptions).

I would drop the flag from this PR.

Regardless of the discussion on Linalg vs. MHLO CSE behavior, which is arguably up to the dialects to decide, I think we should just (a) just enable this change without a flag as it looks reasonable to have and (b) consider finer-grain control over general simplification (canonicalization, CSE, constant folding and propagation) in the longer term.

Drop the optionality flag...

@rriddle dropped the flag. PTAL.

Harbormaster completed remote builds in B194698: Diff 471212.Oct 27 2022, 11:46 AM

Thanks Mahesh!

I am fine with this as it stands now.

rriddle accepted this revision.Nov 1 2022, 10:44 PM

rriddle added inline comments.

mlir/lib/Transforms/CSE.cpp
295	Not done yet?

This revision is now accepted and ready to land.Nov 1 2022, 10:44 PM

Herald added a subscriber: Moerafaat. · View Herald TranscriptNov 1 2022, 10:44 PM

Sorry for the delay, swamped lately. The flang build breakages look real though.

Rebase and update.

Harbormaster completed remote builds in B196987: Diff 474389.Nov 9 2022, 5:00 PM

Rebase and add dependency on D137857

mravishankar added a parent revision: D137857: [mlir] Remove `Transforms/SideEffectUtils.h` and move the methods into `Interface/SideEffectInterfaces.h`..Nov 11 2022, 2:24 PM

Harbormaster completed remote builds in B197307: Diff 474860.Nov 11 2022, 2:38 PM

Rebase

Herald added a subscriber: hanchung. · View Herald TranscriptNov 15 2022, 3:29 PM

Harbormaster completed remote builds in B197856: Diff 475607.Nov 15 2022, 4:02 PM

Fix failing lit test.

Herald added a reviewer: aartbik. · View Herald TranscriptNov 15 2022, 6:26 PM

Herald added a subscriber: anlunx. · View Herald Transcript

Harbormaster completed remote builds in B197890: Diff 475654.Nov 15 2022, 6:46 PM

This revision was landed with ongoing or failed builds.Nov 15 2022, 6:56 PM

Closed by commit rGfcaf6dd597ea: [mlir][Transforms] CSE of ops with a single block. (authored by mravishankar). · Explain Why

This revision was automatically updated to reflect the committed changes.

mravishankar added a commit: rGfcaf6dd597ea: [mlir][Transforms] CSE of ops with a single block..

aartbik added inline comments.Nov 15 2022, 9:06 PM

mlir/test/Dialect/SparseTensor/codegen_buffer_initialization.mlir
20 ↗	(On Diff #475659)	LGTM, but we need a FIXME for the windows determinism

Revision Contents

Path

Size

mlir/

include/

mlir/

IR/

OperationSupport.h

5 lines

lib/

IR/

OperationSupport.cpp

64 lines

Transforms/

CSE.cpp

101 lines

Utils/

RegionUtils.cpp

1 line

Diff 461676

mlir/include/mlir/IR/OperationSupport.h

Show First 20 Lines • Show All 853 Lines • ▼ Show 20 Lines struct OperationEquivalence {

/// The `hashOperands` and `hashResults` callbacks are expected to return a /// The `hashOperands` and `hashResults` callbacks are expected to return a

/// unique hash_code for a given Value. /// unique hash_code for a given Value.

static llvm::hash_code computeHash( static llvm::hash_code computeHash(

Operation *op, Operation *op,

function_ref<llvm::hash_code(Value)> hashOperands = function_ref<llvm::hash_code(Value)> hashOperands =

[](Value v) { return hash_value(v); }, [](Value v) { return hash_value(v); },

function_ref<llvm::hash_code(Value)> hashResults = function_ref<llvm::hash_code(Value)> hashResults =

[](Value v) { return hash_value(v); }, [](Value v) { return hash_value(v); },

function_ref<llvm::hash_code(Region &r)> hashRegions =

MogballUnsubmitted

Not Done

I'm not sure operation equivalence needs to be modified to support this change, mainly because this doesn't support multiblock regions. You can hash the region separately from the operation and compare.

Mogball: I'm not sure operation equivalence needs to be modified to support this change, mainly because…

mravishankarAuthorUnsubmitted

Done

I think you are right.... The change to computeHash itself can be dropped. Needs the changes to OperationEquivalence though....

mravishankar: I think you are right.... The change to `computeHash` itself can be dropped. Needs the changes…

[](Region & /*r*/) { return llvm::hash_code{}; },

rriddleUnsubmitted

Not Done

function_ref<llvm::hash_code(Region &r)> hashRegions =

- [](Region & /*r*/) { return llvm::hash_code{}; },

+ [](Region &r) { return llvm::hash_code{}; },

Flags flags = Flags::None);

I would either uncomment, or just remove the variable name (it adds nothing here).

rriddle: I would either uncomment, or just remove the variable name (it adds nothing here).

Flags flags = Flags::None); Flags flags = Flags::None);

/// Helper that can be used with `computeHash` above to ignore operation /// Helper that can be used with `computeHash` above to ignore operation

/// operands/result mapping. /// operands/result mapping.

static llvm::hash_code ignoreHashValue(Value) { return llvm::hash_code{}; } static llvm::hash_code ignoreHashValue(Value) { return llvm::hash_code{}; }

/// Helper that can be used with `computeHash` above to ignore operation /// Helper that can be used with `computeHash` above to ignore operation

/// operands/result mapping. /// operands/result mapping.

static llvm::hash_code directHashValue(Value v) { return hash_value(v); } static llvm::hash_code directHashValue(Value v) { return hash_value(v); }

static llvm::hash_code ignoreRegionHashValue(Region &r) {

return llvm::hash_code{};

}

/// Compare two operations and return if they are equivalent. /// Compare two operations and return if they are equivalent.

/// `mapOperands` and `mapResults` are optional callbacks that allows the /// `mapOperands` and `mapResults` are optional callbacks that allows the

/// caller to check the mapping of SSA value between the lhs and rhs /// caller to check the mapping of SSA value between the lhs and rhs

/// operations. It is expected to return success if the mapping is valid and /// operations. It is expected to return success if the mapping is valid and

/// failure if it conflicts with a previous mapping. /// failure if it conflicts with a previous mapping.

static bool static bool

isEquivalentTo(Operation *lhs, Operation *rhs, isEquivalentTo(Operation *lhs, Operation *rhs,

▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

mlir/lib/IR/OperationSupport.cpp

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Operation Equivalency		// Operation Equivalency
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

llvm::hash_code OperationEquivalence::computeHash(		llvm::hash_code OperationEquivalence::computeHash(
Operation *op, function_ref<llvm::hash_code(Value)> hashOperands,		Operation *op, function_ref<llvm::hash_code(Value)> hashOperands,
function_ref<llvm::hash_code(Value)> hashResults, Flags flags) {		function_ref<llvm::hash_code(Value)> hashResults,
		function_ref<llvm::hash_code(Region &r)> hashRegions, Flags flags) {
// Hash operations based upon their:		// Hash operations based upon their:
// - Operation Name		// - Operation Name
// - Attributes		// - Attributes
// - Result Types		// - Result Types
llvm::hash_code hash = llvm::hash_combine(		llvm::hash_code hash = llvm::hash_combine(
op->getName(), op->getAttrDictionary(), op->getResultTypes());		op->getName(), op->getAttrDictionary(), op->getResultTypes());

// - Operands		// - Operands
ValueRange operands = op->getOperands();		ValueRange operands = op->getOperands();
SmallVector<Value> operandStorage;		SmallVector<Value> operandStorage;
if (op->hasTrait<mlir::OpTrait::IsCommutative>()) {		if (op->hasTrait<mlir::OpTrait::IsCommutative>()) {
operandStorage.append(operands.begin(), operands.end());		operandStorage.append(operands.begin(), operands.end());
llvm::sort(operandStorage, [](Value a, Value b) -> bool {		llvm::sort(operandStorage, [](Value a, Value b) -> bool {
return a.getAsOpaquePointer() < b.getAsOpaquePointer();		return a.getAsOpaquePointer() < b.getAsOpaquePointer();
});		});
operands = operandStorage;		operands = operandStorage;
}		}
for (Value operand : operands)		for (Value operand : operands)
hash = llvm::hash_combine(hash, hashOperands(operand));		hash = llvm::hash_combine(hash, hashOperands(operand));

// - Operands		// - Operands
for (Value result : op->getResults())		for (Value result : op->getResults())
hash = llvm::hash_combine(hash, hashResults(result));		hash = llvm::hash_combine(hash, hashResults(result));

		// - Regions
		for (Region &r : op->getRegions())
		hash = llvm::hash_combine(hash, hashRegions(r));
return hash;		return hash;
}		}

static bool		static bool
isRegionEquivalentTo(Region lhs, Region rhs,		isRegionEquivalentTo(Region lhs, Region rhs,
function_ref<LogicalResult(Value, Value)> mapOperands,		function_ref<LogicalResult(Value, Value)> mapOperands,
function_ref<LogicalResult(Value, Value)> mapResults,		function_ref<LogicalResult(Value, Value)> mapResults,
OperationEquivalence::Flags flags) {		OperationEquivalence::Flags flags) {
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (lhs->getName() != rhs->getName() \|\|
lhs->getNumRegions() != rhs->getNumRegions() \|\|		lhs->getNumRegions() != rhs->getNumRegions() \|\|
lhs->getNumSuccessors() != rhs->getNumSuccessors() \|\|		lhs->getNumSuccessors() != rhs->getNumSuccessors() \|\|
lhs->getNumOperands() != rhs->getNumOperands() \|\|		lhs->getNumOperands() != rhs->getNumOperands() \|\|
lhs->getNumResults() != rhs->getNumResults())		lhs->getNumResults() != rhs->getNumResults())
return false;		return false;
if (!(flags & IgnoreLocations) && lhs->getLoc() != rhs->getLoc())		if (!(flags & IgnoreLocations) && lhs->getLoc() != rhs->getLoc())
return false;		return false;

		auto getOperandsListFn =
		[](ValueRange values) -> SmallVector<std::pair<Value, unsigned>> {
		return llvm::to_vector(llvm::map_range(
		llvm::enumerate(values), [](auto value) -> std::pair<Value, unsigned> {
		return {value.value(), value.index()};
		}));
		};
ValueRange lhsOperands = lhs->getOperands(), rhsOperands = rhs->getOperands();		ValueRange lhsOperands = lhs->getOperands(), rhsOperands = rhs->getOperands();
SmallVector<Value> lhsOperandStorage, rhsOperandStorage;
		// For commutative operations use a sorted list, but also track the
		// original position of the operands to pass correct values to `mapOperands`
		// function.
		auto lhsOperandsSortedList = getOperandsListFn(lhsOperands);
		auto rhsOperandsSortedList = getOperandsListFn(rhsOperands);
		// Commutativity causes issues with the callback logic. For now disable.
if (lhs->hasTrait<mlir::OpTrait::IsCommutative>()) {		if (lhs->hasTrait<mlir::OpTrait::IsCommutative>()) {
lhsOperandStorage.append(lhsOperands.begin(), lhsOperands.end());		auto compareFn = [](std::pair<Value, unsigned> a,
llvm::sort(lhsOperandStorage, [](Value a, Value b) -> bool {		std::pair<Value, unsigned> b) -> bool {
return a.getAsOpaquePointer() < b.getAsOpaquePointer();		return a.first.getAsOpaquePointer() < b.first.getAsOpaquePointer();
});		};
lhsOperands = lhsOperandStorage;		llvm::sort(lhsOperandsSortedList, compareFn);
		llvm::sort(rhsOperandsSortedList, compareFn);
		}

rhsOperandStorage.append(rhsOperands.begin(), rhsOperands.end());		auto checkOperandRangeMapping =
llvm::sort(rhsOperandStorage, [](Value a, Value b) -> bool {		[&](ArrayRef<std::pair<Value, unsigned>> lhs,
return a.getAsOpaquePointer() < b.getAsOpaquePointer();		ArrayRef<std::pair<Value, unsigned>> rhs,
});		function_ref<LogicalResult(Value, Value)> mapValues) {
rhsOperands = rhsOperandStorage;		for (auto operandPair : llvm::zip(lhs, rhs)) {
		std::pair<Value, unsigned> curArg = std::get<0>(operandPair);
		std::pair<Value, unsigned> otherArg = std::get<1>(operandPair);
		rriddleUnsubmitted Done Reply Inline Actions Why do you need two SmallVector? Should be easy enough to order OpResults before block arguments inside of the sort function. Either way, what's the benefit of ordering block arguments differently from results? rriddle: Why do you need two SmallVector? Should be easy enough to order OpResults before block…
		mravishankarAuthorUnsubmitted Done Reply Inline Actions I combined the two to have a single sort function. I kept it separate for clarity, but there is really no need to do that . As to why block arguments ordering matters, that helps in the equivalence comparison in the call back. For example, if you have the following two regions ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 linalg.yield %0 : f32 and ^bb0(%b0 : f32, %b1: f32) : %1 = arith.addf %b0, %b1 : f32 linalg.yield %1 : f32 These two regions are equivalent. But if you use the pointer values to sort it, when comparing `arith.addf` operands you can end up with the operands of the `arith.addf` in the first region being `[%a0, %a1]` and for the `arith.addf` in the second region being `[%b1, %b0]` . Then the equivalence check fails. If you account for the argument position as well. Writing this though it still doesnt cover all the cases. You could have ^bb0(%a0 : f32, %a1: f32) : %0 = arith.addf %a0, %a1 : f32 %1 = arith.mulf %a0, %a1 : f32 %2 = arith.addf %0, %1 : f32 linalg.yield %2 : f32 and ^bb0(%b0 : f32, %b1: f32) : %3 = arith.addf %b0, %b1 : f32 %4 = arith.mulf %b0, %b1 : f32 %5 = arith.addf %3, %4 : f32 linalg.yield %5 : f32 which would still fail cause `%3` and `%4` would be ordered using the `Value`s pointer value. Thats an issue (and really hard to write a test cases for it). Is there a way to completely avoid using pointer value of `Value`? mravishankar: I combined the two to have a single sort function. I kept it separate for clarity, but there is…
		if (curArg.first.getType() != otherArg.first.getType())
		return false;
		if (failed(mapValues(lhsOperands[curArg.second],
		rhsOperands[otherArg.second])))
		return false;
}		}
		return true;
		};
		// Check mapping of operands.
		if (!checkOperandRangeMapping(lhsOperandsSortedList, rhsOperandsSortedList,
		mapOperands))
		return false;

		// Check mapping of results.
auto checkValueRangeMapping =		auto checkValueRangeMapping =
[](ValueRange lhs, ValueRange rhs,		[](ValueRange lhs, ValueRange rhs,
function_ref<LogicalResult(Value, Value)> mapValues) {		function_ref<LogicalResult(Value, Value)> mapValues) {
for (auto operandPair : llvm::zip(lhs, rhs)) {		for (auto operandPair : llvm::zip(lhs, rhs)) {
Value curArg = std::get<0>(operandPair);		Value curArg = std::get<0>(operandPair);
Value otherArg = std::get<1>(operandPair);		Value otherArg = std::get<1>(operandPair);
if (curArg.getType() != otherArg.getType())		if (curArg.getType() != otherArg.getType())
return false;		return false;
if (failed(mapValues(curArg, otherArg)))		if (failed(mapValues(curArg, otherArg)))
return false;		return false;
}		}
return true;		return true;
};		};
// Check mapping of operands and results.
if (!checkValueRangeMapping(lhsOperands, rhsOperands, mapOperands))
return false;
if (!checkValueRangeMapping(lhs->getResults(), rhs->getResults(), mapResults))		if (!checkValueRangeMapping(lhs->getResults(), rhs->getResults(), mapResults))
return false;		return false;
for (auto regionPair : llvm::zip(lhs->getRegions(), rhs->getRegions()))		for (auto regionPair : llvm::zip(lhs->getRegions(), rhs->getRegions()))
if (!isRegionEquivalentTo(&std::get<0>(regionPair),		if (!isRegionEquivalentTo(&std::get<0>(regionPair),
&std::get<1>(regionPair), mapOperands, mapResults,		&std::get<1>(regionPair), mapOperands, mapResults,
flags))		flags))
return false;		return false;
return true;		return true;
}		}

mlir/lib/Transforms/CSE.cpp

//===- CSE.cpp - Common Sub-expression Elimination ------------------------===// //===- CSE.cpp - Common Sub-expression Elimination ------------------------===//

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

// This transformation pass performs a simple common sub-expression elimination // This transformation pass performs a simple common sub-expression elimination

// algorithm on operations within a region. // algorithm on operations within a region.

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

#include "mlir/Transforms/Passes.h" #include "mlir/Transforms/Passes.h"

#include "mlir/IR/Dominance.h" #include "mlir/IR/Dominance.h"

#include "mlir/IR/Value.h"

rriddleUnsubmitted

Done

I would've expected that this would already be included.

rriddle: I would've expected that this would already be included.

mravishankarAuthorUnsubmitted

Done

Dont know right now why I needed it (I think it was to get the mlir::hash_value(Value v) definition. Dont need it now. Dropped

mravishankar: Dont know right now why I needed it (I think it was to get the `mlir::hash_value(Value v)`…

#include "mlir/Interfaces/SideEffectInterfaces.h" #include "mlir/Interfaces/SideEffectInterfaces.h"

#include "mlir/Pass/Pass.h" #include "mlir/Pass/Pass.h"

#include "llvm/ADT/DenseMapInfo.h" #include "llvm/ADT/DenseMapInfo.h"

#include "llvm/ADT/Hashing.h" #include "llvm/ADT/Hashing.h"

#include "llvm/ADT/ScopedHashTable.h" #include "llvm/ADT/ScopedHashTable.h"

#include "llvm/Support/Allocator.h" #include "llvm/Support/Allocator.h"

#include "llvm/Support/RecyclingAllocator.h" #include "llvm/Support/RecyclingAllocator.h"

#include <deque> #include <deque>

namespace mlir { namespace mlir {

#define GEN_PASS_DEF_CSE #define GEN_PASS_DEF_CSE

#include "mlir/Transforms/Passes.h.inc" #include "mlir/Transforms/Passes.h.inc"

} // namespace mlir } // namespace mlir

using namespace mlir; using namespace mlir;

namespace { namespace {

struct SimpleOperationInfo : public llvm::DenseMapInfo<Operation *> { struct SimpleOperationInfo : public llvm::DenseMapInfo<Operation *> {

static unsigned getHashValue(const Operation *opC) { static unsigned getHashValue(const Operation *opC) {

return OperationEquivalence::computeHash( return OperationEquivalence::computeHash(

const_cast<Operation *>(opC), const_cast<Operation *>(opC),

/*hashOperands=*/OperationEquivalence::directHashValue, /*hashOperands=*/hashOperands,

/*hashResults=*/OperationEquivalence::ignoreHashValue, /*hashResults=*/OperationEquivalence::ignoreHashValue,

/*hashRegions=*/OperationEquivalence::ignoreRegionHashValue,

OperationEquivalence::IgnoreLocations); OperationEquivalence::IgnoreLocations);

} }

static bool isEqual(const Operation *lhsC, const Operation *rhsC) { static bool isEqual(const Operation *lhsC, const Operation *rhsC) {

auto *lhs = const_cast<Operation *>(lhsC); auto *lhs = const_cast<Operation *>(lhsC);

auto *rhs = const_cast<Operation *>(rhsC); auto *rhs = const_cast<Operation *>(rhsC);

if (lhs == rhs) if (lhs == rhs)

return true; return true;

if (lhs == getTombstoneKey() || lhs == getEmptyKey() || if (lhs == getTombstoneKey() || lhs == getEmptyKey() ||

rhs == getTombstoneKey() || rhs == getEmptyKey()) rhs == getTombstoneKey() || rhs == getEmptyKey())

return false; return false;

llvm::DenseMap<Value, Value> areEquivalentValues;

rriddleUnsubmitted

Done

The comment of if they have a single region with a single block seems like something that should be pushed down to where the context is relevant. Just saying "No regions take the easy path"(or something similar) seems fine here.

rriddle: The comment of `if they have a single region with a single block` seems like something that…

if (lhs->getNumRegions() == 1 && rhs->getNumRegions() == 1 &&

rriddleUnsubmitted

Done

This path looks broken. Why is this no longer calling into OperationEquivalence? Please separate out the region code into a different function so that this one is a bit easier to follow.

rriddle: This path looks broken. Why is this no longer calling into OperationEquivalence? Please…

mravishankarAuthorUnsubmitted

Done

It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it! Now all tests pass.

mravishankar: It was broke indeed. Fixing this fixed all the tests that were failing. THanks for catching it!

llvm::hasSingleElement(lhs->getRegion(0)) &&

llvm::hasSingleElement(rhs->getRegion(0)) &&

lhs->getRegion(0).getNumArguments() ==

rhs->getRegion(0).getNumArguments()) {

for (auto bbArgs : llvm::zip(lhs->getRegion(0).getArguments(),

rhs->getRegion(0).getArguments())) {

areEquivalentValues[std::get<0>(bbArgs)] = std::get<1>(bbArgs);

}

rriddleUnsubmitted

Done

// If lhs or rhs does not have a single region with a single block, they

- // arent CSEed for now.

+ // aren't CSEed for now.

if (lhs->getNumRegions() != 1 || rhs->getNumRegions() != 1 ||

rriddle:

auto getParent = [](Value v) -> Operation * {

if (auto blockArg = v.dyn_cast<BlockArgument>())

return blockArg.getParentBlock()->getParentOp();

return v.getDefiningOp()->getParentOp();

};

auto mapOperands = [&](Value lhsValue, Value rhsValue) -> LogicalResult {

if (lhsValue == rhsValue)

return success();

rriddleUnsubmitted

Done

Block &rhsBlock = rhs->getRegion(0).front();

- // If number of arguments differ, not CSEed

+ // Don't CSE if the number of arguments differ.

if (lhsBlock.getNumArguments() != rhsBlock.getNumArguments())

rriddle:

if (getParent(lhsValue) == lhs && getParent(rhsValue) == rhs &&

areEquivalentValues.lookup(lhsValue) == rhsValue)

return success();

return failure();

};

rriddleUnsubmitted

Done

// `rhsBlock`. `Value`s from `lhsBlock` are the key.

- llvm::DenseMap<Value, Value> areEquivalentValues;

+ DenseMap<Value, Value> areEquivalentValues;

for (auto bbArgs : llvm::zip(lhs->getRegion(0).getArguments(),

rriddle:

auto mapResults = [&](Value lhsResult, Value rhsResult) -> LogicalResult {

MogballUnsubmitted

Not Done

auto [lhsArg, rhsArg] = ?

Mogball: `auto [lhsArg, rhsArg] = `?

if (getParent(lhsResult) == lhs && getParent(rhsResult) == rhs) {

auto insertion = areEquivalentValues.insert({lhsResult, rhsResult});

return success(insertion.first->second == rhsResult);

}

return success();

};

MogballUnsubmitted

Not Done

This should really be added to Value. I've written this same code many times. Thoughts @rriddle ?

Mogball: This should really be added to `Value`. I've written this same code many times. Thoughts…

return OperationEquivalence::isEquivalentTo( return OperationEquivalence::isEquivalentTo(

const_cast<Operation *>(lhsC), const_cast<Operation *>(rhsC), const_cast<Operation *>(lhsC), const_cast<Operation *>(rhsC),

/*mapOperands=*/OperationEquivalence::exactValueMatch, mapOperands, mapResults, OperationEquivalence::IgnoreLocations);

/*mapResults=*/OperationEquivalence::ignoreValueEquivalence, }

OperationEquivalence::IgnoreLocations); static llvm::hash_code hashBlockArguments(BlockArgument arg) {

Block *block = arg.getOwner();

llvm::hash_code hash = llvm::hash_value(block);

hash = llvm::hash_combine(hash, llvm::hash_value(arg.getArgNumber()));

return hash;

MogballUnsubmitted

Done

I'm surprised this works. Isn't this hashing the block pointer?

Mogball: I'm surprised this works. Isn't this hashing the block pointer?

mravishankarAuthorUnsubmitted

Done

Yes. AFAIK, its just a hash. Worst case has some collisions.

mravishankar: Yes. AFAIK, its just a hash. Worst case has some collisions.

MogballUnsubmitted

Done

What surprises me is that the test cases below work, which shouldn't happen if the hashes are different, but in this case they are because the block pointers are different for each region.

Mogball: What surprises me is that the test cases below work, which shouldn't happen if the hashes are…

mravishankarAuthorUnsubmitted

Done

Thats was my starting assumption too. My starting understanding was

It is the isEqual method that really comparese two operations. If that returns true for two operations that can be CSEed then thats enough.
The hash is just a way to do fast compares. If two ops have the same hash, they "might" be same. So first you compare hashes, and if hash is the same you compare the op explicitly.

So in theory we can explicitly ignore the region hashes, and it should still work .

mravishankar: Thats was my starting assumption too. My starting understanding was 1) It is the `isEqual`…

rriddleUnsubmitted

Done

The point of hashing is to simplify the number of collisions. The point @Mogball is making is that this hash will never match with any other really, given that no other region will have block inside of it. How are you matching region operations if the hash never matches?

rriddle: The point of hashing is to simplify the number of collisions. The point @Mogball is making is…

mravishankarAuthorUnsubmitted

Done

I think I understand what you guys are saying. Would it work if I just ignore the region during the hash computation.

mravishankar: I think I understand what you guys are saying. Would it work if I just ignore the region during…

mravishankarAuthorUnsubmitted

Done

So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause more collisions, but IIUC the approach here is
(a) If hash is the same use isEqual to actually disambiguate.
(b) if hashes are different ops are different.

mravishankar: So I dropped any hashing that accounted for regions. Makes the code simpler. It might cause…

}

static llvm::hash_code hashOperands(Value v) {

if (BlockArgument arg = v.dyn_cast<BlockArgument>()) {

hashBlockArguments(arg);

rriddleUnsubmitted

Done

Why check getParent here instead of just checking areEquivalentValues?

rriddle: Why check getParent here instead of just checking areEquivalentValues?

mravishankarAuthorUnsubmitted

Done

I think you are right. Dropped the getParent check...

mravishankar: I think you are right. Dropped the `getParent` check...

}

return hash_value(v);

}

static llvm::hash_code hashRegion(Region &r) {

if (!llvm::hasSingleElement(r)) {

return llvm::hash_code{};

rriddleUnsubmitted

Done

When does this happen (aside from results of the parent op)?

rriddle: When does this happen (aside from results of the parent op)?

mravishankarAuthorUnsubmitted

Done

You need to treat the values created within a region by equivalent operations as equivalent. For example,

^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) :
  %0 = arith.addf %arg0, %arg1 : f32
  %1 = arith.mulf %0, %arg2 : f32
  linalg.yield %1 : f32

and

^bb0(%arg0 : f32, %arg1: f32, %arg2 : f32) :
  %2 = arith.addf %arg0, %arg1 : f32
  %3 = arith.mulf %2, %arg2 : f32
  linalg.yield %3 : f32

%0 and %2 are equivalent and based on that %3 and %1 are equivalent. Then the regions are the same and the ops can be CSEed (if the other conditions like operands are the same attributes are the same, etc. are met).

So getParent here checks that this is an op within the region of the two operations being compared, and they are then added to the equivalence set.

mravishankar: You need to treat the values created within a region by equivalent operations as equivalent.

}

// [DO NO SUBMIT YET] : Hash computation accounting for region. This is not

// really used but is added here just in case.

Block *body = &r.front();

Optional<llvm::hash_code> hash;

auto combineHash = [&hash](llvm::hash_code update) {

if (hash)

hash = llvm::hash_combine(hash, update);

else

hash = update;

};

llvm::hash_code bodyHash = llvm::hash_value(body);

llvm::DenseMap<Value, llvm::hash_code> localValueMap;

unsigned localValNum = 0;

for (BlockArgument arg : body->getArguments()) {

llvm::hash_code argHash =

llvm::hash_combine(bodyHash, llvm::hash_value(arg.getArgNumber()));

localValueMap[arg] = argHash;

combineHash(argHash);

localValueMap[arg] = localValNum++;

}

auto hashOperandsOfOpsInBlock = [&](Value v) {

auto iterator = localValueMap.find(v);

if (iterator == localValueMap.end())

return hash_value(v);

return iterator->second;

};

for (Operation &op : *body) {

llvm::hash_code opHash = OperationEquivalence::computeHash(

&op, hashOperandsOfOpsInBlock, OperationEquivalence::ignoreHashValue,

hashRegion, OperationEquivalence::IgnoreLocations);

combineHash(opHash);

}

return llvm::hash_value(hash);

} }

}; };

} // namespace } // namespace

namespace { namespace {

/// Simple common sub-expression elimination. /// Simple common sub-expression elimination.

struct CSE : public impl::CSEBase<CSE> { struct CSE : public impl::CSEBase<CSE> {

/// Shared implementation of operation elimination and scoped map definitions. /// Shared implementation of operation elimination and scoped map definitions.

▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines LogicalResult CSE::simplifyOperation(ScopedMapTy &knownValues, Operation *op,

// If the operation is already trivially dead just add it to the erase list. // If the operation is already trivially dead just add it to the erase list.

if (isOpTriviallyDead(op)) { if (isOpTriviallyDead(op)) {

opsToErase.push_back(op); opsToErase.push_back(op);

++numDCE; ++numDCE;

return success(); return success();

} }

// Don't simplify operations with nested blocks. We don't currently model // Don't simplify operations with nested blocks. We don't currently model

// equality comparisons correctly among other things. It is also unclear // equality comparisons correctly among other things. It is also unclear

MogballUnsubmitted

Done

This comment needs an update

Mogball: This comment needs an update

rriddleUnsubmitted

Not Done

Not done yet?

rriddle: Not done yet?

// whether we would want to CSE such operations. // whether we would want to CSE such operations.

if (op->getNumRegions() != 0) if (!(op->getNumRegions() == 0 ||

(op->getNumRegions() == 1 && llvm::hasSingleElement(op->getRegion(0)))))

return failure(); return failure();

// Some simple use case of operation with memory side-effect are dealt with // Some simple use case of operation with memory side-effect are dealt with

// here. Operations with no side-effect are done after. // here. Operations with no side-effect are done after.

if (!MemoryEffectOpInterface::hasNoEffect(op)) { if (!MemoryEffectOpInterface::hasNoEffect(op)) {

auto memEffects = dyn_cast<MemoryEffectOpInterface>(op); auto memEffects = dyn_cast<MemoryEffectOpInterface>(op);

// TODO: Only basic use case for operations with MemoryEffects::Read can be // TODO: Only basic use case for operations with MemoryEffects::Read can be

// eleminated now. More work needs to be done for more complicated patterns // eleminated now. More work needs to be done for more complicated patterns

▲ Show 20 Lines • Show All 141 Lines • Show Last 20 Lines

mlir/lib/Transforms/Utils/RegionUtils.cpp

Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines	BlockEquivalenceData::BlockEquivalenceData(Block *block)
for (Operation &op : *block) {		for (Operation &op : *block) {
if (unsigned numResults = op.getNumResults()) {		if (unsigned numResults = op.getNumResults()) {
opOrderIndex.try_emplace(&op, orderIt);		opOrderIndex.try_emplace(&op, orderIt);
orderIt += numResults;		orderIt += numResults;
}		}
auto opHash = OperationEquivalence::computeHash(		auto opHash = OperationEquivalence::computeHash(
&op, OperationEquivalence::ignoreHashValue,		&op, OperationEquivalence::ignoreHashValue,
OperationEquivalence::ignoreHashValue,		OperationEquivalence::ignoreHashValue,
		OperationEquivalence::ignoreRegionHashValue,
OperationEquivalence::IgnoreLocations);		OperationEquivalence::IgnoreLocations);
hash = llvm::hash_combine(hash, opHash);		hash = llvm::hash_combine(hash, opHash);
}		}
}		}

unsigned BlockEquivalenceData::getOrderOf(Value value) const {		unsigned BlockEquivalenceData::getOrderOf(Value value) const {
assert(value.getParentBlock() == block && "expected value of this block");		assert(value.getParentBlock() == block && "expected value of this block");

▲ Show 20 Lines • Show All 299 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Transforms] CSE of ops with a single block.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 461676

mlir/include/mlir/IR/OperationSupport.h

mlir/lib/IR/OperationSupport.cpp

mlir/lib/Transforms/CSE.cpp

mlir/lib/Transforms/Utils/RegionUtils.cpp

[mlir][Transforms] CSE of ops with a single block.
ClosedPublic