This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Transforms/
-
Transforms/
6/7
LoopFusion.cpp
-
test/Transforms/
-
Transforms/
5/6
loop-fusion.mlir

Differential D97030

[MLIR][affine-loop-fusion] Handle defining ops between the source and dest loops
ClosedPublic

Authored by tungld on Feb 18 2021, 8:55 PM.

Download Raw Diff

Details

Reviewers

dcaballe
andydavis1
bondhugula

Commits

rG203d5eeec55b: [MLIR][affine-loop-fusion] Handle defining ops between the source and dest loops

Summary

This patch handles defining ops between the source and dest loop nests, and prevents loop nests with iter_args from being fused.

If there is any SSA value in the dest loop nest whose defining op has dependence from the source loop nest, we cannot fuse the loop nests.

If there is a affine.for with iter_args, prevent it from being fused.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	580 ms	x64 windows > MLIR.Examples/Toy/Ch6::jit.toy
	630 ms	x64 windows > MLIR.Examples/Toy/Ch7::jit.toy

Event Timeline

tungld created this revision.Feb 18 2021, 8:55 PM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 14 others. · View Herald TranscriptFeb 18 2021, 8:55 PM

tungld requested review of this revision.Feb 18 2021, 8:55 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2021, 8:55 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

tungld added reviewers: dcaballe, andydavis1.Feb 18 2021, 9:03 PM

tungld edited the summary of this revision. (Show Details)

Remove a redundant method

Harbormaster completed remote builds in B89861: Diff 324872.Feb 18 2021, 11:08 PM

Harbormaster completed remote builds in B89873: Diff 324889.Feb 19 2021, 12:09 AM

Clean the code

tungld edited the summary of this revision. (Show Details)Feb 19 2021, 1:15 AM

Harbormaster completed remote builds in B89884: Diff 324908.Feb 19 2021, 1:51 AM

Check causal dependence

Harbormaster completed remote builds in B89913: Diff 324957.Feb 19 2021, 5:43 AM

dcaballe added a reviewer: bondhugula.Feb 19 2021, 10:52 AM

Hey tungld,

Thanks for the patch! There is a different but somehow related issue that is being address here: https://reviews.llvm.org/D97032.
I would like to know what @bondhugula and @andydavis1 think but I think we might want to add the proper %0 -> %i1 dependence to the MDG,
instead of special-casing the load scenario. Even though it's not strictly a memref dependence, it's a memref-related dependence between two graph nodes.

Interestingly, we should also consider more generic cases. For example:

affine.for %i0 = 0 to 10 {
  affine.store %cst, %b[] : memref<f32>
  affine.store %cst, %a[%i0] : memref<10xf32>
}

%0 = "unknown_op" (%b) : (memref<f32>) -> f32

affine.for %i1 = 0 to 10 {
  %1 = affine.load %a[%i1] : memref<10xf32>
  %2 = divf %0, %1 : f32
}

affine.for %i0 = 0 to 10 {
  affine.store %cst, %b[] : memref<f32>
  affine.store %cst, %a[%i0] : memref<10xf32>
}

%0 = affine.for ...

affine.for %i1 = 0 to 10 {
  %1 = affine.load %a[%i1] : memref<10xf32>
  %2 = divf %0, %1 : f32
}

I think the former would be addressed by https://reviews.llvm.org/D97032 and the latter would need support for iter_args in the fusion algorithm
so I wouldn't try to handle them right now. However, we have to make sure that this solution paves the way to support the more complex cases.

Thanks,
Diego

dcaballe mentioned this in D97032: [MLIR][affine] Prevent fusion when ops with memory effect free are present between producer and consumer.Feb 19 2021, 11:34 AM

In D97030#2575203, @dcaballe wrote:

Hey tungld,

Thanks for the patch! There is a different but somehow related issue that is being address here: https://reviews.llvm.org/D97032.
I would like to know what @bondhugula and @andydavis1 think but I think we might want to add the proper %0 -> %i1 dependence to the MDG,
instead of special-casing the load scenario. Even though it's not strictly a memref dependence, it's a memref-related dependence between two graph nodes.

Not really. This one is just a pure SSA dependence and nothing to do with memory dependences. The value may be loaded into from a memref but it's a value. The memref user op is a load which may participate in separate/other memref dependences.

Interestingly, we should also consider more generic cases. For example:

affine.for %i0 = 0 to 10 {
  affine.store %cst, %b[] : memref<f32>
  affine.store %cst, %a[%i0] : memref<10xf32>
}

%0 = "unknown_op" (%b) : (memref<f32>) -> f32

affine.for %i1 = 0 to 10 {
  %1 = affine.load %a[%i1] : memref<10xf32>
  %2 = divf %0, %1 : f32
}

affine.for %i0 = 0 to 10 {
  affine.store %cst, %b[] : memref<f32>
  affine.store %cst, %a[%i0] : memref<10xf32>
}

%0 = affine.for ...

affine.for %i1 = 0 to 10 {
  %1 = affine.load %a[%i1] : memref<10xf32>
  %2 = divf %0, %1 : f32
}

I think the former would be addressed by https://reviews.llvm.org/D97032 and the latter would need support for iter_args in the fusion algorithm

The author of D97032 is already taking care of all side-effecting intervening ops (like writes, frees, unknown side-effecting ops like call) in two revisions in a generic way using memory effects / side-effect interfaces; those are pretty simple patches. The first of those is D97032 that takes care of memref writes/frees. But this patch addresses a different issue. This is not a memory dependence but an SSA dependence like other SSA edges.

so I wouldn't try to handle them right now. However, we have to make sure that this solution paves the way to support the more complex cases.

Thanks,
Diego

Some comments on clarifying doc / code comments.

mlir/lib/Transforms/LoopFusion.cpp
809–810	This line always needed a comment. Stores anyway don't define SSA values.
1487–1488	Nit on terminology: please either use "... depends on ..." or "has a dependence from" / "has a dependence to".
1493–1494	in between the loops -> in between the loop nests ? It's otherwise ambiguous. "having dependence on the source loop" -> I think this needs to be rephrased for accuracy as well. "... with a user in .."?

This revision now requires changes to proceed.Feb 20 2021, 8:52 AM

Reflect reviewers' comments

Hi @bondhugula, @dcaballe,

Thanks for your comments! I just updated the patch.

This patch is to add one additional node in to MDG, which is from a defining op of an SSA value to its users. So, I think it can handle this

affine.for %i0 = 0 to 10 {
  affine.store %cst, %b[] : memref<f32>
  affine.store %cst, %a[%i0] : memref<10xf32>
}

%0 = affine.for ...

affine.for %i1 = 0 to 10 {
  %1 = affine.load %a[%i1] : memref<10xf32>
  %2 = divf %0, %1 : f32
}

because %0 = affine.for defines an SSA value.

I tried to run with the following program:

affine.for %arg2 = 0 to 10 {
   affine.store %cst, %arg1[] : memref<f32>
   affine.store %cst, %arg0[%arg2] : memref<10xf32>
}
%0 = affine.for %arg2 = 0 to 10 step 2 iter_args(%arg3 = %cst_0) -> (f32) {
   %1 = affine.load %arg0[%arg2] : memref<10xf32>
   %2 = addf %arg3, %1 : f32
   affine.yield %2 : f32
}
affine.for %arg2 = 0 to 10 {
    %1 = affine.load %arg0[%arg2] : memref<10xf32>
    %2 = divf %0, %1 : f32
}

and got

%0 = affine.for %arg2 = 0 to 10 step 2 iter_args(%arg3 = %cst_0) -> (f32) {
  affine.store %cst, %arg1[] : memref<f32>
  affine.store %cst, %arg0[%arg2] : memref<10xf32>
  %1 = affine.load %arg0[%arg2] : memref<10xf32>
  %2 = addf %arg3, %1 : f32
  affine.yield %2 : f32
}
affine.for %arg2 = 0 to 10 {
   %1 = affine.load %arg0[%arg2] : memref<10xf32>
   %2 = divf %0, %1 : f32
}

It seemed it could identify the defining op %0 = affine.for..., but the output program was not semantically similar to the input program due to step in affine.for. I think this is somewhat related to what @dcaballe mentioned

so I wouldn't try to handle them right now. However, we have to make sure that this solution paves the way to support the more complex cases.

Harbormaster completed remote builds in B90166: Diff 325357.Feb 21 2021, 5:54 PM

I would like to know what @bondhugula and @andydavis1 think but I think we might want to add the proper
%0 -> %i1 dependence to the MDG,

Actually, as you could infer from my answer, I completely misunderstood the implementation of the patch! As
@tungld mentioned, it seems we were already adding edges for some SSA dependences:

Add dependence edges between nodes which produce SSA values and their
users. Load ops can be considered as the ones producing SSA values.

This is what I was referring to. I was not aware of it :).

However, the following code would still fail after the patch, right @tungld?

affine.for %arg2 = 0 to 10 {
  %2 = affine.load %arg1[] : memref<f32>
  affine.store %2, %arg0[%arg2] : memref<10xf32>
}
%0 = affine.load %arg1[] : memref<f32>
%1 = addf %0, %0 : f32
affine.for %arg2 = 0 to 10 {
  %2 = affine.load %arg0[%arg2] : memref<10xf32>
  %3 = divf %1, %2 : f32
}

Maybe we should create a single node that includes all the top-level ops between loops?
That would require further discussion, though. The current fix should be good for now.

It seemed it could identify the defining op %0 = affine.for..., but the output program was not
semantically similar to the input program due to step in affine.for. I think this is somewhat
related to what @dcaballe mentioned

Sorry, I should've mentioned that loops with iter_args are not supported. Affine analysis is only looking at
memory dependences and not taking SSA loop-carried dependences into account. We should prevent loops
with iter_args to be fused. IIRC, @bondhugula suggested that we could demote these SSA values to memrefs
(reg2mem) before fusion so that all the analysis happens through memory.

The patch looks great! Just one more comment.

mlir/lib/Transforms/LoopFusion.cpp
979	Thanks!
1496	We are just checking for a dependence that would prevent finding an insertion point for the fused loop. Could we please move this to `getFusedLoopNestInsertionPoint`? In that way, sibling fusion will also get it.

This revision now requires changes to proceed.Feb 21 2021, 11:43 PM

However, the following code would still fail after the patch, right @tungld?

affine.for %arg2 = 0 to 10 {
  %2 = affine.load %arg1[] : memref<f32>
  affine.store %2, %arg0[%arg2] : memref<10xf32>
}
%0 = affine.load %arg1[] : memref<f32>
%1 = addf %0, %0 : f32
affine.for %arg2 = 0 to 10 {
  %2 = affine.load %arg0[%arg2] : memref<10xf32>
  %3 = divf %1, %2 : f32
}

It worked because we had load in the first loop which has no write-read dependence to %0 = affine.load %arg1[] : memref<f32>. I got this output using this patch:

%0 = affine.load %arg1[] : memref<f32>
%1 = addf %0, %0 : f32
affine.for %arg2 = 0 to 10 {
  %2 = affine.load %arg1[] : memref<f32>
  affine.store %2, %arg0[%arg2] : memref<10xf32>
  %3 = affine.load %arg0[%arg2] : memref<10xf32>
  %4 = divf %1, %3 : f32
}

To make dependence, I changed to use store as follows:

%cst = constant 0.000000e+00 : f32
  affine.for %arg2 = 0 to 10 {
    affine.store %cst, %arg1[] : memref<f32>
    affine.store %cst, %arg0[%arg2] : memref<10xf32>
  }
  %0 = affine.load %arg1[] : memref<f32>
  %1 = addf %0, %0 : f32
  affine.for %arg2 = 0 to 10 {
      %2 = affine.load %arg0[%arg2] : memref<10xf32>
        %3 = divf %1, %2 : f32
  }

and the output was correct:

%cst = constant 0.000000e+00 : f32
    affine.for %arg2 = 0 to 10 {
      affine.store %cst, %arg1[] : memref<f32>
      affine.store %cst, %arg0[%arg2] : memref<10xf32>
    }
    %0 = affine.load %arg1[] : memref<f32>
    %1 = addf %0, %0 : f32
    affine.for %arg2 = 0 to 10 {
      %2 = affine.load %arg0[%arg2] : memref<10xf32>
      %3 = divf %1, %2 : f32
    }

I think the add between affine.load and affine.for was handled by hasNonAffineUsersOnThePath.

Could we please move this to getFusedLoopNestInsertionPoint? In that way, sibling fusion will also get it.

Yes, let me do this, and come back soon.

We should prevent loops with iter_args to be fused

Hi Diego, do you prefer to have it in this patch? I could do it by simply checking the source and dest affine.for, if one of them uses iter_args (or returns a SSA value), we prevent them to be fused.

Move the check into getFusedLoopNestInsertionPoint and prevent fusing loop nests that return values

I moved the check of defining ops into getFusedLoopNestInsertionPoint and prevented loop nests with values from being fused. Also added TODO for loop nests with values. Thanks!

In D97030#2578354, @tungld wrote:

We should prevent loops with iter_args to be fused

Hi Diego, do you prefer to have it in this patch? I could do it by simply checking the source and dest affine.for, if one of them uses iter_args (or returns a SSA value), we prevent them to be fused.

Several affine passes need to be updated to correctly handle / bail out on iter_args. Loop unroll was recently fixed in its presence. I think it could be done in another patch.

Harbormaster completed remote builds in B90203: Diff 325421.Feb 22 2021, 5:48 AM

LGTM.

mlir/lib/Transforms/LoopFusion.cpp
373	Should this be named `gatherNonMemRefDefiningNodes`? I'm a bit confused. Or is it clear from other context?
mlir/test/Transforms/loop-fusion.mlir
2959	This TODO could be confusing. It's not clear whether handling return values/iter_args in fusion is worthwhile; it may be better/simpler to do a reg2mem, do the fusion without any iter_args, and then again do mem2reg. You can either update the comment to reflect this or simply drop the comment.

If you have fixed for the iter_args handling, please also update the commit summary to reflect that.

I think the add between affine.load and affine.for was handled by hasNonAffineUsersOnThePath.

Great!

LGTM. It should be good to go after addressing the remaining minor comments. Thanks!

mlir/test/Transforms/loop-fusion.mlir
2938	typo
2952	Please, add checks for ops in the loop body. Loops could be fused without removing the src loop.
2959	+1

This revision is now accepted and ready to land.Feb 22 2021, 10:08 AM

Edit comments and lit tests

tungld edited the summary of this revision. (Show Details)Feb 23 2021, 4:39 AM

Hi Uday and Diego,

Thank you so much for your comments! I just updated the patch.

I don't have permission to land this, please help me to do it. My "name (email)" is "Tung D. Le (tung@jp.ibm.com)".

Thanks,
Tung.

mlir/lib/Transforms/LoopFusion.cpp
373	By definition MDG edge, non-memref value will refer to the dependence that one op defines values used in another op. Anyway, I updated comments here to make it clearer.

Harbormaster completed remote builds in B90372: Diff 325738.Feb 23 2021, 4:50 AM

In D97030#2581436, @tungld wrote:

Hi Uday and Diego,

Thank you so much for your comments! I just updated the patch.

I don't have permission to land this, please help me to do it. My "name (email)" is "Tung D. Le (tung@jp.ibm.com)".

Thanks,
Tung.

@tungld could you please rebase and fix a conflict due to another revision that updated fusion?

Rebase and fix a conflict

@tungld could you please rebase and fix a conflict due to another revision that updated fusion?

Done. Thanks!

Harbormaster completed remote builds in B90505: Diff 325936.Feb 23 2021, 7:36 PM

@dcaballe @bondhugula any further comments? Thanks!

Herald added a subscriber: cota. · View Herald TranscriptFeb 24 2021, 4:25 PM

LGTM! Just minor comments. I think Uday planned to commit it. Otherwise, I could commit it tomorrow together with one of mine.

mlir/test/Transforms/loop-fusion.mlir
2953	You can remove the `%{{.*}} =` if you are not capturing the return value.
2955	Just matching the ops here would suffice: // CHECK-NEXT: affine.load // CHECK-NEXT: affine.yield

Edit lit tests according to @dcaballe's comments.

Just minor comments.

Done. Thanks!

Harbormaster completed remote builds in B90730: Diff 326267.Feb 24 2021, 9:50 PM

Closed by commit rG203d5eeec55b: [MLIR][affine-loop-fusion] Handle defining ops between the source and dest loops (authored by tungld, committed by dcaballe). · Explain WhyFeb 25 2021, 8:23 AM

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG203d5eeec55b: [MLIR][affine-loop-fusion] Handle defining ops between the source and dest loops.

Revision Contents

Path

Size

mlir/

lib/

Transforms/

LoopFusion.cpp

41 lines

test/

Transforms/

loop-fusion.mlir

140 lines

Diff 325421

mlir/lib/Transforms/LoopFusion.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	struct Edge {
// If this edge is stored in Edge = Node.outEdges[i], then		// If this edge is stored in Edge = Node.outEdges[i], then
// 'Node.outEdges[i].id' is the identifier of the dest node of the edge.		// 'Node.outEdges[i].id' is the identifier of the dest node of the edge.
unsigned id;		unsigned id;
// The SSA value on which this edge represents a dependence.		// The SSA value on which this edge represents a dependence.
// If the value is a memref, then the dependence is between graph nodes		// If the value is a memref, then the dependence is between graph nodes
// which contain accesses to the same memref 'value'. If the value is a		// which contain accesses to the same memref 'value'. If the value is a
// non-memref value, then the dependence is between a graph node which		// non-memref value, then the dependence is between a graph node which
// defines an SSA value and another graph node which uses the SSA value		// defines an SSA value and another graph node which uses the SSA value
// (e.g. a constant operation defining a value which is used inside a loop		// (e.g. a constant or load operation defining a value which is used inside
// nest).		// a loop nest).
Value value;		Value value;
};		};

// Map from node id to Node.		// Map from node id to Node.
DenseMap<unsigned, Node> nodes;		DenseMap<unsigned, Node> nodes;
// Map from node id to list of input edges.		// Map from node id to list of input edges.
DenseMap<unsigned, SmallVector<Edge, 2>> inEdges;		DenseMap<unsigned, SmallVector<Edge, 2>> inEdges;
// Map from node id to list of output edges.		// Map from node id to list of output edges.
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	unsigned getOutEdgeCount(unsigned id, Value memref = nullptr) {
unsigned outEdgeCount = 0;		unsigned outEdgeCount = 0;
if (outEdges.count(id) > 0)		if (outEdges.count(id) > 0)
for (auto &outEdge : outEdges[id])		for (auto &outEdge : outEdges[id])
if (!memref \|\| outEdge.value == memref)		if (!memref \|\| outEdge.value == memref)
++outEdgeCount;		++outEdgeCount;
return outEdgeCount;		return outEdgeCount;
}		}

		/// Return all defining nodes of a given node.
		void gatherDefiningNodes(unsigned id, DenseSet<unsigned> &definingNodes) {
		bondhugulaUnsubmitted Done Reply Inline Actions Should this be named `gatherNonMemRefDefiningNodes`? I'm a bit confused. Or is it clear from other context? bondhugula: Should this be named `gatherNonMemRefDefiningNodes`? I'm a bit confused. Or is it clear from…
		tungldAuthorUnsubmitted Done Reply Inline Actions By definition MDG edge, non-memref value will refer to the dependence that one op defines values used in another op. Anyway, I updated comments here to make it clearer. tungld: By definition MDG edge, non-memref value will refer to the dependence that one op defines…
		for (MemRefDependenceGraph::Edge edge : inEdges[id])
		// Defining node is the one on an edge with non-memref value.
		if (!edge.value.getType().isa<MemRefType>())
		definingNodes.insert(edge.id);
		}

// Computes and returns an insertion point operation, before which the		// Computes and returns an insertion point operation, before which the
// the fused <srcId, dstId> loop nest can be inserted while preserving		// the fused <srcId, dstId> loop nest can be inserted while preserving
// dependences. Returns nullptr if no such insertion point is found.		// dependences. Returns nullptr if no such insertion point is found.
Operation *getFusedLoopNestInsertionPoint(unsigned srcId, unsigned dstId) {		Operation *getFusedLoopNestInsertionPoint(unsigned srcId, unsigned dstId) {
if (outEdges.count(srcId) == 0)		if (outEdges.count(srcId) == 0)
return getNode(dstId)->op;		return getNode(dstId)->op;

		// Skip if there is any defining node of 'dstId' that depends on 'srcId'.
		DenseSet<unsigned> definingNodes;
		gatherDefiningNodes(dstId, definingNodes);
		if (llvm::any_of(definingNodes, [&](unsigned id) {
		return hasDependencePath(srcId, id);
		})) {
		LLVM_DEBUG(llvm::dbgs()
		<< "Can't fuse: a defining op with a user in the dst "
		"loop has dependence from the src loop\n");
		return nullptr;
		}

// Build set of insts in range (srcId, dstId) which depend on 'srcId'.		// Build set of insts in range (srcId, dstId) which depend on 'srcId'.
SmallPtrSet<Operation *, 2> srcDepInsts;		SmallPtrSet<Operation *, 2> srcDepInsts;
for (auto &outEdge : outEdges[srcId])		for (auto &outEdge : outEdges[srcId])
if (outEdge.id != dstId)		if (outEdge.id != dstId)
srcDepInsts.insert(getNode(outEdge.id)->op);		srcDepInsts.insert(getNode(outEdge.id)->op);

// Build set of insts in range (srcId, dstId) on which 'dstId' depends.		// Build set of insts in range (srcId, dstId) on which 'dstId' depends.
SmallPtrSet<Operation *, 2> dstDepInsts;		SmallPtrSet<Operation *, 2> dstDepInsts;
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	bool MemRefDependenceGraph::init(FuncOp f) {

for (auto &idAndNode : nodes) {		for (auto &idAndNode : nodes) {
LLVM_DEBUG(llvm::dbgs() << "Create node " << idAndNode.first << " for:\n"		LLVM_DEBUG(llvm::dbgs() << "Create node " << idAndNode.first << " for:\n"
<< *(idAndNode.second.op) << "\n");		<< *(idAndNode.second.op) << "\n");
(void)idAndNode;		(void)idAndNode;
}		}

// Add dependence edges between nodes which produce SSA values and their		// Add dependence edges between nodes which produce SSA values and their
// users.		// users. Load ops can be considered as the ones producing SSA values.
for (auto &idAndNode : nodes) {		for (auto &idAndNode : nodes) {
const Node &node = idAndNode.second;		const Node &node = idAndNode.second;
if (!node.loads.empty() \|\| !node.stores.empty())		// Stores don't define SSA values, skip them.
		if (!node.stores.empty())
		bondhugulaUnsubmitted Done Reply Inline Actions This line always needed a comment. Stores anyway don't define SSA values. bondhugula: This line always needed a comment. Stores anyway don't define SSA values.
continue;		continue;
auto *opInst = node.op;		auto *opInst = node.op;
for (auto value : opInst->getResults()) {		for (auto value : opInst->getResults()) {
for (auto *user : value.getUsers()) {		for (auto *user : value.getUsers()) {
SmallVector<AffineForOp, 4> loops;		SmallVector<AffineForOp, 4> loops;
getLoopIVs(*user, &loops);		getLoopIVs(*user, &loops);
if (loops.empty())		if (loops.empty())
continue;		continue;
▲ Show 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	static Value createPrivateMemRef(AffineForOp forOp, Operation *srcStoreOpInst,
assert(succeeded(res) &&		assert(succeeded(res) &&
"replaceAllMemrefUsesWith should always succeed here");		"replaceAllMemrefUsesWith should always succeed here");
(void)res;		(void)res;
return newMemRef;		return newMemRef;
}		}

/// Walking from node 'srcId' to node 'dstId' (exclusive of 'srcId' and		/// Walking from node 'srcId' to node 'dstId' (exclusive of 'srcId' and
/// 'dstId'), if there is any non-affine operation accessing 'memref', return		/// 'dstId'), if there is any non-affine operation accessing 'memref', return
/// false. Otherwise, return true.		/// true. Otherwise, return false.
		dcaballeUnsubmitted Not Done Reply Inline Actions Thanks! dcaballe: Thanks!
static bool hasNonAffineUsersOnThePath(unsigned srcId, unsigned dstId,		static bool hasNonAffineUsersOnThePath(unsigned srcId, unsigned dstId,
Value memref,		Value memref,
MemRefDependenceGraph *mdg) {		MemRefDependenceGraph *mdg) {
auto *srcNode = mdg->getNode(srcId);		auto *srcNode = mdg->getNode(srcId);
auto *dstNode = mdg->getNode(dstId);		auto *dstNode = mdg->getNode(dstId);
Value::user_range users = memref.getUsers();		Value::user_range users = memref.getUsers();
// For each MemRefDependenceGraph's node that is between 'srcNode' and		// For each MemRefDependenceGraph's node that is between 'srcNode' and
// 'dstNode' (exclusive of 'srcNodes' and 'dstNode'), check whether any		// 'dstNode' (exclusive of 'srcNodes' and 'dstNode'), check whether any
▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines	while (!worklist.empty()) {
// Skip if this node was removed (fused into another node).		// Skip if this node was removed (fused into another node).
if (mdg->nodes.count(dstId) == 0)		if (mdg->nodes.count(dstId) == 0)
continue;		continue;
// Get 'dstNode' into which to attempt fusion.		// Get 'dstNode' into which to attempt fusion.
auto *dstNode = mdg->getNode(dstId);		auto *dstNode = mdg->getNode(dstId);
// Skip if 'dstNode' is not a loop nest.		// Skip if 'dstNode' is not a loop nest.
if (!isa<AffineForOp>(dstNode->op))		if (!isa<AffineForOp>(dstNode->op))
continue;		continue;
		// Skip if 'dstNode' is a loop nest returning values.
		// TODO: support loop nests that return values.
		if (dstNode->op->getNumResults() > 0)
		continue;

LLVM_DEBUG(llvm::dbgs() << "Evaluating dst loop " << dstId << "\n");		LLVM_DEBUG(llvm::dbgs() << "Evaluating dst loop " << dstId << "\n");

// Sink sequential loops in 'dstNode' (and thus raise parallel loops)		// Sink sequential loops in 'dstNode' (and thus raise parallel loops)
// while preserving relative order. This can increase the maximum loop		// while preserving relative order. This can increase the maximum loop
// depth at which we can fuse a slice of a producer loop nest into a		// depth at which we can fuse a slice of a producer loop nest into a
// consumer loop nest.		// consumer loop nest.
sinkSequentialLoops(dstNode);		sinkSequentialLoops(dstNode);
Show All 14 Lines	while (!worklist.empty()) {

for (unsigned srcId : llvm::reverse(srcIdCandidates)) {		for (unsigned srcId : llvm::reverse(srcIdCandidates)) {
// Get 'srcNode' from which to attempt fusion into 'dstNode'.		// Get 'srcNode' from which to attempt fusion into 'dstNode'.
auto *srcNode = mdg->getNode(srcId);		auto *srcNode = mdg->getNode(srcId);
auto srcAffineForOp = cast<AffineForOp>(srcNode->op);		auto srcAffineForOp = cast<AffineForOp>(srcNode->op);
LLVM_DEBUG(llvm::dbgs() << "Evaluating src loop " << srcId		LLVM_DEBUG(llvm::dbgs() << "Evaluating src loop " << srcId
<< " for dst loop " << dstId << "\n");		<< " for dst loop " << dstId << "\n");

		// Skip if 'srcNode' is a loop nest returning values.
		// TODO: support loop nests that return values.
		if (isa<AffineForOp>(srcNode->op) && srcNode->op->getNumResults() > 0)
		continue;

DenseSet<Value> producerConsumerMemrefs;		DenseSet<Value> producerConsumerMemrefs;
gatherProducerConsumerMemrefs(srcId, dstId, mdg,		gatherProducerConsumerMemrefs(srcId, dstId, mdg,
producerConsumerMemrefs);		producerConsumerMemrefs);

// Skip if 'srcNode' out edge count on any memref is greater than		// Skip if 'srcNode' out edge count on any memref is greater than
// 'maxSrcUserCount'.		// 'maxSrcUserCount'.
if (any_of(producerConsumerMemrefs, [&](Value memref) {		if (any_of(producerConsumerMemrefs, [&](Value memref) {
return mdg->getOutEdgeCount(srcNode->id, memref) >		return mdg->getOutEdgeCount(srcNode->id, memref) >
Show All 16 Lines	while (!worklist.empty()) {
if (!srcEscapingMemRefs.empty() &&		if (!srcEscapingMemRefs.empty() &&
hasNonAffineUsersOnThePath(srcId, dstId, mdg)) {		hasNonAffineUsersOnThePath(srcId, dstId, mdg)) {
LLVM_DEBUG(		LLVM_DEBUG(
llvm::dbgs()		llvm::dbgs()
<< "Can't fuse: non-affine users in between the loops\n.");		<< "Can't fuse: non-affine users in between the loops\n.");
continue;		continue;
}		}

// Compute an operation list insertion point for the fused loop
// nest which preserves dependences.		// nest which preserves dependences.
Operation *fusedLoopInsPoint =		Operation *fusedLoopInsPoint =
mdg->getFusedLoopNestInsertionPoint(srcNode->id, dstNode->id);		mdg->getFusedLoopNestInsertionPoint(srcNode->id, dstNode->id);
if (fusedLoopInsPoint == nullptr)		if (fusedLoopInsPoint == nullptr)
continue;		continue;

		bondhugulaUnsubmitted Done Reply Inline Actions Nit on terminology: please either use "... depends on ..." or "has a dependence from" / "has a dependence to". bondhugula: Nit on terminology: please either use "... depends on ..." or "has a dependence from" / "has a…
// Compute the innermost common loop depth for dstNode		// Compute the innermost common loop depth for dstNode
// producer-consumer loads/stores.		// producer-consumer loads/stores.
SmallVector<Operation *, 2> dstMemrefOps;		SmallVector<Operation *, 2> dstMemrefOps;
for (Operation *op : dstNode->loads)		for (Operation *op : dstNode->loads)
if (producerConsumerMemrefs.count(		if (producerConsumerMemrefs.count(
cast<AffineReadOpInterface>(op).getMemRef()) > 0)		cast<AffineReadOpInterface>(op).getMemRef()) > 0)
		bondhugulaUnsubmitted Done Reply Inline Actions in between the loops -> in between the loop nests ? It's otherwise ambiguous. "having dependence on the source loop" -> I think this needs to be rephrased for accuracy as well. "... with a user in .."? bondhugula: in between the loops -> in between the loop nests ? It's otherwise ambiguous. "having…
dstMemrefOps.push_back(op);		dstMemrefOps.push_back(op);
for (Operation *op : dstNode->stores)		for (Operation *op : dstNode->stores)
		dcaballeUnsubmitted Done Reply Inline Actions We are just checking for a dependence that would prevent finding an insertion point for the fused loop. Could we please move this to `getFusedLoopNestInsertionPoint`? In that way, sibling fusion will also get it. dcaballe: We are just checking for a dependence that would prevent finding an insertion point for the…
if (producerConsumerMemrefs.count(		if (producerConsumerMemrefs.count(
cast<AffineWriteOpInterface>(op).getMemRef()))		cast<AffineWriteOpInterface>(op).getMemRef()))
dstMemrefOps.push_back(op);		dstMemrefOps.push_back(op);
unsigned dstLoopDepthTest = getInnermostCommonLoopDepth(dstMemrefOps);		unsigned dstLoopDepthTest = getInnermostCommonLoopDepth(dstMemrefOps);

// Check the feasibility of fusing src loop nest into dst loop nest		// Check the feasibility of fusing src loop nest into dst loop nest
// at loop depths in range [1, dstLoopDepthTest].		// at loop depths in range [1, dstLoopDepthTest].
unsigned maxLegalFusionDepth = 0;		unsigned maxLegalFusionDepth = 0;
▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

mlir/test/Transforms/loop-fusion.mlir

Show First 20 Lines • Show All 2,829 Lines • ▼ Show 20 Lines	func @should_fuse_multi_store_producer_with_scaping_memrefs_and_preserve_src(
// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NOT: affine.for		// CHECK-NOT: affine.for

return		return
}		}

		// -----

		// CHECK-LABEL: func @should_fuse_defining_node_has_no_dependence_from_source_node
		func @should_fuse_defining_node_has_no_dependence_from_source_node(
		%a : memref<10xf32>, %b : memref<f32>) -> () {
		affine.for %i0 = 0 to 10 {
		%0 = affine.load %b[] : memref<f32>
		affine.store %0, %a[%i0] : memref<10xf32>
		}
		%0 = affine.load %b[] : memref<f32>
		affine.for %i1 = 0 to 10 {
		%1 = affine.load %a[%i1] : memref<10xf32>
		%2 = divf %0, %1 : f32
		}

		// Loops '%i0' and '%i1' should be fused even though there is a defining
		// node between the loops. It is because the node has no dependence from '%i0'.
		// CHECK: affine.load %{{.*}}[] : memref<f32>
		// CHECK-NEXT: affine.for %{{.*}} = 0 to 10 {
		// CHECK-NEXT: affine.load %{{.*}}[] : memref<f32>
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
		// CHECK-NEXT: divf
		// CHECK-NEXT: }
		// CHECK-NOT: affine.for
		return
		}

		// -----

		// CHECK-LABEL: func @should_not_fuse_defining_node_has_dependence_from_source_loop
		func @should_not_fuse_defining_node_has_dependence_from_source_loop(
		%a : memref<10xf32>, %b : memref<f32>) -> () {
		%cst = constant 0.000000e+00 : f32
		affine.for %i0 = 0 to 10 {
		affine.store %cst, %b[] : memref<f32>
		affine.store %cst, %a[%i0] : memref<10xf32>
		}
		%0 = affine.load %b[] : memref<f32>
		affine.for %i1 = 0 to 10 {
		%1 = affine.load %a[%i1] : memref<10xf32>
		%2 = divf %0, %1 : f32
		}

		// Loops '%i0' and '%i1' should not be fused because the defining node
		// of '%0' used in '%i1' has dependence from loop '%i0'.
		// CHECK: affine.for %{{.*}} = 0 to 10 {
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[] : memref<f32>
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: }
		// CHECK-NEXT: affine.load %{{.*}}[] : memref<f32>
		// CHECK: affine.for %{{.*}} = 0 to 10 {
		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
		// CHECK-NEXT: divf
		// CHECK-NEXT: }
		return
		}

		// -----

		// CHECK-LABEL: func @should_not_fuse_defining_node_has_transitive_dependence_from_source_loop
		func @should_not_fuse_defining_node_has_transitive_dependence_from_source_loop(
		%a : memref<10xf32>, %b : memref<10xf32>, %c : memref<f32>) -> () {
		%cst = constant 0.000000e+00 : f32
		affine.for %i0 = 0 to 10 {
		affine.store %cst, %a[%i0] : memref<10xf32>
		affine.store %cst, %b[%i0] : memref<10xf32>
		}
		affine.for %i1 = 0 to 10 {
		%1 = affine.load %b[%i1] : memref<10xf32>
		affine.store %1, %c[] : memref<f32>
		}
		%0 = affine.load %c[] : memref<f32>
		affine.for %i2 = 0 to 10 {
		%1 = affine.load %a[%i2] : memref<10xf32>
		%2 = divf %0, %1 : f32
		}

		// When loops '%i0' and '%i2' are evaluated first, they should not be
		// fused. The defining node of '%0' in loop '%i2' has transitive dependence
		// from loop '%i0'. After that, loops '%i0' and '%i1' are evaluated, and they
		// will be fused as usual.
		// CHECK: affine.for %{{.*}} = 0 to 10 {
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[] : memref<f32>
		// CHECK-NEXT: }
		// CHECK-NEXT: affine.load %{{.*}}[] : memref<f32>
		// CHECK: affine.for %{{.*}} = 0 to 10 {
		// CHECK-NEXT: affine.load %{{.}}[%{{.}}] : memref<10xf32>
		// CHECK-NEXT: divf
		// CHECK-NEXT: }
		// CHECK-NOT: affine.for
		return
		}

		// -----

		// TODO: fuse loop nests that returnvalues.
		dcaballeUnsubmitted Done Reply Inline Actions typo dcaballe: typo
		// CHECK-LABEL: func @should_not_fuse_dest_loop_nest_return_value
		func @should_not_fuse_dest_loop_nest_return_value(
		%a : memref<10xf32>) -> () {
		%cst = constant 0.000000e+00 : f32
		affine.for %i0 = 0 to 10 {
		affine.store %cst, %a[%i0] : memref<10xf32>
		}
		%b = affine.for %i1 = 0 to 10 step 2 iter_args(%b_iter = %cst) -> f32 {
		%load_a = affine.load %a[%i1] : memref<10xf32>
		affine.yield %load_a: f32
		}

		// CHECK: affine.for %{{.}} = {{.}}
		// CHECK: {{.}} = affine.for %{{.}} = {{.*}}
		dcaballeUnsubmitted Done Reply Inline Actions Please, add checks for ops in the loop body. Loops could be fused without removing the src loop. dcaballe: Please, add checks for ops in the loop body. Loops could be fused without removing the src loop.

		dcaballeUnsubmitted Done Reply Inline Actions You can remove the `%{{.}} =` if you are not capturing the return value. dcaballe:* You can remove the `%{{.*}} = ` if you are not capturing the return value.
		return
		}
		dcaballeUnsubmitted Done Reply Inline Actions Just matching the ops here would suffice: // CHECK-NEXT: affine.load // CHECK-NEXT: affine.yield dcaballe: Just matching the ops here would suffice: ``` // CHECK-NEXT: affine.load // CHECK-NEXT…

		// -----

		// TODO: fuse loop nests that return values.
		bondhugulaUnsubmitted Done Reply Inline Actions This TODO could be confusing. It's not clear whether handling return values/iter_args in fusion is worthwhile; it may be better/simpler to do a reg2mem, do the fusion without any iter_args, and then again do mem2reg. You can either update the comment to reflect this or simply drop the comment. bondhugula: This TODO could be confusing. It's not clear whether handling return values/iter_args in fusion…
		dcaballeUnsubmitted Not Done Reply Inline Actions +1 dcaballe: +1
		// CHECK-LABEL: func @should_not_fuse_src_loop_nest_return_value
		func @should_not_fuse_src_loop_nest_return_value(
		%a : memref<10xf32>) -> () {
		%cst = constant 1.000000e+00 : f32
		%b = affine.for %i = 0 to 10 step 2 iter_args(%b_iter = %cst) -> f32 {
		%c = addf %b_iter, %b_iter : f32
		affine.store %c, %a[%i] : memref<10xf32>
		affine.yield %c: f32
		}
		affine.for %i1 = 0 to 10 {
		%1 = affine.load %a[%i1] : memref<10xf32>
		}

		// CHECK: {{.}} = affine.for %{{.}} = {{.*}}
		// CHECK: affine.for %{{.}} = {{.}}

		return
		}