This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add naive fusion of parallel loops.
AbandonedPublic

Authored by pifon2a on Feb 13 2020, 4:35 AM.

Diff Detail

Event Timeline

pifon2a created this revision.Feb 13 2020, 4:35 AM
rriddle requested changes to this revision.Feb 13 2020, 11:23 AM
rriddle added inline comments.
mlir/lib/Transforms/ParallelLoopFusion.cpp
21 ↗(On Diff #244389)

using namespace mlir;

26 ↗(On Diff #244389)

Use /// for top-level comments.

27 ↗(On Diff #244389)

Only classes should be in anonymous namespaces, functions should be in the global namespace and marked static.

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

29 ↗(On Diff #244389)

nit: -> WalkResult is unnecessary.

40 ↗(On Diff #244389)

nit: Can you just use std::equal for now?

56 ↗(On Diff #244389)

nit: Use LogicalResult instead of boolean.

75 ↗(On Diff #244389)

Drop trivial braces.

85 ↗(On Diff #244389)

nit: Remove this temporary value and just return the result directly. It doesn't really help readability at all.

117 ↗(On Diff #244389)

nit: block.getOps<ParallelOp>()

123 ↗(On Diff #244389)

nit: Drop all of these trivial braces.

136 ↗(On Diff #244389)

Drop the mlir:: on each of these.

This revision now requires changes to proceed.Feb 13 2020, 11:23 AM
mehdi_amini added inline comments.Feb 13 2020, 10:46 PM
mlir/lib/Transforms/ParallelLoopFusion.cpp
137 ↗(On Diff #244389)

Can this pass be located under the loop dialect?

pifon2a updated this revision to Diff 244576.Feb 13 2020, 11:18 PM
pifon2a marked 12 inline comments as done.

Addressed the comments.

pifon2a added inline comments.Feb 13 2020, 11:19 PM
mlir/lib/Transforms/ParallelLoopFusion.cpp
40 ↗(On Diff #244389)

fair enough :)

85 ↗(On Diff #244389)

thanks. It was an artefact from logging that I used to have here when debugging.

pifon2a marked an inline comment as done.Feb 13 2020, 11:20 PM
pifon2a marked an inline comment as done.Feb 13 2020, 11:26 PM
pifon2a added inline comments.
mlir/lib/Transforms/ParallelLoopFusion.cpp
137 ↗(On Diff #244389)

@mehdi_amini Do you mean moving this code to mlir/lib/Transforms/LoopFusion.cpp where fusion on affine loops is implemented? That file is quite big already. Or do you mean having it in Dialect/LoopOps/Transforms similarly to Dialect/Linalg/Transforms? If it is the latter, then there might be several other passes in lib/Transforms that have to be moved to respective directories.

mehdi_amini added inline comments.Feb 13 2020, 11:31 PM
mlir/lib/Transforms/ParallelLoopFusion.cpp
137 ↗(On Diff #244389)

yes there are several passes we need to move under their respective dialects.
Most of the affine stuff is not under the affine dialect, because that's how MLIR started at the very beginning, before the dialect folder was even created.

pifon2a updated this revision to Diff 244609.Feb 14 2020, 3:29 AM

Fixed CMake build.

herhut requested changes to this revision.Feb 14 2020, 4:16 AM

Cool, great start!

mlir/include/mlir/Dialect/LoopOps/Passes.h
2

This is lacking the LLVM header comment.

mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
40

Avoid named TODO in llvm code base. I am an offender, as well, but I was told to avoid this in the future.

49

You also need that ploop2 does not write to buffers that ploop1 reads. Both loops are fully independent if WRITES(ploop1) U READS(ploop2) is empty and READS(ploop1) U WRITES(ploop2) is empty. As we have sequential execution order within one iteration, we can ignore reads and writes between the two loops that are on the same index in the context of fusion.

50

Can you add a comment what the map contains? This is a map from ploop1 indices to ploop2 indices I assume?

66

This is sufficient for memrefs that are defined outside of the loops. However, memrefs that are created within the loop, e.g. by a subview, will not be handled by this. As a starter, it would be ok to bail out in these cases.

71

Does something like llvm::all_of(llvm::zip(storeIndices, loadIndices), std::equal) work? Just curious, no need to go there.

111

What happens if there is a read or write inbetween the ploops? As a start, you could consider immediately adjacent loops or loops where we have no sideffecting ops inbetween.

This revision now requires changes to proceed.Feb 14 2020, 4:16 AM
pifon2a updated this revision to Diff 245106.Feb 18 2020, 1:46 AM
pifon2a marked 7 inline comments as done.

Addressed the comments.

pifon2a added inline comments.Feb 18 2020, 1:48 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
71

should work with a custom comparator

herhut requested changes to this revision.Feb 18 2020, 3:09 AM
herhut added inline comments.
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
56

Storing to a locally defined buffer also should bail.

91

This should be the dual to the above function with the same comments regarding aliasing. Without alias information, we have to bail on the case where we load or store to a locally defined buffer.

mlir/test/Dialect/Loops/parallel-loop-fusion.mlir
33

Can you drop the {temp = true} from the tests. They are just noise here.

277

This would already fail due to not-matching indices. Maybe load from A instead?

280

Why is this case illegal? If they store to the same buf but at the same index it should be fine.

292

There are 4 cases for this. Read/write to local allocation in loop1/loop2. Essentially any effect on a memref we do not understand.

This revision now requires changes to proceed.Feb 18 2020, 3:09 AM
pifon2a updated this revision to Diff 245129.Feb 18 2020, 4:56 AM
pifon2a marked 5 inline comments as done.

Addressed the comments again.

pifon2a marked 2 inline comments as done.Feb 18 2020, 4:57 AM
pifon2a updated this revision to Diff 245131.Feb 18 2020, 4:59 AM

AAAAAAAAAAAAA

herhut added inline comments.Feb 19 2020, 1:36 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
58

Maybe `haveNoReadsAfterWriteExceptSameIndex'?

101

Would it not suffice to call haveNoReadAfterWritesExpectSameIndex with arguments reversed?

pifon2a updated this revision to Diff 245354.Feb 19 2020, 2:13 AM

Unify deps check.

herhut added inline comments.Feb 19 2020, 3:34 AM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
49

the the -> the

143

Can you extend this so it applies to any operation and traverses its regions? That way it can also be used, e.g., on a ParallelLoop to fuse loops in its body.

173

This could be a pass that runs on any Operation then, not necessarily a function.

184

This is over-committing a bit. It does not fuse loop nests.

mlir/test/Dialect/Loops/parallel-loop-fusion.mlir
262

How is this different from above?

pifon2a marked 9 inline comments as done.Feb 19 2020, 4:41 AM
pifon2a added inline comments.
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
101

No, because we are using BlockAndValueMapping that has only a map from IVs of ploop1 to IVs to ploop2. We can have the inverse map though.

mlir/test/Dialect/Loops/parallel-loop-fusion.mlir
262

renamed to common_buf to make it more visible.

pifon2a updated this revision to Diff 245378.Feb 19 2020, 4:41 AM
pifon2a marked 2 inline comments as done.

Blabla

herhut accepted this revision.Feb 19 2020, 5:26 AM
rriddle added inline comments.Feb 19 2020, 12:20 PM
mlir/lib/Dialect/LoopOps/Transforms/ParallelLoopFusion.cpp
55

nit: Please use /// for top-level comments.

161

Please remove the debugging here.

rriddle accepted this revision.Feb 19 2020, 12:20 PM

Thanks!

This revision is now accepted and ready to land.Feb 19 2020, 12:20 PM

Are we sure we want another parallel SW stack to do fusion here?
I'd love to understand the use cases that are fundamentally different from what we'd expect to do with Linalg + Affine (with or without Intel's multi-for).

nicolasvasilache requested changes to this revision.Feb 23 2020, 7:34 PM

Marking as blocker to be sure my question is addressed, sorry for being late to the party as I was away last week.

This revision now requires changes to proceed.Feb 23 2020, 7:34 PM

Are we sure we want another parallel SW stack to do fusion here?
I'd love to understand the use cases that are fundamentally different from what we'd expect to do with Linalg + Affine (with or without Intel's multi-for).

I do not see a parallel SW stack. This is pretty much structural fusion on parallel loop nests, which are the bottom layer before we go to GPU. The reason this exists is that we do not have all code generation come through LinAlg and would like to combine loops at the lowest level. I do not expect for this to grow sophisticated dependency analysis and get complex. I agree that we will need a common analysis engine that can be reused for different loop-like structures. That is probably easier designed once we have more approaches implemented in full.

Regarding affine, I'd be happy to use loop fusion on affine.parallel loops, as well. It would need to support reduction, though.

Until then, I'd like to land this structural fusion to serve a current code-generation need we have.

ftynse added a subscriber: ftynse.Feb 24 2020, 6:26 AM

I had a similar comment on the ploop tiling diff, @nicolasvasilache... Linalg fusion is also structural in that sense, but we need to decouple the transformation implementation from any of the dialect-specific concerns. I don't see how many abstraction layers and templates we'd have to add to reuse code across dialects and whether it will be worth the effort. I suppose somebody with deep experience in loop stuff (that is you or me) should look into refactoring this.

pifon2a abandoned this revision.Feb 28 2020, 8:45 AM