This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Affine: parallelize affine loops with reductions
ClosedPublic

Authored by ftynse on Apr 23 2021, 8:54 AM.

Details

Summary

Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.

Diff Detail

Event Timeline

ftynse created this revision.Apr 23 2021, 8:54 AM
ftynse requested review of this revision.Apr 23 2021, 8:55 AM
chelini accepted this revision.Apr 23 2021, 11:19 AM
chelini added inline comments.
mlir/lib/Dialect/Affine/Utils/Utils.cpp
167

I would remove braces.

170

Same here.

207

Here we expect *all* the iterators to be valid reductions. I would make this explicit using a comment.

279

It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use an explicit method like getNumInductionVars() ?

This revision is now accepted and ready to land.Apr 23 2021, 11:19 AM
kumasento accepted this revision.Apr 24 2021, 3:05 PM

LGTM!

mlir/lib/Dialect/Affine/Utils/Utils.cpp
167

Just wondering is it possible that both operands can be iter_args?

bondhugula requested changes to this revision.Apr 25 2021, 12:06 AM
bondhugula added a subscriber: bondhugula.

Great to see this functionality! Some comments.

mlir/include/mlir/Analysis/Utils.h
356–357

Update doc comment please to capture the new argument.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
163–164

This check can be moved up and the isa_and_nonull check and the getReductionOperationKind can be unified?

267–271

assert on reductionOp not being null?

mlir/test/Dialect/Affine/parallelize.mlir
168–169

Better to match the whole affine.parallel op itself for clarity at least in this one case?

This revision now requires changes to proceed.Apr 25 2021, 12:06 AM
ftynse updated this revision to Diff 340465.Apr 26 2021, 2:59 AM
ftynse marked 8 inline comments as done.

Address review.

ftynse added inline comments.Apr 26 2021, 3:03 AM
mlir/lib/Dialect/Affine/Utils/Utils.cpp
167

It is possible in the IR, but wouldn't pass the single-use check above.

207

How do you suggest to parallelize reduction loops where only *some* reductions are parallelizable? Sounds impossible to me, but still added a comment.

279

That's why there is a comment above saying "erase the block arguments that correspond to reductions". The /*name=*/ comments are used to indicate the argument names of the callee, llvm::seq in this case, so the correct name would be /*Begin=*/. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style.

chelini added inline comments.Apr 26 2021, 11:33 AM
mlir/lib/Dialect/Affine/Utils/Utils.cpp
207

I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions.

bondhugula added inline comments.Apr 26 2021, 12:22 PM
mlir/lib/Dialect/Affine/Utils/Utils.cpp
138–139

This part of the comment is now stale I think. There's nothing to keep in sync.

157

Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting.

164–167

This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this definition is being yielded at the right position.

bondhugula requested changes to this revision.Apr 26 2021, 7:33 PM
This revision now requires changes to proceed.Apr 26 2021, 7:33 PM

This functionality overlaps with my reduction vectorization patch https://reviews.llvm.org/D100694
We can merge both (our changes to the isLoopParallel function are essentially the same), but having two separate reduction recognizers would be strange, so maybe we should discuss if there is a way to converge them.

ftynse updated this revision to Diff 340890.Apr 27 2021, 9:57 AM
ftynse marked 3 inline comments as done.

Address more review.

ftynse added inline comments.Apr 27 2021, 9:57 AM
mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167

You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg;

Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled.

furthermore, you'll also need to check that the result of this definition is being yielded at the right position.

definition is the defining op of the pos-th yield operand...

@sgrechanik I am happy to use your version when it lands, or have this code updated if it lands first.

bondhugula accepted this revision.Apr 27 2021, 6:30 PM

This looks good to me. Please take a look at the two more comments below.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167

Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled"

mlir/test/Dialect/Affine/parallelize.mlir
168

Could you please check the body as well as you are replacing the old op if that sounds reasonable?

This revision is now accepted and ready to land.Apr 27 2021, 6:30 PM
bondhugula requested changes to this revision.Apr 27 2021, 6:45 PM
bondhugula added inline comments.
mlir/include/mlir/Analysis/Utils.h
359

I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the forOp's iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when reductionsAreParallel is set to true. In fact, ignoreIterArgs or ignoreIterArgDeps as the name is less confusing FWIW the way @sgrechanik has it.

If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection.

This revision now requires changes to proceed.Apr 27 2021, 6:45 PM
bondhugula added inline comments.Apr 27 2021, 6:49 PM
mlir/test/Dialect/Affine/parallelize.mlir
201

How about addf %it1, %it1? That was the one I was referring to. You could also have:

%0 = addf %it1, (%A[%i])
%1 = addf %it2, %0
yield %0, %1 : f32, f32

Just making sure the weird cases (non reductions) are guarded.

ftynse updated this revision to Diff 341092.Apr 28 2021, 1:11 AM
ftynse marked 6 inline comments as done.

Address even more review.

mlir/include/mlir/Analysis/Utils.h
359

Changed to ignoreReductionIterArgs. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing.

If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection.

This is exactly what the implementation has been doing all along.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167

You might have missed it, I added a full-blown check in dependsOnIterArgs that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering.

mlir/test/Dialect/Affine/parallelize.mlir
201

The first one is easily caught because there are two uses of %it1, not one.

The second case is caught by the backward slice check I added.

Added both as tests.

Sorry, it looks like I wasn't clear or I'm still missing something.

mlir/include/mlir/Analysis/Utils.h
358

I assume you meant set to false here? There's no check if it's set to true.

359

Looks like I'm missing something. I only see a single line change to isLoopParallel - the logic you refer to is in affineParallelize and not isLoopParallel unless I'm looking at an older diff! affineParallelize was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in isLoopParallel.

mlir/lib/Dialect/Affine/Utils/Utils.cpp
198–213

This requires a slight rephrase. This replacement is contingent on the reduction checks you have.

202–211

It's this part that I thought should go into isLoopParallel it's really a reduction check. You can have it return (sa output arg) the reduced values?

sgrechanik added inline comments.Apr 28 2021, 6:13 PM
mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167

Do the weird examples that can be caught by dependsOnIterArgs and cannot be caught by checking hasOneUse() really exist? Because it seems to me that checking hasOneUse() on every iter arg will exclude ALL weird cases, but maybe I miss something.

ftynse marked 3 inline comments as done.Apr 29 2021, 12:38 AM
ftynse added inline comments.
mlir/include/mlir/Analysis/Utils.h
358

No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names.

359

Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration.

mlir/lib/Analysis/Utils.cpp
1279

Here's the early exit @bondhugula ^

mlir/lib/Dialect/Affine/Utils/Utils.cpp
164–167

Yes, see the strange_butterfly case in tests. Each iter arg is used exactly once, in the same addf operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg.

ftynse updated this revision to Diff 341432.Apr 29 2021, 1:34 AM
ftynse marked 5 inline comments as done.

Adress the hopefully final iteration of review.

ftynse added inline comments.Apr 29 2021, 1:34 AM
mlir/include/mlir/Analysis/Utils.h
358

This is no longer relevant as I opted for an explicit list of reductions instead.

ftynse added inline comments.Apr 29 2021, 1:35 AM
mlir/lib/Analysis/AffineAnalysis.cpp
108 ↗(On Diff #341432)

Here's the early exit @bondhugula, it has always been here.

mlir/lib/Analysis/Utils.cpp
1279

This has moved to AffineAnalysis.cpp.

bondhugula accepted this revision.Apr 29 2021, 3:51 AM

LGTM. Just minor comments now.

mlir/include/mlir/Analysis/Utils.h
359

Not really. I did notice that one line change. But now I see the whole unification into isLoopParallel which was what I was requesting for.

mlir/lib/Analysis/AffineAnalysis.cpp
108 ↗(On Diff #341432)

Yes, I knew this had always been there. This wasn't the confusion.

111–158 ↗(On Diff #341432)

I assume all of this code beyond the early exit were just added in the recent iteration. This was what I was requesting for! I hope I was indeed looking at the whole diff earlier and not the last increment. The change to isLoopParallel in the previous one was just one line. Now it's integrated!

146–148 ↗(On Diff #341432)

Nit: The Inst suffix is legacy back when they were called instructions. Just srcOp and dstOp is fine. Likewise for Insts.

mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp
43

Doc comment for this one please.

This revision is now accepted and ready to land.Apr 29 2021, 3:51 AM
ftynse updated this revision to Diff 341461.Apr 29 2021, 4:15 AM
ftynse marked 3 inline comments as done.

Fix nits.

ftynse marked an inline comment as done.Apr 29 2021, 4:16 AM
ftynse added inline comments.
mlir/include/mlir/Analysis/Utils.h
359

Now you may be able to see it because it moved to a different file.

mlir/lib/Analysis/AffineAnalysis.cpp
111–158 ↗(On Diff #341432)

Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.cpp, Phabricator indicates this by a light yellow vertical bar next to the lines it can detect as moved.

146–148 ↗(On Diff #341432)

I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant changes, but okay.

This revision was landed with ongoing or failed builds.Apr 29 2021, 4:16 AM
This revision was automatically updated to reflect the committed changes.