Introduce a basic support for parallelizing affine loops with reductions
expressed using iteration arguments. Affine parallelism detector now has a flag
to assume such reductions are parallel. The transformation handles a subset of
parallel reductions that are can be expressed using affine.parallel:
integer/float addition and multiplication. This requires to detect the
reduction operation since affine.parallel only supports a fixed set of
reduction operators.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
143 | Here we expect *all* the iterators to be valid reductions. I would make this explicit using a comment. | |
166 | I would remove braces. | |
169 | Same here. | |
215 | It may be not clear what this "1" means. Can we add a comment like: /* induction var */ or use an explicit method like getNumInductionVars() ? |
LGTM!
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
166 | Just wondering is it possible that both operands can be iter_args? |
Great to see this functionality! Some comments.
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
356–357 | Update doc comment please to capture the new argument. | |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
162–163 | This check can be moved up and the isa_and_nonull check and the getReductionOperationKind can be unified? | |
203–207 | assert on reductionOp not being null? | |
mlir/test/Dialect/Affine/parallelize.mlir | ||
169–170 | Better to match the whole affine.parallel op itself for clarity at least in this one case? |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
143 | How do you suggest to parallelize reduction loops where only *some* reductions are parallelizable? Sounds impossible to me, but still added a comment. | |
166 | It is possible in the IR, but wouldn't pass the single-use check above. | |
215 | That's why there is a comment above saying "erase the block arguments that correspond to reductions". The /*name=*/ comments are used to indicate the argument names of the callee, llvm::seq in this case, so the correct name would be /*Begin=*/. The linter will justifiably complain about any other comment because it only makes code more confusing for whoever familiar with the style. |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
143 | I don't have anything in particular in mind. I suggested adding a comment only to clarify when the function does not parallelize reductions. |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
137–138 | This part of the comment is now stale I think. There's nothing to keep in sync. | |
156 | Nit: A comment here perhaps saying AtomicRMWKind supports additional reduction kinds that we aren't detecting. | |
163–166 | This check doesn't appear to be enough unless I missed something. You'll need to check this "value being reduced" isn't itself an iter_arg or a function of the iter_arg; furthermore, you'll also need to check that the result of this definition is being yielded at the right position. |
This functionality overlaps with my reduction vectorization patch https://reviews.llvm.org/D100694
We can merge both (our changes to the isLoopParallel function are essentially the same), but having two separate reduction recognizers would be strange, so maybe we should discuss if there is a way to converge them.
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
163–166 |
Most of the cases were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled.
definition is the defining op of the pos-th yield operand... |
@sgrechanik I am happy to use your version when it lands, or have this code updated if it lands first.
This looks good to me. Please take a look at the two more comments below.
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
163–166 | Sounds good to me. But it'd be good to add a comment on that note that you mention - "... were actually filtered out because we need all iter args to be reductions to transform them, but there are indeed weird cases that need to be handled" | |
mlir/test/Dialect/Affine/parallelize.mlir | ||
169 | Could you please check the body as well as you are replacing the old op if that sounds reasonable? |
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
358 | I'll have the same concern with this method as with D100694 of @sgrechanik. At this point, we don't even know whether the forOp's iter_args are reduction vars - so this argument naming is weird. Also, the returned result is technically inaccurate when reductionsAreParallel is set to true. In fact, ignoreIterArgs or ignoreIterArgDeps as the name is less confusing FWIW the way @sgrechanik has it. If there are iter_args, can we integrate this method with the reduction detection logic itself that you have? If reductionsAreParallel is set to false, you can immediately return false the way you have it if there is at least one iter_arg; if it's true, then you can go ahead and do the actual reduction detection. |
mlir/test/Dialect/Affine/parallelize.mlir | ||
---|---|---|
206 | How about addf %it1, %it1? That was the one I was referring to. You could also have: %0 = addf %it1, (%A[%i]) %1 = addf %it2, %0 yield %0, %1 : f32, f32 Just making sure the weird cases (non reductions) are guarded. |
Address even more review.
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
358 | Changed to ignoreReductionIterArgs. We don't ignore _all_ iter args so anything that doesn't "reduction" in the name would be actually confusing.
This is exactly what the implementation has been doing all along. | |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
163–166 | You might have missed it, I added a full-blown check in dependsOnIterArgs that fails if any operand in the backward slice is an iter_arg, so this no longer depends on the filtering. | |
mlir/test/Dialect/Affine/parallelize.mlir | ||
206 | The first one is easily caught because there are two uses of %it1, not one. The second case is caught by the backward slice check I added. Added both as tests. |
Sorry, it looks like I wasn't clear or I'm still missing something.
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
357 | I assume you meant set to false here? There's no check if it's set to true. | |
358 | Looks like I'm missing something. I only see a single line change to isLoopParallel - the logic you refer to is in affineParallelize and not isLoopParallel unless I'm looking at an older diff! affineParallelize was mostly meant to be "just perform the replacement" while I was expecting the check including the one for reductions in isLoopParallel. | |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
134–145 | This requires a slight rephrase. This replacement is contingent on the reduction checks you have. | |
138–147 | It's this part that I thought should go into isLoopParallel it's really a reduction check. You can have it return (sa output arg) the reduced values? |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
---|---|---|
163–166 | Do the weird examples that can be caught by dependsOnIterArgs and cannot be caught by checking hasOneUse() really exist? Because it seems to me that checking hasOneUse() on every iter arg will exclude ALL weird cases, but maybe I miss something. |
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
357 | No, I meant what I said. If it is set to true, the parallelism check will ignore iter args that it can prove to be reductions; other iter args still make the loop non-parallel. If it is unset (or set to false if you wish), any iter arg makes loop non-parallel. Let's please not get too much into bikeshedding with names. | |
358 | Yes, you are missing something. Probably because Phabricator only shows you the diff between the latest version and the version you reviewed previously, not the base version. Line 1275 in Analysis/Utils.cpp contains the check and it was there since the first iteration. | |
mlir/lib/Analysis/Utils.cpp | ||
1273 | Here's the early exit @bondhugula ^ | |
mlir/lib/Dialect/Affine/Utils/Utils.cpp | ||
163–166 | Yes, see the strange_butterfly case in tests. Each iter arg is used exactly once, in the same addf operation that becomes the reduction. One of the operands may also be a value transitively dependent on the single use of another iter arg. |
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
357 | This is no longer relevant as I opted for an explicit list of reductions instead. |
mlir/lib/Analysis/AffineAnalysis.cpp | ||
---|---|---|
108 | Here's the early exit @bondhugula, it has always been here. | |
mlir/lib/Analysis/Utils.cpp | ||
1273 | This has moved to AffineAnalysis.cpp. |
LGTM. Just minor comments now.
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
358 | Not really. I did notice that one line change. But now I see the whole unification into isLoopParallel which was what I was requesting for. | |
mlir/lib/Analysis/AffineAnalysis.cpp | ||
108 | Yes, I knew this had always been there. This wasn't the confusion. | |
111–158 | I assume all of this code beyond the early exit were just added in the recent iteration. This was what I was requesting for! I hope I was indeed looking at the whole diff earlier and not the last increment. The change to isLoopParallel in the previous one was just one line. Now it's integrated! | |
146–148 | Nit: The Inst suffix is legacy back when they were called instructions. Just srcOp and dstOp is fine. Likewise for Insts. | |
mlir/lib/Dialect/Affine/Transforms/AffineParallelize.cpp | ||
44 | Doc comment for this one please. |
mlir/include/mlir/Analysis/Utils.h | ||
---|---|---|
358 | Now you may be able to see it because it moved to a different file. | |
mlir/lib/Analysis/AffineAnalysis.cpp | ||
111–158 | Part of this code (the original contents of isLoopParallel) is moved from lib/Analysis/Utils.cpp, Phabricator indicates this by a light yellow vertical bar next to the lines it can detect as moved. | |
146–148 | I'm just moving this code from a different file and I'm not a fan of sneaking in irrelevant changes, but okay. |
Update doc comment please to capture the new argument.