Page MenuHomePhabricator

Do not destroy attrs in MergeNestedParallelLoops

Authored by gflegar on May 24 2022, 10:58 AM.



We can only merge nested loops if that does not destroy
attribute information. While it might be possible to merge
loops with attributes in some cases, for now we just disallow
merging if any of the two loops have additional attributes.

This change prevents the canonicalizer from destroying
information added with greadilyMapParallelSCFToGPU.

Diff Detail

Event Timeline

gflegar created this revision.May 24 2022, 10:58 AM
gflegar requested review of this revision.May 24 2022, 10:58 AM

I think the intended way is to call addWithLabel() instead. But I'm not sure that's what we want to do, it does seem clunky to exclude this pattern from canonicalization until we get to greedilyMapParallelSCFToGPU.

Would it be possible to instead apply the mapping to gpu early (when we tile the loops) and carry the attributes forward during canonicalization?

Sure, I can change to addWithLabel instead.

I tried delaying the canonicalizer pass till after the MapParallelLoops pass but that does not help. The canonicalizer still merges multiple loops (and destroys the attributes created by MapParallelLoops). So to make it work without removing the pattern, we would need to delay it all the way after the ConvertParallelLoopToGpu pass. However, that pass fails unless we've run the canonicalizer before it, didn't investigate why exactly yet. (here is the evolution of IR in the working pipeline that just disables this patterns, while here is what happens if we try to delay canonicalizer until after ConvertParallelLoopToGpu).

There's a bigger design question here about the current solution where we implicitly encode how a parallel loop maps to specific hardware by generating a specific nest of loops, which should be semantically equivalent to a single loop - and this is where things break, since the canonicalizer rightfully makes this assumption. Arguably, we should somehow explicitly encode the mapping in the future, so I am considering this CL to be only a quick workaround while we're figuring out how to best do that.

However, there might be a nicer workaround: I would say it's wrong for the canonicalizer to merge loops with different attributes, as that leads to the loss of some of them, so I can try to fix the pattern to not do that. That would work around our problem, so we can get the pipeline working (assuming all goes well). I would still consider it a workaround, as we're still being implicit about what the loop nest means, but it's at least not as much of a hack as the current approach.

gflegar updated this revision to Diff 432888.May 30 2022, 4:33 AM

Prevent merging loops with attrs instead

gflegar retitled this revision from Label MergeNestedParallelLoops patterns to Do not destroy attrs in MergeNestedParallelLoops.May 30 2022, 4:42 AM
gflegar edited the summary of this revision. (Show Details)

Worked like a charm.

Now I'm also thinking that encoding hardware mapping information in scf.parallel's attributes is probably explicit enough, as long as various passes do not destroy this information (and if they do, that should be treated as a bug).

csigg added a comment.May 30 2022, 5:51 AM

Looks good, thanks!

Could you please get another approval from e.g. Stephan or Nicolas, I'm not feeling confident enough to put the official stamp on it.


Checking whether attr is not in ParallelOp::getAttributeNames() might be slightly better.

gflegar added inline comments.May 30 2022, 9:50 AM

Is it? Would you know what exactly is the semantics of getAttributeNames()? Are all of the values returned by it mandatory arguments to ParallelOp?

My thinking here was that we should err on the side of not merging loops if it could lead to information loss. We obviously have to ignore mandatory attributes (as otherwise, we would end up merging nothing), but for everything else, we should not merge, unless someone explicitly states in which situations it is o.k. to merge a loop with a specific attribute.

csigg added inline comments.May 30 2022, 10:28 AM

Yes, getAttributeNames() returns all known op attributes. At the moment, this is equivalent to ParallelOp::getOperandSegmentSizeAttr().

My thinking is that if the op knows about an certain attribute, the canonicalizer should handle it.

gflegar added inline comments.May 30 2022, 10:58 AM

What I'm trying to avoid is a situation like the following:

You add your new known attribute is_awesome (optional boolean) that signifies you can do awesome optimizations on your parallel loop. When you're adding it, you're not aware about this merging loops canonicalizer pass.
You write a test that checks your loop is being awesomely optimized, e.g.:

scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) {
    <some operations>
} {is_awesome = 1}

and that works like a charm. You build your pipeline, everything is working, and you're happy and blissfully unaware of the loop merging canonicalization.
However, there's a potential problem. At some point your compiler gets the following IR:

scf.parallel (%arg0) = (%c0) to (%c16) step (%c1) {
  scf.parallel (%arg1) = (%c0) to (%c16) step (%c1) {
      <some operations>
  } {is_awesome = 1}
} {is_awesome = 1}

If we use getAttributeNames here, the canonicalizer will happily replace this loop with:

scf.parallel (%arg0, %arg1) = (%c0, %c0) to (%c16, %c16) step (%c1, %c1) {
  <some operations>

and now your loop is no longer as awesome.

If we explicitly list only the attributes we want to ignore while merging (as this CL does), the loop will not be merged.
Better yet, the canonicalize.mlir test (below) will likely fail when you add a new attribute, since loop merging will be broken, letting you know that you should decide how your new attribute should be handled during loop merger.

csigg added inline comments.May 30 2022, 11:32 AM

Good point, you convinced me.

gflegar marked 3 inline comments as done.May 31 2022, 3:31 AM

I don't think there are any guarantees that MLIR preserves unknown attributes during canonicalization. So this would be the first precedent in that direction.

Also, this only hides the problem, as the loops would still be combined if we ran canonicalize twice before attaching the mapping attribute. In essence, the issue is that tiling, mapping to gpu and the transformation to gpu dialect have to run after each other without being disturbed by other passes. So the best fix would be to ensure this is the case.

In essence, the issue is that tiling, mapping to gpu and the transformation to gpu dialect have to run after each other without being disturbed by other passes.

That wouldn't work either, as transformation to gpu relies on canonicalizing the output from tiling (specifically, folding the loop steps into constants).

gflegar abandoned this revision.May 31 2022, 7:20 AM

We can achieve what we need by adding LoopInvariantCodeMotionPass. Abandoning this.