This is an archive of the discontinued LLVM Phabricator instance.

[mlir][SCF] Canonicalize nested ParallelOps's
ClosedPublic

Authored by Hardcode84 on May 19 2021, 12:10 PM.

Diff Detail

Event Timeline

Hardcode84 created this revision.May 19 2021, 12:10 PM
Hardcode84 requested review of this revision.May 19 2021, 12:10 PM
ftynse requested changes to this revision.May 20 2021, 5:05 AM
ftynse added a reviewer: ftynse.
ftynse added inline comments.
mlir/lib/Dialect/SCF/SCF.cpp
1736

Could you rather use rewriter.mergeBlocks here? It lets you remap values and doesn't require copying the operation. Not necessarily in the bodyBuilder, just create a new ParallelOp and merge the body of the inner loop into that of newly created loop, then erase the duplicate terminator.

This revision now requires changes to proceed.May 20 2021, 5:07 AM
Hardcode84 added inline comments.May 20 2021, 8:35 AM
mlir/lib/Dialect/SCF/SCF.cpp
1736

I didn't quite get how to do this. If we create create new outer ParallelOp and try to merge block from old inner ParallelOp into it, this new block will still reference iterArgs from old outer ParallelOp (we cannot remap them via mergeBlocks api).

ftynse added inline comments.May 20 2021, 8:39 AM
mlir/lib/Dialect/SCF/SCF.cpp
1736
ftynse accepted this revision.May 20 2021, 8:43 AM
ftynse added inline comments.
mlir/lib/Dialect/SCF/SCF.cpp
1736

My bad, you mean the outer loop.

This revision is now accepted and ready to land.May 20 2021, 8:43 AM
Hardcode84 added inline comments.May 20 2021, 8:45 AM
mlir/lib/Dialect/SCF/SCF.cpp
1736

It can remap only source block arguments as I understand but iter vars from outer loop are no inner block arguments, they are just captured in inner block.

Anyway, there is another issue, I should also add check that outer block iter vars not being used as inner loop bounds/steps

check outer iter vars not used as bounds/steps for inner loop

This revision was automatically updated to reflect the committed changes.
herhut added a subscriber: herhut.May 25 2022, 7:15 AM

What was the motivation behind this canonicalization? In particular, why would a combined parallel loop be considered the canonical form?

We found this pattern because it undoes the tiling we do in some cases. I agree the two forms are semantically equivalent, so the transformation is correct. It is inconvenient for our use case, though.

Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2022, 7:15 AM

Isn't the whole point of canonicalization to convert semantically equivalent codes to the single form? So we should either always merge nested loops or always split them (which makes nested loops support in scf.parallel useless).

Isn't the whole point of canonicalization to convert semantically equivalent codes to the single form? So we should either always merge nested loops or always split them (which makes nested loops support in scf.parallel useless).

I agree with this. As a single scf.parallel op can represent multi-dimensional nests, the canonical form would be one where successive (1-d or n-d) scf.parallel ops are combined into a larger n-d scf.parallel op. However, a few issues with this revision:

  1. It's missing a commit summary; please don't leave it empty in such cases.
  2. Test cases are missing nests where one or more of the input scf.parallel were already multi-dimensional. This is important to exercise.
  3. The rewrite pattern is missing a doc comment and code comments at places.
mlir/lib/Dialect/SCF/SCF.cpp
1722

Can you just use llvm::is_contained? (In that case, this lambda isn't adding anything.)