This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Introduce full/partial tile separation using if/else
ClosedPublic

Authored by bondhugula on Mar 24 2020, 6:30 AM.

Details

Summary

This patch introduces a utility to separate full tiles from partial
tiles when tiling affine loop nests where trip counts are unknown or
where tile sizes don't divide trip counts. A conditional guard is
generated to separate out the full tile (with constant trip count loops)
into the then block of an 'affine.if' and the partial tile to the else
block. The separation allows the 'then' block (which has constant trip
count loops) to be optimized better subsequently: for eg. for
unroll-and-jam, register tiling, vectorization without leading to
cleanup code, or to offload to accelerators. Among techniques from the
literature, the if/else based separation leads to the most compact
cleanup code for multi-dimensional cases (because a single version is
used to model all partial tiles).

INPUT

affine.for %i0 = 0 to %M {
  affine.for %i1 = 0 to %N {
    "foo"() : () -> ()
  }
}

OUTPUT AFTER TILING W/O SEPARATION

map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

affine.for %arg2 = 0 to %M step 32 {
  affine.for %arg3 = 0 to %N step 32 {
    affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] {
      affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] {
        "foo"() : () -> ()
      }
    }
  }
}

OUTPUT AFTER TILING WITH SEPARATION

map0 = affine_map<(d0) -> (d0)>
map1 = affine_map<(d0) -> (d0 + 32)>
map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)>

#set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)>

affine.for %arg2 = 0 to %M step 32 {
  affine.for %arg3 = 0 to %N step 32 {
    affine.if #set0(%arg2, %arg3)[%M, %N] {
      // Full tile.
      affine.for %arg4 = #map0(%arg2) to #map1(%arg2) {
        affine.for %arg5 = #map0(%arg3) to #map1(%arg3) {
          "foo"() : () -> ()
        }
      }
    } else {
      // Partial tile.
      affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] {
        affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] {
          "foo"() : () -> ()
        }
      }
    }
  }
}

The separation is tested via a cmd line flag on the loop tiling pass.
The utility itself allows one to pass in any band of contiguously nested
loops, and can be used by other transforms/utilities. The current
implementation works for hyperrectangular loop nests.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Diff Detail

Event Timeline

bondhugula created this revision.Mar 24 2020, 6:30 AM
bondhugula edited the summary of this revision. (Show Details)Mar 24 2020, 6:43 AM
bondhugula edited the summary of this revision. (Show Details)
andydavis1 added inline comments.Mar 25 2020, 4:52 PM
mlir/include/mlir/Analysis/AffineStructures.h
508

"Ind" is a bit too mysterious. How about "removeIndependentConstraints" or "removeIndepConstraints".

mlir/lib/Analysis/AffineStructures.cpp
1245

Document what this block of code does.

1249

Can this loop be moved out into a findEqWithNonZero helper function or lambda?

1908

Add this before the resize:
assert(inequalities.size() >= numReservedCols);

2809

assert(NumDims >= 1)

mlir/lib/Transforms/Utils/LoopUtils.cpp
2234

This function is kind of long. Would make sense to break it in two, and start the bottom half here?

bondhugula marked 4 inline comments as done.

Address review comments

Thanks for the review!

mlir/lib/Analysis/AffineStructures.cpp
1249

Done - but called it containsConstraintDependentOnRange and reused it for the inequalities loop above as well.

bondhugula marked 3 inline comments as done.Mar 25 2020, 11:07 PM
bondhugula added inline comments.
mlir/lib/Analysis/AffineStructures.cpp
2809

Added this at the very top as
assert(pos < numDims)

bondhugula marked an inline comment as done.

Address some more review comments.

bondhugula marked 2 inline comments as done.

Address last review comment - split separateFullTiles

mlir/lib/Transforms/Utils/LoopUtils.cpp
2234

Done. I actually moved the first half (along with the 'add the body' part') into its own function - createFullTiles. This was logically one thing. That simplifies the cleanup on error as well and leads to a smaller separateFullTiles. Thanks.

Harbormaster completed remote builds in B50492: Diff 252750.
andydavis1 accepted this revision.Mar 27 2020, 3:09 PM
This revision is now accepted and ready to land.Mar 27 2020, 3:09 PM
This revision was automatically updated to reflect the committed changes.

Is this a transformation that can be done on a tiled loop nest without the if/else separation?

mlir/lib/Dialect/Affine/Transforms/LoopTiling.cpp
48

Can you replace all these with pass options?