This patch introduces a utility to separate full tiles from partial

tiles when tiling affine loop nests where trip counts are unknown or

where tile sizes don't divide trip counts. A conditional guard is

generated to separate out the full tile (with constant trip count loops)

into the then block of an 'affine.if' and the partial tile to the else

block. The separation allows the 'then' block (which has constant trip

count loops) to be optimized better subsequently: for eg. for

unroll-and-jam, register tiling, vectorization without leading to

cleanup code, or to offload to accelerators. Among techniques from the

literature, the if/else based separation leads to the most compact

cleanup code for multi-dimensional cases (because a single version is

used to model all partial tiles).

INPUT

affine.for %i0 = 0 to %M { affine.for %i1 = 0 to %N { "foo"() : () -> () } }

OUTPUT AFTER TILING W/O SEPARATION

map0 = affine_map<(d0) -> (d0)> map1 = affine_map<(d0)[s0] -> (d0 + 32, s0)> affine.for %arg2 = 0 to %M step 32 { affine.for %arg3 = 0 to %N step 32 { affine.for %arg4 = #map0(%arg2) to min #map1(%arg2)[%M] { affine.for %arg5 = #map0(%arg3) to min #map1(%arg3)[%N] { "foo"() : () -> () } } } }

OUTPUT AFTER TILING WITH SEPARATION

map0 = affine_map<(d0) -> (d0)> map1 = affine_map<(d0) -> (d0 + 32)> map2 = affine_map<(d0)[s0] -> (d0 + 32, s0)> #set0 = affine_set<(d0, d1)[s0, s1] : (-d0 + s0 - 32 >= 0, -d1 + s1 - 32 >= 0)> affine.for %arg2 = 0 to %M step 32 { affine.for %arg3 = 0 to %N step 32 { affine.if #set0(%arg2, %arg3)[%M, %N] { // Full tile. affine.for %arg4 = #map0(%arg2) to #map1(%arg2) { affine.for %arg5 = #map0(%arg3) to #map1(%arg3) { "foo"() : () -> () } } } else { // Partial tile. affine.for %arg4 = #map0(%arg2) to min #map2(%arg2)[%M] { affine.for %arg5 = #map0(%arg3) to min #map2(%arg3)[%N] { "foo"() : () -> () } } } } }

The separation is tested via a cmd line flag on the loop tiling pass.

The utility itself allows one to pass in any band of contiguously nested

loops, and can be used by other transforms/utilities. The current

implementation works for hyperrectangular loop nests.

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>