Instead of modeling the schedule (execution order) of a piece of code as a flat
multi-dimensional mapping, we now retain the inherent tree structure of the
schedule. This makes the schedule a little bit more verbose, but the exposed
structure makes it both easier to work with and also faster to process.
Specifically, we can now easily walk the schedule tree to decide where to tile
loops, apply register tiling at the inner level and also perform full/partial
tile separation a lot easier.
This patch does not yet exploit these capabilities, but is mostly a 1-to-1
translation from flat schedules to schedule trees.
For more information see:
impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf
XXX: This patch is not yet finished, but just a proof of concept.
Open Issues
- The dependence analysis for reductions does not yet work. Specifically, this patch does not yet support the computation of reduction dependences. The current approach uses flat schedules extensively. It might make sense to first change the way we model reductions (as may-writes as suggested by Sven) and only then add this patch.
Performance measurements
0 1 2 3 Before Patch Patch+64bit-int NoPolly
gemm: 121ms 92ms 74ms 46ms
3mm: 600ms 340ms 203ms 60ms
NAS Parallel Benchmark (bt): 6m+ 1.2s 627ms 80ms
The performance compares
0) Polly trunk 1) Polly + this patch 2) Polly + this patch + 64bit integers 3) clang -O3 without Polly
The NAS benchmark seems to still have a performance problem. Adding/deriving
run-time assumptions seems to take a significant amount of time, making ScopInfo
with about 75% the most costly pass, whereas it commonly does not show up high
on the profile.
Example
Source Code
for (long i = 0; i < 100; i++) {
S1: A[i] += 1;
for (long j = 0; j < 100; j++)
S2: A[i+j] += 1;
S3: A[i] += 1;
}
Flat Schedule
{
Stmt_S1[i0] -> [i0, 0, 0]: i0 >= 0 and i0 <= 99; Stmt_S2[i0, i1] -> [i0, 1, i1]: i0 >= 0 and i0 <= 99 and i1 >= 0 and i1 <= 99 Stmt_S3[i0] -> [i0, 2, 0]: i0 >= 0 and i0 <= 99
}
Schedule Tree
domain: "{ Stmt_S1[i0] : i0 >= 0 and i0 <= 99;
Stmt_S2[i0, i1] : i0 >= 0 and i0 <= 99 and i1 >= 0 and i1 <= 99; Stmt_S3[i0] : i0 >= 0 and i0 <= 99 }"
child:
schedule: "[{ Stmt_S1[i0] -> [(i0)]; Stmt_S2[i0, i1] -> [(i0)]; Stmt_S3[i0] -> [(i0)]; }]" child: sequence: - filter: "{ Stmt_S1[i0] }" - filter: "{ Stmt_S2[i0, i1] }" child: schedule: "[{ Stmt_S2[i0, i1] -> [(i1)] }]" - filter: "{ Stmt_S3[i0] }"