This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Move to schedule trees to store our schedule
AbandonedPublic

Authored by grosser on Apr 23 2015, 8:15 AM.

Details

Summary

Instead of modeling the schedule (execution order) of a piece of code as a flat
multi-dimensional mapping, we now retain the inherent tree structure of the
schedule. This makes the schedule a little bit more verbose, but the exposed
structure makes it both easier to work with and also faster to process.

Specifically, we can now easily walk the schedule tree to decide where to tile
loops, apply register tiling at the inner level and also perform full/partial
tile separation a lot easier.

This patch does not yet exploit these capabilities, but is mostly a 1-to-1
translation from flat schedules to schedule trees.

For more information see:

impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf

XXX: This patch is not yet finished, but just a proof of concept.

Open Issues

  • The dependence analysis for reductions does not yet work. Specifically, this patch does not yet support the computation of reduction dependences. The current approach uses flat schedules extensively. It might make sense to first change the way we model reductions (as may-writes as suggested by Sven) and only then add this patch.

Performance measurements

  0           1         2                 3
Before      Patch   Patch+64bit-int    NoPolly

gemm: 121ms 92ms 74ms 46ms
3mm: 600ms 340ms 203ms 60ms
NAS Parallel Benchmark (bt): 6m+ 1.2s 627ms 80ms

The performance compares

0) Polly trunk
1) Polly + this patch
2) Polly + this patch + 64bit integers
3) clang -O3 without Polly

The NAS benchmark seems to still have a performance problem. Adding/deriving
run-time assumptions seems to take a significant amount of time, making ScopInfo
with about 75% the most costly pass, whereas it commonly does not show up high
on the profile.

Example

Source Code

for (long i = 0; i < 100; i++) {

S1: A[i] += 1;

for (long j = 0; j < 100; j++)

S2: A[i+j] += 1;
S3: A[i] += 1;

}

Flat Schedule

{

Stmt_S1[i0]     -> [i0, 0, 0]: i0 >= 0 and i0 <= 99;
Stmt_S2[i0, i1] -> [i0, 1, i1]: i0 >= 0 and i0 <= 99 and i1 >= 0 and i1 <= 99
Stmt_S3[i0]     -> [i0, 2, 0]: i0 >= 0 and i0 <= 99

}

Schedule Tree

domain: "{ Stmt_S1[i0] : i0 >= 0 and i0 <= 99;

Stmt_S2[i0, i1] : i0 >= 0 and i0 <= 99 and i1 >= 0 and i1 <= 99;
Stmt_S3[i0] : i0 >= 0 and i0 <= 99 }"

child:

schedule: "[{ Stmt_S1[i0] -> [(i0)];
              Stmt_S2[i0, i1] -> [(i0)];
              Stmt_S3[i0] -> [(i0)];
            }]"
child:
  sequence:
  - filter: "{ Stmt_S1[i0] }"
  - filter: "{ Stmt_S2[i0, i1] }"
    child:
      schedule: "[{ Stmt_S2[i0, i1] -> [(i1)] }]"
  - filter: "{ Stmt_S3[i0] }"

Diff Detail

Event Timeline

grosser updated this revision to Diff 24306.Apr 23 2015, 8:15 AM
grosser retitled this revision from to [RFC] Move to schedule trees to store our schedule.
grosser updated this object.
grosser edited the test plan for this revision. (Show Details)
grosser added a subscriber: Unknown Object (MLST).

Test comment to see if pollydev receives review messages.

If it does: have a look, this is the schedule-tree commit

grosser updated this revision to Diff 24381.Apr 24 2015, 5:32 AM

The previous patch had some correctness issues slipping in, which invalidated
the original performance results. This patch should be more correct, but
again slower. We now convert from the schedule tree back to flat dependences
to allow our normal reduction dependence analysis to work.

jdoerfert edited edge metadata.Apr 26 2015, 9:17 AM

I'll work on the may write reduction dependences next week so we can integrate this.

jdoerfert resigned from this revision.Sep 27 2015, 6:05 PM
jdoerfert removed a reviewer: jdoerfert.

Already commited. Please close.

grosser abandoned this revision.Sep 27 2015, 6:09 PM

This has been integrated already.