Page MenuHomePhabricator

[ModuloSchedule] Introduce PeelingModuloScheduleExpander

Authored by jmolloy on Sep 2 2019, 8:19 AM.



This is the beginnings of a reimplementation of ModuloScheduleExpander. It works
by generating a single-block correct pipelined kernel and then peeling out the
prolog and epilogs.

This patch implements kernel generation as well as a validator that will
confirm the number of phis added is the same as the ModuloScheduleExpander.

Prolog and epilog peeling will come in a different patch.

Diff Detail


Event Timeline

jmolloy created this revision.Sep 2 2019, 8:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2019, 8:19 AM
ThomasRaoux added inline comments.Sep 2 2019, 9:33 PM
1398 ↗(On Diff #218373)

Outdated comment?

1400 ↗(On Diff #218373)

I'm trying to understand this formula.

For instance, if the ProducerStage is 2 and consumer stage is 0. When we pipeline we would run stage 2 iteration N with stage 0 iteration N+2, in this case wouldn't that mean we want 2 phi to get the value from stage 2 from iteration N? Maybe I'm missing something though.

1401 ↗(On Diff #218373)

Is the modulo needed? Looks like StageDiff should always be < NumStages

1519 ↗(On Diff #218373)

nit: should it be const?

jmolloy updated this revision to Diff 218421.Sep 3 2019, 3:03 AM
jmolloy marked 4 inline comments as done.

Thanks for the review Thomas!

1400 ↗(On Diff #218373)

Thanks for making me rethink this formula from the ground up. Some of these formulae have been quantatively formed rather than qualatitavely.

If ProducerStage > ConsumerStage, this can only be represented if ProducerStage - ConsumerStage == 1. Consider the following loop:

p = phi(z, D) // D is the phi init value
y = A p  // stage = 0
z = B y   // stage = 1 (cannot be stage=2 as distance(B, A) = 1).

Then the unrolled schedule is:

      A1(B0), B0
             A2(B1), B1

That is, B0 must be available in the same iteration as A1. Therefore the stage difference is bounded by 1 and additionally B0 must be produced before A1 in the rescheduled loop (Cycle(B) < Cycle(A)).

Note that this is only in the case where ProducerStage > ConsumerStage. In the opposite case we can always pad out the difference between stages with extra phis.

I've updated this formula and all the tests pass :)

ThomasRaoux accepted this revision.Sep 3 2019, 4:50 PM
ThomasRaoux added inline comments.
1400 ↗(On Diff #218421)

Thanks for explaining. Those properties are enforced by the swing scheduler ASAP and ALAP functions? Maybe a comment would make it easier to understand that part?

This revision is now accepted and ready to land.Sep 3 2019, 4:50 PM
jmolloy marked an inline comment as done.Sep 4 2019, 5:52 AM

Done, thanks!

This revision was automatically updated to reflect the committed changes.

No-asserts build fixed in rL370894. Apologies for the breakage, I was holding git wrong.