Change of WAR, WAW generation:
- buildFlow(Sink, MustSource, MaySource, Sink) treates any flow of the form sink <- may source <- must source as a *may* dependence.
- we used to call:
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow);
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
Correct WAW:
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow);
- Straightforward call.
Correct WAR:
Flow = buildFlow(Write, Read, Write, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow);
- We want the "shortest" WAR possible (exact dependences).
- We mark all the writes as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a write before reaching a sink.
- Leaves with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
New StrictWAW for Reductions:
- We used to call: Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow);
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any reads in between.
Example for strict WAW:
void f(int *A, int *B) { for(int i = 0; i <= 100; i++) { S0: *A += i; --WAW (S0 -> S0) --* | if (i >= 98) { WAR (S0 -> S1) S1: *B = *A; <--------------* } } }
- Since the writes to S0 happen *between* reads at S1, the entire loop is not a legal reduction. It is only a reduction in (0 <= i <= 98).
- To detect these sorts of patterns, we need to generate strict WAW that do not have reads between them.
Explanation: Why the new WAR dependences in tests are correct:
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
Code:
; void manyreductions(long *A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: *A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: *A += 42; ; ...
WAR dependence:
{ S0[1023, 1023] -> S1[0, 0] }
- Between S0[1023, 1023] and S1[0, 0], we will have the dependences:
S0[1023, 1023]: *-- tmp = *A (load0)--* WAR 2 add = tmp + 42 | *-> *A = add (store0) | WAR 1 S1[0, 0]: | tmp = *A (load1) | add = tmp + 42 | A = add (store1)<-*
- One may assume that WAR2 *hides* WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 | WAR 1 S1[0, 0]: | tmp = A (load1) | add = A + 42 | A = add (store1) <-*
- So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model.