Change of WAR, WAW generation:
- buildFlow(Sink, MustSource, MaySource, Sink) treates any flow of the form sink <- may source <- must source as a *may* dependence.
- we used to call:
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow);
- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.
- Now, we call WAR and WAW correctly.
Correct WAW:
Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow);
- Straightforward call.
Correct WAR:
Flow = buildFlow(Write, Read, Write, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow);
- We want the "shortest" WAR possible (exact dependences).
- We mark all the writes as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a write before reaching a sink.
- Leaves with direct (R -> W).
- This affects reduction generation since RED is built using WAW and WAR.
New StrictWAW for Reductions:
- We used to call: Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow);
- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any reads in between.
Example for strict WAW:
void f(int *A, int *B) { for(int i = 0; i <= 100; i++) { S0: *A += i; --WAW (S0 -> S0) --* | if (i >= 98) { WAR (S0 -> S1) S1: *B = *A; <--------------* } } }
- Since the writes to S0 happen *between* reads at S1, the entire loop is not a legal reduction. It is only a reduction in (0 <= i <= 98).
- To detect these sorts of patterns, we need to generate strict WAW that do not have reads between them.
Explanation: Why the new WAR dependences in tests are correct:
- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.
Code:
; void manyreductions(long *A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: *A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: *A += 42; ; ...
WAR dependence:
{ S0[1023, 1023] -> S1[0, 0] }
- Between S0[1023, 1023] and S1[0, 0], we will have the dependences:
S0[1023, 1023]: *-- tmp = *A (load0)--* WAR 2 add = tmp + 42 | *-> *A = add (store0) | WAR 1 S1[0, 0]: | tmp = *A (load1) | add = tmp + 42 | A = add (store1)<-*
- One may assume that WAR2 *hides* WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores.
- Hence, according to Polly, the code may have looked like this:
S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 | WAR 1 S1[0, 0]: | tmp = A (load1) | add = A + 42 | A = add (store1) <-*
- So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model.
My opinion is still that reads do not imply side-effects. Consider using a different explanation why they do not count as reduction.