[Polly][WIP] Scalar fully indexed expansion
ClosedPublic

Authored by niosega on Aug 12 2017, 1:50 PM.

Details

Summary

This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.

MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.

MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :

#define Ni 10000
#define Nj 10000
void mse(double A[Ni], double B[Nj]) {
  int i,j;
  double tmp = 6;
  for (i = 0; i < Ni; i++) {
    for (int j = 0; j<Nj; j++) {
      tmp = tmp + 2;
    }
    B[i] = tmp;
  }
}

Polly gives us the following statements and memory accesses :

Statements {
	Stmt_for_body
        Domain :=
            { Stmt_for_body[i0] : 0 <= i0 <= 9999 };
        Schedule :=
            { Stmt_for_body[i0] -> [i0, 0, 0] };
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
        Instructions {
              %tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
        }
	Stmt_for_inc
        Domain :=
            { Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
        Schedule :=
            { Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
        Instructions {
              %tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
              %add = fadd double %tmp.11, 2.000000e+00
              %exitcond = icmp ne i32 %inc, 10000
        }
	Stmt_for_end
        Domain :=
            { Stmt_for_end[i0] : 0 <= i0 <= 9999 };
        Schedule :=
            { Stmt_for_end[i0] -> [i0, 2, 0] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
            { Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
            { Stmt_for_end[i0] -> MemRef_B[i0] };
        Instructions {
              %add.lcssa = phi double [ %add, %for.inc ]
              store double %add.lcssa, double* %arrayidx, align 8
              %exitcond5 = icmp ne i64 %indvars.iv.next, 10000
        }
}

and the following dependences :

{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999; 
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998; 
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999; 
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }

When trying to expand this memory access :

{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };

The new access map would look like this :

{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }

The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.

Diff Detail

Repository
rL LLVM
niosega created this revision.Aug 12 2017, 1:50 PM

We talked about it on Friday but I think I can help out more with a small writeup.

First to your SCoP. The accesses to 'add_lcssa_phi' result from LLVM transforming the loop into "loop closed static single assignment" form, i.e.,
providing a single definition for values that live accross the loop boundaries outside of the loop. LLVM does this by inserting redundant PHI nodes for these values.
That's why you see a PHI node with a single operand in the LLVM IR right after the innermost loop.

You got (roughly) this after LCSSA (actually only tmp_lcssa):

#define Ni 10000
#define Nj 10000
void mse(double A[Ni], double B[Nj]) {
  int i,j;
  double tmp = 6;
  for (i = 0; i < Ni; i++) {
    // Either outside, or previous
    tmp_0 = PHI(tmp, tmp_lcssa);

    for (int j = 0; j<Nj; j++) {
      // Either outside, or previous
      tmp_1 = PHI(tmp_0, tmp_2);
      tmp_2 = tmp_1 + 2;
    }

    // tmp_2's lifetime exceeds the innermost loop
    tmp_lcssa = PHI(tmp_2);
    B[i] = tmp_lcssa;
  }
}

Polly then models each PHI node as a 'load' of new memory that was written at the end of the incoming block (numbering is differnt, sorry).

Statements {
	Stmt_for_body
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1] // Load current value of 'tmp' (PHI_0)
            { Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1] // Write for PHI_1
            { Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
        
	Stmt_for_inc
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1] // Write for PHI_1
            { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1] // Load current value of 'tmp' (PHI_1)
            { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] }; 
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1] // Write for PHI_2
            { Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
        
	Stmt_for_end
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1] // Write for PHI_0
            { Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
        ReadAccess :=	[Reduction Type: NONE] [Scalar: 1] // Load current value of 'tmp_lcssa' (PHI_2)
            { Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
        MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
            { Stmt_for_end[i0] -> MemRef_B[i0] };
}

Now when you switch the roles of Reads and Writes for PHI nodes you should always get a single read and one or more writes (see above). After PHI expansion it should virtually look like this:

void mse(double A[Ni], double B[Nj]) {
  int i,j;
  double tmp = 6;
  for (i = 0; i < Ni; i++) {
    // Either outside, or previous
    tmp_0 = PHI(tmp, tmp_lcssa); // READ: tmp_0[i] (Problem: Nobody is writing this value
                                                        (Copy in required), you should actually bail out on this example).
    // WRITE: tmp_1[i][0] (Initial value).

    for (int j = 0; j<Nj; j++) {
      // Either outside, or previous
      tmp_1 = PHI(tmp_0, tmp_2); // READ: tmp_1[i][j]
      tmp_2 = tmp_1 + 2;

      // WRITE: tmp_1[i][j+1] (We need to write the current value into the 'next' iteration because we will read the value from there).
      // WRITE: tmp_lcssa[i] (Got this one from LCSSA)
    }

    tmp_lcssa = PHI(tmp_2); // READ: tmp_lcssa[i]
    B[i] = tmp_lcssa;

    // WRITE: tmp_0[i+1] (We need to write the current value into the 'next' iteration because we will read the value from there).
  }
}

I can't see yet why your code specifically does the wrong thing, but I hope this helps anyway. @everyone correct me if i missed something, it was late :-)

lib/Transform/MaximalStaticExpansion.cpp
398 ↗(On Diff #110855)

or not of.

Besides, no need to verbally describe the if statement. Just remove the comment.

406 ↗(On Diff #110855)

Why 'expandAccordingToDependences'?

Style: I would handle that loop inside expandRead(s) and pass the Reads as parameter (You have to handle them all anyway inside the function, might as well iterate in there).

I think what would make everything clearer (to me at least) would be a naming as follows:

expandAccess (Read/Write): Your expandAccordingToDomain.
redirect/mapAccess (Read/Write): Your expandAccordingToDependences.

The first does the array allocation, the second redirects/maps the access(es) to the new array.

Meinersbur added inline comments.Aug 14 2017, 5:33 AM
lib/Transform/MaximalStaticExpansion.cpp
84–85 ↗(On Diff #110855)

[Style] Doxygen expects @param ParameterName, that is, it interprets If as a parameter name.

369–371 ↗(On Diff #110855)

[Style] No parenthesis around single statements.

test/MaximalStaticExpansion/non_working_phi.ll
1 ↗(On Diff #110855)

Try not to add other passes than the one you are testing. That makes the test case more robust against changes in those passes (now that -polly-position=before-vectorizer is default, and the new pass manager doesn't even have an "early" extension point, -polly-canonicalize might be removed in the mid-term).

Use opt -polly-canonicalize -S non_working_phi.ll > non_working_phi.ll to produce an .ll file which doesn't need --polly-canonicalize.

If you are using clang, use -polly-dump-before-file=non_working_phi.ll to generate an .ll file that does not need any processing.

niosega updated this revision to Diff 111684.Aug 18 2017, 9:17 AM

Rebase with the new version of master to fix the bug of not correct dependencies (https://reviews.llvm.org/D36791).
Take Michael and Andreas comments into account.

Some test cases for phi expansion need to be added. But the one in place output the expected memory accesses.

I need to add a check before expansion to avoid expansion of Phi which need read from original scalar.

niosega added inline comments.Aug 18 2017, 9:23 AM
test/MaximalStaticExpansion/working_phi_expansion.ll
22 ↗(On Diff #111684)

This array should not be expanded because the expansion would lead to a read to the original array.
But the expanded memory accesses look like what is expected from @simbuerg analysis (in a previous comment).

A check before expansion will be added in a next revision.

niosega updated this revision to Diff 111698.Aug 18 2017, 10:30 AM

Add a check that verify, before expansion (in isExpandable) if the expansion would lead to a read from the original value of the scalar.

niosega updated this revision to Diff 111821.Aug 19 2017, 8:05 AM

Add test cases for Phi expansion and one 'non-trivial' for array expansion.

niosega added inline comments.Aug 19 2017, 8:08 AM
test/MaximalStaticExpansion/working_deps_between_inners_phi.ll
33 ↗(On Diff #111821)

The arrays are not always coming in the same order. Is there a better solution than CHECK-DAG to handle such checks ?

Meinersbur accepted this revision.Aug 20 2017, 2:14 PM
Meinersbur added inline comments.
test/MaximalStaticExpansion/working_deps_between_inners_phi.ll
33 ↗(On Diff #111821)

If there is an indeterminism, we should make it deterministic. Most ofthen this is because we are using some DenseMap or DenseMap whose iteration order depends on memory addresses. We most commonly resolve this be replacing them by a MapVector/SetVector.

In this case, ArrayInfoSetTy is already a SetVector. Maybe it is where the elements are added.

Fixing that would be nice, but not required.

This revision is now accepted and ready to land.Aug 20 2017, 2:14 PM
This revision was automatically updated to reflect the committed changes.