This is an archive of the discontinued LLVM Phabricator instance.

[Polly][WIP] Fully-Indexed static expansion
ClosedPublic

Authored by niosega on Jul 4 2017, 7:18 AM.

Details

Summary

The idea of this patch is to implement a mechanism of fully-indexed static expansion.

The goal of this patch is to be able to expand every memory access to a fully indexed one. For example from this original source code :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[j] = j;
T: A[i] = B[i]

After the pass, we want this :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[i, j] = j;
T: A[i] = B[i, i]

We are, for now, unable to apply expansion on some cases :

  • Scalar access
  • Multiple writes per SAI
  • MayWrite Access
  • Expansion that leads to an access to the original array

Diff Detail

Event Timeline

niosega created this revision.Jul 4 2017, 7:18 AM
niosega created this object with visibility "Custom Policy".
niosega created this object with edit policy "Custom Policy".
niosega edited the summary of this revision. (Show Details)Jul 5 2017, 2:04 AM
niosega added a reviewer: Meinersbur.
niosega changed the visibility from "Custom Policy" to "Public (No Login Required)".
niosega changed the edit policy from "Custom Policy" to "All Users".
niosega added subscribers: pollydev, llvm-commits.
niosega added inline comments.Jul 5 2017, 2:10 AM
lib/Transform/MaximalStaticExpansion.cpp
68

By reading and trying to understand the work done in DeLICM and JSONImporter::importAccesses, I had begin the implementation of my pass. But I am stuck in building the new map. For example, if we have the following c code :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     tmp = A[N+i]*tmp;

After the pass, we want this :

 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
T:      tmp[i, j] = A[N+i]*tmp;

This means that I want to transform the map of S, [N] -> {S[i,j] -> tmp}, to the map of T, [N] -> {S[i,j] -> tmp[i, j]}. But i was not able to find a way to do that. Anybody has an idea on how to proceed ?

test/MaximalStaticExpansion/no_optim.ll
1 ↗(On Diff #105173)

I try to create a test case with the folowing c code :

#define Ni 2000

double mse(double A[Ni], int N) {
  int i;
  double tmp = 2;
  for (i = 0; i < Ni; i++) {
    for (int j = 0; j<Ni; j++) {
      tmp = A[N+i]*tmp;
    }
  }
  return tmp;
}

I use the following commands to generate the IR :

clang -O2 -S -emit-llvm pure_c_main.c
opt -S  -mem2reg -polly-scops -polly-export-jscop pure_c_main.ll

and this command to detect the SCOP :

opt -polly-process-unprofitable  -polly-remarks-minimal  -polly-use-llvm-names -polly-scops -analyze  -polly-export-jscop  no_optim.ll

When I use this commands, polly detects scop but the IR is optimize due to the O2 in clang. I'd like to have a non optimize version of the IR so that the IR is directly linked to the c code. But when I remove the -02, polly does not detect any scop. Anybody has an idea why ?

niosega updated this revision to Diff 106279.EditedJul 12 2017, 12:28 PM

Fully-Indexed expansion of the write accesses :

  • Build the new access map from the current access map
  • Create a new SAI for the expanded version of the access array or scalar
  • Modify the memory access to the new SAI

Seems to be working for both scalars and non-scalars access.

For now, the test is only based on dump comparison because the code is broken : the reads are still to the old SAI.

Meinersbur added inline comments.Jul 13 2017, 3:40 AM
lib/Support/RegisterPasses.cpp
319–320

Suggestion: Put it behind DeadCodeElimination? It uses DependenceInfo and should have the same results even before MSE.

lib/Transform/MaximalStaticExpansion.cpp
2

Remove - Expand the memory access. No other header does this.

13

To be consistent with the other files, add an empty line before //===------

Meinersbur added inline comments.Jul 13 2017, 3:40 AM
lib/Transform/MaximalStaticExpansion.cpp
41

Typo: transformations

86

Check for whther this returns nullptr. At least an assertion.

Suggestion: Get the domain sizes from the writing statement's domain.

106–109

isl_dim_in and isl_dim_out dimensions have no dim_id, so this is not necessary.

112

At this point, in the regression test, we have

{ Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[o0] }

(i0 and o0 are unconnected)

That is, every statement instance accesses all array elements. It should be something like:

{ Stmt_for_cond1_preheader[i0] -> MemRef_tmp_11__phi1_expanded[i0] }
136

Use inline declarations:

bool CorrectWrite = expandWrite(S, MA);
138

We cannot return early with transformations only partially applied. The SCoP representation will be inconsistent and at best passes after this will crash, at worst we miscompile.

Please check in advance if a transformation can be applied before trying to apply it.

In this case, you may want to check to each ScopArrayInfo whether it is applicable. The class ScalarDefUseChains in DeLICM.cpp might be helpful to get the accesses of a SAI. I think we will sooner-or-later integrate it into ScopInfo.

148

runOnScop returns true if and only if the IR has been modified. We only modify the SCoP representation, therefore we only return false.

152

In printScop, you must print to OS. It will print to stdout instead to stderr. This should also make the regression test simpler.

156

You will need to add AU.addRequired<DependenceInfo>(); here at some point.

test/MaximalStaticExpansion/mse___%for.cond1.preheader---%for.end8.jscop
1 ↗(On Diff #106279)

What is this file needed for?

niosega updated this revision to Diff 107505.Jul 20 2017, 7:47 AM
niosega marked 5 inline comments as done.

In this revision, we have done :

  • Use of c++ bindings for isl instead of direct isl
  • Implementation of the reads expansion
  • Add constraint for the write map so that the in dims are link to the out dims
  • Remove useless JSCoP file
  • Create a new expanded SAI for each statement
  • Add a check method before doing the expansion to avoid partial expansion

Problems :

  • Do I have to expand already fully-indexed array (for example, the first write of B in the test case "too-many-writes.ll" ) ?

For now, we can not expand :

  • Scalar access
  • MayWrite access
  • SAI with more than one write
  • SAI with read that can cause partial read access (because polly can not handle partial read access)
niosega marked 6 inline comments as done.Jul 20 2017, 7:52 AM
Meinersbur edited edge metadata.Jul 20 2017, 12:34 PM

[Suggestion] Add a regression test where generated LLVM-IR is tested.

include/polly/ScopInfo.h
840–842

[Style] Why was this moved?

lib/Transform/MaximalStaticExpansion.cpp
47

[Style] Instead of a set of not expandable arrays, why not using a set of expandable arrays? It feels safer because if an array is just missing in the set for whatever reason, it would not default to expand it.

[Style] std::set is rarely used in LLVM. There are alternitives: See http://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallset-h .

51

[Typo] checkExpandability

59–61

[Style] Please make them doxygen comments (///)

The doxygen style for parameters we usually use is.

@param Dependences The RAW dependences of the SCoP @p S.
77

[Style] Negation in variable name and double negation. Why not bool Expandable = true?

124–129

[Style] NumerElementMap = isl_union_map_n_map(CurrentReadWriteDependences.get())

143–144

[Serious] In normal operation, do not print anything. Users expect only clang warnings and errors.

Alternatives are:

  1. DEBUG(dbgs() << "") (discurraged for regression tests, this is meant for debugging)
  2. Print remarks using -pass-remarks-missed
  3. Print information at -analysis
  4. STATISTIC
214–218

[Serious] This is not a sufficient condition for full expansion. E.g. one dimension can be a static 0.
There is also more than one possibility for full expansion (e.g. one starting at 0, another at 1). The reads must access the correct element. So if you want to implement this as a heuristic that expansion is not worth it in this case, implement it in checkExpandability such that reads are also not modified.

249–250

[Serious] I think getMaxBackedgeTakenCount can fail in cases where Polly is still able to detect affine loops. Please add at least an assert that SCEV is not SCEVCountNotCompute.

255

[Suggestion] Should this be getLatestScopArrayInfo() ?

297–298

[Remark] The structure I had in mind was

for (array : S.arrays()) {
  if (!checkExpendability(array))
    continue;

  for (ScopStmt &Stmt : S) 
    for (MemoryAccess *MA : Stmt) {
       if (MA->isRead())
         expandRead(MA);
       if (MA->isWrite())
         expandRead(MA);
  }
}

This does not need a NotExpandable set, but has worse asymptotic runtime. So I guess your version has an advantage.

302–304

[Style] We usually do not use parenthesis around single statements:

if (isExpandable(SAI))
  continue;
313

[Style] In Polly's coding style, all sentences end with a dot (but I personally don't care).

Thank you for this mostly working version. I hope my comments are not too daunting.

lib/Transform/MaximalStaticExpansion.cpp
90

[Suggestion] One could skip the check for the current array once it is known to be unexpandable and continue with the next one.

188–199

[Suggestion] Instead of searching for the correct id, you could derive the name as in expandWrite and look it up in ScopArrayNameMap.

[Serious] What if the Id is not found? Please add an assertion for that case.

250

[Suggestion] Or use ScalarEvolution::getAddExpr(SCEV, ScalarEvolution::getConstant(1))

[Serious] I don't see where +1 is added to the ISL version.

test/MaximalStaticExpansion/working_expansion.ll
24

Shouldn't this be

MemRef_B_Stmt_for_body3_expanded[2000][3000]

?

niosega added inline comments.Jul 21 2017, 9:18 AM
include/polly/ScopInfo.h
840–842

This was moved from private to public section because I need this method to find the expanded SAI name during read expansion. But if I use the solution you suggest, I will not need it anymore.

lib/Transform/MaximalStaticExpansion.cpp
90

That is what I wanted to do in the beginning. But I didn't find a clean solution to "escape" from the two innermost loops and go to next iteration of the loop that iterate over the SAI

250

We discuss during the last phone call about an other solution that involves methods that are in FlattenAlgo.cpp to get the boundary of the loop iterations variables. That's what I meant by "ISL version". But for now, I can not access the methods from FlattenAlgo.cpp because they are defined inside an unamed namespace.

297–298

If I am not wrong, we must first expand all writes before expanding the reads. Because otherwise problems can happened during trying to find expanded SAI during read expansion.

Meinersbur added inline comments.Jul 21 2017, 2:03 PM
include/polly/ScopInfo.h
840–842

ok.

lib/Transform/MaximalStaticExpansion.cpp
90

A possible solution is to refactor the inner two loops to a function, which can return true/false for a ScopArrayInfo at any point.

bool isExpandable(SAI) {
 for (ScopStmt &Stmt : S) 
      for (MemoryAccess *MA : Stmt) {
         if (MA->isMayWrite())
           return false;
        
           ...
      }     
}

for (auto &SAI : S.arrays()) {
  if (!isExpandable())
    NotExpandables.insert(SAI);
}
250

You may need to modify them anyway, so don't hesitate to copy them over, especially for a prototype.

If later we find they share a significant amount of code, we can find a common file for them.

297–298

Let me refine what I some time ago I had in mind.

for (array : S.arrays()) {
  MemoryAccess *TheWrite = nullptr;
  List<MemoryAccess*> *AllReads;
  if (!isExpandable(array, TheWrite, AllReads))
    continue;
  assert(TheWrite);
  assert(AllReads.size() > 0);

  ScopArrayInfo *ExpandedArray =  expandWrite(TheWrite);

  for (MemoryAccess *MA : AllReads) 
    expandRead(MA, ExpandedArray);
}
niosega marked 23 inline comments as done.Jul 24 2017, 12:18 PM
niosega updated this revision to Diff 107942.Jul 24 2017, 12:44 PM

Take into account remarks from Michael.

Change the structure of the expansion. Now iterate over SAI of the Scop.
Get the boundaries of the loop iterations variables with ISL.
Style modifications.

Implementation for remarks is in place but not working. One test case is broken due to change in structure. But the output seems to be correct. I will correct it as soon as possible.

niosega updated this revision to Diff 107946.Jul 24 2017, 12:52 PM

Remove debug if condition.

Meinersbur added inline comments.Jul 25 2017, 8:49 AM
lib/Transform/MaximalStaticExpansion.cpp
194

I get a compile error here. MA->getAccessRelation() has been updated to use C++ object. Please update to Polly trunk.

340

[Suggestion] Pass string as llvm::StringRef (or const std::string & to avoid a copy)

test/MaximalStaticExpansion/partial_access.ll
2

[Style] Please remove trailing whitespace.

simbuerg added inline comments.Jul 25 2017, 9:17 AM
lib/Transform/MaximalStaticExpansion.cpp
182

Reminder: These should/will become diagnostics

253

As far as I remember, this will fail, if there are more than one map in the union_map. I would check that with at least an assert.

259

Where do you use this name?

315

This takes a const string &, no need to go over the c_str().

343

Why 'AssumpRestrict'?

niosega marked 7 inline comments as done.Jul 26 2017, 4:01 AM
niosega updated this revision to Diff 108249.Jul 26 2017, 4:38 AM

Take into account Michael and Andreas comments.

Diagnostic still does not work.

niosega updated this revision to Diff 108256.Jul 26 2017, 5:50 AM

Emit remarks instead of stderr printing. Test case works.

Meinersbur accepted this revision.Jul 26 2017, 7:10 AM

To have any effect, we need to clear the DependenceInfo, otherwise -polly-opt-isl will still use the unexpanded dependence info and the mse was useless.

DependenceInfo has no facility yet to reset a previous analysis. You might want want to add one into this patch or a follow-up one.

lib/Support/RegisterPasses.cpp
144

[Suggestion] To be consistent with other switches that add passes, name this -polly-enable-mse and the pass itself -polly-mse (instead of -polly-opt-mse)?

lib/Transform/MaximalStaticExpansion.cpp
68

[Style] Use SmallPtrSetImpl<MemoryAccess *> to not require the small size in the parameter.

188

[Typo] extand -> expand

219

The consequence would not be a partial read access, but it would need to read the original value the element had before entering the SCoP. That's a special case similar to having more than one write.

308

[Nit] The UpperBound could overflow a long. Add an assertion for that?

329–332

[Style] This could be simpler using

NewAccessMap = NewAccessMap->equate(isl::dim::in, dim. isl::dim::out, dim);

or, even, better, use basic_map::equal.

348

[Nit] OptimizationRemarkEmitterWrapperPass

358–360

[Style] No braces around single statements.

Also possible:

SmallPtrSet<ScopArrayInfo *, 4> CurrentSAI(S.array_begin(), array_end());
test/MaximalStaticExpansion/partial_access.ll
2

The interleaving of stdout (-analyze) and stderr (-pass-remarks-analysis) is undefined. It is better to have two separate RUN lines, one checking analyze, the other -pass-remarks-analysis.

This revision is now accepted and ready to land.Jul 26 2017, 7:10 AM
niosega marked 8 inline comments as done.Jul 26 2017, 8:01 AM
niosega added inline comments.
lib/Transform/MaximalStaticExpansion.cpp
219

Are you sure ?

Because if I remember well. Let say that we are analyzing this code :

for (i = 0; i < Ni; i++) {
  B[j] = i;
  for (int j = 0; j<Nj; j++) {
    B[j] = j;
  }
  A[i] = B[i];

When I try to set the new access relation for the B read, the setNewAccessRelation method of class MemoryAccess failed with the assert "Partial READ accesses not supported".

308

How can I efficiently check that there is an overflow ?

329–332

I'd like to use isl_basic_map_equal but I did not find the documentation of this method on the online isl doc. There is also no example of uses in Polly. Can you explain me how it works ?

Meinersbur added inline comments.Jul 27 2017, 4:44 AM
lib/Transform/MaximalStaticExpansion.cpp
219

How were you able to do that? setNewAccessRelation accepts only an isl_map, not isl_union_map.

Let me explain in more detail.

for (int k = 0; k < M; k+=1) {
  for (int i = 0; i<= N/2; i+=1) {
S:  B[i] = i;
  }
  for (int i = 0; i<N; i+=1) {
T:  = ... B[i]
  }
}

The flow dependence would look something like:

{ T[k, i] -> S[k, i] : 0 < i <= N/2 }

We could naively expand B to B_expanded:

for (int k = 0; k < M; k+=1) {
  for (int i = 0; i<= N/2; i+=1) {
S:  B_expanded[k][i] = i;
  }
  for (int i = 0; i<N; i+=1) {
T:  = ... B_expanded[k][i]
  }
}

The problem here is that B_expanded[k][i] for i > N/2 never gets written (And T would read uninitialized memory). The flow dependence doesn't tell which instance of S wrote it in the first place!! If you try to apply it naively anyway using setNewAccessRelation, we need a source of the value for all instances of S, but we don't have one for i > N/2! This is way partial read accesses are unsupported.

The correct thing to do would be to read the value from the original array B (which then becomes read-only).

{ T[k,i] -> B_expanded[k,i] : i < N/2; T[k,i] -> B[i] : i>=N/2 }

This again is an isl_union_map (NOT a partial access since it is defined for all instances of T), which we currently do not support support.

Please try to understand what the problem with partial read accesses is. Not the partial read accesses are the problem, but the reason why you would want to use one.

308

UpperBound.le(INT_MAX) (I think there is no implicit conversion from int to isl::val, but you gget the idea)

329–332
isl_map_space(SpaceMap.copy(), SpaceMap.dim(isl::in))

should get you a basic_map of that space where the n_equal = SpaceMap.dim(isl::in) in- and out-
dimensions are equal. Something like.

{ Stmt[i0, i1] -> MemRef[o0, o1] : i0 = o0 and i1 = o1 }

However, no documention could mean that the function was not intended to be public.

niosega updated this revision to Diff 108467.Jul 27 2017, 7:56 AM
niosega marked 9 inline comments as done.

Take Michael comments into account.

Meinersbur added inline comments.Jul 27 2017, 8:57 AM
lib/Support/RegisterPasses.cpp
144

I'm ok with the switch name, but doesn't the e in "mse" already stand for "expansion" (therefore expand-mse is short for "expand-maximal-array-expansion")

lib/Transform/MaximalStaticExpansion.cpp
308–309

This assertion fails on Windows (and 32 bit platforms): The isl::val constructor takes a long, which is 32 bit this platforms. UINT_MAX exceeds its range.

310

It has been tested for the range of an unsigned int.

311

If UpperBound.get_num_si() is UINT_MAX, you get an overflow when adding +1.

test/MaximalStaticExpansion/read_from_original.ll
1 ↗(On Diff #108467)

Nice new testcase!

niosega added inline comments.Jul 29 2017, 2:03 PM
lib/Transform/MaximalStaticExpansion.cpp
308–309

It's a mistake from my side to compare it with UINT_MAX.

If I replace

std::numeric_limits<unsigned>::max()

with

std::numeric_limits<long>::max()

it should work on every platform, right ?

Meinersbur added inline comments.Jul 30 2017, 11:02 AM
lib/Transform/MaximalStaticExpansion.cpp
308–309

Except that it is stored into a std::vector<unsigned>. Storing a long into as an unsigned int may get you another overflow.

My suggestion is to stick with the lowest common maximum: std::numeric_limits<int>::max(). I don't think you would want to allocate memory larger than that anyway.

niosega updated this revision to Diff 109640.Aug 3 2017, 3:18 PM
niosega edited the summary of this revision. (Show Details)

Take into account Michaels comments.
Update setNewAccessRelation call (isl::map as parameter instead of isl map * )

niosega edited the summary of this revision. (Show Details)Aug 3 2017, 3:20 PM
niosega edited the summary of this revision. (Show Details)Aug 3 2017, 3:23 PM

LGTM.

Andreas, do you want to commit?

lib/Transform/MaximalStaticExpansion.cpp
309

Why - 1?

This revision was automatically updated to reflect the committed changes.