Page MenuHomePhabricator

[LoopNest]: Analysis to discover properties of a loop nest.
ClosedPublic

Authored by etiotto on Oct 10 2019, 7:38 AM.

Details

Summary

This patch adds an analysis pass to collect loop nests and summarize properties of the nest (e.g the nest depth, whether the nest is perfect, what's the innermost loop, etc...).

The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/v/llvm-loop-nest-analysis) where we discussed
the unimodular loop transformation framework ( “A Loop Transformation Theory and an Algorithm to Maximize Parallelism”, Michael E. Wolf and Monica S. Lam, IEEE TPDS, October 1991). The unimodular framework provides a convenient way to unify legality checking and code generation for several loop nest transformations (e.g. loop reversal, loop interchange, loop skewing) and their compositions. Given that the unimodular framework is applicable to perfect loop nests this is one property of interest we expose in this analysis. Several other utility functions are also provided. In the future other properties of interest can be added in a centralized place.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2019, 7:38 AM
etiotto edited the summary of this revision. (Show Details)Oct 10 2019, 7:39 AM
fhahn added a comment.Oct 10 2019, 8:42 AM

The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/s/xn1sfd80jkpfssiqhal6t88yvzxebcex).

I think it would be great to summarize the motivation in the summary as well in a few sentences.

The motivation for this patch was discussed at the latest meeting of the LLVM loop group (https://ibm.box.com/s/xn1sfd80jkpfssiqhal6t88yvzxebcex).

Please change the link to: https://ibm.box.com/v/llvm-loop-nest-analysis

typo in title: [LoopNext] -> [LoopNest]

etiotto edited the summary of this revision. (Show Details)
etiotto retitled this revision from [LoopNext]: Analysis to discover properties of a loop nest. to [LoopNest]: Analysis to discover properties of a loop nest..
etiotto edited the summary of this revision. (Show Details)Oct 10 2019, 1:52 PM

[serious] I think this pass should determine the maximum set of loops that are perfectly nested, as getMaxPerfectDepth returns. This would make

for (int i)
  for (int j) {
    preStmt();
    for (int k) ...
    postStmt();
  }

a perfectly nested loop nest of depth 2 with a body that contains a loop. In fact, any loop would be perfectly nested, but possibly just with depth one. This would avoid different analysis result between the above loop nest and

for (int i)
  for (int j) {
    OutlinedBody(i,j);
  }

[serious] There could be irreducible control flow between two loops, that needs to be ruled-out.

llvm/include/llvm/Analysis/LoopNestAnalysis.h
66

[nit] outermost is a proper english word, so I suggest: getOutermostLoop()

78

[typo] breadth

125

[discussion] I understand "perfectly nested" as a property of a set of loops. That is, in

for (int i)
  for (int j) {
    preStmt();
    for (int k) ...
    postStmt();
  }

I'd understand the loop i and j to be a perfect loop nest. What code is in the body does not matter.

llvm/lib/Analysis/LoopNestAnalysis.cpp
24

Is this an established pattern? Never seen this before, but looks practical. The other pattern is VerboseFusionDebugging. Maybe we should agree on one patten.

71

[unrelated to this patch] Why isn't this prefixed with LLVM?

[suggestion] We could #define LLVM_DEBUG_VERBOSE(...) DEBUG_WITH_TYPE(DEBUG_TYPE "-verbose", __VA_ARGS__)

93–95

[serious] Should this just use llvm::isSafeToSpeculativelyExecute? It basically contains anything that could just be moved into the innermost loop such that it is executed multiple times.

178–181

[serious] This looks inconsistent with getMaxPerfectDepth() which understand "perfectly nested" as a property of a set of loops. That is, in

for (int i)
  for (int j) {
    preStmt();
    for (int k) ...
    postStmt();
  }

I'd understand the loop i and j to be a perfect loop nest. What code is in the body does not matter.

It doesn't seem to be used, so I'd just remove this function.

I am very much not convinced of either this analysis or the motivation.

First, restricting transformations to perfectly nested loops when there are known techniques to handle imperfect loop nests does not seem worthwhile. As such, I strongly reject the notion that we should make perfect nesting a requirement for all of our loop transformations.

Secondly, even if I accept the premise that perfect nesting is useful, I do not see the value of this analysis pass. We have an existing description of a loop and loop nest in loop info. How is this anything more than a set of helper function implemented over the existing analysis? (i.e. LoopNestUtils.hpp/cpp?)

First, restricting transformations to perfectly nested loops when there are known techniques to handle imperfect loop nests does not seem worthwhile. As such, I strongly reject the notion that we should make perfect nesting a requirement for all of our loop transformations.

I'd see it as a tool for transformations that need perfect loop nests (interchange, tiling, unrollandjam, ...) rather as a new requirement.

Secondly, even if I accept the premise that perfect nesting is useful, I do not see the value of this analysis pass. We have an existing description of a loop and loop nest in loop info. How is this anything more than a set of helper function implemented over the existing analysis? (i.e. LoopNestUtils.hpp/cpp?)

We already had this discussion in D63459. In this case, I don't have a strong opinion on whether it should be a LoopAnalysis or a utility. In this case an advantage could be that passes may mark the analysis as preserved (e.g. it doesn't change the loop nest) rather it being recomputed.

etiotto marked 11 inline comments as done.Oct 29 2019, 1:48 PM
etiotto added inline comments.
llvm/include/llvm/Analysis/LoopNestAnalysis.h
66

ok, I will also rename getInnerMost() to getInnermost() for consistency.

125

Correct. In your example loops i and j are perfectly nested with respect to each other. Loop j and k on the other hand are not perfectly nested because of the present of stmts between the loops. The entire loop nest is imperfect because of that.

So the isPerfect member function returns true if and only if any adjacent pair of loops in the nest are perfectly nested with respect to each other (not only the othermost 2 loops). That is the only loop containing user code is the innermost loop.

llvm/lib/Analysis/LoopNestAnalysis.cpp
24

I personally like this one because is shorter than --debug-only=loopnest-verbose-debug.... but if ppl objects then I can change it.

71

Yep I like it. Would be useful to have the LLVM_DEBUG_VERBOSE macro to drive consistency in how debug traces could be toggled.

It would be even better to have a way to pass a verbosity level to the debug/tracing infrastructure. Users would then be able to select which level of verbosity to associated with a particular debug stmt:

#define LLVM_DEBUG_VERBOSE(DbgLvl, ...) DEBUG_WITH_TYPE(DEBUG_TYPE "-verbose-" + DbgLvl, VA_ARGS)

178–181

This function would return false for the`j` loop in the example because the loop nest it 'roots' is imperfect given that there is code between loops j and k. It would also return false of loop i given that its containing loop nest is imperfect. Does that answer your question?

etiotto updated this revision to Diff 226963.Oct 29 2019, 1:59 PM
etiotto marked 5 inline comments as done.

Partially address code review comments from @Meinersbur.

Meinersbur added inline comments.Oct 30 2019, 9:18 AM
llvm/include/llvm/Analysis/LoopNestAnalysis.h
125

I am arguing that we should remove isPerfect entirely and replace by something that returns the maximum perfect loop nest depth. To avoid a difference whether in

for (int i)
  for (int j) {
    OutlinedBody(i,j);
  }

the function OutlinedBody is inlined or not.

Also, since this is a loop analysis, there should be a function determining whether the currently analyzed loop is the outermost loop nest. A loop pass could use this to skip optimizing a loop of depth 2 when when including another outer loop it could optimize a loop nest depth of 3 (since the order of optimization is determined by the LoopPassManager and can be arbitrary).

bmahjour added inline comments.Nov 1 2019, 11:18 AM
llvm/lib/Analysis/LoopNestAnalysis.cpp
93–95

isSafeToSpeculativelyExecute returns false for phi nodes, branches and most integer divisions.
We need to allow phi and branches due to inner loop induction and control.
As far as I know integer divisions are safe to speculate most of the time, except on some platforms where zero-division generate a trap.
It puzzles me why isSafeToSpeculativelyExecute checks for integer division but completely ignores floating-point division which would have visible side-effects on more platforms!

[serious] Could you add a test with early exits, e.g.

for (i = 0; i<ni; ++i) {
    for (j=0; j<nj; j+=1) {
       if (c)
        return i+j;
    }
}
llvm/lib/Analysis/LoopNestAnalysis.cpp
93–95

I guess control flow instructions are out of the scope of isSafeToSpeculativelyExecute, but could be checked explicitly.

isSafeToSpeculativelyExecute should be the reference for whether a non-control-flow instruction can be executed speculatively/redundantly. If it returns true for something that has side-effect, it's a bug, not just if called here. If it can be improved, then we can improve it. For instance, passing the target determining whether division by zero is undefined or poison (by the LangRef, it's always undefined).

Another question is whether isSafeToSpeculativelyExecute is what we need. It may return true on loads, but I'd exclude loads here (which may give us problems since LICM likes to move loads out if the innermost loop). If something can be moved into the innermost loop, then it does not matter whether it might be undefined behavior since execution of the innermost loop is required for anything in the innermost to execute.
On the other side, hoisting out of the outermost loop might be required if needed to compute the iteration count of the innermost loop (e.g. for loop interchange). We cannot risk a division-by-zero to occur of it did not occur in the original loop nest.

I think isSafeToSpeculativelyExecute is a safe default and we may think about loosen the requirements if we know what should happen with instructions of this kind for loop nest transformations. It's also better to have a more central place for the property we need.

llvm/test/Analysis/LoopNestAnalysis/perfectnest.ll
225

Does the guard condition be consistent with the loop exist condition? I.e. is

for (i = 0; i<ni; ++i) {
  if (5<nj) { // this a guard?
    for (j=0; j<nj; j+=1) 
      x+=(i+j);
  }
}
etiotto marked 8 inline comments as done.Nov 20 2019, 9:30 AM
etiotto added inline comments.
llvm/include/llvm/Analysis/LoopNestAnalysis.h
125

Hi Michael, the isPerfect() member function is IMO convenient for consumers that want to know whether the entire loop nest is perfect (from a syntactic standpoint). However is not strictly necessary so I'll remove it and add this member function:

/// Return true if the given loop \p L is the outermost loop in the nest.
bool isOutermost(const Loop &L) const;

llvm/lib/Analysis/LoopNestAnalysis.cpp
93–95

In the next patch I'll will use isSafeToSpeculativelyExecute as suggested, and augment it for PHI nodes and cmp instructions.

llvm/test/Analysis/LoopNestAnalysis/perfectnest.ll
225

I rely on the LoopInfo getLoopGuardBranch() member function to determine whether (5<nj) is a guard for the inner loop (it is currently considered to be a guard). So the answer is yes, if (5<nj) would be considered a guard and therefore the nest considered perfect.

etiotto updated this revision to Diff 230283.Nov 20 2019, 9:39 AM
etiotto marked 3 inline comments as done.

Addressing code review comments from Michael.

[suggestion] Add a methods that returns/fills a vector with all the Loop*s that are part of the perfect loop nest.

llvm/include/llvm/Analysis/LoopNestAnalysis.h
125

In my domain it is very common to have additional loops inside otherwise perfectly nested loops (e.g. stencils). Here is a simple example of a grayscale conversion:

for (int y = 0; y < Y; ++y)
  for (int x = 0; x < X; ++x) {
      double luminance = 0;
      for (int c = 0; c < 3; ++c)
        luminance += colored[x][y][c] / 3.0;
      grayscale[x][y] = luminance;
    }

For optimization, we would e.g. interchange the x and y loops. If the interchange uses isPerfect to determine whether the loop can be transformed, it will have to bail out (Let's ignore that the inner loop could be unrolled before loop interchange, many stencils in HPC have inner loops with dynamic trip count or are too large).

That is, using isPerfect would stop many HPC applications, maybe even a majority of our programs in production, to be optimized. For the syntactic property of two loops having no in-between code (with size-effects), what's inside the innermost body should be irrelevant and it would be surprising to me if it did. This is why I'd to not even include isPerfect in the API.

If a pass really cannnot handle nested loops, it can still ask LoopInfo's innermost Loop whether it contains another loop. This would be an explicit restriction by the pass.

[suggestion] Add a methods that returns/fills a vector with all the Loop*s that are part of the perfect loop nest.

That sounds like a good idea. I'd suggest returning a vector of vectors actually in case where there are multiple perfect sub nests in a loop nest. ie:

L1
  L2
   <code>
    L3
      L4
        L5

return:
{{L1,L2},{L3,L4,L5}}

[suggestion] Add a methods that returns/fills a vector with all the Loop*s that are part of the perfect loop nest.

That sounds like a good idea. I'd suggest returning a vector of vectors actually in case where there are multiple perfect sub nests in a loop nest. ie:

L1
  L2
   <code>
    L3
      L4
        L5

return:
{{L1,L2},{L3,L4,L5}}

I see how that would be useful. Assuming data dependencies are preserved such a vector could be used to interchange loops in the 2 groups ({L1,L2} and {L3,L4,L5}}. I will add this functionality in the next patch.

etiotto updated this revision to Diff 233182.Dec 10 2019, 1:38 PM

Based on the feedback received during code review I have added a new public member function called 'getPerfectLoops' which can be used to retrieve a list of loops that are perfect with respect to each other.
For example, given the following loop nest containing 4 loops, 'getPerfectLoops' would return {{L1,L2},{L3,L4}}.

for(i) // L1
  for(j) // L2
    <code>
    for(k) // L3
       for(l) // L4
etiotto marked 2 inline comments as done.Dec 10 2019, 1:43 PM
etiotto added inline comments.
llvm/include/llvm/Analysis/LoopNestAnalysis.h
125

@Meinersbur I have uploaded a patch that adds the a function which can be used to retrieve the list of loops that are perfect within each other in the nest. In your example it would return loop-y and loo-x into a list (and loop-c into another list by itself). Please take a look, it should address your concern.

I'd only store the list of perfectly nested loops that are have the analysis loop as outermost loop in the analysis results. This should be the most expensive analysis that might be worth caching/preserving between passes.

llvm/include/llvm/Analysis/LoopNestAnalysis.h
35–36

Could you clarify whether this allows other perfectly nested loops between outer and inner? Eg.

for outer:
   for middle:
      for inner:
37–38

[suggestion] Did you consider moving the static methods to LoopUtils?

63

[serious] As already mentioned, I don't see a use case for this. Transformations should not have more requirements than necessary, including the content of the innermost loop of a perfect loop nest, and I don't see how the presence of a nested loop should be a requirement of any transformation.

Please either try to convince me otherwise or remove it to avoid it being used accidentally.

Same applies to getNestDepth.

68–69

[subjective] I'd prefer *Loops.front() over const_cast.

103

[suggestion] Return a ArrayRef<Loop*>. Returning LoopVector exposes more implementation details than necessary.

117

[typo] "i.e." (that is) instead of "e.g." (for example)

llvm/lib/Analysis/LoopNestAnalysis.cpp
31–35

[discussion] The analysis looks like a cache for the the perfect loop depth and the loops in breadth-first order. Everything else is computed on-the-fly. Does this justify being a pass?

[serious] I also don't see the advantage of storing the loop list over iterating the LoopInfo structure itself when needed. Is is for speed? Also note that loop analysis might be instantiated for every loop, each storing its own list.

92

[name] containsOnlySafeInstructions

218–222

[style] The LLVM code base does not use const for function-local variables.

etiotto marked 17 inline comments as done.Dec 18 2019, 1:47 PM
etiotto added inline comments.
llvm/include/llvm/Analysis/LoopNestAnalysis.h
35–36

Added example in a comment.

37–38

Yes, I think is nice to keep all the code related to the loop nest analysis grouped together in this class.

63

OK removing these 2 static functions

68–69

ok, done

llvm/lib/Analysis/LoopNestAnalysis.cpp
31–35

OK I will change this from an analysis pass to an utility class in the next revision. Perhaps over time, as we find the need to add more functionality to this class, we can revisit this point and determine whether it makes more sense to have an analysis pass.

218–222

I am trying to ensure the basic blocks are not mutated in the function. I am not sure what was the rationale (in the LLVM code base) for not wanting to const qualify function-local variables if they are intended to remain immutable ... perhaps the thinking was that the original author of the code would not make a mistake and modify something that was intended to stay immutable, but I think is safer to const qualify variables to prevent unintended changes down the road.

etiotto updated this revision to Diff 234611.Dec 18 2019, 1:48 PM
etiotto marked 6 inline comments as done.

Addressing code review comments.

You may have though about ArrayRef<const Loop*> getLoops() const instead of const ArrayRef<Loop*> getLoops() const.

However, I think the const qualifier on the method applies to the analysis result, not the loops it holds (which are the analysis result of LoopInfo). The loops are identity objects: modifying them does not change which loops they represent. To create a LoopNest, one requires a non-const Loop* object anyway.

A language-layer might say that when requiring a non-const Loop* at construction, we already know that the Loop* is stored in modifyable memory (not a const global), hence can be modified.

llvm/include/llvm/Analysis/LoopNestAnalysis.h
65

[suggestion] Add a remark that the outermost and the returned are not necessarily perfectly nested.

85–88

[serious] This accessor makes no sense. The same Loop* (non-const) can be obtained from a const LoopNest using getLoops()[Index].

LoopNest does not own the Loops and therefore has no control over its const-ness. Just use a single Loop *getLoop(unsigned) const accessor. const on LoopNest refers to the constness of the analysis result. The Loops themselves are identity objects: after e.g. removing a BB out of its lists (e.g. because it is dead), it still represents the same loop. Same applies to getOutermostLoop().

To make the accessor useful, also provide a getNumLoops() member (or just remove all of it and require users to use getLoops()).

95

[style] const ArrayRef is useless. It's an r-value/temporary.

141–142

[style] Since it is private, move it as a free function into the .cpp file. No need to expose it in a header file.

146

[typo] breadth

llvm/lib/Analysis/LoopNestAnalysis.cpp
33–34

I still think storing all a list of all loops doesn't have a lot of benefits.

etiotto updated this revision to Diff 243869.Feb 11 2020, 7:41 AM
etiotto marked 9 inline comments as done.

Addressed review comments from @Meinersbur

etiotto added inline comments.Feb 12 2020, 7:05 AM
llvm/include/llvm/Analysis/LoopNestAnalysis.h
85–88

Removed the member functions that return a const Loop. Added getNumLoops().

141–142

ok

llvm/lib/Analysis/LoopNestAnalysis.cpp
33–34

I think is useful to the transformation to know which loops are in the loop nest. It is used in several places. For example is handy tp have a list of loops in the nest to check whether they are all in simplified form. It is also used to determine the nst depth, etc...

Meinersbur accepted this revision.Feb 17 2020, 9:25 AM

LGTM

llvm/include/llvm/Analysis/LoopNestAnalysis.h
35–36

What I meant is what does it return for:

for i
  for j
    for k

arePerfectlyNested(loop_i, loop_k, SE) == ?

87

[style] SmallVector::size() is of type size_t, why not also return size_t?

114

[suggestion] Add an assertion ensuring that the result would not be negative.

This revision is now accepted and ready to land.Feb 17 2020, 9:25 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Mar 3 2020, 5:57 AM

Looks like this broke the build with modules enabled: http://green.lab.llvm.org/green/job/lldb-cmake/10655/console .