This is an archive of the discontinued LLVM Phabricator instance.

Ignore Singletons in statement domains
Needs ReviewPublic

Authored by sabuasal on Jun 13 2019, 4:01 PM.

Details

Summary

Code generation in Polly doesn't do a good job with disjunctions in ,
one statement will get materialized more than once which will increase code
size.

  
For example, for a do-while loop would be modeled like so:
  
  :: isl ast :: foo :: %do.body---%do.end
   [p_0] -> {  : -2147483648 <= p_0 <= 2147483647 }
    { domain: "[p_0] -> { Stmt0[i0] : 0 <= i0 < p_0; Stmt0[0] : p_0 <= 0 }",
         child: { schedule: "[p_0] -> [{ Stmt0[i0] -> [(i0)] }]" } }
    if (1 && 0 == (p_0 >= 268435457 || p_0 <= -1))
          {
            #pragma simd
            #pragma known-parallel
            for (int c0 = 0; c0 < p_0; c0 += 1)
              Stmt0(c0);
            if (p_0 <= 0)
              Stmt0(0);    <---- will be rematerialized.
          }
  
      else
            {  /* original code */ }
In the resulting AST There is a separate instance for the 0 case (of p_0) and
different one for the non zero case.
  
By pruning the domain set to ignore singletons we can generate a contiguous set,
we can do that by injecting the assumption that the loop trip count
is always > 0 to generate a domain (and corresponding AST) that looks like so:
  
        :: isl ast :: foo :: %do.body---%do.end
        [p_0] -> {  : -2147483648 <= p_0 <= 2147483647 }
        { domain: "[p_0] -> { Stmt0[i0] : 0 <= i0 < p_0 }",
            child: { schedule: "[p_0] -> [{ Stmt0[i0] -> [(i0)] }]" } }
        if (p_0 >= 1 && 0 == (p_0 >= 268435457 || p_0 <= -1))  --> added a condition for P_0 > 1
  
            #pragma simd
            #pragma known-parallel
            for (int c0 = 0; c0 < p_0; c0 += 1)
              Stmt0(c0);
  
        else
            {  /* original code */ }

This patch received intensive testing and showed considerable code size saving when used to build android.
This patch will have significant code size saving by eliminating excessive versioning in the AST for the values
of parameters. Look at the full AST listing for the test case test/ScopDetect/model-dowhile.ll

Diff Detail

Event Timeline

sabuasal created this revision.Jun 13 2019, 4:01 PM
Herald added a project: Restricted Project. · View Herald Transcript
sabuasal edited the summary of this revision. (Show Details)Jun 13 2019, 4:07 PM
sabuasal edited the summary of this revision. (Show Details)Jun 13 2019, 4:42 PM

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);
include/polly/ScopInfo.h
69

Can you use something more descriptive?

lib/Analysis/ScopInfo.cpp
195

Can you explain the why you chose the name model-do-while?

1215

setDomain would better describe what this function does.

2182

No need to check plain_is_universe(); intersection with the universe would not change the set. ISL also has a fast-path for plain_universe arguments.

2183–2184

[serious] simplifyAssumptionContext should not logically modify the AssumptionContext. The intersection should have happened earlier.

2896

Please document/doxygen what this function is doing

2905

Condition not necessary: project_out will not do anything when projecting-out zero dimensions.

2996

[serious] Instead of lexmax on the scatter function, shouldn't we subtract the parameter conditions that cause the singleton set to be executed? For instance, I imagine this would break the lexmax logic (smaller n means more loop iterations):

for (int i = 0; i < -n; i+=1)
  Stmt(i);

What is the reason for this logic to be in addLoopBoundsToHeaderDomain and not e.g. in a separate function like intersectBBDomainParams? I prefer having the logic handling this in one place.

2997

We are trying to move the functionality that is only required for the construction of a SCoP into ScopBuilder.

3749

[suggestion] A different description such as "Avoid single iteration loops?

test/ScopDetect/model-dowhile.ll
2

[serious] Please move RUN lines to the beginning

29

[serious] Use %loadPolly

To avoid mixing stdout and stderr (how they are interleaved is undefined), you can use --disable-output

32

[serious] Use | to directly pipe to FileCheck

74

[serious] Please unsure that every value has a name (anon values tend to cause less table output). You can do this using opt -instnamer.

108

Remove these markers? They should be irrelevant for this test case.

118–119

[nit] attributes are unused.

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

sabuasal marked 2 inline comments as done.Jun 14 2019, 10:30 AM

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

As for numbers, we observed code size saving of around 1 MB when compiling a version of android Q.

include/polly/ScopInfo.h
69

Hmm, I'll change to TRIPCOUNT , how does that sound?

lib/Analysis/ScopInfo.cpp
195

The problem with disjunction in domains is very common in *do while* style loops where the domain of the statement has either one iteration (if the loop condition is false) or more than one.

The idea in this patch is that if the loop condition is actually false we should really not try to run the transformed version of the loop anyway since it not beneficial for a loop of 1 iteration.

I am open t other names if you have suggestions :)

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

As for numbers, we observed code size saving of around 1 MB when compiling a version of android Q. SPEC show very little to no performance changes.

You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

What I meant is NOT using isl at all for detecting the situation, but deriving it from the control flow graph (such as 'conditional branch found at the end of the loop instead of before'). Keep in mind this is only a [suggestion], not a requirement for me to accept this patch.

As for numbers, we observed code size saving of around 1 MB when compiling a version of android Q.

What is 1 MB relative to?

include/polly/ScopInfo.h
69

Better, but still does not describe what the assumption is.

lib/Analysis/ScopInfo.cpp
195

The problem I see is that this does not only apply to do-while loops (which is a syntactical element), but e.g. to loop-rotated other loops as well.

[suggestion] polly-avoid-singular-loops, polly-assume-multi-iteration-loops, polly-fallback-on-single-iteration-loops, -polly-expect-loops-to-loop, -polly-optimize-looping-loops-only, ....

(in compiler lingo, 'assume' usually means the compiler is free to use the fact for optimization without requiring it to be verified, 'expect' that violating the fact must still result in correct code).

sabuasal marked 5 inline comments as done.Jun 14 2019, 2:50 PM

You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

What I meant is NOT using isl at all for detecting the situation, but deriving it from the control flow graph (such as 'conditional branch found at the end of the loop instead of before'). Keep in mind this is only a [suggestion], not a requirement for me to accept this patch.

Oh I see, I only thought about this in terms of modeling in ISL instead of hacking something in ScopDetection\Info. Thanks for the suggestion.

As for numbers, we observed code size saving of around 1 MB when compiling a version of android Q.

What is 1 MB relative to?

This accounted for about 0.4% of code size of shared objects we compile in Android with our internal Polly enabled for vectorizetion. Hope that answers your question.
One extra benefit for this patch was the simplicity of the resulting AST for analysis too, which is certainly a big plus.

include/polly/ScopInfo.h
69

I'll use AVOIDSINGLEITERATION :)

lib/Analysis/ScopInfo.cpp
195

You are right, it does happen with rotated loops. It iis actually why it was significant enough that I figured we could model it in polly. I'll use polly-avoid-singular-loops.

sabuasal marked an inline comment as done.Jun 14 2019, 2:56 PM
sabuasal added inline comments.
lib/Analysis/ScopInfo.cpp
2183–2184

Actually, this has to be done after simplification. The AssumptionContext is simplified (gisted-ed) with the domain parameters as you can see in line 2178. If you look at line 3025 you'll see that we also gist the domain parameters with the RestrictDomainParameters we are building. this can result in a situation where the parameters restriction we add can be completely removed from both the assumption, and hence the runtime check, and the materialized domain.

the reason it is important to gist the domain parameters is that it we'd rather push all the complex params to the RTC and keep the materialized statement as simple as possible.

Can you suggest a better place for the intersection that insures our restriction params are always present in the AssumptionContext? I think intersectBBDomainParams() might work?

sabuasal marked 3 inline comments as done.Jun 14 2019, 3:30 PM
sabuasal added inline comments.
lib/Analysis/ScopInfo.cpp
2896

will do.

2905

will do.

2996

Thanks for the suggestion here, I think this could actually fit in intersectBBDomainParams(), let me experiment with it.

sabuasal marked 6 inline comments as done.Jun 17 2019, 7:17 PM
sabuasal added inline comments.
lib/Analysis/ScopInfo.cpp
2183–2184

Here is an example from an internal test case that shows the problem:

Avoid Single Iter Assumptions: [p_0, p_1, p_2] -> { : p_1 = 1 and p_2 > 0 and (p_0 >= 32 or p_0 <= 30) }

Assumed context Before gist with domain params: [p_0, p_1, p_2] -> { : (p_1 = 1 and p_0 >= 32 and p_2 > 0) or (p_1 = 1 and p_0 <= 30 and p_2 > 0) }
Domain parameters: [p_0, p_1, p_2] -> { : (p_0 >= 32 and p_2 > 0) or (p_0 <= 30 and p_2 > 0) }
Assumed context After simplifying with domain params: [p_0, p_1, p_2] -> { : (p_1 = 1 and p_0 >= 32) or (p_1 = 1 and p_0 <= 30) } *Note the loss of p_2 >0*

sabuasal updated this revision to Diff 205233.Jun 17 2019, 7:19 PM

Addressed comments from @Meinersbur.

sabuasal updated this revision to Diff 205234.Jun 17 2019, 7:20 PM

Thank you for applying most of my suggestions.

lib/Analysis/ScopInfo.cpp
2908

Condition is unnecessay: .project_out(isl::dim::set, 0, 0) will just return the original set.

2997

Any reasons to not move this to ScopBuilder?

3005

A basic_set should convert implicitly to a set when passed to hasSingleton

3011

[serious] I am quite sure that this method removing the single-iteration-domain is not reliable. Examples include:

for (int i = 0; i < -n; i+=1)
  Stmt(i);
if (unsigned i = 0; i < (n%8); i+=1)
  Stmt(i);

and combinations thereof. At least the first would do the exact opposite: Remove all contexts that have more than one iteration.

3020

Is this a safeguard to avoid excluding everything? Could you write a comment about its intention?

3033

Could you explain why RestrictDomainParams has to be collected separately from AssumedContext and only to be merged afterwards?

test/ScopDetect/model-dowhile.ll
2

Even before the REQUIRES line

3

The -S flag is unnecessary with --disable-output

Meinersbur added inline comments.Jun 26 2019, 1:23 PM
test/ScopDetect/model-dowhile.ll
3

[serious] Is -O3 should not be necessary; if some transformation is required by -polly-ast, it should be baked-into the test case.

Meinersbur added inline comments.Jun 26 2019, 2:44 PM
lib/Analysis/ScopInfo.cpp
3011

What I would try to do:

InvalidContext = InvalidContext.unite(BSet.params())

This might not directly solve the problem since the invalid context is not used for gisting the iterations space. So the alterntive is to collect the params .params() of all non-singleton basic_sets and intersect those with the AssumedContext.